Interpretability for Time Series Transformers using A Concept Bottleneck Framework

Authors

Angela van Sprang
Erman Acar
Willem Zuidema

Date (dd-mm-yyyy)

2025-11-07

Title

Interpretability for Time Series Transformers using A Concept Bottleneck Framework

Publication Year

2025-11-07

Document type

Poster

Abstract

Mechanistic interpretability focuses on reverse engineering the internal mechanisms learned by neural networks. We extend our focus and propose to mechanistically forward engineer using our framework based on Concept Bottleneck Models. In the context of long-term time series forecasting, we modify the training objective to encourage a model to develop representations which are similar to predefined, interpretable concepts using Centered Kernel Alignment. This steers the bottleneck components to learn the predefined concepts, while allowing other components to learn other, undefined concepts. We apply the framework to the Vanilla Transformer, Autoformer and FEDformer, and present an in-depth analysis on synthetic data and on a variety of benchmark datasets. We find that the model performance remains mostly unaffected, while the model shows much improved interpretability. Additionally, we verify the interpretation of the bottleneck components with an intervention experiment using activation patching.

URL

go to publisher's site

Permalink

https://hdl.handle.net/11245.1/8bc9261e-4069-46b1-b237-6dddc3a83488