Authors
Angela van Sprang
Erman Acar
Willem Zuidema
Date (dd-mm-yyyy)
2025-11-07
Title
Interpretability for Time Series Transformers using A Concept Bottleneck Framework
Publication Year
2025-11-07
Document type
Poster
Abstract
Mechanistic interpretability focuses on reverse engineering the internal mechanisms learned by neural networks. We extend our focus and propose to mechanistically forward engineer using our framework based on Concept Bottleneck Models. In the context of long-term time series forecasting, we modify the training objective to encourage a model to develop representations which are similar to predefined, interpretable concepts using Centered Kernel Alignment. This steers the bottleneck components to learn the predefined concepts, while allowing other components to learn other, undefined concepts. We apply the framework to the Vanilla Transformer, Autoformer and FEDformer, and present an in-depth analysis on synthetic data and on a variety of benchmark datasets. We find that the model performance remains mostly unaffected, while the model shows much improved interpretability. Additionally, we verify the interpretation of the bottleneck components with an intervention experiment using activation patching.
URL
go to publisher's site
Permalink
https://hdl.handle.net/11245.1/8bc9261e-4069-46b1-b237-6dddc3a83488