Authors
Rochelle Choenni
Evi Hendrikx
L.M. Beinborn
Date (dd-mm-yyyy)
2019-04-10
Title
On the Evaluation of Structural Similarity between Brain and Computational Models
Publication Year
2019-04-10
Document type
Poster
Abstract
Much progress has been made in representing the meaning of linguistic units such as words, sentences and phrases due to powerful neural network architectures [1], [2]. These computational representations are high dimensional vectors that are learned such that units with similar meaning are grouped together more closely in the vector space. As a result, they capture the meaning of concepts without explicitly being informed about what these concepts entail. These computational representations have improved performances on a variety of downstream NLP tasks. The question to what extent they are similar to semantic representations in the human brain has drawn the attention of researchers that are trying to gain insight into human language processing. Two main approaches for comparing computational representations with human brain activation are encoding/decoding experiments and representational similarity analysis. Both methods attempt to evaluate whether computational models and the brain use similar organizational principles to process language by trying to capture similar patterns in their semantic representations. Finding a correlation between the structures of computational and brain representations may contribute to linguistic, computational, and cognitive science. Computational models can operationalize and test cognitive hypotheses for human language understanding. Simultaneously, a better understanding of the human brain enables us to derive more cognitively plausible models [3]. A current issue is that there is no conventionalized way to evaluate the analysis results. Therefore, we compared different evaluation methods, using state-of-the-art deep learning models, and tested them on a number of fMRI datasets in order to allow for a robust comparison. We found that different methods could lead to vastly different results. For example, the way in which pairwise accuracy is defined could make a difference of 30% in accuracy. These inconsistent results could lead to misleading assumptions of structural similarity between both models. It is therefore important to make evaluation procedures more transparent.
Permalink
https://hdl.handle.net/11245.1/8c3cedba-62d6-437f-81ea-d9bab10b715f