Authors
Yishan Wang
Pia Sommerauer
J. Bloem
Date (dd-mm-yyyy)
2025
Title
The Negation Bias in Large Language Models: Investigating bias reflected in linguistic markers
Publication Year
2025
Number of pages
17
Document type
Conference contribution
Abstract
Large Language Models trained on large-scale uncontrolled corpora often encode stereotypes and biases, which can be displayed through harmful text generation or biased associations. However, do they also pick up subtler linguistic patterns that can potentially reinforce and communicate biases and stereotypes, as humans do? We aim to bridge theoretical insights from social science with bias research in NLP by designing controlled, theoretically motivated LLM experiments to elicit this type of bias. Our case study is negation bias, the bias that humans have towards using negation to describe situations that challenge common stereotypes. We construct an evaluation dataset containing negated and affirmed stereotypical and anti-stereotypical sentences and evaluate the performance of eight language models using perplexity as a metric for measuring model surprisal. We find that the autoregressive decoder models in our experiment exhibit this bias, while we do not find evidence for it among the stacked encoder models.
Permalink
https://hdl.handle.net/11245.1/7284a9c6-f0ef-45a2-a45c-b277ad8c2246