The highest probability sequences of most neural language generation models tend to be degenerate in some way, a problem known
as the inadequacy of the mode. While many approaches to tackling particular aspects of the problem exist, such as dealing
with too short sequences or excessive repetitions, explanations of why it occurs in the first place are rarer and do not agree
with each other. We believe none of the existing explanations paint a complete picture. In this position paper, we want to
bring light to the incredible complexity of the modelling task and the problems that generalising to previously unseen contexts
bring. We argue that our desire for models to generalise to contexts it has never observed before is exactly what leads to
spread of probability mass and inadequate modes. While we do not claim that adequate modes are impossible, we argue that they
are not to be expected either.