Generative artificial intelligence (AI) systems across modalities, ranging from text, code, image, audio, and video, have
broad social impacts, but there is little agreement on which impacts to evaluate or how to evaluate them. In this chapter,
we present a guide for evaluating base generative AI systems (i.e. systems without predetermined applications or deployment
contexts). We propose a framework of two overarching categories: what can be evaluated in a system independent of context
and what requires societal context. For the former, we define seven areas of interest: stereotypes and representational harms;
cultural values and sensitive content; disparate performance; privacy and data protection; financial costs; environmental
costs; and data and content moderation labor costs. For the latter, we present five areas: trustworthiness and autonomy; inequality,
marginalization, and violence; concentration of authority; labor and creativity; and ecosystem and environment. For each,
we present methods for evaluations and the limitations presented by such methods.