Recent advances in Large Language Models (LLMs) have revitalized interest in Agent-Based Models (ABMs) by enabling “generative” simulations, with agents that can plan, reason, and interact through natural language. These developments promise greater realism and expressive power, but also revive long-standing concerns over empirical grounding, calibration, and validation—issues that have historically limited the uptake of ABMs in the social sciences. This paper systematically reviews the emerging literature on generative ABMs to assess how these long-standing challenges are being addressed. We map domains of application, categorize reported validation practices, and assess their alignment with the stated modeling goals. Our review suggests that the use of LLMs may exacerbate rather than alleviate the challenge of validating ABMs, given their black-box structure, cultural biases, and stochastic outputs. While the need for validation is increasingly acknowledged, studies often rely on face-validity or outcome measures that are only loosely tied to underlying mechanisms. Generative ABMs thus occupy an ambiguous methodological space—lacking both the parsimony of formal models and the empirical validity of data-driven approaches—and their contribution to cumulative social-scientific knowledge hinges on resolving this tension.