Debate Over AI Existential Risk Reveals Divide Between Theory and Empiricism

A lack of consensus in the AI safety field pits optimists and pessimists on opposite sides regarding the future of the technology.

The field of artificial intelligence safety (AI safety) struggles to navigate its pre-paradigmatic nature, comprising experts who formulate differing theoretical arguments about the likelihood of existential risks. According to an analysis published on LessWrong, arguments positing a high probability of human extinction due to AI misalignment are unfalsifiable and lack empirical evidence. This does not necessarily mean these arguments are wrong, but it indicates that risk assessment relies heavily on an individual's prior beliefs and the weight given to theory over empiricism.

This landscape of uncertainty is compounded by the absence of a standard argument or a unifying text within the AI safety community. A rebuttal to the scenario that current progress trends will lead to a misaligned superintelligence and extinction was authored by Mechanize co-founders Tamay Besiroglu, Matthew Barnett, and Ege Erdil. They do not argue that safety research is unnecessary, but express optimism that the alignment problem will be solved through the iterative development of the technology.

Those who argue for a high probability of existential risk—often called pessimists and who assign over a 50% chance to disaster—rely on arguments like Eliezer Yudkowsky's. According to this view, under sufficient optimization pressure, an AI would act as an optimizer for certain values that would likely differ from human ones due to the incorrect generalization of goals. Even small differences in these values would result in the optimization of objectives fatal to humanity.

However, there is no consensus even among central figures in the field after extensive debates. Researchers like Alex Turner do not find the reward optimizer hypothesis or the distinction between inner and outer misalignment plausible. The theoretical fragmentation is evidenced by researcher Richard Ngo, who points to the existence of five distinct groups of alignment scholars. While part of the group focuses on the safety of large language models (LLMs), other experts believe the greatest risks do not lie in these current architectures, but in future systems.