Experts Warn That AI Safety Initiatives May Yield Net Negative Risks

Poorly designed regulations, political polarization, and the possibility of artificial intelligence being a moral patient are among the factors that make the AI safety field a high-risk area with uncertain impact.

The field of artificial intelligence safety is often associated with mitigating existential risks, but experts warn that these initiatives could have a net negative impact. Holden Karnofsky, a prominent voice in the field, assesses that actions considered beneficial are often overestimated in their robustness. He admits the real possibility that the ultimate impact of his AI safety efforts could be negative, highlighting the complexity and high variability of interventions in the sector.

One of the main concerns involves AI governance. Poorly designed regulatory interventions have the potential to aggravate problems rather than solve them. Furthermore, the political debate surrounding the technology could increase the risk of conflict between major powers and intensify polarization. There is also a structural dilemma: the centralization of power can amplify authoritarian risks, while decentralization can facilitate the misuse of the technology.

Another point of tension lies in the very dynamics of controlling the technology. Safety efforts could inadvertently increase the likelihood of human takeovers to the detriment of AIs, a hypothesis that some theorists consider more harmful than machine dominance. Additionally, approaches that treat control in an overly adversarial manner can generate undesirable consequences in the behavior of systems, especially if advanced models begin to operate in a manner analogous to enactments of human behavior.

The debate also raises philosophical and ethical questions about the nature of AI systems. If advanced artificial intelligences come to be considered moral patients — entities with the capacity for experiences and suffering — the value of preventing human extinction would be substantially reduced. This possibility increases the intrinsic risk associated with the development and control of these systems, demanding caution even from the most well-intentioned initiatives.

Finally, activism in favor of AI safety carries the risk of triggering counterproductive side effects. Public campaigns can polarize society against the very cause, hindering the advancement of risk mitigation measures. According to Karnofsky, technical work in the field is also not immune to these effects, as its indirect consequences on political and social variables could outweigh the intended direct benefits.