All major technological innovations lead to a number of positive and negative consequences. For AI, the spectrum of possible outcomes, from the most negative to the most positive, is extraordinarily wide.
That the use of AI technology can cause harm is obvious, because it is already happening.
AI systems can cause harm when people use them maliciously. For example, when they are used in politically motivated disinformation campaigns or to enable mass surveillance.12
But AI systems can also cause unintended harm, when they act differently than intended or fail. For example, in the Netherlands, authorities used an AI system that falsely claimed that an estimated 26,000 parents had fraudulently claimed childcare benefits. The false allegations caused hardship for many poor families and also led to the resignation of the Dutch government in 2021.13
As AI becomes more powerful, the potential negative impacts could be much greater. Many of these risks have rightfully received public attention: more powerful AI could lead to mass displacement of labor or extreme concentrations of power and wealth. In the hands of autocrats, it could foster totalitarianism through its suitability for mass surveillance and control.
The so-called AI alignment problem is another extreme risk. This is the concern that no one would be able to control a powerful AI system, even if the AI took actions that would harm us humans or humanity in general. Unfortunately, this risk is receiving little attention from the general public, but many leading AI researchers see it as an extremely large risk.14
How could an AI escape human control and end up harming humans?
The risk is not that an AI becomes self-aware, develops bad intentions, and “chooses” to do so. The risk is that we try to instruct the AI to pursue some specific goal, even a very worthwhile one, and in the pursuit of that goal end up harming humans. These are unintended consequences. The AI does what we told it to do, but not what we wanted it to do.
Can’t we tell the AI not to do these things? It’s definitely possible to build an AI that avoids any particular problem we foresee, but it’s hard to foresee all possible unintended consequences. The alignment problem arises because of “the impossibility of defining real human purposes correctly and completely,” as AI researcher Stuart Russell puts it.15
So can’t we turn off the AI? That might not be possible either. This is because a powerful AI would know two things: it faces the risk of being shut down by humans, and it cannot achieve its goals once it has been shut down. As a consequence, the AI will pursue a very fundamental goal of making sure it doesn’t shut down. That’s why, once we realize that an extremely intelligent AI is causing unwanted damage in pursuit of some specific target, it may not be possible to disable it or change what the system is doing. 16
This risk – that humanity might not be able to maintain control once AI becomes very powerful, and that this could lead to extreme catastrophe – has been recognized since the early days of AI research more than 70 years ago .17 The Very Fast The development of AI in recent years has made a solution to this problem much more urgent.
I have tried to summarize some of the risks of AI, but a short article does not have enough space to address all possible questions. Especially regarding the worst risks of AI systems and what we can do now to reduce them, I recommend reading Brian Christian’s book The Alignment Problem and Benjamin Hilton’s article “Preventing an AI-related catastrophe.”
If we manage to avoid these risks, transformative AI could also have very positive consequences. Advances in science and technology were crucial to many positive developments in human history. If artificial intelligence can augment ours, it could help us advance the many big problems we face: from cleaner energy, to replacing unpleasant jobs, to much better healthcare.
This extremely large contrast between the possible positives and negatives makes it clear that the stakes are unusually high with this technology. Reducing the downside risks and solving the alignment problem could mean the difference between a healthy, prosperous and wealthy future for humanity, and its destruction.