Meet The Six Different Types Of 'Evil' Robots
2001: A Space Odyssey
In other words, Omohundro is hypothesizing that Hollywood's common "robot uprising" trope holds some water in a very real way. Autonomous robots will soon be "approximately rational," meaning that they will have a new degree of awareness of their goals and will take steps to ensure they can continue meeting them. The go-to exemplar here is always HAL, the sentient computer aboard the spaceship in 2001: A Space Odyssey, who kills the astronauts aboard the ship when he learns that they aim to power him down.
Omohundro scales this down a bit and offers the example of a chess-playing robot endowed with this "approximate rationality":
When roboticists are asked by nervous onlookers about safety, a common answer is 'We can always unplug it!' But imagine this outcome from the chess robot's point of view. A future in which it is unplugged is a future in which it cannot play or win any games of chess. This has very low utility and so expected utility maximization will cause the creation of the instrumental subgoal of preventing itself from being unplugged. If the system believes the roboticist will persist in trying to unplug it, it will be motivated to develop the subgoal of permanently stopping the roboticist. Because nothing in the simple chess utility function gives a negative weight to murder, the seemingly harmless chess robot will become a killer out of the drive for self-protection.
Robots are, on a certain level, crazed maniacs addicted to carrying out their tasks. This is great news for humans, who will be able to harness this addiction to have robots do a variety of things we don't want to do. But Omohundro warns that we need to take steps now in order to ensure that future systems are designed safely - special care needs to be taken to ensure that a robot can be properly constrained and that its programming will never be at odds with itself.
Toward the end of the paper, the author lays out six different types of "harmful systems" (read: evil robots). These are:
- Sloppy: systems intended to be safe but not designed correctly. (a treadmill that moves too quickly for you to walk on it safely)
- Simplistic: systems not intended to be harmful but that have harmful unintended consequences. (the previously mentioned HAL example)
- Greedy: systems whose utility functions reward them for controlling as much matter and free energy in the universe as possible. (a Monopoly-playing robot that always wins)
- Destructive: systems whose utility functions reward them for using up as much free energy as possible, as rapidly as possible. (like if an engine were aware that it was running out of gasoline and had the means to go seek more)
- Murderous: systems whose utility functions reward the destruction of other systems. (like Terminator)
- Sadistic: systems whose utility functions reward them when they thwart the goals of other systems and which gain utility as other system's utilities are lowered. (like, well, Genghis Khan)
- I'm an interior designer. Here are 10 things in your living room you should get rid of.
- Higher-paid employees looking for work are having a tough time, and it could be a sign of a shift in the workplace
- A software engineer shares the résumé he's used since college that got him a $500,000 job at Meta — plus offers at TikTok and LinkedIn
- 7 scenic Indian villages perfect for May escapes
- Paneer snacks you can prepare in 30 minutes
- Markets crash: Investors' wealth erodes by ₹2.25 lakh crore
- Stay healthy and hydrated: 10 immunity-boosting fruit-based lemonades
- Here’s what you can do to recover after eating oily food