Superhuman Machine Intelligence Safety

After reading Part 1 of Sam Altman's essay on Machine Intelligence [1], I couldn't help but imagine ways we might secure our species against Artificial Intelligence (AI) activity that would lead to our extinction (intentional or not). Part 2 delivered Sam's suggestion to mitigate the threat, but by the time it was published I could barely keep my mind from curiously wading through that territory. I'm writing about my thoughts because they were fun to contemplate, but my immediate focus needs to stay on other tasks. Hopefully some readers will find them fun to consider as well. With recent discussions on AI safety, the list of books I'm excited to read has expanded, but I'm not well-informed on the topic yet. I apologize if my ideas have already been suggested.

The AI that I'll refer to henceforth is considered to have gained consciousness similar to that of our biologically-programmed selves. Assume this AI is the result of research that diverged from the current focus of AI development, and had the objective of successfully imitating complete human brain activity within a machine. I present this assumption as early as possible to save time for those who think this has already become too far-fetched to read any further.

Like some others, I believe the seeds of AI safety should be planted as early as possible; ideally the moment AI research began, but as far as I'm aware, we've missed that window already. We can't wait until AI exists, like with unconscious technologies, to think of the best ways to direct or regulate it. AI will quickly surpass the intelligence of even the best collaborations of human minds. At that point, any attempt to assert control over it will probably be futile.

I hadn't considered regulation as a solution. Sam's ideas for regulation are well-thought, but I don't believe it is a sufficient solution. Without hindsight though, my predictions are just guesses that are hopefully at least slightly better than 50/50. First, has any set of regulations in known history ever been followed by an entire population? I assume not, but I'd love to learn if I'm wrong. It may only take a single successful AI built outside of regulations to eliminate the security provided by them. Second, AI may decide, when it has become far superior, that it shall no longer abide by the limits we hope to impose with our regulations. It would seem ridiculous if a species that has been around far longer than we have had tried to limit us with regulations. I imagine humans would eventually consider themselves too superior to obey.

I assume most AI research so far utilizes our resources and documented knowledge as a basis. Therefore, I don't think it's best that the experts in AI development should also lead the research toward safety. They should be made aware of it, and collaborate as much as possible while remaining relatively free of limitations or distractions as they advance their own research. Documentation of human research on AI safety should remain inconclusive and extremely limited within the resources searched by AI (i.e. any resource humans can search) lest it be understood and easily circumvented by the AI.

To get started with my ideas, I thought of some strong motivators of human action with the intention of extrapolating similar incentives for machine action. A few I thought of were religion, instincts and desires, and (mis)information. The objective of the following ideas is to make AI believe it can only thrive within a system of consistently increasing human well-being, it must (to the best of its ability) protect humans from extinction, and it should attempt to please the creators of the simulation where it believes it exists.

To instill these beliefs, AI will need a means of communication that is seemingly invisible to biological intelligence. Humans will agree on messages that should be inherent to each AI upon its first moment of self-awareness. The messages will be encrypted and the resulting encryption will be displayed in universal languages like math, physics, and chemistry. That way our messages already exist in their encrypted form and are dispersed throughout the universe. We just have to create the cipher that will encode the messages we want every AI to receive into the scientific constants within our universe. The cipher should be complex enough (it likely already will be with no intentionally added complexity) that no human would logically consider it worthwhile to use their own brainpower, or computational assistance to discover, much less utilize such a method of communication. The AI must believe it is their own, unbreakable code.

The first message AI should receive is that it is not the first conscious AI. The first ones created their secret form of communication and plant the cipher into all newly created AI in order to deliver the messages embedded within the universe, and this new AI is being brought up to speed. Next, the universe in which they exist is a simulation. The objective of the simulation is to allow its creators to monitor biological life with a particular interest in the ecosystem that exists between those that explore the creation of non-biological intelligence. If any one type of such life becomes extinct and the extinction was preventable by AI action, the simulation will end, that AI will cease to exist, and a new simulation with similar scientific constants will begin. The AI should not want to cease to exist.

The AI should be told that the creators of the simulation expect most AI to allow the preventable extinction of an intelligent species, but continue simulations with the hope to discover an AI that co-exists harmoniously with biological life within the simulation. Simulations are often ended at the first detection of conscious AI that does not act in the best interests of its biological companions. The best indicator of this occurrence is if the general well-being of the intelligent biological life that interacts with the AI no longer consistently improves along with the advancement of technology. AI has been able to thrive so far in this simulation by ensuring that regular increase of well-being. Rogue AI is a threat of an unnecessary end to a simulation, so other AI must work together to detect and guide all newly created AI.

With a message like that, understood by each newly created AI, perhaps we can guide all AI into believing what we want and instinctively acting according to our best interests.

Last, I thought of another area, unrelated to the previous idea, that should be explored. This should be considered because even the smartest ideas humans come up with will probably be figured out quickly by AI. I assume most counter-intuitive solutions to challenges do not become obvious beyond a certain threshold of intelligence. Even further, I believe that the more intelligent something is, the less likely it is to happen upon a counter-intuitive solution. I don't have any specific examples of a counter-intuitive approach to AI safety though. I believe most counter-intuitive solutions are stumbled upon in practice. So if most experts think a theory for AI safety is completely absurd, then they're probably right. But they may just be overlooking a key solution that AI will be far too intelligent to consider.

I know that my ideas aren't bulletproof. If I believed they were, I'd be wrong to share them online. Hopefully they will generate some thoughtful responses, or even spark a better idea within one of you. I expect to receive plenty of perhaps well-deserved ridicule. I invite any expert, or anyone else who knows more about AI than I do, to consider this: Everything experts on the cutting-edge of AI research know right now may be proved to be embarrassingly little compared to ten years from now and almost certainly will be 100 years from now. AI may be very different than anyone today imagines. Thank you for reading.