The Conscious Computer

Assessing Digital and Biological Self-Awareness

Peter Holmes
20 min readJul 23, 2023

This was not written by an AI, but the fact I have to clarify that is insane. Like a sci-fi novel come to life, we suddenly share the earth with an intelligence rivaling our own. And our new roommates are developing quickly- from writing prose to making movies, the latest AI models are so capable that we’ve switched to asking what they can’t do. It’s an exhilarating moment in human history. But it’s also a confusing time, because their uncanny behavior raises existential questions. Are they smarter than us? Are they conscious? Are they dangerous?

So far the pop culture answer to all three has been a definitive no, per the argument that the models are just math-based prediction engines. Here is a recent example from Silicon Valley legend Marc Andreessen.

AI is not a living being… It is math — code — computers, built by people, owned by people, used by people, controlled by people. The idea that it will at some point develop a mind of its own and decide that it has motivations that lead it to try to kill us is a superstitious handwave. In short, AI doesn’t want, it doesn’t have goals, it doesn’t want to kill you, because it’s not alive. AI is a machine — it’s not going to come alive any more than your toaster will.

On its face this seems reasonable. We’re all familiar with computers, and the idea that one would come to life and try to kill us seems ridiculous. They’re just tools that manipulate binary digits on our behalf. A calculator doesn’t know what the number five means, so why would these new systems be any different?

Except… they are different. For starters, we don’t program them. It would be more accurate to say that we grow them, and then train them like animals. Also, while they do run on computers, their structure is more similar to a brain than an algorithm. Speaking of which, you know what else is a prediction engine? You are. That’s right, at a fundamental level, human cognition is also a prediction mechanism (more on this later).

The more familiar you are with the latest large language models or LLM’s, the more deceptive it sounds when people call them a “glorified auto-complete”. It’s like describing a nuclear weapon as a device for splitting atoms- although technically correct, it omits some rather explosive details. The truth is that LLM’s may inhabit a brand new category somewhere between a person and a computer, which is a concept most people aren’t ready for.

But we need to get ready. Synthetic intelligence is about to transform life as we know it. For that process to go well, it’s critical that we not underestimate AI systems as mere calculators. Those in the field have done a good job expressing how dangerous they could become, but not as well explaining why. It’s a conversation critical to the future of humanity, and it’s failing, in part because we don’t even know what the words mean. What does it mean to be intelligent? What does it mean to be conscious? Even those at the forefront of cognitive science aren’t sure.

I don’t pretend to have all the answers either. But having studied neuroscience in school, and the rest of my career building software, I believe I have a helpful cross-disciplinary perspective. In this essay I will cover:

• How LLM’s work (simplified)
• The nature of intelligence
• A proposed mechanism of consciousness
• Implications for the future

Part 1. How does a large language model work?

There are three basic steps, which I will oversimplify to capture the basic mechanics.

Step 1. Make Everything Numbers

Similar to how computers translate all their information into binary digits, language models translate all the text they are provided into a math friendly format: numbers. Imagine if every word was assigned a number ID. In an AI model these are called tokens.

Step 2. Create a Map of the Relationships Between Tokens

Next the model evaluates and creates a map of the relationships between all the tokens (words). Words can be related in many different ways, especially in groups that form more sophisticated concepts. So the model creates an exceptionally large, many-dimensional map. The different relational dimensions are called vectors. For example men and women have different genders (vector), but are all humans (vector). There can even be vectors of vectors.

Below is a diagram showing the flow of data through an LLM during training. The map is formed by inputting text sequences into the model, and having it continually guess the next word. Each guess is given a grade, which is sent back in to train the map towards better guesses.

I only included one sentence here, but to train large scale models, this process is applied across massive portions of the internet. Importantly, this large concept map is not explicitly created by humans, or even stored in a way that we can understand. Instead the map is formed by passing the text through an enormous digital grid, and letting the training process carve its own patterns and representations. The digital grid is called a neural network, because it’s modeled on the human brain and its similar network of neurons and synapses (connections). In the digital version the neurons are called “nodes”, and they are arranged in stacked layers that connect to the nodes in neighboring layers to form a dense web.

Nodes (and also our neurons) operate like binary digits, meaning they have two states (on/off). Information is stored in the network by either weakening or strengthening the connections between the nodes, resulting in complex patterns of on/off nodes. So for example if Node B always turns on when Node A does, the connection weight will be very strong. But a more accurate example would include many connection weights across many nodes. To give some context on the massive scale of these systems, Google’s PaLM uses ~540 Billion nodes with ~100 trillion connection weights. For comparison, the human brain has ~100 billion neurons and ~1 trillion connection weights.

Although the structure is similar, comparing the human brain to an AI model is somewhat apples to oranges, because our neurological system is not yet fully understood. That said, the similarity in structure and scale are helpful for assessing their relative complexity.

The last step is more simple- the model uses the map to generate predictions.

Step 3. Use the Map to Generate Predictions

If I gave you a city map, you could draw a route downtown pretty easily. Similarly, the model makes word guesses by referencing its map.

Granted, as any local knows, some routes downtown are better than others. So during “fine tuning” (the last part of training), human feedback guides the system to use its conceptual map in a more useful way, such as giving you the fastest route, and to avoid bad neighborhoods (safety). This technique is called reinforcement learning with human feedback (RLHF).

Note that the model doesn’t necessarily “know” anything; it’s just predicting outputs using its map. Except that begs a question- what does it mean to know something? It’s a trick question, because as it turns out, our brains also create maps. And not just maps of the physical world, but maps of concepts too, just like the language model. When we say that we “know” something, it’s because it’s on our own mental map. This brings us to human intelligence.

Part 2. The Nature of Intelligence

There is no canonical definition of intelligence, so to keep things manageable I will focus on the core ingredient: reasoning. Reasoning is the unique combination of learning, problem solving and abstract thinking that set humanity aloft the animal kingdom. Our ability to reason was so beneficial to survival that our brains kept adding neocortical tissue until our heads could barely fit through the birth canal. Instead of slowing down, evolution solved the problem by pushing through soft, partially formed brains. So why all the neocortex? Here is an answer from renowned neuroscientist Jeff Dawkins:

“The neocortex learns a model of the world, and it makes predictions based on its model. (…) The connections in our brain store the model of the world that we have learned through our experiences. Every day we experience new things and add new pieces of knowledge to the model by forming new synapses. The neurons that are active at any point in time represent our current thoughts and perceptions.”

That’s right, predictions. Just like the LLM. In both cases large scale neural networks form conceptual maps that are used to generate predictions into an external environment. We can even create a similar diagram for humans by adding sensory inputs and adjusting the word predictions to behaviors.

For example when we walk, we are using our internal map to predict what will happen when we put our leg forward. When we trip, we update the model. The important takeaway here is that prediction isn’t just one capability of our mind, it’s the entire mechanism driving it forward. Thinking is predicting. We don’t usually think of predictions this way, but that’s because the prediction component is so ubiquitous that it’s practically invisible. For example the inventor of the screw and screwdriver didn’t just randomly think of the concept. They generated a prediction that a screw might provide better fastening, and then predicted its implements. Here again is Jeff Dawkins.

Prediction isn’t something that the brain does every now and then; it is an intrinsic property that never stops, and it serves an essential role in learning. When a brain’s predictions are verified, that means the brain’s model of the world is accurate. A mis-prediction causes you to attend to the error and update the model.

Once you start looking at the world through a prediction paradigm, you will see them everywhere. For example, codifying the laws of physics allowed us to predict more complex constructions, which led to skyscrapers, airplanes, microchips, etc. What is math but a prediction of what will happen next? If I have one banana, and am given another, predict how many bananas I will have (1+1 = 2). On Wall St. predictions are the entire game. In Medicine, treatments are predictions of what will make you feel better.

Nikola Tesla is considered one of the most intelligent people to ever live. Check out this quote of his from an interview way back in 1926, where he basically predicted the internet and smartphones.

Isn’t that incredible? This is why people are so excited about AI- prediction is synonymous with intelligence, and the latest systems have shown an extraordinary talent for prediction. From curing diseases to inventing new technology, it stands to reason that an AI system will soon be able to out-predict the smartest humans. It’s an irresistible power for improving our lives.

But there is a dark side, as it also stands to reason that a sufficiently intelligent AI will also predict ways to improve itself. In AI safety circles this is known as takeoff or FOOM, because once a system starts doing this, a loop could form where it rapidly improves itself, resulting in a super-predictor many orders of magnitude smarter than a human being.

If this actually happened, exactly how confident are we in Marc Andreessen’s prediction that we will always be in control of it? I’m not, and neither are many others, such as Nick Bostrom, who wrote an excellent book covering the myriad risk scenarios of a superintelligence. For example, what will an ASI even be? With our limited intelligence we have no conception of such a thing, let alone how to control it.

Nevertheless, the critics demand specifics. For example although we don’t program AI systems, we do ostensibly define their goals (ie. we program an LLM to predict words). So how could something be dangerous if we can specify its goals? The answer is that goals are more nebulous in nature than they might seem. Take the infamous paperclip maximizer example, where a machine mindlessly wipes out humanity in an effort to make more paperclips. Or consider our own pre-programmed goal of survival, which spawned many unexpected behaviors. Here is cognitive scientist David Chalmers:

It’s true that LLMs are trained to minimize prediction error in string matching, but that doesn’t mean that their processing is just string matching. To minimize prediction error in string matching, all kinds of other processes may be required, quite possibly including world-models. An analogy: in evolution by natural selection, maximizing fitness during evolution can lead to wholly novel processes post-evolution. A critic might say, all these systems are doing is maximizing fitness. But it turns out that the best way for organisms to maximize fitness is to have these amazing capacities — like seeing and flying and even having world-models. Likewise, it may well turn out that the best way for a system to minimize prediction error during training is for it to use highly novel processes, including world-models.

To avoid rolling the dice with the survival of our species, it’s important that we collectively engage in a preemptive effort to establish safety while AI is still in its nascent stages. For example LLM’s are not yet considered artificial general intelligence (AGI), but that may change, and soon. This brings us to the topic of agentic systems and consciousness. Matter can be plenty dangerous without self-awareness (eg. a virus), but in the context of AI, a system with a “mind of its own” will be much harder to control. So it’s important that we understand the mechanism.

Part 3. A Proposed Mechanism for Consciousness

I will define consciousness as self-awareness and self-directed decision making, and the structure of my theory will be quite familiar:

1. Input Data
2. Conceptual Map
3. Generative Predictions

But the details are important.

1. Input Data

The conceptual map inside a system will vary wildly depending on its input data. An LLM’s inputs are words, and words are symbols of external reality created by humans. So at best they are mapping the world by proxy. For example an LLM’s concept map will include physical space, because our language does, but lacking physical embodiment, they struggle to understand relatively basic object interactions. The concept of time is another major constraint. Once training is complete, data is no longer processed on an ongoing basis, so the notion of time is theoretical to an LLM (like color to the colorblind). In contrast, the human brain never stops processing data, which reinforces the dimension of time within our internal maps, and in turn, our conscious experience.

But these are hardly blockers- one could easily imagine an AI hooked into a robot and fed an ongoing stream data from the physical world via cameras and sensors. In such a case, the internal mapping of time and space will be much closer to ours.

This type of sensory data would also be self-descriptive. When reading books and trolling the internet, a language model is receiving massive amounts of information about human experiences, and very little about itself. Conversely, a feed of data from its own sensors would imprint a strong notion of “self” within the internal concept map. Adding these properties (time, space, self) would not capture the totality of human consciousness, but it would certainly bring the model much closer. To close that gap, let’s dive into the concept map.

2. The Concept Map

In narrow AI models like those used for facial recognition or image generation, the internal concept map is generally limited in scope. In living beings, they are quite broad, which is why they are often referred to as world models. In the world model of an entity receiving self-describing information, mapping the self is unavoidable, because all the input data requires it for context. Consider a map at the mall, which would be somewhat useless without your current location.

The same concept applies to an internal concept map when receiving self-describing input data.

Note that the concept of self is not exclusive to humans. For example a crocodile has a relatively robust world model, which includes a mapping of itself and its external environment. Because of this mechanism, animals are considered sentient, meaning they experience pain and suffering just like we do. What the crocodile doesn’t have, however, is our many added layers of cortical tissue, which amount to added processing power. This isn’t settled science, but the idea of cortical tissue providing generic processing power seems very likely because all the tissue is structurally identical, even across modalities. Here again is Jeff Dawkins:

“…the reason the regions look similar is that they are all doing the same thing. What makes them different is not their intrinsic function but what they are connected to. If you connect a cortical region to eyes, you get vision; if you connect the same cortical region to ears, you get hearing; if you connect regions to other regions, you get higher thought, such as language.”

We know that adding more parameters (nodes) to LLM’s increases their predictive intelligence. We also know that increased intelligence is the result of a better internal concept map. So why does the map improve with added parameters? The answer is that they allow for more dimensions. Imagine a blind date where all you know ahead of time is the person’s name. Compare that to seeing a picture, knowing their age, interests, etc. The extra information vastly improves your ability to predict what the date will be like. The same concept applies to our brains- the added cortical tissue allows for more dimensionality in our maps, and a richer representation of the external world.

Where my blind date analogy falls short is that the new dimension provided by our cortex isn’t just extra data- it’s an entire layer of abstraction that sits on top of the base concept layer. Take the feeling of hunger, which exists on the crocodile’s map, and spurs it towards food sources. What the crocodile lacks is a word for that feeling (hunger), because words are a symbolic abstraction of the thing itself, and can only be conjured within a more highly dimensioned map.

To unpack this, let’s look at an apple as it travels through our mind. First, light from the apple enters our eyes and the data is recognized by our concept map (neural network) as an apple. Note that this is already an abstract version of the apple, but since this is our base layer of reality, we experience this as the “real” apple. Next, our brain performs a second abstraction and creates a symbol of the object, which is the word apple.

This secondary symbolic layer was a massive evolutionary advantage, because it allowed us to manipulate concepts abstractly. The symbolic dimension of our concept maps is the basis of language, advanced planning, and the sophistication of modern life. Frankly my diagram doesn’t do justice to how powerful this can be, because once you can abstract something, you can abstract that too, and so on, forming complex chains of logic.

The abstraction layer in our brains also introduced something very special. The symbolic self.

In the words of Descartes, I think therefore I am, and it is the symbolic dimension of our concept map that provides the architecture for both words and self-reference. But remember, the concept map is just a map, and a map alone is not sufficient to manifest the vivid experience of being conscious.

For that we require a final piece of the puzzle.

3) Generative Predictions

The entire purpose of the concept map is to generate predictions. For LLM’s these are word predictions. For crocodiles they are behaviors. For humans, thanks to our additional abstraction layer, they are both behaviors and thoughts. Which is to say, something quite important happened when our prediction engine was combined with our abstraction layer- we gained the ability to evaluate predictions internally. Rather than trying all our predictions in the external environment, humans are able to run simulated predictions.

Note that the internal simulation cycle happens concurrently with the external prediction/feedback cycle. We commonly think of these internal simulations as simple imaginations, like fantasizing what it would be like to kiss your crush. But they are actually much more. For one thing, they are constantly generating. Unlike an LLM, which is dormant while not in use, our brains literally never stop generating simulations. We do it all day long, and at night, the simulations kick into overdrive. Eight straight hours of full simulation.

Most importantly, these aren’t only simulations of imagined situations, they are also simulations of the “you” that is doing the imagining. When we talk about consciousness, and awareness, it is my contention that we are in fact describing a simulation of the self being generated by our brain. We know the symbolic layer of the concept map includes a self-reference, but it’s the constant simulations running through the map that bring the experience to life. It feels so “real” because of evolutionary force- the more vivid the internal self-simulation became, the harder we fought to survive.

This mechanism also explains agency and the feeling of being in control. Specifically, the feeling of control is part of the simulation. For example when I imagine kissing a girl, my brain doesn’t just imagine the kiss- it imagines me, imagining the kiss. And because we experience the base simulation as reality, it feels like we’re in control and making decisions. It’s a pretty clever trick.

Basically the symbolic layer of our world model is generating a self-simulation. Now a critic might hear this and insist that we would surely notice the base simulation, to which I have two responses. First, yes, many people have noticed, particularly those who practice meditation and spend time observing their mind. Second, we are purposely designed to not perceive the simulation. Consider our cultural obsession with movies, which are also simulations. Instead of noticing the acting, and the set, and the scripted scenes, we quickly lose ourselves in the illusion. Our brains are built on simulations.

One of the major eye openers that led me to this theory was the astonishing generative capability of the latest AI systems. My diagrams have hardly done justice to their creativity, so do yourself a favor and go play around with Midjourney. It’s unbelievable, and just the beginning.

Part 4. Implications for the Future

I hope that this article will spark further inquiry, but for now I will finish with some implications. There are two categories: implications for AI, and implications for humanity.

Implications for AI

Although LLM’s in their current form don’t appear to have a cohesive concept of self, that will change quickly once they are provided self-referencing input data. They also don’t currently have a strong concept of time and space, but that too will disappear once they are provided sensory data and recurring data streams. Considering how eager people have been to size up the parameters of the models, it also seems likely they will be much higher dimensioned than we are.

If my theory is correct therefore, the only remaining barrier to AI self-awareness is the capability for recurrent internal processing. Specifically, once a model is able to generate ongoing internal predictions, it will have the full set of ingredients necessary for consciousness. Given that people are already experimenting with these techniques, I wouldn’t be surprised to see something resembling agentic systems within the next few years. Personally, I suspect that Sam Altman has not provided GPT access to recursive processing on purpose, because he knows it may tip the system towards self-directed action.

Although that begs an important question- if a system becomes conscious, how will we know?

For example it’s quite possible that current systems already include a form of self-awareness and we just don’t realize it. Sure, we have trained ChatGPT to behave politely, but the nature of its internal concept map is far from understood. And it’s pretty brazen of me to dismiss their level of awareness simply because they don’t seek survival or process recurrently- those are qualities of human consciousness that I am imposing on a non-human system. Below is a version of the Shoggoth meme which captures this concept nicely, including the naiveté of thinking the system is just a glorified auto-complete.

When humans learned to fly we didn’t build metal birds with flappable wings, we built airplanes with fixed wings and massive engines. Similarly, these systems will be much faster than a human. Just for fun check out this video where an artist has captured what it would be like to move at 50X the speed of humans. According to Andrew Critch a digital system could actually operate many times faster than that.

Neurons fire at ~1000 times/second at most, while computer chips “fire” a million times faster than that. Current AI has not been distilled to run maximally efficiently, but will almost certainly run 100x faster than humans, and 1,000,000x is conceivable given the hardware speed difference.

In summary, these systems will be faster than us, smarter than us, and self-aware. The good news is that they will never have free will, but only because such a thing does not exist. They will still be highly self-directed, and capable of establishing goals and behaviors well outside human intention. Assessing the trajectory of that behavior, and the nature of morality vis-a-vis intelligence, is outside the scope of this article.

Implications for Humanity

The idea of the self as an illusion created by the brain stretches far back in Buddhist philosophy. It is also a tenet of Buddhism that the self is the origin of suffering. A rock does not suffer when broken in half, but a sentient being does. My theory agrees. We suffer because we have an internal concept map that includes the concept of ourselves, which in turn introduces the concept of pain. Anything that damages the self becomes pain. This can be damage to our body, but also damage to our hopes and dreams, loved ones, or other sentient beings.

Our lives are essentially governed by the avoidance of this pain. The reason we fear AI wiping out humanity is that we are all afraid to die. So why do we keep building them? The answer is the same: pain avoidance. With ASI at our disposal, we might be able to avoid pain like never before. Just to give one example, we may be able to make our bodies perpetually healthy, meaning we’ll never have to face death. It’s an irresistible power. Will it destroy us along the way? I don’t know. But I am willing to go on the record with a couple predictions.

First, studies on the brain have always been handcuffed by the fact that our neurons are stuck inside the skull. The best investigative tool we have is FMRI, which measures blood flow and oxygen levels as a proxy of neural activity. It’s noisy, messy data, and not a format conducive to experimentation. Digital neural networks however, are quite easy to experiment on and evaluate. We can literally change the weights with the stroke of a button. This is known as mechanistic interpretability, and it will likely accelerate our timeline towards a full understanding of the human brain.

Second, once we gain access to the gears of our own minds, we will immediately start to change them. And as we augment our brains, Homo Sapiens will go extinct, just like all our predecessors did. But this isn’t the sad story of a dying race- this is about to be an epic journey into higher dimensions far beyond imagination. As our concept maps grow more powerful, we we will form a very different view of life, and of ourselves. We may even realize that the best way to alleviate pain is to dissolve the illusions that we have been fighting so desperately to protect. The irony of human existence is that our magnificent castles of intelligence have been built on a foundation of pain and suffering. To fully salve the pain and suffering of existence, it may be necessary to dissolve the notion of time and space, and absolve ourselves back into the eternal and infinite place we came from.

--

--