The key to real AI might be imagination

Take the following example:

You’re driving a car. Along the road — at a few meters distance — are many parked cars. Behind the cars is a large meadow, sloping up gently. In the distance on the meadow you can see a school. You instantly recognise the fact that children may be around and may suddenly appear from between the parked cars. You slow down or at least increase your attention.

What is happening here? Well, what is happening is that you do not react to what you see, but to what you — realistically — can imagine. Your behaviour isn’t only governed by reacting to your senses (even a flower closing at night can do that), it is governed by possibilities. By opportunities and risks. As I have written elsewhere, Adriaan de Groot researched the difference between the best amateurs and the top players in chess (he got his doctorate on it in 1946). His observation: it wasn’t that the top players could calculate deeper or more, it was that they saw more possibilities before they started calculating.

So, why are self-driving cars so hard to create? Because only reacting to the sensors isn’t enough by far. You need to react to what isn’t there but what could be there. Is ever better reacting to what is there going to be enough to surpass reacting to what could be there? It is possible (after all, we did it in chess, the go example is actually less clear cut), but it might very well be not doable for most other normal human domains, such as driving cars.

‘Seeing possibilities’ is what happens a bit in the ARC-AGI test for AI from François Chollet too, I think.

The puzzles presented by ARC-AGI have a special characteristic: instances of the puzzles have a high variability, while what we humans would consider the core of doing them is much more stable. Or, in other words: it is much easier to handle that variation if you have mastered the core. Mastering the core, understanding what it is about, is what true intelligence is capable of. If you do, you can handle the outliers much better, because they’re outliers of the variation, but not of the core. ARC-AGI produces instances which do not repeat/reuse a previous pattern. Every puzzle (link to playable daily puzzle) in ARC-AGI is in a way a new challenge.

Why does a system like GPT work so well on PhD level exams? It might very well be because in the end, the situations asked in the test mostly aren’t ‘outliers’, so they can brute-force scoring well without having mastered the core. But brute-forcing skills isn’t the same as intelligence, as François points out in his extremely worthwhile original paper. A PhD-level skill isn’t the same as a PhD-level intelligence. (And let us not forget that people may be cognitively agile but still not smart, and other people may be cognitively not that fast, but far from dumb as Dietrich Bonhoeffer so eloquently argued). Anyway, Gary Marcus is right to point out the ‘outlier problem’ over and over again, when GenAI-expectations veer into AGI-like territory over and over again.

True intelligence will require realistic imagination (or ‘constrained imagination’). This is related to — as Erik Larson has written in his book The Myth of AI — C.S. Peirce’s ‘abduction’, Which in turn I guess is based on having mastered many ‘cores’ into a fused ‘making sense’ (sometimes called: common sense, another problematic aspect for GenAI that Gary often mentions). But we don’t just need an internal world model built from such cores to get intelligence, we need a world model with very fuzzy (and potentially chaotic) boundaries, because we need to be able to go beyond ‘what is the case’. Our model enables us to imagine how a school bus lying on its side looks like without ever having seen one (having it in our ‘training material’, as it were).

Common sense, surprisingly, seems to require imagination. And this imagination is constrained by our sense of what is real. (link to a short video that is about an illusion that isn’t an illusion at all).

You’re driving a car. Along the road — at a few meters distance — are many parked cars. Behind the cars is a large meadow, sloping up gently. Between the cars and the meadow is a high fence, apparently brand new. In the distance on the meadow you can see a school. You instantly recognise the fact that children may be around and suddenly appear from between the parked cars, but the fact that the fence is there weakens that risk. You do not slow down as much but you do increase your attention.

It is absolutely amazing that our brain can do all that with the speed it has and the energy it uses (about 20W). And I think it will be a while until self driving cars look at the school in the distance or the fence just behind the parked cars. Self-driving cars may require actual intelligence, unless we constrain them in another way.

We humans can go ‘beyond what is the case’. It is where our creativity lies too. Creativity is a normal part of intelligence, because without ‘going beyond what is the case’ intelligent behaviour simply is too brittle.

You’re driving a car. Next to the road is a large meadow, sloping up gently. There are no parked cars. In the distance on the meadow you can see a school. You can see far ahead and there are no children in sight. You do not slow down as much and your attention remains as it is.

You get my drift.

The difference between a self-driving car that gets into problems when outliers are presented, and a self-driving car that reacts intelligently may be its imagination. Who’d have figured…?

And that may hold for Enterprise Architecture as well…

(Slightly) adapted from the final part of Let’s call GPT and Friends: ‘Wide AI’ (and not ‘AGI’)

PS. If you look at young children, the very young ones are constantly playing with reality, learning how it behaves (I once saw a baby pluck endlessly at its sleeve and could not escape the impression that it was effectively learning on how cloth, sleeves, etc. behave). Then, when that basis has been laid, the imaginary phase really kicks off with imagination, that is in the end fused with a sense of reality to turn us into the ‘constrained confabulators’ that we too are. What the GenAI people however are now learning is that constrained-token-or-pixel-confabulation is an incredibly inefficient way of getting to constrained-human-or-beyond-imagination (a.k.a. intelligence). And even fusing it (in a yet unknown way) with symbolic models probably won’t be efficient. At least, I’m not holding my breath.

Near-concurrently nearly identically published on LinkedIn