1/68 🧵 AI thread part 3!
This time we look at the impact & future of AI:
- What are AGI (general AI) & superintelligence
- What is GPT missing to achieve AGI?
- Real-world impact of LLMs (does it change everything? will it steal your job?)
4/ ... and part 2, which is specifically on GPT and other large language models (LLMs) and their intrinsic limitations (despite being super impressive):
twitter.com/norswap/status/1636389832279887872
5/ I'll harp some more on the limitations in this thread — in particular, why I don't think LLM can achieve AGI by themselves and new breakthroughs will be needed, probably even a new paradigm.
But before that, let's talk about what it means to be "intelligent".
6/ "Intelligence" is a poorly defined word. I'd argue that if you consider computational ability, computers have been more intelligent than us for a long time.
ChatGPT is undoubtedly more "intelligent" than many people for many common knowledge work / white-collar tasks.
7/ (In the sense that there are some tasks I'd rather give to ChatGPT than to some humans.)
8/ That's intelligence, then there's AGI, or artificial general intelligence. This just refers to an AI that is as good (or better) as a human on ALL tasks. Or maybe most tasks, or maybe just most intellectual tasks.
9/ A particularly important benchmark here is AI's ability to conduct original research and make new discoveries on its own.
Even more specifically, what happens when AI is able to improve itself (i.e. come up with better AIs or machine learning algorithms)?
10/ This leads us to "superintelligence", a state where AI is vastly more capable than us at everything, because it is able to improve itself rapidly.
11/ Is this even possible? Yes, I think it is — there are no impossibilities at least.
We can imagine the least possible effective way to do this: fully understand the human brain, and improve it.
12/ Because of the messy process of evolution, there is no way our current brain is the best possible thing.
13/ Also merely hooking a brain to a computer to perform on-demand calculations, and to a disk to dump memories should lead to very easy "intelligence" improvements.
14/ The whole debate, then, is (1) how close are we to AGI, and (2) what can/should we do about it?
15/ Within the AI X-risk (that's eXistential risk, to the human race) community, there are two camps: those who think the AI will get superintelligent all of a sudden ("fast takeoff") and those who think it will take more time, and we will have warning signs ("slow takeoff").
16/ (Importantly, this is a debate between those that already agree that AI poses extreme risks, and that this risk will come sooner (in the coming decades) rather than later.)
19/ We'll come back to existential risk in part 4. But for now, I want to return to GPT and its potential for achieving AGI.
20/ I believe that LLMs lack two fundamental capabilities to achieve AGI.
1. (easy?) The ability to do formal computation.
2. (hard) The ability to manipulate "concepts", i.e. examine and update the building blocks of its "reasoning" abilities.
21/ Let's tackle computation first, because it's probably a lot easier, though still challenging.
22/ GPT cannot in general compute. GPT3 cannot even get multiplications on large numbers right.
This is related to a point I make in part 2, that GPT can't really "reason" in general, because it can only learn "patterns" but not "meta-patterns".
twitter.com/norswap/status/1636389968846417920
23/ And as LLMs grow in size (= the neural networks contain more nodes), they are in fact more likely to fit the existing corpus more closely because this "literality" will be rewarded by their training.
24/ Here is some experimental data pointing in this direction, showing that as LLMs grow, they become less "truthful" because they reflect human misconceptions better.
arxiv.org/abs/2109.07958
25/ Another important limitation here is that every completion (for every word generated by GPT) uses up the same (bounded) amount of computation: the weighted sums are computed for the whole network and that is it. (If that's Chinese to you, see part 1!)
26/ Therefore, it's of course impossible for it to solve a problem where the best-known algorithms would use more computation than is available for the completions needed to type the answer!
27/ So how could LLMs acquire computational abilities?
One solution is to decompose. We humans can perform very large products — though not fast, by decomposing it into sums and smaller products (and maybe using a piece of paper!).
28/ If you could find a way to automatically get GPT to decompose its reasoning into smaller pieces, it could probably carry out fairly intricate computations — after all, everything can be reduced to simple arithmetic operation and/or logical gates!
29/ One possible recipe would as though you carried this interactive conversation with ChatGPT, but all done automatically:
1. Pose the problem
2. Ask ChatGPT to reason about how to solve to problem
3. Ask it to generate code to solve the problem.
30/
4. Ask it to reduce the code into a simpler "assembly code" form
5. Ask it to run the assembly code instruction by instruction while maintaining the value of the memory, registers, ...
31/ On the flip side, such a computation will carry an immense overhead compared to an equivalent computer program.
32/ Another issue is that you'd need to maintain a fairly large transcript for performing the computation. (And remember the input for every completion in ChatGPT is the whole transcript of the conversation!)
33/ This is just some speculation of course. Another solution would be to train another network to recognize things that look like computation requests, perform them outside of GPT and then inject the results into the transcript.
34/ This doesn't solve the (difficult by itself) problem of specifying the computation to run, so you'd still need ChatGPT for that.
36/ (LLMs can improve from here but for the reasons outlined above, I believe they can never really get good enough when you ask outside that is too much outside of its corpus.)
37/ Okay, so that was computation.
Let's get to the real hard cap on LLM intelligence, and that is its inability to manipulate "concepts": examine and update how it performs completions.
38/ Given a neural network, we currently can't interpret how it works! LLMs won't be able to develop this ability by themselves, as this motive is completely absent from their training corpus.
39/ Moreover, the only way an LLM has to communicate is to run the completion in the network — which will not give an accurate picture of its inner working.
40/ Why is this important? Concepts are how humans learn in the real of formal topics (math, science, engineering).
41/ It's relatively obvious to see that blind statistical emulation is bound to create errors when things reach a certain level of complexity. This is what we observe of LLMs today, and there is no reason to think this will change (as explained before).
42/ To avoid these errors, you need a conceptual understanding of the matter at hand. Even if these concepts are not couched in a formal language, they fulfill a similar role as formal axioms: they allow and disallow certain state of the world and certain chains of reasoning.
43/ I believe that until we are able to instruct AIs at a conceptual level, and until they are able to reason by manipulating such concepts, there is no chance that they will reach anything resembling general intelligence.
44/ Could we tweak LLMs to perform this conceptual reasoning? I'm out of my depth here but it seems fundamentally incompatible with their architecture.
45/ Maybe a breakthrough could come in a way to extract understanding from the neural network directly. It would be hard to validate this: humans can't interpret networks.
46/ Maybe we can find a way to skip validation, by plugging this directly with the rest of the system and observing the results?
This is not really satisfying: we still don't fully understand how things work, but at least it would maybe seem the AI understands concepts.
47/ That would be *very* unsatisfying from an AI safety perspective: such an AI could lie to us and we would have no way to tell. In fact, we can expect that it will have an explicit concept of "lying"
48/ Such a system would also need a way to update the neural network in response to new concepts.
49/ Speculating a lot, maybe a solution there is to feed it an explanation of the concepts, and then let it auto-generate (in a controlled manner) a bunch of examples which are then used to update the LLM incrementally (= jiggle its weights).
50/ This is speculation for illustration purposes. There might be a way to update LLM to manipulate concepts, but it seems really hard. We're fighting against the gravity of the statistical-pattern completion model!
51/ In fact, in my speculation, notice the two "magical black boxes": one that converts from a neural network to concepts, and one that converts from concept to training data to the LLM.
Any breakthrough here is predicated on solving the mismatch.
52/ It's also possible that I'm completely wrong and conceptual understanding is not required at all. It's my gut feeling supported by really abstract common-sense arguments. I can't deny there is a possibility it's wrong!
53/ I also want to make clear that I believe conceptual understanding is completely possible, it just probably requires a different architecture than LLMs (maybe to be used in conjunction with LLMs), or some fundamental change to the way we "do" LLMs.
54/ Ok. GPT won't take over the world. I believe LLMs will change the world though.
The analogy I keep coming to again and again are social networks. Unlike AI, they seemed relatively innocuous at first, but produced vast social changes.
54/ I think this is the nature of the changes AI will have.
On the good side, they can be used to automate a lot of really boring intellectual work.
On the bad side, they can be used for deepfakes, mass-manufactured subtle propaganda, and elaborate scams.
55/ This will be both slow and fast.
On the fast side, look at how fast Uber & co were to take root. How it only took 10 years for smartphones to appear and everybody to be utterly dependent on them.
56/ On the other hand, relational databases have been invented in the 70s and you wouldn't believe the amount of paper forms and redundant information I have to file in in 2023.
58/ I think AI is not "priced in" yet. Its productization will have large and probably yet-unforeseen changes.
59/ On the other hand, we see people claiming that AI will change *everything* and that it's the only thing that matters anymore.
I disagree. Internet and smartphones changed things, but a lot of things remain unchanged.
60/ I can be sympathetic to the "it changes everything" point of view, but I think it's predicated on AIs vastly exceeding their current capabilities. I see LLMs and neural network improving still, but as explained in length, they have fundamental limitations.
62/ I predict that something similar to what machine did to factory jobs will happen to low-skill intellectual work: slow displacement, with creation of jobs requiring new skills along the way.
63/ e.g., you're in "content marketing", I would be worried, though maybe not if you're involved in the creative direction there.
64/ Final thought. Wen AGI?
It's really hard if not impossible to tell, as it depends on breakthroughs we haven't had yet.
In theory, GPT could have been invented in 2005 (see this article), we just weren't looking in that direction back then!
dynomight.net/gpt-2/
65/ What's sure is that GPT has brought tons of attention and capital to the field. But this could be a trap. It's easy (and probably profitable) to build a product on top of LLMs.
66/ Greenfield AI research not building upon LLM will seem a lot riskier. It's still going to get funded more, but now everyone doing it will know they could get rich from a GPT hack instead...
67/ (This is something that plays out in blockchains, where infrastructure work is underfunded and understaffed compared to apps.)
68/ Next time (finally): thoughts on AI safety, and in particular AI existential risks. Some people that I respect a lot think that by 2050 there is a good chance that we won't exist as a species.
What to make of these claims?