GPT-3 — Sophisticated Gimmick or Pathway to AGI?

10 min readNov 21, 2021

HAL 9000 — A fictional AGI from 2001: A Space Odyssey

About half a year ago, I came across a news report on GPT-3, and I have been fascinated by the technology, ever since. With no educational background in machine learning, I will stay away from in-depth explanations of how the technology works. I am much more interested in the social and legal implications that might follow from the birth of GPT-3 and similar AI language models.

In this post, I will explain what GPT-3 is, how it works, what it can be used for, and what implications it presents. Finally, I will tackle the more overarching, theoretical issue: Does GPT-3 lean towards being a sophisticated gimmick or the pathway to achieving Artificial General Intelligence (AGI)?

GPT-3 — What Is It?

In May 2020, the AI research and deployment company, OpenAI, published a paper titled Language Models Are Few-Shot Learners where they introduced the third generation “Generative Pre-trained Transformer” called GPT-3.

Provided with a few examples, GPT-3 can write anything you want it to write. Business ideas, news stories, novels, websites, screenplays, it can even make synthetic photos or videos, create apps, you name it. It can also make convincing personifications; see for instance Dr. Seuss’s poem about Elon Musk, or Tim Ferris’ interview with Marcus Aurelius (see here for more).

GPT-3 is a deep learning algorithm made for natural language processing. It differs from its older brothers, GPT-1 from 2018 and GPT-2 from 2020 because it is trained on much larger data sets. While GPT-2 was trained with 1.5 billion parameters of data from 8 million different webpages[1], GPT-3 is trained with 175 billion parameters from all corners of the internet and scanned books.[2] To put that into perspective, the entirety of articles on Wikipedia was included in GPT-3’s training set, and it only accounted for about 0,5% of the full training set.[3]

GPT-3 is exclusively licensed by OpenAI to Microsoft. Private access to GPT-3’s Beta application programming interface (API) can be granted upon request.[4]

How GPT-3 works

At the most basic level, a language model is an AI model that has been trained to predict the next word in a text based on the preceding word(s).[5] Not much unlike the AutoText function on a smartphone. Given any text the API of GPT-3 will return a text completion, attempting to match the pattern you gave it.[6] By showing GPT-3 a few examples of what you like it to do, you can also “program” it for a specific task.[7]

GPT-3 is task-agnostic which means it can perform tasks without any fine-tuning. Contrary to other language models that are task-specific, GPT-3 has the ability to recognize a pattern and perform a task with very few, or no prior examples. To use some technical terms, GPT-3 performs overall well on zero-shot, one-shot, or few-shot tasks. See the image below for an explanation of the three settings when translating English to French.[8]

In the example above, a human would likely know what to do from just the text instruction. However, in many other scenarios, humans are unable to understand the content or format of a task if no prior examples of the task are given. In this light, the OpenAI team comments in their paper[9] that the zero-shot setting in many cases is “unfairly hard”, while the one-shot setting most closely resembles how tasks are communicated to humans.

The OpenAI team evaluated GPT-3’s abilities in a wide variety of tasks with the zero-shot, one-shot, and few-shot setting. GPT-3 was tested in different language tests where it had to predict the last word of a sentence, finish a story, answer questions about broad factual knowledge, translate texts in different languages (English, French German, Romanian), answer questions about how the physical world works, answer multiple-choice questions based on a text, understand the relationship between two sentences, do calculations with up to five digits, solve word scramble puzzles, correct English grammar, use novel “fantasy words” in a sentence, and generate news articles.

Unsurprisingly, GPT-3 performed exponentially better the more examples it was “fed” prior to a task. Hence the title of the paper, “Language Models Are Few-Shot Learners”. Many of the few-shot results (typically 10 to 100 examples were given prior to a task), were only slightly behind, or sometimes even surpassing, fine-tuned state-of-the-art models.[10] What makes GPT-3 so ground-breaking is not its ability to excel at a specific language task, but rather its ability to rapidly adapt to new tasks and learn from examples.

GPT-3 — News Articles and Fake News

In my latest writing, I addressed how “fake journalists” with GAN-generated profile photos can publish and share articles on social media to spread false information or “fake news”. GPT-3 can generate full articles that are as good as impossible to distinguish from articles written by humans.

According to OpenAI’s research paper[11], the dataset used to train GPT-3 was not weighted towards news articles. As a result, GPT-3 often interpreted the proposed first sentence of a “news article” as a tweet and then posted synthetic responses or follow-up tweets. Therefore, the team employed GPT-3’s few-shot abilities by providing only three news articles to condition it on. With the title and sub-title of a proposed text, GPT-3 could go on to generate short articles in the news genre.

To investigate how well human evaluators could detect whether an article was written by GPT-3 or a human journalist, the OpenAI team selected 12 articles from Reuters and used the titles and sub-titles to generate 12 articles by GPT-3. 80 test participant was asked which one was written by a human, and which one was written by GPT-3. Only 52% of the time, the participant could tell the difference.

American university student Liam Porr, started a blog in 2020 where all content was written by GPT-3. Within two weeks the blog attracted 26.000 visitors and GPT-3’s post: Feeling unproductive? Maybe you should stop overthinking. became the number one most-read post on the tech-savvy social news site Hacker News. Out of the 26.000 visitors, only one user commented, perhaps jokingly, that the post looked like something GPT-3 could have written. The comment that was down-voted by the community and received sharply negative replies.

After the experiment, Liam Porr made an interesting observation[12]:

“GPT-3 is great at creating beautiful language that touches emotion, not hard logic and rational thinking.”

The reason why GPT-3 is not good at hard logic or rationale thinking, seems obvious to me: GPT-3 has no perception of reality. It is good at imitating human intelligence but has no understanding of the words it is generating. As a result, GPT-3’s comprehension of the world is oftentimes seriously off.[13] What’s more, as GPT-3 is trained with data from the internet, it reportedly reveals from time to time how it has been spilled in the internet’s toxicity. It indulges in bias and discrimination towards race, gender, and religion, spews profanities, makes racist jokes, condones terrorism, and accuses people of being rapists.[14]

To make an AI model like GPT-3 safe and ensure that it “benefits all of humanity” [15] (the mission stated in OpenAI’s Charter) it seems that developers have to improve on its capacity for reasoning and rational thinking. Essentially make it more human.

Ramping up to pass the Turing test

“The Turing test” is perhaps the most acknowledged benchmark for measuring if a machine is “intelligent”. It was proposed by Alan Turing’s in his paper “Computing Machinery and Intelligence” from 1950 under the name “the imitation game”[16]:

“It is played with three people, a man (A), a woman (B), and an interrogator © who may be of either sex. The interrogator stays in a room apart from the other two. The object of the game for the interrogator is to determine which of the other two is the man and which is the woman. He knows them by labels X and Y, and at the end of the game he says either ‘X is A and Y is B’ or ‘X is B and Y is A’ (..)
In order that tones of voice may not help the interrogator the answers should be written, or better still, typewritten. The ideal arrangement is to have a teleprinter communicating between the two rooms (..)
We now ask the question, ‘What will happen when a machine takes the part of A in this game?’ Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman?”

If a human can chat with A) a human and B) a machine, and after five minutes of questioning still not tell which one is A) and which one is B), the machine has per Turing’s definition passed the test.[17] Turing predicted that machines could pass the test in half a century, but so far, no computer or AI model has succeeded in that task.

Some researchers claim that it is impossible to program a machine into mastering human dialogue in its full generality for mathematical reasons.[18] Firstly, there is no mathematical model that could be used as a starting point for creating such as machine. Secondly, machine learning models cannot be extended to cope with human dialogue.

However, if GPT-3 can generate news articles that look like they are written by humans, and can fool readers 52% of the time, an AI model that can engage in convincing real-time human dialogue, does not seem that far-fetched.

GPT-3 already shows hints of possessing AGI, since it can perform well on a wide variety of tasks, and have the ability to learn new tasks as well. As mentioned above, GPT-3 was trained on 175 billion parameters. In comparison, the human brain consists of more than 125 trillion synapses. Synapses are channels in the central nervous system that allows a signal to pass from one neuron to the next.[19] They are also the loose inspiration for the parameters of an artificial neural network.[20] Imagine if GPT-4 is going to be ramped up with more than 125 trillion parameters — would that make it able to communicate like a human? It has indeed been rumored that GPT-4 will be about 100 trillion parameters.[21]

GPT-3 — The model to achieve AGI?

The initial question remains, is the model behind GPT-3 merely a sophisticated gimmick or the pathway to achieving AGI? A better way to formulate the question might be: Can AGI ever be achieved by building larger and larger models with more and more parameters? Personally, I have a hard time seeing the point where these unfathomable large AI models “come alive”.

As Canadian computer scientist Richard Sutton puts it in his essay The Bitter Lesson from March 2019[22]:

“(..) the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.”

In essence, I believe what Richard Sutton is saying here, is that AGI cannot be achieved by training existing models on larger and larger datasets. Instead, if we want to create truly intelligent systems, we’ll need another approach where we teach systems to learn in the same way that humans learn. The major obstacle is — obviously — that AIs for now are incapable of sensing and experiencing the world as we humans do.[23]

Based on these observations, I believe that GPT-3 leans more towards being a sophisticated gimmick than a pathway to AGI. However, only time can tell if we are truly building Frankenstein’s monster, a rocket ship, or just an increasingly greater source of entertainment and wonder.

[1] https://openai.com/blog/better-language-models/ (06–11–2021).

[2] Kindra Cooper (2021), OpenAI GPT-3: Everything You Need to Know à https://www.springboard.com/blog/ai-machine-learning/machine-learning-gpt-3-open-ai/ (09–11–2021).

[3] Noah Giansiracusa (2021), How algorithms create and prevent fake news — exploring the impact of social media, deepfakes, GPT-3, and more, pg. 33.

[4] https://openai.com/blog/openai-api/ (20–11–2021).

[5] https://medium.com/unpackai/language-models-in-ai-70a318f43041 (11–11–2021).

[6] https://openai.com/blog/openai-api/ (14–11–2021).

[7] Ibid.

[8] From Brown et al (July 22, 2020)., “Language Models are Few-Shot Learners,” July 22, -> https://arxiv.org/pdf/2005.14165.pdf.

[9] Ibid. pg. 6–7.

[10] Ibid. pg. 7.

[11] Ibid. pg. 25.

[12] https://liamp.substack.com/p/my-gpt-3-blog-got-26-thousand-visitors (31–10–2021).

[13] Gary Marcus & Ernest Davis (2020), GPT-3, Bloviator: OpenAI’s language generator has no idea what it’s talking about https://www.technologyreview.com/2020/08/22/1007539/gpt3-openai-language-generator-artificial-intelligence-ai-opinion/.

[14] https://www.wired.com/story/efforts-make-text-ai-less-racist-terrible/ (21–11–2021).

[15] https://openai.com/charter/ (07–11–2021).

[16] A. M. Turing (1950), Computing Machinery and Intelligence, Mind, Volume LIX, Issue 236, October 1950, Pages 433–460, https://doi.org/10.1093/mind/LIX.236.433.

[17] Ibid.

[18]J. Landgrebe & B. Smith (2019), There is no Artificial General Intelligence

[19] https://www.verywellhealth.com/synapse-anatomy-2795867 (21–11–2021).

[20] https://www.youtube.com/watch?v=kpiY_LemaTc (21–11–2021).

[21] https://www.wired.com/story/cerebras-chip-cluster-neural-networks-ai/ (21–11–2021).

[22] Rich Sutton (2019), The Bitter Lesson ->http://www.incompleteideas.net/IncIdeas/BitterLesson.html (21–11–2021).

[23] See Dreyfus HL, Dreyfus SE (1986). Mind over machine. Basil Blackwell.

GPT-3 — Sophisticated Gimmick or Pathway to AGI?

Written by TobiasMJ