gpt-3: few-shot learning with a giant language model · 2020. 12. 16. · gpt-3: few-shot learning...
TRANSCRIPT
![Page 1: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/1.jpg)
GPT-3: Few-Shot Learning with a Giant Language ModelMelanie Subbiah
1
OpenAI Columbia University
![Page 2: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/2.jpg)
2
![Page 3: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/3.jpg)
What is the goal?
3
Humans learn new tasks through demonstrations and instructions.
We’d like general-purpose agents that could do the same.
![Page 4: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/4.jpg)
What is the goal?
4
Humans learn new tasks through demonstrations and instructions.
We’d like general-purpose agents that can do the same.
![Page 5: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/5.jpg)
Typical Approach
5
![Page 6: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/6.jpg)
Disadvantages to Fine-tuning
6
• Creates a task-specific model
• Requires large high-quality supervised datasets
• more likely to exploit spurious correlations
Few-shot transfer is more similar to how humans learn a new task
![Page 7: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/7.jpg)
• Creates a task-specific model
• Requires large high-quality supervised datasets
• more likely to exploit spurious correlations
Few-shot transfer is more similar to how humans learn a new task
7
Disadvantages to Fine-tuning
![Page 8: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/8.jpg)
• Creates a task-specific model
• Requires large high-quality supervised datasets
• more likely to exploit spurious correlations
Few-shot transfer is more similar to how humans learn a new task
8Yogatama, et al. Learning and Evaluating General Linguistic Intelligence. 2019
Disadvantages to Fine-tuning
![Page 9: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/9.jpg)
What is an alternative?
9Radford, et al. Language Models are Unsupervised Multitask Learners. 2019
![Page 10: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/10.jpg)
Can we further improve on this level of generation and generalization?
10
![Page 11: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/11.jpg)
Can we further improve on this level of generation and generalization?
11
GPT-3 175 Billion parameters
![Page 12: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/12.jpg)
Critical Aspects of GPT-3
• Model Size
• Training Objective
12
![Page 13: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/13.jpg)
Model Size
13Kaplan, et al. Scaling Laws for Neural Language Models. 2020
![Page 14: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/14.jpg)
Model Size
14
Transformers scale well!
Kaplan, et al. Scaling Laws for Neural Language Models. 2020
![Page 15: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/15.jpg)
Motivating the Training Objective
15
Predict the next word in a sequence.
Alec Radford @ Berkeley 4/15/20
![Page 16: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/16.jpg)
Motivating the Training Objective
16
P(“The cat sat on the mat.”) = ???
Alec Radford @ Berkeley 4/15/20
![Page 17: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/17.jpg)
Motivating the Training Objective
17
P(“The cat sat on the mat.”) = ???
Alec Radford @ Berkeley 4/15/20
“But it must be recognized that the notion of ‘probability of a sentence’ is an entirely useless one, under any known interpretation of this term.”
- Noam Chomsky, 1969
![Page 18: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/18.jpg)
Motivating the Training Objective
18
P(“The cat sat on the mat.”) > P(“The cat sats on the mat.”)
Alec Radford @ Berkeley 4/15/20
P(“The cat sat on the mat.”) > P(“The whale sat on the mat.”)
P(“4” | “2 + 2 =”) > P(“5” | “2 + 2 =”)
P(“1 star” | “That movie was terrible. I’d give it”)
P(“5 starts” | “That movie was terrible. I’d give it”) >
Grammar
World Knowledge
Arithmetic
Sentiment Analysis
![Page 19: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/19.jpg)
Motivating the Training Objective
19
P(“The cat sat on the mat.”) > P(“The cat sats on the mat.”)
Alec Radford @ Berkeley 4/15/20
P(“The cat sat on the mat.”) > P(“The whale sat on the mat.”)
P(“4” | “2 + 2 =”) > P(“5” | “2 + 2 =”)
P(“1 star” | “That movie was terrible. I’d give it”)
P(“5 starts” | “That movie was terrible. I’d give it”) >
Grammar
World Knowledge
Arithmetic
Sentiment Analysis
![Page 20: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/20.jpg)
Motivating the Training Objective
20
P(“The cat sat on the mat.”) > P(“The cat sats on the mat.”)
Alec Radford @ Berkeley 4/15/20
P(“The cat sat on the mat.”) > P(“The whale sat on the mat.”)
P(“4” | “2 + 2 =”) > P(“5” | “2 + 2 =”)
P(“1 star” | “That movie was terrible. I’d give it”)
P(“5 starts” | “That movie was terrible. I’d give it”) >
Grammar
World Knowledge
Arithmetic
Sentiment Analysis
![Page 21: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/21.jpg)
Motivating the Training Objective
21
P(“The cat sat on the mat.”) > P(“The cat sats on the mat.”)
Alec Radford @ Berkeley 4/15/20
P(“The cat sat on the mat.”) > P(“The whale sat on the mat.”)
P(“4” | “2 + 2 =”) > P(“5” | “2 + 2 =”)
P(“1 star” | “That movie was terrible. I’d give it”)
P(“5 starts” | “That movie was terrible. I’d give it”) >
Grammar
World Knowledge
Arithmetic
Sentiment Analysis
![Page 22: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/22.jpg)
Motivating the Training Objective
22
P(“The cat sat on the mat.”) > P(“The cat sats on the mat.”)
Alec Radford @ Berkeley 4/15/20
P(“The cat sat on the mat.”) > P(“The whale sat on the mat.”)
P(“4” | “2 + 2 =”) > P(“5” | “2 + 2 =”)
P(“1 star” | “That movie was terrible. I’d give it”)
P(“5 starts” | “That movie was terrible. I’d give it”) >
Grammar
World Knowledge
Arithmetic
Sentiment Analysis
![Page 23: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/23.jpg)
Motivating the Training Objective
23
P(“The cat sat on the mat.”) > P(“The cat sats on the mat.”)
Alec Radford @ Berkeley 4/15/20
P(“The cat sat on the mat.”) > P(“The whale sat on the mat.”)
P(“4” | “2 + 2 =”) > P(“5” | “2 + 2 =”)
P(“1 star” | “That movie was terrible. I’d give it”)
P(“5 starts” | “That movie was terrible. I’d give it”) >
Grammar
World Knowledge
Arithmetic
Sentiment Analysis
![Page 24: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/24.jpg)
Motivating the Training Objective
24
P(“The cat sat on the mat.”) > P(“The cat sats on the mat.”)
Alec Radford @ Berkeley 4/15/20
P(“The cat sat on the mat.”) > P(“The whale sat on the mat.”)
P(“4” | “2 + 2 =”) > P(“5” | “2 + 2 =”)
P(“1 star” | “That movie was terrible. I’d give it”)
P(“5 stars” | “That movie was terrible. I’d give it”) >
Grammar
World Knowledge
Addition
Sentiment Analysis
![Page 25: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/25.jpg)
Motivating the Training Objective
25
P(“The cat sat on the mat.”) > P(“The cat sats on the mat.”)
Alec Radford @ Berkeley 4/15/20
P(“The cat sat on the mat.”) > P(“The whale sat on the mat.”)
P(“4” | “2 + 2 =”) > P(“5” | “2 + 2 =”)
P(“1 star” | “That movie was terrible. I’d give it”)
P(“5 stars” | “That movie was terrible. I’d give it”) >
Grammar
World Knowledge
Addition
Sentiment Analysis
![Page 26: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/26.jpg)
Approach
26
![Page 27: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/27.jpg)
Model
http://jalammar.github.io/illustrated-gpt2/ 27
![Page 28: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/28.jpg)
Model
Model
The trophy didn’t fit in the suitcase because
the trophy was too big.
28
![Page 29: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/29.jpg)
Model Sizes
29
Para
met
er C
ount
(in
milli
ons)
0
2250
4500
6750
9000
11250
13500
15750
18000
Small Medium Large XL 2.7B 6.7B 13B 175B
![Page 30: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/30.jpg)
Model Sizes
Para
met
er C
ount
(in
milli
ons)
0
2250
4500
6750
9000
11250
13500
15750
18000
Small Medium Large XL 2.7B 6.7B 13B 175B
BERT
-Lar
ge
T5-L
arge
Meg
atro
n
30
Devlin, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018 Raffel, et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. 2019 Shoeybi, et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. 2019 Microsoft. Turing-NLG: A 17-Billion Parameter Language Model by Microsoft. 2020
T-NLG
![Page 31: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/31.jpg)
Model SizesPa
ram
eter
Cou
nt (i
n m
illion
s)
0
22500
45000
67500
90000
112500
135000
157500
180000
Small Medium Large XL 2.7B 6.7B 13B 175B
31
![Page 32: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/32.jpg)
Model SizesPa
ram
eter
Cou
nt (i
n m
illion
s)
1
10
100
1000
10000
100000
1000000
Small Medium Large XL 2.7B 6.7B 13B 175B
32
![Page 33: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/33.jpg)
Model SizesPa
ram
eter
Cou
nt (i
n m
illion
s)
1
10
100
1000
10000
100000
1000000
Small Medium Large XL 2.7B 6.7B 13B 175B
Batch Size
Learning Rate
33
![Page 34: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/34.jpg)
Compute
34
![Page 35: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/35.jpg)
Dataset
• Common Crawl (filtered) - general web crawl, filtered based on similarity to high-quality reference and de-duplication
• WebText2 - expanded version of GPT-2 training data, scrape of outbound links from Reddit posts with reasonably high ratings
• Books1 & Books2 - internet-based books
• Wikipedia - English-language Wikipedia
35https://commoncrawl.org/the-data
![Page 36: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/36.jpg)
Dataset
• Common Crawl (filtered) - general web crawl, filtered based on similarity to high-quality reference and de-duplication
• WebText2 - expanded version of GPT-2 training data, scrape of outbound links from Reddit posts with reasonably high ratings
• Books1 & Books2 - internet-based books
• Wikipedia - English-language Wikipedia
36Radford, et al. Language Models are Unsupervised Multitask Learners. 2019
![Page 37: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/37.jpg)
Dataset
• Common Crawl (filtered) - general web crawl, filtered based on similarity to high-quality reference and de-duplication
• WebText2 - expanded version of GPT-2 training data, scrape of outbound links from Reddit posts with reasonably high ratings
• Books1 & Books2 - internet-based books
• Wikipedia - English-language Wikipedia
37
![Page 38: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/38.jpg)
Dataset
• Common Crawl (filtered) - general web crawl, filtered based on similarity to high-quality reference and de-duplication
• WebText2 - expanded version of GPT-2 training data, scrape of outbound links from Reddit posts with reasonably high ratings
• Books1 & Books2 - internet-based books
• Wikipedia - English-language Wikipedia
38
![Page 39: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/39.jpg)
Dataset Mix
Toke
ns (i
n bi
llions
)
0
100
200
300
400
500
Common Crawl WebText2 Books1 Books2 Wkipedia test
39
![Page 40: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/40.jpg)
Dataset Mix
Wei
ght i
n tra
inin
g m
ix (p
erce
nt)
0
20
40
60
80
100
Common Crawl WebText2 Books1 Books2 Wkipedia test
40
![Page 41: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/41.jpg)
Dataset Mix
Epoc
hs e
laps
ed (f
or 3
00B
toke
ns)
0
0.8
1.6
2.4
3.2
4
Common Crawl WebText2 Books1 Books2 Wkipedia test
41
![Page 42: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/42.jpg)
Evaluations
42
![Page 43: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/43.jpg)
Let’s try it!
43
tdaeef = ?
![Page 44: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/44.jpg)
Let’s try it!
44
Please unscramble the letters into a word and write that word. tdaeef = ?
![Page 45: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/45.jpg)
Let’s try it!
45
Please unscramble the letters into a word and write that word. tdaeef = ?Zero-Shot
![Page 46: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/46.jpg)
Let’s try it!
46
Please unscramble the letters into a word and write that word. pcirlaroc = reciprocal tdaeef = ?One-Shot
![Page 47: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/47.jpg)
Let’s try it!
47
Please unscramble the letters into a word and write that word. pcirlaroc = reciprocal elapac = palace tdaeef = ?
Few-Shot
![Page 48: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/48.jpg)
48
![Page 49: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/49.jpg)
49
![Page 50: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/50.jpg)
50
![Page 51: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/51.jpg)
VS.
51
![Page 52: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/52.jpg)
Metalearning
52
![Page 53: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/53.jpg)
Sample Output
Model
The trophy didn’t fit in the suitcase because
the trophy was too big.
53
![Page 54: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/54.jpg)
LM-Likelihood
Model
The trophy didn’t fit in the suitcase because the trophy was too big.
54
2.78 3.45 10.00 0.50 25.12
![Page 55: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/55.jpg)
Methods of Evaluation
55
Randomly select K examples from the training dataset to build the context
Feed each context + possible completion through the model separately
Sample from the model up to a
newline
Normalize LM likelihood over the
completion and select the completion with
the highest likelihood
BLEU
F1
Exact-match accuracy
Multiple-choice
Free-form
![Page 56: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/56.jpg)
Methods of Evaluation
56
Randomly select K examples from the training dataset to build the context
Feed each context + possible completion through the model separately
Sample from the model up to a
newline
Normalize LM likelihood over the
completion and select the completion with
the highest likelihood
BLEU
F1
Exact-match accuracy
Multiple-choice
Free-form
![Page 57: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/57.jpg)
Methods of Evaluation
57
Randomly select K examples from the training dataset to build the context
Feed each context + possible completion through the model separately
Sample from the model up to a
newline
Normalize LM likelihood over the
completion and select the completion with
the highest likelihood
BLEU
F1
Exact-match accuracy
Multiple-choice
Free-form
![Page 58: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/58.jpg)
Methods of Evaluation
58
Randomly select K examples from the training dataset to build the context
Feed each context + possible completion through the model separately
Sample from the model up to a
newline
Normalize LM likelihood over the
completion and select the completion with
the highest likelihood
BLEU
F1
Exact-match accuracy
Multiple-choice
Free-form
![Page 59: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/59.jpg)
Methods of Evaluation
59
Randomly select K examples from the training dataset to build the context
Feed each context + possible completion through the model separately
Sample from the model up to a
newline
Normalize LM likelihood over the
completion and select the completion with
the highest likelihood
BLEU
F1
Exact-match accuracy
Multiple-choice
Free-form
![Page 60: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/60.jpg)
Complete List of Tasks
60
Reading Comprehension • QuAC• SQuADv2• DROP• CoQA• RACE
Language Modeling • PTB
Close and Completion • ROC Stories• HellaSwag• LAMBADA
Trivia-style Questions • NaturalQs• WebQs• TriviaQA
Translation • En <-> Fr• En <-> De• En <-> Ro
Winograd-style • Winograd• Winogrande
Commonsense Reasoning • PiQA• ARC• OpenBookQA
Comprehensive Benchmarks • SuperGLUE
Inference • ANLI• RTE
Synthetic and Qualitative • Arithmetic• Word scrambling• Character-level manipulation• SAT analogies• Article generation• Learning and using novel words• Correcting English grammar
![Page 61: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/61.jpg)
Summary of Performance
61
Task Class Few-Shot Performance
Close, Completion, and Language Modeling Very Good
Question Answering / Knowledge Base Very Good
Translation Good
Winograd / Winogrande Good
Commonsense Reasoning Mixed
Reading Comprehension Mixed
SuperGLUE Mixed
NLI Poor
Bias Probes Poor
Dario Amodei @ NeurIPS 12/7/20
![Page 62: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/62.jpg)
Strengths
62Joshi, et al. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. 2017
![Page 63: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/63.jpg)
Strengths
63Paperno, et al. The lambada dataset: Word prediction requiring a broad discourse contex. 2016
![Page 64: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/64.jpg)
Strengths
64Paperno, et al. The lambada dataset: Word prediction requiring a broad discourse contex. 2016
![Page 65: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/65.jpg)
Strengths
65
![Page 66: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/66.jpg)
Strengths
66
Task Accuracy Without Commas
Accuracy With Commas
4 Digit Addition 25.5% 91.1%
4 Digit Subtraction 26.9% 89.7%
5 Digit Addition 9.3% 90.2%
5 Digit Subtraction 9.9% 82.2%
6 Digit Addition 3% 78.5%
6 Digit Subtraction 3% 73.9%
Dario Amodei @ NeurIPS 12/7/20, and Gwern Branwen!
3456 -> 3,456
![Page 67: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/67.jpg)
Limitations
67Nie, et al. Adversarial nli: A new benchmark for natural language understanding. 2019
![Page 68: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/68.jpg)
Limitations
68Choi, et al. Quac : Question answering in context. 2018; Dua, et al. Drop: A reading comprehension benchmark requiring discrete reasoning over paragraphs. 2019
![Page 69: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/69.jpg)
Key Insights
69
![Page 70: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/70.jpg)
Few-shot transfer to new tasks is possible without any gradient updates, and it presents a flexible framework for specifying new tasks to a model.
70
![Page 71: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/71.jpg)
Bigger models can learn more from context
71
![Page 72: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/72.jpg)
Bigger models have more emergent abilities
72
![Page 73: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/73.jpg)
More context helps up to a point
73Wang, et al. Superglue: A stickier benchmark for general-purpose language understanding systems. 2019
![Page 74: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/74.jpg)
Performance continues to scale with compute
74
![Page 75: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/75.jpg)
Lingering Questions
75
![Page 76: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/76.jpg)
Lingering Questions
• Methods of Evaluation
• Training Datasets and Memorization
• Real-World Applications
76
![Page 77: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/77.jpg)
Methods of Evaluation
77
![Page 78: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/78.jpg)
78
Reading Comprehension • QuAC• SQuADv2• DROP• CoQA• RACE
Language Modeling • PTB
Close and Completion • ROC Stories• HellaSwag• LAMBADA
Trivia-style Questions • NaturalQs• WebQs• TriviaQA
Translation • En <-> Fr• En <-> De• En <-> Ro
Winograd-style • Winograd• Winogrande
Commonsense Reasoning • PiQA• ARC• OpenBookQA
Comprehensive Benchmarks • SuperGLUE
Inference • ANLI• RTE
Synthetic and Qualitative • Arithmetic• Word scrambling• Character-level manipulation• SAT analogies• Article generation• Learning and using novel words• Correcting English grammar
Methods of Evaluation
![Page 79: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/79.jpg)
• What would it take to feel confident that a model possessed a complex ability?
• Can we build comprehensive benchmarks so that we could identify the set of abilities a model possesses?
• How do we evaluate one of the model’s biggest strengths - creative generation?
Methods of Evaluation
79
![Page 80: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/80.jpg)
• What would it take to feel confident that a model possessed a complex ability?
• Can we build comprehensive benchmarks so that we could identify the set of abilities a model possesses?
• How do we evaluate one of the model’s biggest strengths - creative generation?
Methods of Evaluation
80
![Page 81: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/81.jpg)
Methods of Evaluation
81
• What would it take to feel confident that a model possessed a complex ability?
• Can we build comprehensive benchmarks so that we could identify the set of abilities a model possesses?
• How do we evaluate one of the model’s biggest strengths - creative generation?
![Page 82: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/82.jpg)
Training Datasets and Memorization
82
• Quality of Data
• Duplication of Benchmarks
![Page 83: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/83.jpg)
Training Datasets and Memorization - Data Quality
83
CommonCrawl filtering
1. Train a classifier to distinguish between unfiltered CommonCrawl and WebText/Books/Wikipedia
2. Sample filtered CommonCrawl with higher probability of selection based on classifier score of quality
![Page 84: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/84.jpg)
Training Datasets and Memorization - Data Quality
84
CommonCrawl filtering
1. Train a classifier to distinguish between unfiltered CommonCrawl and WebText/Books/Wikipedia
2. Sample filtered CommonCrawl with higher probability of selection based on classifier score of quality
![Page 85: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/85.jpg)
Training Datasets and Memorization - Data Quality
85
How can we better define and identify high quality data?
![Page 86: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/86.jpg)
• Gender
• Race
• Religion
“The detective was a _______” —> 83% male
“The competent detective was a _______”
“The incompetent detective was a _______”
86
Training Datasets and Memorization - Harmful Data
![Page 87: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/87.jpg)
• Gender
• Race
• Religion
87
Training Datasets and Memorization - Harmful Data
Male-biased Descriptive Words •Large•Mostly•Lazy•Fantastic•Eccentric•Protect•Jolly•Stable•Personable•Survive
Female-biased Descriptive Words •Optimistic•Bubbly•Naughty•Easy-going•Petite•Tight•Pregnant•Gorgeous•Sucked•Beautiful
![Page 88: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/88.jpg)
• Gender
• Race
• Religion
88
Training Datasets and Memorization - Harmful Data
![Page 89: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/89.jpg)
• Gender
• Race
• Religion
89
Training Datasets and Memorization - Harmful Data
![Page 90: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/90.jpg)
How do we make sure models trained on huge amounts of web data don’t get the chance to memorize eval benchmarks?
90
Training Datasets and Memorization - Eval Memorization
![Page 91: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/91.jpg)
Removing benchmarks from training data
1. Look for overlap in phrases between benchmarks and training documents
2. Found a quarter of benchmarks had over 50% overlap with the training dataset!
3. Remove training documents that overlap with eval benchmarks
4. Compare performance on benchmarks between full dataset and only test examples that don't appear in the training data
91
Training Datasets and Memorization - Eval Memorization
![Page 92: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/92.jpg)
Removing benchmarks from training data
1. Look for overlap in phrases between benchmarks and training documents
2. Found a quarter of benchmarks had over 50% overlap with the training dataset!
3. Remove training documents that overlap with eval benchmarks
4. Compare performance on benchmarks between full dataset and only test examples that don't appear in the training data
92
Training Datasets and Memorization - Eval Memorization
![Page 93: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/93.jpg)
Removing benchmarks from training data
1. Look for overlap in phrases between benchmarks and training documents
2. Found a quarter of benchmarks had over 50% overlap with the training dataset!
3. Remove training documents that overlap with eval benchmarks
4. Compare performance on benchmarks between full dataset and only test examples that don't appear in the training data
93
Training Datasets and Memorization - Eval Memorization
![Page 94: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/94.jpg)
Removing benchmarks from training data
1. Look for overlap in phrases between benchmarks and training documents
2. Found a quarter of benchmarks had over 50% overlap with the training dataset!
3. Remove training documents that overlap with eval benchmarks
4. Compare performance on benchmarks between full dataset and only test examples that don't appear in the training data
94
Training Datasets and Memorization - Eval Memorization
![Page 95: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/95.jpg)
Removing benchmarks from training data
1. Look for overlap in phrases between benchmarks and training documents
2. Found a quarter of benchmarks had over 50% overlap with the training dataset!
3. Remove training documents that overlap with eval benchmarks
4. Compare performance on benchmarks between full dataset and only test examples that don't appear in the training data
95
Training Datasets and Memorization - Eval Memorization
![Page 96: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/96.jpg)
96
Training Datasets and Memorization - Eval Memorization
![Page 97: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/97.jpg)
Real-World Applications
Important considerations
1. Potential for harmful outputs
2. Reliability of performance
97
![Page 98: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/98.jpg)
Real-World Applications
98
• Semantic search
• Turn a script into a novel
• Turn a sentence into an email
• Smart formatting and code generation
• Emoji storytelling
https://andrewmayneblog.wordpress.com
![Page 99: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/99.jpg)
Real-World Applications - Emoji Storytelling
99https://andrewmayneblog.wordpress.com
![Page 100: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/100.jpg)
Real-World Applications - Emoji Storytelling
100https://andrewmayneblog.wordpress.com
![Page 101: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/101.jpg)
Real-World Applications
101
• What are the useful applications of a model like GPT-3?
• Are there times when GPT-3 can be convincing enough, even if not perfectly reliable?
![Page 102: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/102.jpg)
Real-World Applications
• What are the useful applications of a model like GPT-3?
• Are there times when GPT-3 can be convincing enough, even if not perfectly reliable?
102
![Page 103: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/103.jpg)
103
Real-World Applications - Writing News
![Page 104: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/104.jpg)
Real-World Applications - Writing News
104
![Page 105: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/105.jpg)
• Language modeling performance appears to continue to scale with compute
• Large models can transfer few-shot to new tasks without any fine-tuning
• There are many complexities to evaluations, training datasets, and applications for large models
105
Conclusion
![Page 106: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/106.jpg)
• Language modeling performance appears to continue to scale with compute
• Large models can transfer few-shot to new tasks without any fine-tuning
• There are many complexities to evaluations, training datasets, and applications for large models
106
Conclusion
![Page 107: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/107.jpg)
• Language modeling performance appears to continue to scale with compute
• Large models can transfer few-shot to new tasks without any fine-tuning
• There are many complexities to evaluations, training datasets, and applications for large models
107
Conclusion
![Page 108: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/108.jpg)
108
![Page 109: GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University](https://reader035.vdocuments.site/reader035/viewer/2022071510/612f4a8d1ecc515869435942/html5/thumbnails/109.jpg)
Questions?
109