🪴 Hayul's digital garden

Search

Search IconIcon to open search

Language Models are Few-Shot Learners

Last updated Dec 23, 2022 Edit Source

Goal

이 논문에서 저자들은 GPT-3 언어 모델을 사용하여 가설을 테스트하고 있다. GPT-3는 1,750억 개의 매개 변수를 가지고 있으며 문맥 정보를 사용하여 학습할 수 있다. 저자들은 이 모델을 사용하여 작업에 빠르게 적응할 수 있는 능력과 few shot 시나리오에서 학습에 대한 숙련도를 평가하고 있다.

PLM을 이용한 Fine-tuning이 만능인가?

The approach of pre-trained language models has had significant success and has allowed for more efficient tasks, but it requires having a task-specific dataset for fine-tuning. This means that to get strong performance on any specific task, a dataset of a large number of examples related to that task is needed.

아니다, fine-tuning을 위한 데이터셋이 필요하다. 이러한 한계를 없애야하는 이유는 여러가지가 있다.

그러면, 다른 방향이 있는가?

# Term definition

Tip

When building a machine learning model, the batch size and learning rate are important factors that must be considered. Generally, for larger models, you can use a larger batch size - which means that more data is fed in at once during training - but a smaller learning rate, which dictates the pace at which the model learns.

# GPT in reading comprehension

On DROP [DWD+19 ], a dataset testing discrete reasoning and numeracy in the context of reading comprehension, GPT-3 in a few-shot setting outperforms the fine-tuned BERT baseline from the original paper but is still well below both human performance (…)

# Limitations