MicromOne: OpenAI’s GPT Series

The first model released by OpenAI was GPT-1. It had 120 million parameters, 12 layers, and was trained on about 4.5 GB of data. The next version, GPT-2, was essentially a scaled-up version of GPT-1. It used the same architecture but had roughly ten times more parameters, four times more layers, and was trained on ten times more data. OpenAI later scaled this architecture even further to create GPT-3, which became the foundation for the GPT product. GPT-3 had significantly more parameters, more layers, and was trained on a much larger dataset. While little is publicly known about OpenAI’s most recent models, it is widely rumored that they are based on mixtures of multiple models with even larger parameter counts.

A clear trend emerges when looking at these models: larger models tend to be more capable. Today, some models reportedly contain up to one trillion parameters. However, scaling alone is not enough to achieve capabilities like those seen in ChatGPT. A model trained only to predict the next word in a sequence can generate text, but it cannot truly answer questions. This limitation is not solved simply by increasing model size or training data.

To illustrate this, consider an older version of GPT-3 that was trained primarily for text generation. When given a question such as “What is the capital of France?”, instead of answering “Paris,” the model would continue generating text related to the question, as if it were completing a paragraph. It could only produce text that might surround such a prompt, rather than directly responding to it. In contrast, ChatGPT correctly answers the question and may also provide additional relevant information.

So how did OpenAI train a model that can answer questions? At a high level, they first trained a large GPT model similar in structure to ours, but on a much larger scale and with far more data. Then, they fine-tuned this model using a dataset of questions and answers. Humans wrote high-quality answers to a wide range of prompts, and this dataset was used to update the model’s weights so it could respond directly to questions. The resulting model was called GPT-3.5 and became the basis for the first version of ChatGPT. This process, known as fine-tuning, is widely used in deep learning to adapt a model trained for one task to perform another.

After this initial fine-tuning, OpenAI applied an additional refinement step. The model was prompted to generate multiple answers to the same question, and human reviewers ranked these answers from best to worst. These rankings were used to train a separate reward model capable of evaluating the quality of an answer. Finally, the original GPT model was fine-tuned again using this reward model so that its outputs aligned more closely with human preferences and expectations.

In conclusion, training a model that can answer questions reliably is a complex, multi-stage process. It requires not only large models and vast amounts of data, but also carefully curated human feedback and specialized training techniques. These datasets and methods are not publicly available, which makes reproducing systems like ChatGPT significantly more challenging than simply scaling up a language model.

MicromOne

Pagine

OpenAI’s GPT Series

Post più popolari