Language Modeling (LM) is one of the most important tasks in modern Natural Language Processing (NLP). LM is a probabilistic model which helps us to predict the next word or character in a document.

Generative Pre-Trained Transformer (GPT) can be considering as the game changer in the field of natural language understanding and a front runner in Language Modeling. It touches a number of diverse tasks such as textual entailment, answering question, document classification and evaluating semantics similarity. It deals with large unlabeled text which is abundant in nature and always presents a challenge. The GPT harness generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. GPT is very effective and outperforms discriminatively trained models that use architectures specifically crafted for each task.

There are three versions available, GPT, GPT-2 and GPT-3 till now.

1. GPT

GPT  [1] is trained with a causal language modeling (CLM) objective so that it can become as powerful to predict next token in the sequence.  The proposed GPT framework [1] aims to build a strong NLU. It uses a single task-agnostic model followed by generative pre-training and discriminative fine-tuning. With the introduction of pre-training on vast amount of text, GPT has acquired a healthy knowledge, which eventually helping us to solve discriminative tasks such as:

  • Question answering,
  • Semantic similarity assessment,
  • Entailment determination, and
  • Text classification

GPT utilizes a semi-supervised method for NLU, it uses a combination of unsupervised pre-training and supervised fine-tuning. GPT uses a large amount of unlabeled text and several datasets which are very well annotated pre-hand training examples (target tasks). GPT follows a 2 stage training procedure. It uses language modeling objective on unlabeled data and learn the parameters of neural network model.  In second phase, these parameters are used to a target task using the corresponding supervised objective.

2. GPT 2

In the year 2019, OpenAI [2] published GPT-2, which was trained to recognize words in the vicinity.  GPT-2, is a transformer-based language,  is very much applicable in coherent writings as shown in Figure 1.

Transformer architecture.PNG

Figure 1:  Transformer Architecture [1]

The system, which is a general-purpose language algorithm, utilized AI to remodel the language processing abilities. Leveraging above mentioned feature of GPT (1st version), it allows GPT-2 to generate syntactically consistent text. GPT-2 has given a new direction as we talk about text data. GPT-2, like its 1st generation GPT, is a pre-trained language model which we can use for various NLP tasks, such as:

  • Text generation
  • Language translation
  • Building question-answering systems, and so on.

The full GPT-2 model contains:

  1. 1.5 billion Parameters, which used to a target task.  This helps to generate efficient results with summary.  
  2. It contains 8 million web pages collected from outbound links.
  3. It contains 40Gb texts

After training, it harnesses Transformers concept (proposed by Google), an encoder-decoders mechanism to detect input-output dependencies. The previously generated symbols are used as inputs for upcoming outputs. After this another additional normalization layer is added which lets it to generate a whole article. This is a very important development as other NLP models can only generate word, or at higher end can find the missing word in the sentence.

3. GPT 3

This is the third version of the NLP and it can do some outstanding things. Developed by OpenAI’s Generative Pretrained Transformer [3], GPT-3, is LM which can interpret text, answer questions, and accurately compose text. It analyzes a series of words, text, and other information then focuses on those examples to deliver a unique output as an article or a picture.

Following are the details based on which GPT 3 works:

  • It has huge data bank constitutes some astonishing and power packed computer models called neural nets. These neural nets fed by Gigantic data bank of English sentences enables it to identify patterns and take decisions accordingly
  • It constitutes 175 billion learning parameters that enables it to perform practically any task it is assigned. It is even bigger than Microsoft Corp’s Turing-NLG algorithm, which has 17 billion learning parameters.

GPT-3’s capabilities are remarkable. Following could be the potential applications

  • It can compose fiction,
  • It can generate working code,
  • It can make business minutes meetings etc,

In GPT 3, we can provide the inputs as text and it produces the best outcome in the form of probable text. It takes this information i.e. the text input by user and the output and creates a subsequent piece. With training on this huge data, GPT 3 can perform anything, however it also suffers with few challenges when compared to human intelligence:

  • It cannot produce the right results when we input some ideas into it. We cannot expect correct actions when concepts will be given to it. It means it cannot handle intelligence questions.
  • Another drawback is, it produces out word by word, which sometimes difficult to keep the narrative intact. Its output could be confusing after some sentences


GPT can be considered as a next evolutionary step in AI. It opens a possible door to create a human like intelligence through machine learning. It has brought a revolution in AI and a step towards matching human intelligence, though it is still in infancy stage and we can expect a lot improvement. It has given a direction to us for better neural networks.



Leave a Reply

Your email address will not be published. Required fields are marked *