BigScience Research Workshop Releases AI T0 Language Model
BigScience research workshop published T0, a series of specially trained natural language processing (NLP) AI models for multi-tasking learning without shooting. T0 can often outperform models 6 times larger on the BIG-bench benchmark, and can surpass the larger 16x GPT-3 on several other NLP benchmarks.
The workshop team described the model and its training data sets in an article published on arXiv. To study the zero-firing performance of large NLP models on completely “invisible” tasks, the researchers converted a large set of supervised learning NLP datasets into a model prompt format. The objective of the research was to determine whether training data in this format improved T0’s ability to generalize to invisible tasks. When rated on 11 retained datasets, T0 outperformed GPT-3 on 8 of the datasets. T0 also outperformed the other benchmark models in 13 of the 14 tasks of the BIG-benchmark.
Large language models are often able to operate fairly well on invisible tasks — that is, tasks for which they have not been trained. For example, although GPT-3 was only explicitly trained to fill in words that were masked in sentences, the model actually performed well in a variety of other tasks, including translating, answering questions, and even writing. 3-digit arithmetic. According to the BigScience team, one hypothesis to explain this is that the models encounter a âmix of implicit tasksâ in the training data. On the other hand, they point out that training data is often pulled from the web and could contain such tasks. explicitly; for example, web pages with questions and answers effectively constitute a training data set for a question and answer task.
BigScience Research Workshop is a one-year collaboration between “600 researchers from 50 countries and over 250 institutions”, with the aim of creating and studying a very large multilingual data set and a deep learning NLP model . The team chose to build T0 to “focus on intentionally and explicitly training large language models in a supervised and massively multitasking manner.” The key feature of the training data was to specify language tasks using natural language instructions; the researchers hypothesized that this training data format would result in a model that could better generalize to invisible tasks while requiring fewer model parameters.
To create their datasets, the team collected several existing supervised learning datasets for various NLP tasks. The datasets were then converted to a prompt using a template set; for example, a model for a natural language inference task might be “Suppose X. Can we deduce Yes? âwhere X and Y are phrases such asâ the banker contacted the professors and the athlete âandâ the banker contacted the professors. âThe researchers collected 62 datasets organized into 12 tasks.
The T0 model is based on that of Google Text to Text Transfer Transformer (T5) pre-trained model, which is then refined on a mix of the multitasking dataset as a prompt. The data sets from four tasks were fully preserved to assess the zero generalization performance of the model. The model, which contained 11B parameters, outperformed a GPT-3 model of 175B parameters on 8 of the 11 data sets.
Several members of the T0 research team joined a Hacker News work discussion. A researcher pointed out that Google and EleutherAI had recently studied “instructional tuning” language models to improve their generalization ability. When asked if the size of the model made inference a ‘problem’, another researcher replied:
As to whether size is an issue: it is possible to run inference on a single Google Cloud TPU v3-8 device or on a server with 4 x 32 GB v100 GPUs. Hugging Face also has a Inference API…
The T0 model files are available on the HuggingFace website.