SD Times Open Source Project of the Week: BigCode
The goal of the BigCode initiative is to create large, state-of-the-art language learning models (LLMs) for creating code in an open and responsible way.
Code LLMs enable the completion and synthesis of code from other code and natural language descriptions, and allow users to work in a wide range of domains, tasks, and programming languages.
The initiative is led by ServiceNow Research, which researches AI-powered future experiences, and Hugging Face, a community and data platform that provides tools for users to create, train and deploy ML models based on open source code and technologies. .
BigCode invites AI researchers to collaborate on a representative assessment suite for code LLMs covering a diverse set of tasks and programming languages, responsible development and governance of datasets for code LLMs, and faster training and inference methods for LLMs.
“BigCode’s primary goal is to develop and release a data set large enough to form a state-of-the-art language model for code. We will ensure that only files from repositories with permissive licenses enter the set of data,” wrote ServiceNow Research in a blog post.
“With this dataset, we will train a 15 billion parameter language model for code using ServiceNow’s internal GPU cluster. With an adapted version of Megatron-LM, we will train the LLM on distributed infrastructure.
Additional project details are available here.