Enterprise AI solution provider Indico has announced a new open-source project for machine learning and natural language processing. Finetune is a “scikit-learn style model finetuning for NLP,” according to its GitHub page.
Finetuning refers to a transfer learning approach that is meant to take a model that is trained on one task and adapt it to be able to solve a different, but related, task.
“Most organizations have natural language processing problems, but few have the labeled data they need to solve them with machine learning,” said Madison May, Indico machine learning architect and cofounder. “Finetune lets them do more with less labeled training data. And it only requires a base level of IT experience.”
The project Finetune was developed to enable users to solve a variety of different tasks in text and document-based workflows. According to Indico, the project extends OpenAI’s original research and develop on improving language understanding with generative pre-training led by Alec Radford. OpenAI provided a model with general capabilities for document classification, comparison and multiple-choice question answering. The Finetune library packages up those capabilities for ease of use and adds document annotation, regression and multi-label classification.
“Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant, labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to perform adequately. We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. In contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve effective transfer while requiring minimal changes to the model architecture,” OpenAI researchers wrote in a paper.
The project also provides a quickstart guide, installation information, code examples, API reference and model configuration options.
“We have a vested interest in promoting the advantages of transfer learning and giving back to the open source community is a really productive way for us to do that,” said Slater Victoroff, co-founder and CTO of Indico. “I also want to acknowledge the important research and development work done by the team at OpenAI and Alec Radford. They are driving huge innovations in machine learning that really help accelerate the progress of companies like Indico.”