Core Concepts
This study introduces Octopus, a novel framework that leverages on-device large language models to enhance the accuracy and efficiency of software API interactions, outperforming GPT-4 in this domain.
Abstract
This study presents a framework called Octopus that aims to improve the integration of large language models (LLMs) with software APIs. The key highlights are:
Dataset Compilation:
The researchers compiled a comprehensive dataset of over 30,000 widely-used APIs from the Rapid API Hub.
They employed a rigorous data refinement process, including negative sampling, similar function clustering, and GPT-4 verification, to create a high-quality training dataset.
Model Development:
The researchers fine-tuned several base models, including Codellama7b, Google's Gemma 7B & 2B, and Stable Code 3B, on the curated dataset.
They introduced a conditional masking technique during inference to ensure the generated outputs adhere to the desired format, improving accuracy without sacrificing inference speed.
Evaluation and Benchmarking:
The researchers conducted a comprehensive evaluation of the Octopus model against GPT-3.5 and GPT-4 on a custom benchmark dataset.
The results show that the Octopus models, particularly the 7B variants, outperform GPT-4 in accurately identifying and calling the appropriate API functions.
The study highlights the potential of compact LLMs in enhancing software development and API integration, setting a new efficiency benchmark for scalable AI applications.
Stats
The researchers compiled a dataset of over 30,000 widely-used APIs from the Rapid API Hub.
Quotes
"This advancement, validated across our selected base models, not only showcases the potential of compact LLMs in external API integration but also sets a new efficiency benchmark for scalable AI applications."
"To ensure the consistency of our model's output formatting, we introduce a conditional masking technique during inference. This innovative approach guarantees that our LLMs generate outputs in the desired formats, markedly improving accuracy and minimize validation loss without sacrificing inference speed."