แนวคิดหลัก
Deploying large language models (LLMs) on mobile devices enables natural language processing capabilities directly on the device, enabling use cases like accurate and contextually relevant question answering.
บทคัดย่อ
The paper discusses the process of porting large language models (LLMs) to mobile devices for efficient question answering. The key points are:
Deploying LLMs on mobile devices makes natural language processing capabilities available directly on the device, enabling use cases like accurate and contextually relevant question answering.
The authors employed the llama.cpp framework, a flexible and self-contained C++ framework for LLM inference, to run the models natively on the mobile device. This avoids the complexity of the Tensorflow Lite conversion pipeline.
The authors selected the Orca-Mini-3B model, a 3 billion parameter model with 6-bit quantization, which runs in interactive speed on a recent smartphone like the Galaxy S21.
Experiments show the model provides accurate and faithful answers to user queries across different subjects like politics, geography, and history, though it can occasionally hallucinate false information.
The authors plan to explore recently introduced LLMs like phi-2 and GPU acceleration via OpenCL or Vulkan in the future.
สถิติ
The model has 3 billion parameters and takes approximately 2.2 GB of CPU RAM on the device.
คำพูด
"Deploying Large Language Models (LLMs) on mobile devices makes all the capabilities of natural language processing available on the device."
"An important use case of LLMs is question answering, which can provide accurate and contextually relevant answers to a wide array of user queries."