Core Concepts
The author presents DOMINO, a novel decoding algorithm that achieves efficient and minimally invasive constrained generation, outperforming existing approaches with no loss in accuracy.
Abstract
The content discusses the challenges of constrained decoding for large language models and introduces DOMINO, a novel algorithm that enforces constraints efficiently. It compares DOMINO with other methods, showcasing its superior performance in terms of accuracy and throughput.
The article highlights the importance of aligning sub-word tokens with external constraints to improve task accuracy. It introduces speculative decoding as a technique to speed up inference while maintaining accuracy. The study evaluates different parameters like lookahead and speculative tokens to optimize performance.
Overall, the content emphasizes the significance of efficient and accurate constrained generation for large language models, showcasing how DOMINO addresses these challenges effectively.
Stats
To address this, we present a novel decoding algorithm, DOMINO, that can enforce constraints in a fully subword-aligned fashion.
...in some cases even almost 2× speedup over unconstrained decoding – thereby outperforming existing approaches by a wide margin.
We propose DOMINO, a novel constrained decoding algorithm, that addresses token misalignment and leverages pre-computation and speculative decoding for very low overhead generation.
Our key contributions are: We identify the challenges of constrained decoding...We propose DOMINO...An extensive evaluation shows that DOMINO is minimally-invasive...
...DOMINO is highly efficient and incurs little to no overhead...
...DOMINO achieves the best accuracy for all tasks while also improving throughput well beyond unconstrained generation...
Quotes
"To ensure that text generated by large language models (LLMs) is in an expected format, constrained decoding proposes to enforce strict formal language constraints during generation."
"DOMINO is highly efficient and incurs little to no overhead..."
"Our key contributions are: We identify the challenges of constrained decoding...We propose DOMINO..."