Core Concepts
API-Protected LLMs leak proprietary information due to a softmax bottleneck, revealing hidden model details.
Abstract
The commercialization of large language models (LLMs) has led to the common practice of high-level API-only access to proprietary models. This work reveals that even with a conservative assumption about the model architecture, it is possible to extract non-public information about an API-protected LLM from a small number of queries. The findings focus on the softmax bottleneck in modern LLMs, allowing for efficient discovery of hidden model characteristics and parameters. Various capabilities are unlocked, including estimating embedding sizes and identifying model updates. Methods discussed enable accountability and transparency for LLM providers.
Directory:
Introduction:
Companies increasingly use closed-source LLMs accessible only via APIs.
False sense of security for providers; users rely on provider announcements for updates.
Logits Constrained Space:
Modern LLM outputs are restricted to low-dimensional subspaces due to softmax bottleneck.
Implications on output spaces and probability distributions explained.
Fast Full Outputs Retrieval:
Algorithms proposed for efficiently obtaining full-vocabulary outputs from restricted APIs.
Discovering Embedding Size:
Methodology outlined for inferring embedding size from model outputs alone.
Identifying LLMs:
Model signatures based on unique images allow precise identification of different models.
Further Applications:
Potential uses include finding unargmaxable tokens, reconstructing softmax matrices, and enhancing decoding algorithms.
Mitigations:
Proposed defenses against attacks include removing logit bias or transitioning to softmax-bottleneck-free architectures.
Discussion:
Impact assessment of methods on trust-building between API users and providers discussed.
Simultaneous Discovery:
Comparison with similar work by Carlini et al., highlighting complementary approaches and interactions.
Conclusion:
Summary of key findings regarding vulnerabilities in API-protected LLMs and their implications.
Stats
"Our empirical investigations show the effectiveness of our methods, which allow us to estimate the embedding size of OpenAI’s gpt-3.5-turbo to be about 4,096."
"We find that the singular values for these outputs drop dramatically between index 4,600 and 4,650."
"Extrapolating further... it is likely that the number of parameters in gpt-3.5-turbo is around 7 billion."