Enhancing Language Model Inference Efficiency and Privacy through Model-Aware Retrieval Augmentation
A novel model-aware approach that leverages language model token embeddings to efficiently determine when retrieval augmentation is necessary, without requiring access to sensitive pre-training data.