Discovering Latent Knowledge in Language Models Without Supervision
The authors propose a method to uncover latent knowledge within language models without supervision by finding consistent internal representations that reflect truth. This approach outperforms zero-shot accuracy and maintains high performance even when models are prompted to generate incorrect answers.