Guide-Align proposes a method to enhance the safety and quality of Large Language Models (LLMs) by introducing a guideline library approach. The method involves safety-trained models identifying potential risks, formulating specific guidelines for inputs, and correlating new inputs with relevant guidelines to guide LLMs in generating responses aligned with human values. By fine-tuning models with new datasets generated through this process, significant improvements in LLM security and quality are demonstrated. The approach aims to overcome limitations associated with manually crafted rules and insufficient risk perception in models without safety training.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Yi Luo,Zheng... at arxiv.org 03-19-2024
https://arxiv.org/pdf/2403.11838.pdfDeeper Inquiries