핵심 개념
Major AI developers should provide legal and technical safe harbors to protect public interest safety research from account suspensions or legal reprisal.
초록
1. Abstract:
Independent evaluation and red teaming are crucial for identifying risks posed by generative AI systems.
Prominent AI companies deter model misuse through terms of service, hindering good faith safety evaluations.
Proposal for major AI developers to provide legal and technical safe harbors for public interest safety research.
2. Introduction:
Generative AI systems raise concerns for misuse, bias, hate speech, privacy issues, and more.
Leading AI companies lack transparency and access into their systems, hindering independent evaluation.
Terms of service restrict independent evaluation, leading to account suspensions for researchers.
3. Challenges to Independent AI Evaluation:
AI companies' terms of service discourage community-led evaluations.
Companies lack transparency in enforcement processes, limiting independent evaluation.
Existing safe harbors protect security research but not other good faith research.
4. Safe Harbors:
Proposal for legal safe harbor to protect researchers from legal action for good faith research.
Proposal for technical safe harbor to prevent account suspensions for good faith research.
Recommendations for companies to delegate access authorization to trusted third parties.
5. Related Proposals:
Prior calls for expanding independent access for AI evaluation and red teaming.
Governments' suggestions for independent evaluation and red teaming in AI systems.
통계
AI 개발자들은 공개적인 이해관계 연구를 보호하기 위해 법적 및 기술적 안전 지역을 제공해야 합니다.
AI 회사들의 이용 약관은 독립적인 평가를 방해하고 계정 정지를 유발합니다.
기업들은 신뢰할 수 있는 제3자에게 연구 접근 권한을 위임하여 참여를 확대해야 합니다.
인용구
"We propose that major AI developers commit to providing a legal and technical safe harbor, indemnifying public interest safety research and protecting it from the threat of account suspensions or legal reprisal." - Authors
"The gaps in the policy architectures of leading AI companies force well-intentioned researchers to either wait for approval from unresponsive access programs, or risk violating company policy and potentially losing access to their accounts." - Authors