Quantifying and Analyzing Copyright Infringement by Large Language Models in Realistic Scenarios under European Law
Large language models vary significantly in their tendency to reproduce copyrighted text, and while model size generally correlates with higher memorization, targeted finetuning and specific design choices can significantly improve copyright compliance.