Core Concepts
Introducing Large Content Behavior Models (LCBMs) to optimize communication by integrating behavior tokens in training.
Abstract
The paper introduces Large Content Behavior Models (LCBMs) to optimize communication by incorporating behavior tokens in training. It addresses the limitations of current models in predicting and optimizing communication for desired receiver behaviors. LCBMs show promise in simulating behavior, understanding content, and adapting to different behavior domains. The study highlights the importance of including receiver behaviors in training data for effective communication optimization.
Shannon and Weaver's information theory is referenced, emphasizing technical, semantic, and effectiveness levels of communication. LLMs like GPT-3 and T5 have advanced natural language processing tasks but fall short in predicting receiver behaviors. The paper proposes a text-to-text approach to model content and behavior together, enabling a wide range of applications.
Experiments on YouTube videos and Twitter posts demonstrate LCBM's performance in behavior simulation, content understanding, and domain adaptation. The dataset released with the paper aims to encourage further research on large content and behavior models.
Stats
Large Language Models (LLMs) like GPT-3 and T5 mentioned.
Enron Email corpus used for LLM training.
Common Crawl project as a source of data for language models.
LVU benchmark dataset for testing generalization capabilities.
Quotes
"Large Content Behavior Models (LCBMs) show promise in enabling models to predict human behavior over content."
"Behavior simulation can enable real-world applications like content recommendation and A/B testing."
"Training LCBM on both Twitter and YouTube data improves performance through domain adaptation."