Evaluating Long-Term Memory and Continual Learning Capabilities of Large Language Models through Dynamic Conversational Benchmarking
Large language models perform well on isolated tasks but struggle when tasks are interleaved in a continuous conversation, revealing limitations in their long-term memory and information integration capabilities.