Large Language Models Struggle with the "White Bear Phenomenon" - Prompt-Based Attacks and Cognitive Therapy-Inspired Defenses
Large language models, despite their advanced capabilities, exhibit a fundamental limitation in comprehending the concept of negation and absence, akin to the "white bear phenomenon" observed in human cognition. This weakness can be exploited through prompt-based attacks, but can also be mitigated using cognitive therapy-inspired defense strategies.