The author explores the issue of data contamination and memorization in Large Language Models (LLMs) when applied to tabular data, highlighting the need for ensuring data integrity. The approach involves testing for knowledge, learning, and memorization in LLMs on various datasets.
Instruction-based prompts reveal higher memorization levels in instruction-tuned language models compared to base models.