Core Concepts
Large language models like ChatGPT can be compared to lossy text-compression algorithms, offering insights into their functioning and limitations.
Abstract
The content delves into the comparison between a Xerox photocopier's lossy compression format and large language models like ChatGPT. It highlights how both systems use compression techniques that may lead to inaccuracies or hallucinations in the reproduced content. The analogy of lossy compression helps understand the functioning of large language models and raises questions about their true understanding of the information they process.
In 2013, a German construction company discovered discrepancies in copies made by a Xerox photocopier due to its lossy compression format. This incident led to an investigation by computer scientist David Kriesel, revealing how modern photocopiers use digital scanning and compression techniques.
The difference between lossless and lossy compression is explained, with examples of where each type is typically used based on the importance of accuracy. Lossy compression, like that used in Xerox photocopiers, can lead to subtle inaccuracies that are not immediately noticeable.
Xerox photocopiers utilize JBIG2, a lossy compression format for black-and-white images, which can result in misleading but readable outputs. The comparison between this technology and large language models like ChatGPT is drawn to highlight similarities in their approach to data processing.
ChatGPT is likened to a blurry JPEG of all text on the Web, retaining information but potentially leading to hallucinations or incorrect responses due to its lossy nature. The article explores whether such large language models truly understand the content they process or merely offer statistical approximations.
The relationship between text compression and understanding is discussed through examples related to arithmetic principles and economic theories. Large language models' ability to identify correlations in text raises questions about their level of comprehension versus mere statistical analysis.
Stats
"the rooms were 14.13, 21.11, and 17.42 square metres"
"a solution to the mystery begins to suggest itself"
"achieve the desired compression ratio of a hundred to one"
"the greatest degree of compression can be achieved by understanding the text"
"it looks at nearby pixels and calculates the average"
Quotes
"ChatGPT as a blurry JPEG of all the text on the Web."
"These hallucinations are compression artifacts."
"Models like ChatGPT aren’t eligible for the Hutter Prize."