Core Concepts
There is a significant discontinuity in the language and style of State of the Union addresses before and after the late 1920s, suggesting a fundamental shift in the nature of American governance and politics.
Abstract
The authors use natural language processing techniques, including BERT and GPT-2 embeddings combined with dimensionality reduction methods like UMAP, TriMAP, and PaCMAP, to analyze the State of the Union (SOTU) address dataset from Kaggle. Their analysis reveals a surprising finding - there is a sharp break in the language and style of SOTU addresses around 1927-1932, suggesting a major discontinuity in American history.
The authors first observe that addresses delivered by the same president are closely clustered, and those written in chronological proximity are also similar. However, the most striking result is the clear separation between addresses written before 1927 and those written after 1932, as shown in the UMAP and TriMAP visualizations.
The authors hypothesize that this shift may be due to two factors: 1) the increased use of speechwriters by presidents, starting with Franklin Roosevelt, and 2) the transformation of the United States from a remote, provincial country to a global superpower after World War II, leading to changes in the focus and emphasis of presidential addresses.
The authors also experiment with authorship attribution and year prediction tasks using fine-tuned DistilBERT models. They are able to achieve high accuracy (93-95%) in identifying the president who delivered a particular address, and reasonably good performance (RMSE of around 4.5 years) in predicting the year of an address, despite the relatively small amount of training data available for each president.
The authors conclude by acknowledging that they do not have a definitive explanation for the observed discontinuity, but they believe that there must be an underlying reason or reasons for this significant shift in the language and style of SOTU addresses over time.
Stats
The average length of a State of the Union address is around 8,358 words.
The amount of training data for each president varies greatly, from 1,790 words on average for John Adams to 22,614 words on average for William H. Taft.
Quotes
"What was surprising however, was that there is a large break, as demonstrated by the UMAP visualization (Figures 1a – 1b), between addresses written before 1927 and addresses written after 1932."
"We should note here that the amount of training data for each author is relatively small, and also varies greatly: from 1790 words in an average SOTU address for John Adams to 22614 words on average for William H. Taft [5]. The average length of a SOTU is ∼8358 words."