About
This blog analyzes U.S. presidential inaugural speeches to track changes in language, sentiment, and readability over time. Using the AFINN sentiment lexicon, Flesch Reading Ease formula, and Flesch-Kincaid formula, we examine how emotional tone and speech complexity have shifted from 1789 to 2009.
Methodology
Data Collection & Wrangling
This project analyzes the full text of U.S. presidential inaugural addresses from 1789 to 2009. The speeches were collected from the Avalon Project at Yale Law School, a publicly accessible site of historical, legal, and political documents (“The inaugural addresses of the presidents” n.d.). We scraped the website in March 2025 as part of a previous group project. The Avalon Project does not indicate when each transcript was digitized, but it is maintained by Yale Law School as an educational resource.
Each speech was stored as plain text and processed into a tidy format for text analysis. We removed standard stop words (e.g., “the”, “and”, “is”) using the stop_words list from the tidytext package. We also removed highly repetitive words like “fellow”, “citizens”, and “people”, which appeared across nearly all speeches and contributed little variation to sentiment or bigram frequency.
The cleaned data were then tokenized and organized into tidy data containing word-level, bigram, and sentence-level information, enabling the calculation of sentiment scores, readability metrics, and word frequency patterns. While most presidents were included, Van Buren, Buchanan, Garfield, and Coolidge were excluded due to missing transcripts on the Avalon website.
A full list of references can be found at the end of this blog.
Tools
We conducted our analysis in R using several key packages:
tidyverse for data wrangling and plotting (Wickham et al. 2019)
tidytext for tokenizing text and extracting bigrams (Silge and Robinson 2016)
ggplot2 for static visualizations (Wickham 2016)
plotly for interactive visualizations (“Plot (plotly)” n.d.)
kableExtra for rendering styled tables (Zhu 2024)
These packages enabled us to clean, analyze, and visualize the data in an efficient and reproducible manner.
Sentiment Analysis
For the Sentiment Analysis Over Time, we found the total sentiment score of each speech using the AFINN lexicon. The AFINN Lexicon assigns identified words with a value from -5 to 5, with negative scores representing negative sentiment and positive scores representing positive sentiment. We then calculated average sentiment score = (total sentiment) / (number of words recognized by AFINN). This is the average sentiment score (per word) for each speech, and it accounts for length of speech and allows for fair comparisons across speeches.
For the Normalized Bigram Frequency by Era, we defined six eras: “Revolution & Early Republic,” “Expansion, Reform & Civil War,” “Reconstruction to WWI,” “Interwar & WWII,” “Postwar & Civil Rights,” and “Post-Watergate to Obama.” Then, we got the got the top 8 biwords/bigrams by era with total count greater than 2. Finally, we “normalized” the counts by total bigrams per era, to allow for fair comparisons across eras, since not all eras contain the same number of bigrams.
Readability Analysis
Two measures of readability were used: Flesch Reading Ease (FRE) and the Flesch-Kincaid Grade Level (FKGL)
The FRE measures the “readability” of a text and assigns a score that correlates to a specific level. \[ \text{FRE} = 206.835 - 1.015 \times \left( \frac{\text{Total Words}}{\text{Total Sentences}} \right) - 84.6 \times \left( \frac{\text{Total Syllables}}{\text{Total Words}} \right) \]
- 100 - 90: 5th Grade
- 90 - 80: 6th Grade
- 80 - 70: 7th Grade
- 70 - 60: 8th & 9th Grade
- 60 - 50: 10th to 12th Grade
- 50 - 30: College
- 30 - 10: College Graduate
- 10 - 0: Professional
The FKGL measures the U.S. reading level of a text. \[ \text{FKGL} = 0.39 \times \left( \frac{\text{Total Words}}{\text{Total Sentences}} \right) + 11.8 \times \left( \frac{\text{Total Syllables}}{\text{Total Words}} \right) - 15.59 \] For example:
- A score of 8.0 means the average 8th grader should be able to understand it.
- A score of 12.0 suggests it matches a high school senior’s reading level.
Sentiment and Flesch-Kincaid Scores Difference among Presidents that Served Two Terms
In the Sentiment and Flesch-Kincaid Scores Change among Presidents that Served Two Terms section, Figure 14 compares sentiment scores between first and second term speeches and Figure 15 compares the Flesch-Kincaid scores between first and second term.
Figure 16, Differences Between Terms, displays a table showing the calculated differences of both sentiment scores and Flesch-Kincaid scores, allowing for easier comparison across presidents. Positive and negative differences are interpreted to assess whether emotional tone became more positive, negative or neutral and language complexity increased, decreased, or stayed consistent
Mean Sentence Length
For Mean Sentence Length over time, we calculate the number of words per sentence throughout the entire presidential inaugural speech to obtain the mean sentence length. By measuring the mean sentence length of an speech, we can get a rough idea of how complex the speech’s sentences are. It is important to note that the mean sentence length measures the complexity of the sentence structure based on its length, but it doesn’t measure the complexity of each individual word. For Mean Sentence Length vs. Sentiment, K-means clustering was used to identify distinct groupings of presidential inaugural speeches based on sentence structure complexity (mean sentence length) and emotional tone (average sentiment score), making it easier to spot patterns in speech style and emotion.
Note: We didn’t include these two plots involing mean sentence length on the main page because it didn’t seem as relevant. However, we still thought to show them in the appendix to provide additional insight.
Limitations
This analysis is limited by the availability and consistency of data. Four presidents (Van Buren, Buchanan, Garfield, Coolidge) were excluded due to missing transcripts, and our dataset only includes inaugural addresses from 1789 to 2009, meaning more recent rhetorical trends are not captured. Additionally, while our sentiment analysis using the AFINN lexicon offers a standardized way to quantify emotional tone, it captures only surface-level sentiment and may mischaracterize speeches that use historically or emotionally charged vocabulary, such as those referencing war. Readability metrics like Flesch-Kincaid and Flesch Reading Ease are similarly limited, as they focus on sentence and word structure, but not conceptual complexity, audience targeting, or rhetorical strategy. Clustering results are exploratory, relying on parameter choices like the number of groups, and may not yield interpretable patterns in all comparisons. Because of these constraints, our findings may not generalize beyond inaugural speeches or the historical contexts in which they were delivered.
Future Work
Future research could expand the dataset to include more recent inaugural addresses and improve sentiment analysis by incorporating lexicons or models better suited to political speech. Additional speech types, such as State of the Union addresses or campaign remarks, could offer insight into how presidents adapt tone and complexity for different audiences. More sophisticated natural language processing techniques, such as topic modeling, could also help uncover deeper thematic patterns over time. Finally, improvements in clustering methodology or supervised classification might allow for more meaningful groupings of speeches based on known political or historical events, yielding a more comprehensive view of rhetorical trends over time.
Appendix
Mean Sentence Length over time
This interactive bar chart displays the mean sentence length of the presidential inaugural speeches of U.S presidents that served from 1789 to 2009. By hovering over each bar, the user can view the president’s name, year of the inaugural address, and the mean sentence length of the inaugural address. The bars are color-coded by different shades of blue, which vary depending on the mean sentence length.
As shown on the bar chart, the mean sentence length of presidential inaugural speeches seem to fluctuate slightly. However, we can observe an overall trend: the mean sentence length decreases over time. This shows that the sentence structure of U.S. presidents’ inaugural speeches have become less complex over time. This may be due wanting to make inaugural speeches more accessible to the public (Liberman (2011))
The first figure uses an interactive plotly chart to allow readers to explore the mean sentence length by hovering over each bar (“Plot (plotly)” n.d.). The color gradient, different shades of blue, was created using scale_fill_gradient2() in ggplot2 (Wickham 2024).
Mean Sentence Length vs. Sentiment
Figure 15 displays an interactive clustering of presidential inaugural speeches based on the average sentiment score and the mean sentence length. By hovering over each data point, the user can view the president’s name, sentiment score, Flesch-Kincaid score and the cluster that president’s inaugural speech belongs to.
Clusters description:
Cluster 1: Presidents with moderately low mean sentence length have neutral inaugural speeches.
Cluster 2: Presidents with high mean sentence length have considerably positive inaugural speeches.
Cluster 3: Presidents with medium mean sentence length have positive inaugural speeches.
Cluster 4: Presidents with the highest mean sentence length have more positive inaugural speeches.
Cluster 5: Presidents with low mean sentence length have neutral, but leaning toward negative, inaugural speeches
Cluster 6: Presidents with the lowest mean sentence length have neutral inaugural speeches.