4 Nov 2013

SAT Essay Word Clouds

Submitted by Karl Hagen
Topic: 
For all my students who have recently taken the SAT (or are planning to do so soon), I prepared a little visualization of what sorts of topics appear frequently on SAT essays. I took all the essay topics made public from March 2005 (the first SAT with a writing section) through October 2013, deleted the boilerplate instructions and attribution lines, and ran the remaining text through Wordle to create a word cloud. Here's the result:

SAT essay topic basic word cloud

What do SAT essays ask you about? The answer seems, rather resoundingly, to be that they ask you about people. Of course lots of the instances of "people" occur because of the formulaic wording of many prompts ("Many people think...," etc.), so I began to do a little massaging of the data. The first picture above was the result of no additional processing on my part--just Wordle omitting words in its stop list. Since the nouns, especially the abstract ones, play a key role in essay topics, I next generated a word cloud of just the nouns, with "people" omitted so as not to overwhelm the other results. The frequencies are after stemming (so most plural nouns are reduced to their singular forms) Here's the result:

SAT essay topic noun word cloud

Finally, I lemmatized the list so that derivationally related forms were counted together, mostly under the form of noun, if one existed in the corpus. For example "succeed" and "success" were collapsed into the single form "success." I also got rid of all the minor-category words (determiners, etc.) that hadn't already been filtered out from the stop list, and "people" too. This is the result:

SAT essay topic lemmatized cloud