Hello all! Welcome back!
This week I am talking about distance reading and text mining.
Distance reading is looking broadly across articles or other
documents and being able to pull out patterns. This approach would be extremely
useful when you are beginning to start research and need to see the frequency
of how terms are used or are trying to determine which synonyms are worthy of
investigation. This can approach can also help navigate hidden tones and give
the perceptions of the bias of the time. One example of this is looking back at
reports during the US Civil War.
I thought Ayers (2011) New York Times article was really
fascinating. As far as US history goes, the Civil War era is my favorite to
read and learn about. He stated that using computer aided technology helps to gain
a better understanding of the region from large amounts of sources. Utilizing
these methods can elucidate alternative conclusions. In the Ayers (2011)
article, he was able to identify a different “primary cause” of the Civil War.
It is interesting that these computer-aided tools can help to uncover patterns that
are otherwise difficult to see.
Another interesting way to dig through a lot of information
is by text mining. Text mining looks at the frequency of words or topics in a
certain period of time. Ewing et al., (2014) provide an excellent example in
their article on the flu epidemic. They used two text-mining methods: topic modeling
and tone classification. Through their work they were able to uncover how often
different words were used in reports during different stretches of time within the
local community and outside of it. They also looked at the tone of newspaper
reports about the flu. Ewing et al., (2014) developed four classifications: alarmist,
warning, reassuring, and explanatory. This was created to determine how the
tone in reporting prompted public health intervention. Through this exercise,
they were able to see the tone shift in reporting from the beginning of the
epidemic to the end.
After reading these
articles and beginning to understand their use, I wanted to see how I could
employ these methods for my own research. I attempted to use three difference tools, Google Ngram, Voyant, and JStor Data for Research. While playing around with Google Ngram and Voyant was fun, I was unable to figure out how to utilize
JStor Data for Research. This is more likely an issue on my end than with the
tool itself.
The Google Ngram Viewer is a pretty cool tool. As a test, I used the phrases
Beauty and the Beast, Cinderella, Snow White, and Rapunzel. Then I viewed the
differences of the phrases between the corpus of American English, British
English, French, and German. This was interesting that there would be such a
difference (as seen below).
Then I wanted to see the results of terms I would use in my own research, so I included peasant, Christian, and religion. Again, I used English, French, and German. The reason for this is that information regarding my research is not likely to be found in English, but rather French or German, so, I wanted to see if there were changes and if I could reveal anything about them (see below). I wonder how searching in the French or German language would change the results.
Voyant provides a really cool visual of the most often words
used in a document. For this example, I used my Master’s thesis to create a word
cloud with the top 55 words (see below). While these tools are
undoubtably useful (and pretty interesting), I am not sure how useful they
would be to my own research, but it is definitely worth investigating!
Thanks for stopping by! I hope you enjoyed your visit.
-The Migrant Isotopist
-The Migrant Isotopist
Articles I included in case you want to check them out for
yourself:
Ewing et al. (2014) https://www.historians.org/publications-and-directories/perspectives-on-history/january-2014/mining-coverage-of-the-flu-big-datas-insights-into-an-epidemic
Website links to the tools I used if you want to play around
with it (warning, it can be addicting):
No comments:
Post a Comment