Analyzing a Lot of Text

In working on both projects for EDLP 713 and EDLP 717, I have come across a lot of text. In a qualitative analysis, we’re typically looking for themes to emerge from interviews. One of the things I thought about when looking at different answers to the same interview questions from different people was whether or not there would be commonalities, i.e., would certain words more frequently appear?

Likewise, I was interested in analyzing the content of my own text for the 717 project on leadership. In a sense, what do I write about the most?

The type of analysis I’m speaking of is a frequency count: what things are identified the most, and how often? You could count words, for instance, and then graph the occurance of these words. That may or may not be useful to you, given the specific project. And in terms of mixing-up analysis paradigms, it’s just a little bit of quantitative-style technique on qualitative objects: words.

You can also look it this from a different perspective, using tags instead of words. In the interview example, instead of identifying themes, per se, you’d tag areas of text with tag words. These may or may not be the same as your theme identification. Then, you’d apply the analysis and look for the frequency of tags. It’s up to you, as the researcher, to decide what these are.


If we take this further, it’s possible to get richer anaylsis applying the concept of folksonomy to tagging. You could have multiple people tag a text, and then look at frequency. I’m guessing if I had thousands of pages of text to go through, however, I’d want a script or program to process the text for commonalities. What I’m suggesting here is really simple.

But here’s my assumption, and a really lazy way to test it…

If I’m using a digital tool to communicate, and I want to know if what I’m communicating has anything to do with leadership, and if my “test” of using that tool for leadership means I’m using it frequently, then what are the “things” (read: words) that I’m using most often?

That’s why I chose to look at the frequency of words I’ve been using in Twitter. To do this, it’s relatively simple.

  1. First, download all of your Tweets from Twitter.
  2. Clean up the data file that comes back, to remove terms that probably will mess-up the analysis. Things like “http://” are ripe for removal. (I chose to keep RT to see if I retweet other people’s content frequently.)
  3. Copy the tweets as text and paste it into You’ll need to make sure your Java is up to date.
  4. “Export” the resulting diagram, after you’ve styled it, as a PDF. (In Mac OS X, I printed to PDF.)
  5. Import the resulting PDF file into your report to see a “cloud” of what you’ve written. Wordle has automagically applied the frequency counting to your writing, and instead of a graph, it shows this frequency via the size of the words. The rest of the arrangement is up to your own aesthetics.

So, have I been tweeting about (and with whom) since 2008, when I started my account, @hendron?

John's Tweets


Leave a comment

Filed under Learning Reflections, Professional Reflections

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s