Language and Numbers: What the Zipf!?


Well, I was a little lost coming up with ideas for a post this week, but luckily the winds of fate, or the internet, have a funny way of leading me to topics. Just yesterday, a Facebook friend sent me the most mind-blowing video ever. And of course, it has to do with language so it had to be addressed on my blog. It’s a bit long, but take a looksee:

If you look up Zipf’s Law on Wikipedia (do not cite Wikipedia in your term papers, folks. It’s not considered reputable. But it will suffice for a blog post today as it is a very busy time at work right now. and I have little time for writing even on my own time), it states that “Zipf’s law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc.,” (https://en.wikipedia.org/wiki/Zipf%27s_law).

Okay, I know that there’s A LOT of math involved with this video and this law, but if you can wade through it, it’s really spectacular. I was endlessly fascinated with the video due to the way that it depicts language based on the frequency of word use. It’s amazing to me that language can be quantified in such a manner simply because of the individual nature of language. One’s vocabulary depends upon their education, how they grew up, where they grew up and more. I suppose considering these individual factors, it makes sense that articles and helping verbs comprise the list of the 20 most popular words in the English language because these essential pieces are virtually universal in their use and are independent of influencing criteria or experiences. The list of the first twenty words includes (for those who may not watch the video): The, Of, And, To, A, In, Is, I, That, It, For, You, Was, With, On, As, Have, But, Be, They. As you can imagine, it is quite difficult to create full sentences without using one of the words in this list.

Ordinarily I would be against the idea of quantifying words, mostly because I don’t like quantifying things. It makes them too complicated. Plus, I find that looking at language through numbers makes it scientific as opposed to artistic, not that I have anything against science. Mostly, as your stereotypical word person, I hate numbers.

But, because of Zipf’s Law, I must accept that art, even though it is creative, is at the same time, artistic and creative. As a word person, I define myself as absolutely NOT being a numbers person. I suppose I can’t do that anymore as it appears that words and numerical data are intimately linked. But, even though I hate wading through charts and graphs, what I am truly enamored with is the idea that this pattern exists throughout all languages and into many other realms before. Life almost seems to revolve around this concept somehow.

What do you think about this law? And what does it mean for language?

Feature image courtesy of: http://www.scilt.org.uk/