Zipf’s Law

Zipf’s law is a mysterious, empirical law – it’s also linked to Pareto’s rule:

  • It suggests limits on the size of companies and their share of markets
  • According to Annalee Newitz, the editor of i09, in 1949 linguist George Zipf noticed that people used a very small number of words most of the time – we minimise what we need to convey our messages
  • In particular, Zipf found that a pattern emerges – the most popular word is used twice as often as the second most popular, three times as often as the third, and so on
  • A mere 135 words cover 50% of all the words we ever use on a regular basis
  • The most popular three words are:
    • The = 7% of occurrences
    • And = 3.5%
    • Of = 2.3%
  • Zipf’s law must surely offer a clue as to how Alan Turing and his Bletchley Park team were able to break the Enigma code – although I’m guessing here
  • Zipf then found his law also applied elsewhere:
    • To income and wealth distributions in any given country, where the richest have twice as much money as the next, and so on – much as Pareto observed many years before him
    • To the size of cities, where the city with the largest population in any country is generally twice as large as the next biggest, etc. – this only applies where cities are economically integrated, with common language, laws and institutions, as in any nation – it does not apply to any group of nations like the EU
    • To the size of firms in any sector – the biggest firm is twice the size of the next one, three times the size of the next, and so on – hence, it’s inevitable to end up with a group of Big 4, 5 or 6 companies in any sector
  • Other interesting applications include:
    • Books borrowed from libraries
    • Web sites visited
    • Earthquake sizes
  • Quite why the pattern is followed so closely is not understood
  • However, it offers useful predictability for economists and businessmen alike

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.