The distribution of words ranked by their frequency in a random corpus of text is generally a power-law distribution, known as Zipf's law.
If one plots the frequency rank of words contained in a large
corpus of text data versus the number of occurences or actual
frequencies, one obtains a power-law distribution,
with exponent close to one (but see Gelbukh and Sidoro 2001).
External References