Main Page | See live article | Alphabetical index

Histogram

In statistics, a histogram is a graphical display of the data in a table of frequencies. What should be plotted is the frequency density, i.e. the frequency of a range divided by that range's length; this is important if the ranges have different lengths. For example, we might have a collection of data for which the frequency table is as follows:
FromTo  FrequencyDensity
0920.2
101930.3
202950.5
303980.8
404960.6
505910.1
606930.3

Here all ranges have the same length 10, and the histogram corresponding to these data would look like:

0.8                #####
0.7                #####
0.6                ##########
0.5           ###############
0.4           ###############
0.3      ####################     #####
0.2 #########################     #####
0.1 ###################################
   0---10---20---30---40---50---60---70

Suppose we group the above data differently:
FromTo  FrequencyDensity
0920.2
101930.3
2059200.5
606930.3
Now one range is larger than the others, which has to be taken into account when computing the densities. Our histogram now looks like this:

0.8
0.7
0.6
0.5           ####################
0.4           ####################
0.3      ##############################
0.2 ###################################
0.1 ###################################
   0---10---20---30---40---50---60---70

The distinction between a histogram and a bar graph is that if we wish to find the total frequency of a range of values, we must consider the area under the graph in that range. For instance, for the histogram above, the area under the graph in the range 0-20 is 10×0.2 + 10× 0.3 for a total frequency of 5.

If a histogram is based on relative frequencies (i.e. percentages) as opposed to absolute frequencies as above, then it will resemble the underlying random variable's probability density function and the area under the histogram will always be 1.