Credits | Overview | Plotting Styles | Commands | Terminals |
---|
Boxplots are a common way to represent a statistical distribution of values. Gnuplot boxplots are always vertical, showing a distribution of values along y. Quartile boundaries are determined such that 1/4 of the points have a y value equal or less than the first quartile boundary, 1/2 of the points have y value equal or less than the second quartile (median) value, etc. A box is drawn around the region between the first and third quartiles with a horizontal line at the median value. Whiskers extend from the box to user-specified limits. Points that lie outside these limits (outliers) are drawn individually. The width of the boxplot can be controlled either by set boxwidth or by providing it in a third field of the using specifier in the plot command.
Syntax
2 columns: x-position y-value 3 columns: x-position y-value boxwidth 4 columns: first-x-position y-value boxwidth category
The horizontal position of a boxplot is usually given as a constant value in the first field (x-position) of the using specifier in the plot command. You can place an identifying label at this position under the boxplot by adding an xticlabel specifier in the plot command (two or three column syntax) or by providing it as a string in a separate data column (four column syntax). Both examples below should produce a plot with layout similar to the one in the boxplot example figure.
Examples
# # Compare distribution of y-values from two different files. set border 2 # left-hand border only set xtics nomirror scale 0 # no tickmarks; only labels set ytics rangelimited nomirror plot 'dataset_A' using (1.):2:xticlabel('A') with boxplot, \ 'dataset_B' using (2.):2:xticlabel('B') with boxplot # # Compare y-values from two categories of data in the same file. # Each line contains a category string ("A" or "B") in column 1 and # a data value in column 2. Labels auto-generated from category string. start_x = 1.0 boxwidth = 0.5 plot 'mixeddata' using (start_x):2:(boxwidth):1 with boxplot
By default a single boxplot is produced from all y values found in the column specified by the second field of the using specification. If a fourth field is given in the using specification the content of that input column will be used as a string that identifies a discrete category. A separate boxplot will be drawn for each category found in the input. The horizontal separation between these boxplots is 1.0 by default;, it can be changed by set style boxplot separation. By default the category identifier is shown as a tic label below each boxplot. Note that if category column contains numerical values they are nevertheless treated as strings and thus do not usually correspond to the x coordinate of the boxplot.
The order of data points in the input file is not important. If there are multiple blocks of data in the input file separated by two blank lines, individual blocks may be selected with the index keyword or by using the the data block number (column(-2)) as a level value in the fourth column. See pseudocolumns, index.
By default the whiskers extend vertically from the ends of the box to the most distant point whose y value lies within 1.5 times the interquartile range. By default outliers are drawn as circles (point type 7). The width of bars at the end of the whiskers may be controlled using set bars or set errorbars. Multiple outliers with the same y value are displaced horizontally by one character width. This spacing can be controlled by set jitter.
These default properties may be changed using the set style boxplot command. See set style boxplot, bars, boxwidth, fillstyle, candlesticks.