Credits Overview Plotting Styles Commands Terminals

set datafile missing

Syntax:

      set datafile missing "<string>"
      set datafile missing NaN
      show datafile missing
      unset datafile

The set datafile missing command tells gnuplot there is a special string used in input data files to denote a missing data entry. There is no default character for missing. Gnuplot makes a distinction between missing data and invalid data (e.g. "NaN", 1/0.). For example invalid data causes a gap in a line drawn through sequential data points; missing data does not.

Non-numeric characters found in a numeric field will usually be interpreted as invalid rather than as a missing data point unless they happen to match the missing string.

Conversely set datafile missing NaN causes all data or expressions evaluating to not-a-number (NaN) to be treated as missing data.

The example below shows differences between gnuplot version 4 and version 5.

figure_missing

Example:

      set style data linespoints
      plot '-' title "(a)"
         1 10
         2 20
         3 ?
         4 40
         5 50
         e
      set datafile missing "?"
      plot '-' title "(b)"
         1 10
         2 20
         3 ?
         4 40
         5 50
         e
      plot '-' using 1:2 title "(c)"
         1 10
         2 20
         3 NaN
         4 40
         5 50
         e
      plot '-' using 1:($2) title "(d)"
         1 10
         2 20
         3 NaN
         4 40
         5 50
         e
 

Plot (a) differs in gnuplot 4 and gnuplot 5 because the third line contains only one valid number. Version 4 switched to a single-datum-on-a-line convention that the line number is "x" and the datum is "y", erroneously placing the point at(2,3).

Both the old and new gnuplot versions handle the same data correctly if the '?' character is designated as a marker for missing data (b).

Old gnuplot versions handled NaN differently depending of the form of the using clause, as shown in plots (c) and (d). Gnuplot now handles NaN the same whether the input column was specified as N or ($N). See also the imageNaN demo.

Starting with version 5.4, gnuplot notices a missing value flag in column N when the using specifier in a plot command directly refers to the column as using N, using ($N), or using (function($N)). In these cases of direct reference the expression, e.g. func($N), is not evaluated at all. This is to forestall floating point errors or other side effects that would cause the program to stop with an error.

The current gnuplot version also notices direct references of the form (column(N)), and it notices during evaluation if the expression depends even indirectly on a column value flagged "missing".

In all these cases the program treats the entire input data line as if it were not present at all. However if an expression depends on a data value that is truly missing (e.g. an empty field in a csv file) it may not be caught by these checks. If it evaluates to NaN it will be treated as invalid data rather than as a missing data point. If you want to treat such invalid data the same as missing data, use the command set datafile missing NaN.