Teem: nrrd: General Description of the NRRD format


teem	/	nrrd	General Description of the NRRD format

General Description of the NRRD format

(See also full definition of NRRD format.)

Like the nrrd library, the main virtue of the NRRD file format is stark simplicity. The NRRD header is simple ASCII text, one field per line. The fields in the header do not have a strict ordering, and most of them are optional. Most strings are case insensitive, and alternate forms of many of the identifiers and descriptors are allowed. When writing non-ASCII data, the byte ordering is recorded, but not altered to match one particular endianness. Key/value pairs (of strings) are stored in plain text, one pair per line.

The NRRD file format was also conceived as being somewhat analogous to the PPM format for color images: straight-forward, friendly to programmers, and descriptive of a sufficiently large class of data to be useful in research. Time and experience with the NRRD format has gradually increased its complexity (such as with the introduction of node- versus cell-centered samples), but the feature set has very nearly converged. As a general representation of raster data, NRRD is intended to occupy the very large but sparsely populated niche between

Raw, headerless data, hopefully with some nearby README file explaining the type and dimensions.
Very sophisticated, powerful (complicated) formats such as HDF (http://hdf.ncsa.uiuc.edu/).

Many aspects of the NRRD format borrow heavily from the PCGV volume dataset format developed by James Durkin at the Cornell Program of Computer Graphics.

Optional encodings

NRRD has two basic encodings: ascii and raw. It has other optional encodings which are useful in different situations:

hex: If you know enough PostScript to learn the image dimensions, this allows you to snarf image data out of a PostScript file.
gzip: Allows you to read and write data with the zlib compression library, in a way that is compatible with the gzip/gunzip command-line tools.
bzip2: Allows you to read and write data with bzip2 compression, compatible with the bzip2/bunzip2 command-line tools.

Having an optional encoding means that the nrrd library can be compiled without these turned on, so that no external libraries are needed. Builds of the nrrd library which are missing the compression encodings will fail with a warning message when asked to read or write compressed data.

Other optional encodings may be added in the future. However, there is no risk that NRRD will turn into another TIFF, a format so flexible that few readers actually support all of the 121 page specification. The only optional encodings which may be added to NRRD in the future will be ones for which there exist freely available command-line tools to convert the encoded data (in isolation) to raw data.

If you have a NRRD file volume.nrrd, with an attached header, using a data encoding not supported by the available nrrd implementation, you can always use the unix/GNU/linux/cygwin command "tail +N volume.nrrd" (where N is two plus the number of lines in the header) to get at all the data, so as to pass it onto a stand-alone converter. Or, the Utah Nrrd Utilities command "unu data" is a much easier way of doing the exact same thing. Data in a separate file, detached from the NRRD header, is obviously trivial to pass to a converter.

I couldn't find stand-alone converters for hex data, so I wrote them:

enhex.c: converts from raw to hex
denex.c: converts from hex to raw

Because unu data will always be able to spit out the data portion of a NRRD file, even if the nrrd library on which it was built wasn't compiled with the optional encodings enabled, other non-teem NRRD readers should feel no obligation to support the optional encodings.

NRRD compared to VTK format

The VTK (http://public.kitware.com/VTK/pdf/file-formats.pdf) file formats (non-XML versions) are more general than NRRD in the types of information represented, and slightly less general that NRRD when it comes to raster data.

Unlike NRRD, VTK can represent:

Point sets, polygonal data, structure and unstructured meshes of various types
Multiple pieces of data in one file, allowing samples to have many various attributes.
Vector and tensor types explicitly. In NRRD, these are represented implicitly, by using a short non-spatial axis prior to the spatial axes.

But with raster data, unlike VTK, NRRD can:

Read and write data in either byte-ordering (VTK is always big-endian).
Have the data in a seperate file from the header.
Represent the difference between cell and node-centered samples
Store data of of any dimension, and any C scalar type.
Encode data in more than just ascii and binary, including gzip and bzip2 compression.
Store more peripheral information, such as axis labels and units.

NRRD compared to MetaImage format

This format (specs available from http://caddlab.rad.unc.edu/software/MetaIO/) was developed at the Computer Aided Diagnosis and Display Lab at UNC. It is extremely similar to NRRD in terms of representational capabilities, in that it represents arrays of general type and dimension. As of the version 4 of the NRRD file format (with magic "NRRD0004") there are basically only superficial differences between the representational abilities of the two formats. In favor of MetaImage:

The ElementNumberOfChannels field allows a nominally 3-D data header to describe what is logically a 4-D array. NRRD suffers from a slight weirdness in this regard (a color image is a three-dimensional nrrd), a consequence of its "everything is a scalar" mentality.

In favor of NRRD:

The NRRD format has a simple "magic" on a line by itself at the beginning of the file, to unambiguously identify the type of the file to multi-format readers.
NRRD supports both gzip and bzip2 compression, as well as hex, and the underlying modular implementation allows new encodings to be easily added, which work with and without attached headers.
A very conservative approach to representing optional information. If you don't know information like sample spacing, you don't include that field. The NRRD reader remembers that you didn't know spacing, rather than contriving a default value.
A flexible and general approach to defining the array orientation, which enables the lossless representation of orientation information from NIFTI-1, Analyze, and DICOM images.
Has a powerful associated command-line program, unu for doing image manipulation and pre-procesing quickly and easily. The underlying nrrd library provides an easy-to-use C API to the same functionality.

There are many one-to-one parallels between the header fields in the two formats:

NRRD	`-`	MetaImage
`#`	`-`	`Comment`
`dimension`	`-`	`NDims`
`sizes`	`-`	`DimSize`
`type`	`-`	`ElementType`
`endian`	`-`	`ElementByteOrderMSB`
`byte skip`	`-`	`HeaderSize`
`min`	`-`	`ElementMin`
`max`	`-`	`ElementMax`
`thicknesses`	`-`	`ElementSize`
`content`	`-`	`Name`
`axis mins`	`-`	`Position`
`space origin`, `space directions`	`-`	`Orientation`, `AnatomicalOrientation`
`data file`	`-`	`ElementDataFile`
(not using a detached header)	`-`	`ElementDataFile LOCAL`

There are various MetaImage fields that NRRD has no immediate analog for, because NRRD aims to be more minimalist in its representational abilibites. This sort of information would be stored as key/value pairs in NRRD:

ObjectType, ObjectSubType, TransformType, Modality: descriptive strings
ID, ParentID: integers
Color: arrays
SequenceID: 4-tuple of integers specific to DICOM format.

Some NRRD fields that MetaImage doesn't seem to have analogs for:

centers: Records whether samples are cell-centered or node-centered, which is fundamental to properly representing, for example, histograms (cell-centered) versus some simulation data (node-centered).
axis maxs: Helpful for unambiguously storing histograms, scatterplots, and images with a specific field of view.
old min, old max: Very handy for remembering the original range of floating point values, prior to quantization.
line skip: When snarfing data from other formats (VTK, VFF, VisPack, and even PostScript), specifying the skip in terms of lines (rather than bytes) is much simpler.
labels: arbitrary string per axis
units: arbitrary string per axis giving units that spacing is measured in.
space: the space in which the orientation of the array is defined can be explicitly named, mainly to disambiguate between medical image format standards using different spaces (DICOM's left-posterior-superior versus NIFTI-1's right-anterior-superior).
measurement frame: for non-scalar data such as vectors and tensors, it is extremely useful to record the coordinate frame in which the vector and tensor coefficients are measured.

Future Extensions

I have some ideas on how the NRRD file format may be extended in the future, but these are not likely to happen within the next year.

More than one array per NRRD file: There are many situations where it is good to logically associate multiple NRRD files together as one "data set". Examples include a large volume and its pre-computed univariate histogram (to help in isovalue navigation), or a collection of different one-, two-, and three-dimensional lookup tables which are good transfer functions for a given volume. I am leaning towards implementing this multi-NRRD association with XML on top of regular NRRD files. If you need this, though, you should probably be using HDF (http://hdf.ncsa.uiuc.edu/)
Bricking: Whenever I do get around to implementing bricking in nrrd, the results will be saved in NRRD files. One level of bricking will turn a 3-D array into a 6-D array. For various subtle reasons, the representation of the axis mins and maxs becomes ambiguous in the case of cell-centered data, and this information is meaningless for all the bricked axes. Brick overlap should be represented too, but this means a new field.
Other compression methods: It would be really nice to have a compression method that worked well on floating point data.