IRATE Format Specification

This document describes the specific required structure of the IRATE format for format version 0.

All IRATE files are HDF5 files, and hence usually have either an .h5 or .hdf5 extension.

IRATE File Format

The main data file for an IRATE format is referred to as simply an “IRATE file”. These files may store any number of the actual outputs of a simulation (typically meaning multiple snapshots), associated halo and/or galaxy catalogs and merger trees, and any other data that might be associated with such a simulation (e.g. black hole catalogs).

Note

An IRATE file is intended to hold at most one simulation. Multiple simulations should be stored as multiple separate IRATE files. A single simulation can, however, be spread over multiple files - see the irate.core.scatter_files and irate.core.gather_files functions for examples. In that case, the root file that contains the main heirarchy is the “IRATE file” and the others are ancillary files.

To conform to the IRATE standard, such a file must satisfy the following conditions:

  • The root of the file must have an integer attribute named ‘IRATEVersion’ that specifies the version of the IRATE format that the file obeys. The format version for this documentation is 0. The format version for the currently-installed IRATE tools can always be accessed as an integer via irate.formatversion.

  • At the root of the file, there must be a Group ‘Cosmology’. This Group must have the following HDF5 attributes to specify the cosmology that defines the data:

    • ‘HubbleParam’
    • ‘OmegaMatter’
    • ‘OmegaLambda’
    • ‘OmegaBaryon’
    • ‘PowerSpectrumIndex’
    • ‘sigma_8’

    Furthermore, if the cosmology used has an accepted name (e.g. WMAP-7), it is strongly recommended that the Group have an additional attribute, ‘Name’, for human readability; such an attribute, however, is not required.

    Some cosmologies may include additional parameters, in which case such parameters can be included as attributes of the ‘Cosmology’ group, or as datasets if such information can only be stored in array form. The naming conventions used above are recommended for custom parameters.

    For non-cosmological simulations, the ‘Cosmology’ Group must still be present; however, to signify that the cosmology is unimportant, all of the attributes should be set to zero and the ‘Name’ attribute should be set to ‘Non-Cosmological.’

  • The root of the file must also contain a Group named ‘SimulationProperties’. Various properties of the simulation, such as the box size and assorted flags, should be provided in this Group. If it’s possible, they should be given as attributes; however, it is accepted in the format that this group contain datasets as well.

  • Also at the root of the file, there may be any number of Groups with names of the form ‘Snapshot#####’, where the # is typically a number identifying the output in the context of the simulation, padded to be five digits long (e.g. Snapshot 35 would be saved under /Snapshot00035). Each Snapshot Group should have an attribute named ‘ScaleFactor’, but if there’s neither particle nor grid data contained within the the snapshot, it’s not required. It must contain only other Groups, which may be ‘ParticleData’ or ‘GridData’ (whose individual requirements are discussed in Particle Data and Grid Data, repectively), along with any number of halo or galaxy catalogs (described below in Halo Catalogs and Galaxy Catalogs).

    Todo

    Developers, Should redshift be required? It’s not provided by halo catalogs usually, so we’d be requiring users to manually type it in.)

    Todo

    Developers, Is requiring that the simulation groups be called “Snapshot#” too restrictive? Should some other naming convention be required, instead? Or just say any groups not explicitly called for here will be treated as snapshots regardless of their names (that’s in conflict with the second bullet point below)?

  • The root of the file may (but is not required to) contain a ‘MergerTrees’ Group, which holds information about the merger trees in the simulation. If present, this group must obey the format specified in Merger Trees.

  • The root of the file may also contain any other Groups that are desired, but their form is not specified in the format. Additionally, it is strongly recommended that they follow the same conventions with regards to units and naming structure that are laid out elsewhere in this documentation.

    Todo

    Developers, do we want to allow this, or should there be nothing else allowed at the root level?

  • There must not be spaces in any group names so as not to confuse some HDF5 tools that don’t play well with spaces.

Note

All group and attribute names are case-sensitive.

Unit Information

For all datasets that have units associated with them, those units should be stored either in the individual datasets as attributes, or as attributes of the Group that contains the datasets. In either case, it should be presented in both human readable and in the form of a conversion factor to CGS units. If a dataset does not have units, it will be assumed to be dimensionless.

Todo

Developers, how do you like this method of including units sound? Its based on Andrew’s and the yt/GDF format scheme...

If the units are attached directly to the Dataset that they relate to, they must be named ‘unitname’ and ‘unitcgs’; if they are instead attached to a Group above them, the names should be prepended with the exact name of the Dataset that they relate to; e.g. the units for the Dataset ‘R200b’ would be named ‘R200bunitname’ and ‘R200bunitcgs’, if they are attributes to the group that contains that Dataset.

The ‘unitname’ attribute should be a string defining the unit, e.g. ‘kpc/h’. The unitcgs attribute must be a three element array, where the stored values are, in order, the numerical conversion factor to CGS, the value of the exponent on the Hubble Parameter that the conversion factor should be multiplied by, and lastly the value of the exponent on the scale factor that the conversion factor should be multipled by.

For example, if ‘unitname’ is ‘comoving Mpc/h’, ‘unitcgs’ should be an array containing [3.0857e24, -1, 1].

Note that the core library provides utilities for accessing units - see irate.core.get_units(), irate.core.set_units(), and irate.core.get_cgs_factor().

Other Metadata

Other metadata associated with individual datasets should be included in the same fashion as units. That is, they should either be attributes directly attached to the dataset with the metadata field name, or they can be attributes of groups further up the hierarchy, following the simple naming convention datasetnamemetadataname. The core library provides utilities for accessing or setting metadata in irate.core.get_metadata() and irate.core.set_metadata().

Particle Data

The ParticleData Group, if it exists, must contain at least one group, of which the most common are ‘Dark’, ‘Gas’, and ‘Star’; these contain the data for dark matter, stars, and gas, respectively. Users are free to use other names for particle blocks, e.g. if the users want to separate high resolution from low resolution particles, but any Group containing dark matter particles must have a (case-sensitive) name that begins with ‘Dark’ (e.g. ‘Dark_HighRes’), any Group containing gas particles must have a name that begins with ‘Gas’, and any Group containing star particles must have a name beginning with ‘Star’. Users are free to store other particle types in IRATE files; it is strongly recommended that they follow the same convention laid out here (e.g. ‘BlackHole’). Tools that read in IRATE files, such as halo finders, will assume the type of particle based on the group name.

Any groups within /Snapshot#/ParticleData/ may contain only data sets. For particle data, the following Dataset objects must be present in each group that exists, even if they have 0 particles:

  • ‘Position’ (N x d)
  • ‘Velocity’ (N x d)
  • ‘Mass’ (N)
  • ‘ID’ (N)

where d is the dimensionality (presumably pretty much always 3) and N is the total number of particles. Additional data sets (e.g. ‘Metallicity’,’Entropy’, ‘Density’, etc.) may be present, but the above 4 are the minimum required. Any other data sets are encouraged to either be shape N for scalar data, or N x d for vector data.

Grid Data

The grid data specification has not yet been defined.

Halo Catalogs

Halo catalogs are stored as a Group that must have names that begin with the phrase ‘HaloCatalog’, For example, both ‘HaloCatalog_AHF1’ and ‘HaloCatalog_Rockstar’ are valid names; ‘AHFCatalog’ and ‘Catalog_Rockstar’, however, are not.

Any halo catalogs that are contained within a Snapshot Group should have, as attributes, any parameters that are relevant to the halo finder, such as FOF linking lengths, overdensity criterion, or the code used to produce that catalog (though the former may be obvious from the name of the group).

Any halo catalogs must contain a Dataset with the Name ‘Center’ that has shape N x d, where N is the number of halos in thecatalog, and d is the dimensionality (typically 3). All other datasets in the catalog should have a matching first dimension, and should be in the same order. That is, the ith entry in ‘Center’ should correspond to the same halo as the ith entry in any of the other datasets.

If the index of the most bound particle of each halo is included in the halo catalog, it should be stored in a Dataset named ‘MostBoundParticleID’.

If particle data is included with the halo catalog, it must be saved in a Group inside the halo catalog with the name ‘HaloParticleData’. This group must contain at least two datasets. The first of these should be named ‘HaloParticleIDs’, while the second should be named ‘ParticlePerHalo’.

‘HaloParticleIDs’ should contain integer particle IDs in order such that all particles in the first halo come first, followed by those in the second halo, and so on. Here, halo order is the same as the order of the halos in the ‘Center’ dataset. Note that the number of elements of this dataset is not neccesarily the same as the number of total particles, because some particles may be members of multiple halos, in which case they appear on ‘HaloParticleIDs’ more than once.

The ‘ParticlePerHalo’ Dataset, on the other hand, must be of a length matching the first dimension of of the ‘Center’ dataset, and should give the (integer) number of particles in each halo. The sum of all of the values in this dataset must match the size of the ‘HaloParticleIDs’ dataset. This allows ‘HaloParticleIDs’ and ‘ParticlesPerHalo’ to provide all the information needed determine which particles are in which halos.

Many users will find it convenient to store the type of particle as well. This should be saved in a third Dataset named ‘HaloParticleTypes’, but this dataset is not required by the format. If it is present, it should be of the same size as ‘HaloParticleIDs’.

Galaxy Catalogs

Galaxy catalogs are stored as a Group that must have names that begin with the phrase ‘GalaxyCatalog’, For example, both ‘GalaxyCatalog_Galacticus’ and ‘GalaxyCatalog_LGalaxies’ are valid names; ‘GalacticusCatalog’ and ‘Catalog_LGalaxies’, however, are not.

Any galaxy catalogs that are contained within a Snapshot Group should have, as attributes, any parameters that are relevant to the galaxy formation code, such as input parameter values, or the version of the code used to produce that catalog.

Any galaxy catalogs must contain two Dataset s with the names ‘HaloID’ and ‘HaloSnapshot’, that have shape N, where N is the number of galaxies in the catalog. All other datasets in the catalog should have a matching first dimension, and should be in the same order. That is, the ith entry in ‘HaloID’ should correspond to the same halo as the ith entry in any of the other datasets. The ‘HaloID’ dataset should give the ID of the halo in which the galaxy is located, while ‘HaloSnapshot’ should give the corresponding snapshot number at which that halo exists. (Galaxies may be located in halos which exist at an earlier snapshot if, for example, the halo can no longer be found at the current snapshot, but the galaxy formation code determines that the galaxy itself has not yet merged.) If corresponding halos are not present in the file these two Dataset s should have all values set to -1.

Merger Trees

Merger trees are stored as a Group that must have names that begin with the phrase ‘MergerTrees’. For example, both ‘MergerTrees_yt’ and ‘MergerTrees_Millennium’ are valid names; ‘ytTrees’, however, is not.

Merger tree groups should have, as attributes, any parameters that are relevant to the merger tree builder, such as the name of the code used to build the trees.

Any merger tree groups must contain a Dataset with the name ‘HaloID’ that has shape N, where N is the total number of halos in all trees. The dataset should give the integer index of a halo in a ‘HaloCatalog’ Group. Also required are Dataset s with the names ‘HaloSnapshot’, ‘DescendentID’ and ‘DescendentSnapshot’ which must have the same shape, N, and should be in the same order. That is, the ith entry in ‘HaloID’ should correspond to the same halo as the ith entry in ‘HaloSnapshot’, ‘DescendentID’ and ‘DescendentSnapshot’. The ‘HaloSnapshot’ Dataset must give the index of the snapshot to which this halo belongs. The ‘DescendentID’ and ‘DescendentSnapshot’ Dataset s must give the index and snapshot of the halo into which this halo descends. For halos with no descendent (e.g. the root halo of a tree), values of -1 should be used.

In addition, the MergerTrees Group must contain a Dataset name ‘HalosPerTree’ must be of a length equal to the total number of trees present in group, and should give the (integer) number of halos in each tree. The sum of all of the values in this dataset must match the size of the ‘HaloID’ Dataset. ‘HalosPerTree’ provides all the information needed determine which halos are in which trees. Optionally, a Dataset named ‘TreeID’, which should have the same length as ‘HalosPerTree’ may be present, and should give a unique identifying index for each tree.

Examples

Here we provide the structure of a sample IRATE Format file in the form output by the h5dump utility (included in libhdf5 library). Note that the ‘Halo’, ‘Bulge’, and ‘Disk’ groups are not actually a part of the specification, but are examples of possible ways one might wish to sub-divide the particle data. Also note that a typical IRATE file will contain many more datasets, particularly in the catalogs, which have been removed from here for the sake of brevity:

HDF5 "SampleIRATEfile.hdf5" {
FILE_CONTENTS {
 group      /                       (Contains attribute defining the version of the IRATE format that this file conforms to)
 group      /Cosmology              (Contains attributes defining the cosmology of the simulation)
 group      /SimulationProperties   (Contains attributes defining non-cosmological properties of the simulation)
 group      /Snapshot00144          (Contains attributes defining redshift, scale factor, or both)
 group      /Snapshot00144/HaloCatalog_AHF      (Should contain attributes defining the parameters of the halo finding)
 dataset    /Snapshot00144/HaloCatalog_AHF/Center   (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/Ekin     (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/Epot     (Contains attributes with unit information)
 group      /Snapshot00144/HaloCatalog_AHF/HaloParticleData
 ext link   /Snapshot00144/HaloCatalog_AHF/HaloParticleData/HaloParticleTypes -> SampleIRATEfile-00144particles.hdf5 /HaloParticleTypes
 ext link   /Snapshot00144/HaloCatalog_AHF/HaloParticleData/HaloParticleIDs -> SampleIRATEfile-00144particles.hdf5 /HaloParticleIDs
 ext link   /Snapshot00144/HaloCatalog_AHF/HaloParticleData/ParticlesPerHalo -> SampleIRATEfile-00144particles.hdf5 /ParticlesPerHalo
 dataset    /Snapshot00144/HaloCatalog_AHF/L        (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/Mvir     (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/Phi      (Contains attributes with unit information)
 group      /Snapshot00144/HaloCatalog_AHF/RadialProfiles
 dataset    /Snapshot00144/HaloCatalog_AHF/RadialProfiles/L         (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/RadialProfiles/M_in_r    (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/RadialProfiles/dens      (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/RadialProfiles/npart
 dataset    /Snapshot00144/HaloCatalog_AHF/RadialProfiles/r         (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/RadialProfiles/vcirc     (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/Rmax         (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/Rvir         (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/Velocity     (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/Vmax         (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/fMhires
 dataset    /Snapshot00144/HaloCatalog_AHF/lambda
 dataset    /Snapshot00144/HaloCatalog_AHF/nbins
 dataset    /Snapshot00144/HaloCatalog_AHF/npart
 group      /Snapshot00144/HaloCatalog_Rockstar     (Should contain attributes defining the parameters of the halo finding)
 dataset    /Snapshot00144/HaloCatalog_Rockstar/Center      (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_Rockstar/M200b       (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_Rockstar/R200b       (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_Rockstar/Rmax        (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_Rockstar/Spin
 dataset    /Snapshot00144/HaloCatalog_Rockstar/Velocity    (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_Rockstar/Vmax        (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_Rockstar/npart
 group      /Snapshot00144/ParticleData                     (Contains attributes with unit information for all datasets within it)
 group      /Snapshot00144/ParticleData/Dark_Bulge
 dataset    /Snapshot00144/ParticleData/Dark_Bulge/ID
 dataset    /Snapshot00144/ParticleData/Dark_Bulge/Mass
 dataset    /Snapshot00144/ParticleData/Dark_Bulge/Position
 dataset    /Snapshot00144/ParticleData/Dark_Bulge/Velocity
 group      /Snapshot00144/ParticleData/Dark_Disk
 dataset    /Snapshot00144/ParticleData/Dark_Disk/ID
 dataset    /Snapshot00144/ParticleData/Dark_Disk/Mass
 dataset    /Snapshot00144/ParticleData/Dark_Disk/Position
 dataset    /Snapshot00144/ParticleData/Dark_Disk/Velocity
 group      /Snapshot00144/ParticleData/Dark_Halo
 dataset    /Snapshot00144/ParticleData/Dark_Halo/ID
 dataset    /Snapshot00144/ParticleData/Dark_Halo/Mass
 dataset    /Snapshot00144/ParticleData/Dark_Halo/Position
 dataset    /Snapshot00144/ParticleData/Dark_Halo/Velocity
 group      /Snapshot00153                      (Contains attributes defining redshift, scale factor, or both)
 group      /Snapshot00153/HaloCatalog_AHF      (Should contain attributes defining the parameters of the halo finding)
 dataset    /Snapshot00153/HaloCatalog_AHF/Center       (Contains attributes with unit information)
 group      /Snapshot00153/HaloCatalog_AHF/HaloParticleData
 ext link   /Snapshot00153/HaloCatalog_AHF/HaloParticleData/HaloParticleTypes -> SampleIRATEfile-00153particles.hdf5 /HaloParticleTypes
 ext link   /Snapshot00153/HaloCatalog_AHF/HaloParticleData/HaloParticleIDs -> SampleIRATEfile-00153particles.hdf5 /HaloParticleIDs
 ext link   /Snapshot00153/HaloCatalog_AHF/HaloParticleData/ParticlesPerHalo -> SampleIRATEfile-00153particles.hdf5 /ParticlesPerHalo
 dataset    /Snapshot00153/HaloCatalog_AHF/L            (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_AHF/Mvir         (Contains attributes with unit information)
 group      /Snapshot00153/HaloCatalog_AHF/RadialProfiles
 dataset    /Snapshot00153/HaloCatalog_AHF/RadialProfiles/M_in_r    (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_AHF/RadialProfiles/r         (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_AHF/RadialProfiles/vcirc     (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_AHF/Rmax         (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_AHF/Rvir         (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_AHF/Velocity     (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_AHF/Vmax         (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_AHF/nbins
 dataset    /Snapshot00153/HaloCatalog_AHF/npart
 group      /Snapshot00153/HaloCatalog_Rockstar     (Should contain attributes defining the parameters of the halo finding)
 dataset    /Snapshot00153/HaloCatalog_Rockstar/Center          (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_Rockstar/M200b           (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_Rockstar/Mbound200b      (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_Rockstar/R200b           (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_Rockstar/Rmax            (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_Rockstar/Velocity        (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_Rockstar/Vmax            (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_Rockstar/npart           (Contains attributes with unit information)
 group      /Snapshot00153/ParticleData             (Contains attributes with unit information for all datasets within it)
 group      /Snapshot00153/ParticleData/Dark_Bulge
 dataset    /Snapshot00153/ParticleData/Dark_Bulge/ID
 dataset    /Snapshot00153/ParticleData/Dark_Bulge/Mass
 dataset    /Snapshot00153/ParticleData/Dark_Bulge/Position
 dataset    /Snapshot00153/ParticleData/Dark_Bulge/Velocity
 group      /Snapshot00153/ParticleData/Dark_Disk
 dataset    /Snapshot00153/ParticleData/Dark_Disk/ID
 dataset    /Snapshot00153/ParticleData/Dark_Disk/Mass
 dataset    /Snapshot00153/ParticleData/Dark_Disk/Position
 dataset    /Snapshot00153/ParticleData/Dark_Disk/Velocity
 group      /Snapshot00153/ParticleData/Dark_Halo
 dataset    /Snapshot00153/ParticleData/Dark_Halo/ID
 dataset    /Snapshot00153/ParticleData/Dark_Halo/Mass
 dataset    /Snapshot00153/ParticleData/Dark_Halo/Position
 dataset    /Snapshot00153/ParticleData/Dark_Halo/Velocity
 }
}

In addition, we include here an example of a file with merger trees:

HDF5 "treesIRATE.hdf5" {
FILE_CONTENTS {
 group      /
 group      /Cosmology
 group      /MergerTrees
 dataset    /MergerTrees/DescendentID
 dataset    /MergerTrees/DescendentSnapshot
 dataset    /MergerTrees/HaloID
 dataset    /MergerTrees/HaloSnapshot
 dataset    /MergerTrees/HalosPerTree
 dataset    /MergerTrees/HostID
 dataset    /MergerTrees/TreeID
 group      /SimulationProperties
 group      /Snapshot00016
 group      /Snapshot00016/HaloCatalog
 dataset    /Snapshot00016/HaloCatalog/AngularMomentum
 dataset    /Snapshot00016/HaloCatalog/Center
 dataset    /Snapshot00016/HaloCatalog/HalfMassRadius
 dataset    /Snapshot00016/HaloCatalog/Index
 dataset    /Snapshot00016/HaloCatalog/Mass
 dataset    /Snapshot00016/HaloCatalog/MostBoundParticleID
 dataset    /Snapshot00016/HaloCatalog/Velocity
 group      /Snapshot00016/ParticleData
 group      /Snapshot00016/ParticleData/Dark
 dataset    /Snapshot00016/ParticleData/Dark/ID
 dataset    /Snapshot00016/ParticleData/Dark/Mass
 dataset    /Snapshot00016/ParticleData/Dark/Position
 dataset    /Snapshot00016/ParticleData/Dark/Velocity
 }
}