IRATE Format Specification¶
This document describes the specific required structure of the IRATE format for format version 0.
All IRATE files are HDF5 files, and hence
usually have either an .h5
or .hdf5
extension.
IRATE File Format¶
The main data file for an IRATE format is referred to as simply an “IRATE file”. These files may store any number of the actual outputs of a simulation (typically meaning multiple snapshots), associated halo and/or galaxy catalogs and merger trees, and any other data that might be associated with such a simulation (e.g. black hole catalogs).
Note
An IRATE file is intended to hold at most one simulation. Multiple simulations should be stored as multiple separate IRATE files. A single simulation can, however, be spread over multiple files - see the irate.core.scatter_files and irate.core.gather_files functions for examples. In that case, the root file that contains the main heirarchy is the “IRATE file” and the others are ancillary files.
To conform to the IRATE standard, such a file must satisfy the following conditions:
The root of the file must have an integer attribute named ‘IRATEVersion’ that specifies the version of the IRATE format that the file obeys. The format version for this documentation is 0. The format version for the currently-installed IRATE tools can always be accessed as an integer via
irate.formatversion
.At the root of the file, there must be a
Group
‘Cosmology’. ThisGroup
must have the following HDF5 attributes to specify the cosmology that defines the data:- ‘HubbleParam’
- ‘OmegaMatter’
- ‘OmegaLambda’
- ‘OmegaBaryon’
- ‘PowerSpectrumIndex’
- ‘sigma_8’
Furthermore, if the cosmology used has an accepted name (e.g. WMAP-7), it is strongly recommended that the
Group
have an additional attribute, ‘Name’, for human readability; such an attribute, however, is not required.Some cosmologies may include additional parameters, in which case such parameters can be included as attributes of the ‘Cosmology’ group, or as datasets if such information can only be stored in array form. The naming conventions used above are recommended for custom parameters.
For non-cosmological simulations, the ‘Cosmology’
Group
must still be present; however, to signify that the cosmology is unimportant, all of the attributes should be set to zero and the ‘Name’ attribute should be set to ‘Non-Cosmological.’The root of the file must also contain a
Group
named ‘SimulationProperties’. Various properties of the simulation, such as the box size and assorted flags, should be provided in thisGroup
. If it’s possible, they should be given as attributes; however, it is accepted in the format that this group contain datasets as well.Also at the root of the file, there may be any number of Groups with names of the form ‘Snapshot#####’, where the # is typically a number identifying the output in the context of the simulation, padded to be five digits long (e.g. Snapshot 35 would be saved under /Snapshot00035). Each Snapshot
Group
should have an attribute named ‘ScaleFactor’, but if there’s neither particle nor grid data contained within the the snapshot, it’s not required. It must contain only other Groups, which may be ‘ParticleData’ or ‘GridData’ (whose individual requirements are discussed in Particle Data and Grid Data, repectively), along with any number of halo or galaxy catalogs (described below in Halo Catalogs and Galaxy Catalogs).Todo
Developers, Should redshift be required? It’s not provided by halo catalogs usually, so we’d be requiring users to manually type it in.)
Todo
Developers, Is requiring that the simulation groups be called “Snapshot#” too restrictive? Should some other naming convention be required, instead? Or just say any groups not explicitly called for here will be treated as snapshots regardless of their names (that’s in conflict with the second bullet point below)?
The root of the file may (but is not required to) contain a ‘MergerTrees’
Group
, which holds information about the merger trees in the simulation. If present, this group must obey the format specified in Merger Trees.The root of the file may also contain any other Groups that are desired, but their form is not specified in the format. Additionally, it is strongly recommended that they follow the same conventions with regards to units and naming structure that are laid out elsewhere in this documentation.
Todo
Developers, do we want to allow this, or should there be nothing else allowed at the root level?
There must not be spaces in any group names so as not to confuse some HDF5 tools that don’t play well with spaces.
Note
All group and attribute names are case-sensitive.
Unit Information¶
For all datasets that have units associated with them, those units should be
stored either in the individual datasets as attributes, or as
attributes of the Group
that contains the datasets. In either
case, it should be presented in both human readable and in the form of a
conversion factor to CGS units. If a dataset does not have units, it will be
assumed to be dimensionless.
Todo
Developers, how do you like this method of including units sound? Its based on Andrew’s and the yt/GDF format scheme...
If the units are attached directly to the Dataset
that they
relate to, they must be named ‘unitname’ and ‘unitcgs’; if they are instead
attached to a Group
above them, the names should be prepended
with the exact name of the Dataset
that they relate to; e.g.
the units for the Dataset
‘R200b’ would be named ‘R200bunitname’
and ‘R200bunitcgs’, if they are attributes to the group that contains that
Dataset
.
The ‘unitname’ attribute should be a string defining the unit, e.g. ‘kpc/h’. The unitcgs attribute must be a three element array, where the stored values are, in order, the numerical conversion factor to CGS, the value of the exponent on the Hubble Parameter that the conversion factor should be multiplied by, and lastly the value of the exponent on the scale factor that the conversion factor should be multipled by.
For example, if ‘unitname’ is ‘comoving Mpc/h’, ‘unitcgs’ should be an array containing [3.0857e24, -1, 1].
Note that the core library provides utilities for accessing units - see
irate.core.get_units()
, irate.core.set_units()
, and
irate.core.get_cgs_factor()
.
Other Metadata¶
Other metadata associated with individual datasets should be included in the
same fashion as units. That is, they should either be attributes directly
attached to the dataset with the metadata field name, or they can be attributes
of groups further up the hierarchy, following the simple naming convention
datasetnamemetadataname. The core library provides utilities for accessing or
setting metadata in irate.core.get_metadata()
and
irate.core.set_metadata()
.
Particle Data¶
The ParticleData Group
, if it exists, must contain at least
one group, of which the most common are ‘Dark’, ‘Gas’, and ‘Star’;
these contain the data for dark matter, stars, and gas,
respectively. Users are free to use other names for particle blocks, e.g. if
the users want to separate high resolution from low resolution
particles, but any Group
containing dark matter particles must
have a (case-sensitive) name that begins with ‘Dark’ (e.g. ‘Dark_HighRes’), any
Group
containing gas particles must have a name that begins with
‘Gas’, and any Group
containing star particles must have a name
beginning with ‘Star’. Users are free to store other particle types in IRATE
files; it is strongly recommended that they follow the same convention laid out
here (e.g. ‘BlackHole’). Tools that read in IRATE files, such as halo finders,
will assume the type of particle based on the group name.
Any groups within /Snapshot#/ParticleData/ may contain
only data sets. For particle data, the following Dataset
objects must be present in each group that exists, even if they have 0
particles:
- ‘Position’ (N x d)
- ‘Velocity’ (N x d)
- ‘Mass’ (N)
- ‘ID’ (N)
where d is the dimensionality (presumably pretty much always 3) and N is the total number of particles. Additional data sets (e.g. ‘Metallicity’,’Entropy’, ‘Density’, etc.) may be present, but the above 4 are the minimum required. Any other data sets are encouraged to either be shape N for scalar data, or N x d for vector data.
Grid Data¶
The grid data specification has not yet been defined.
Halo Catalogs¶
Halo catalogs are stored as a Group
that must have names that
begin with the phrase ‘HaloCatalog’, For example, both ‘HaloCatalog_AHF1’ and
‘HaloCatalog_Rockstar’ are valid names; ‘AHFCatalog’ and ‘Catalog_Rockstar’,
however, are not.
Any halo catalogs that are contained within a Snapshot Group
should have, as attributes, any parameters that are relevant to the halo
finder, such as FOF linking lengths, overdensity criterion, or the code used
to produce that catalog (though the former may be obvious from the name of
the group).
Any halo catalogs must contain a Dataset
with the Name ‘Center’
that has shape N x d, where N is the number of halos in thecatalog, and d is the
dimensionality (typically 3). All other datasets in the catalog should have a
matching first dimension, and should be in the same order. That is, the ith entry
in ‘Center’ should correspond to the same halo as the ith entry in any of the
other datasets.
If the index of the most bound particle of each halo is included in the
halo catalog, it should be stored in a Dataset
named
‘MostBoundParticleID’.
If particle data is included with the halo catalog, it must be saved in a
Group
inside the halo catalog with the name ‘HaloParticleData’.
This group must contain at least two datasets. The first of these should be
named ‘HaloParticleIDs’, while the second should be named ‘ParticlePerHalo’.
‘HaloParticleIDs’ should contain integer particle IDs in order such that all particles in the first halo come first, followed by those in the second halo, and so on. Here, halo order is the same as the order of the halos in the ‘Center’ dataset. Note that the number of elements of this dataset is not neccesarily the same as the number of total particles, because some particles may be members of multiple halos, in which case they appear on ‘HaloParticleIDs’ more than once.
The ‘ParticlePerHalo’ Dataset
, on the other hand, must be of a
length matching the first dimension of of the ‘Center’ dataset, and should give
the (integer) number of particles in each halo. The sum of all of the values in
this dataset must match the size of the ‘HaloParticleIDs’ dataset. This allows
‘HaloParticleIDs’ and ‘ParticlesPerHalo’ to provide all the information needed
determine which particles are in which halos.
Many users will find it convenient to store the type of particle as well. This
should be saved in a third Dataset
named ‘HaloParticleTypes’, but
this dataset is not required by the format. If it is present, it should be of
the same size as ‘HaloParticleIDs’.
Galaxy Catalogs¶
Galaxy catalogs are stored as a Group
that must have names that
begin with the phrase ‘GalaxyCatalog’, For example, both ‘GalaxyCatalog_Galacticus’ and
‘GalaxyCatalog_LGalaxies’ are valid names; ‘GalacticusCatalog’ and ‘Catalog_LGalaxies’,
however, are not.
Any galaxy catalogs that are contained within a Snapshot Group
should have, as attributes, any parameters that are relevant to the galaxy formation
code, such as input parameter values, or the version of the code used
to produce that catalog.
Any galaxy catalogs must contain two Dataset
s with the
names ‘HaloID’ and ‘HaloSnapshot’, that have shape N, where N is the
number of galaxies in the catalog. All other datasets in the catalog
should have a matching first dimension, and should be in the same
order. That is, the ith entry in ‘HaloID’ should correspond to the same
halo as the ith entry in any of the other datasets. The ‘HaloID’ dataset
should give the ID of the halo in which the galaxy is located, while
‘HaloSnapshot’ should give the corresponding snapshot number at which
that halo exists. (Galaxies may be located in halos which exist at an
earlier snapshot if, for example, the halo can no longer be found at the
current snapshot, but the galaxy formation code determines that the
galaxy itself has not yet merged.) If corresponding halos are not
present in the file these two Dataset
s should have all
values set to -1
.
Merger Trees¶
Merger trees are stored as a Group
that must have names
that begin with the phrase ‘MergerTrees’. For example, both
‘MergerTrees_yt’ and ‘MergerTrees_Millennium’ are valid names;
‘ytTrees’, however, is not.
Merger tree groups should have, as attributes, any parameters that are relevant to the merger tree builder, such as the name of the code used to build the trees.
Any merger tree groups must contain a Dataset
with the
name ‘HaloID’ that has shape N, where N is the total number of halos
in all trees. The dataset should give the integer index of a halo in a
‘HaloCatalog’ Group
. Also required are
Dataset
s with the names ‘HaloSnapshot’,
‘DescendentID’ and ‘DescendentSnapshot’ which must have the same
shape, N, and should be in the same order. That is, the ith entry in
‘HaloID’ should correspond to the same halo as the ith entry in
‘HaloSnapshot’, ‘DescendentID’ and ‘DescendentSnapshot’. The
‘HaloSnapshot’ Dataset
must give the index of the snapshot to
which this halo belongs. The ‘DescendentID’ and ‘DescendentSnapshot’
Dataset
s must give the index and snapshot of the halo
into which this halo descends. For halos with no descendent (e.g. the
root halo of a tree), values of -1 should be used.
In addition, the MergerTrees Group
must contain a
Dataset
name ‘HalosPerTree’ must be of a length equal to
the total number of trees present in group, and should give the
(integer) number of halos in each tree. The sum of all of the values in
this dataset must match the size of the ‘HaloID’
Dataset
. ‘HalosPerTree’ provides all the information
needed determine which halos are in which trees. Optionally, a
Dataset
named ‘TreeID’, which should have the same
length as ‘HalosPerTree’ may be present, and should give a unique
identifying index for each tree.
Examples¶
Here we provide the structure of a sample IRATE Format file in the form output
by the h5dump
utility (included in libhdf5 library). Note that the ‘Halo’,
‘Bulge’, and ‘Disk’ groups are not actually a part of the specification, but are
examples of possible ways one might wish to sub-divide the particle data. Also note
that a typical IRATE file will contain many more datasets, particularly in the
catalogs, which have been removed from here for
the sake of brevity:
HDF5 "SampleIRATEfile.hdf5" {
FILE_CONTENTS {
group / (Contains attribute defining the version of the IRATE format that this file conforms to)
group /Cosmology (Contains attributes defining the cosmology of the simulation)
group /SimulationProperties (Contains attributes defining non-cosmological properties of the simulation)
group /Snapshot00144 (Contains attributes defining redshift, scale factor, or both)
group /Snapshot00144/HaloCatalog_AHF (Should contain attributes defining the parameters of the halo finding)
dataset /Snapshot00144/HaloCatalog_AHF/Center (Contains attributes with unit information)
dataset /Snapshot00144/HaloCatalog_AHF/Ekin (Contains attributes with unit information)
dataset /Snapshot00144/HaloCatalog_AHF/Epot (Contains attributes with unit information)
group /Snapshot00144/HaloCatalog_AHF/HaloParticleData
ext link /Snapshot00144/HaloCatalog_AHF/HaloParticleData/HaloParticleTypes -> SampleIRATEfile-00144particles.hdf5 /HaloParticleTypes
ext link /Snapshot00144/HaloCatalog_AHF/HaloParticleData/HaloParticleIDs -> SampleIRATEfile-00144particles.hdf5 /HaloParticleIDs
ext link /Snapshot00144/HaloCatalog_AHF/HaloParticleData/ParticlesPerHalo -> SampleIRATEfile-00144particles.hdf5 /ParticlesPerHalo
dataset /Snapshot00144/HaloCatalog_AHF/L (Contains attributes with unit information)
dataset /Snapshot00144/HaloCatalog_AHF/Mvir (Contains attributes with unit information)
dataset /Snapshot00144/HaloCatalog_AHF/Phi (Contains attributes with unit information)
group /Snapshot00144/HaloCatalog_AHF/RadialProfiles
dataset /Snapshot00144/HaloCatalog_AHF/RadialProfiles/L (Contains attributes with unit information)
dataset /Snapshot00144/HaloCatalog_AHF/RadialProfiles/M_in_r (Contains attributes with unit information)
dataset /Snapshot00144/HaloCatalog_AHF/RadialProfiles/dens (Contains attributes with unit information)
dataset /Snapshot00144/HaloCatalog_AHF/RadialProfiles/npart
dataset /Snapshot00144/HaloCatalog_AHF/RadialProfiles/r (Contains attributes with unit information)
dataset /Snapshot00144/HaloCatalog_AHF/RadialProfiles/vcirc (Contains attributes with unit information)
dataset /Snapshot00144/HaloCatalog_AHF/Rmax (Contains attributes with unit information)
dataset /Snapshot00144/HaloCatalog_AHF/Rvir (Contains attributes with unit information)
dataset /Snapshot00144/HaloCatalog_AHF/Velocity (Contains attributes with unit information)
dataset /Snapshot00144/HaloCatalog_AHF/Vmax (Contains attributes with unit information)
dataset /Snapshot00144/HaloCatalog_AHF/fMhires
dataset /Snapshot00144/HaloCatalog_AHF/lambda
dataset /Snapshot00144/HaloCatalog_AHF/nbins
dataset /Snapshot00144/HaloCatalog_AHF/npart
group /Snapshot00144/HaloCatalog_Rockstar (Should contain attributes defining the parameters of the halo finding)
dataset /Snapshot00144/HaloCatalog_Rockstar/Center (Contains attributes with unit information)
dataset /Snapshot00144/HaloCatalog_Rockstar/M200b (Contains attributes with unit information)
dataset /Snapshot00144/HaloCatalog_Rockstar/R200b (Contains attributes with unit information)
dataset /Snapshot00144/HaloCatalog_Rockstar/Rmax (Contains attributes with unit information)
dataset /Snapshot00144/HaloCatalog_Rockstar/Spin
dataset /Snapshot00144/HaloCatalog_Rockstar/Velocity (Contains attributes with unit information)
dataset /Snapshot00144/HaloCatalog_Rockstar/Vmax (Contains attributes with unit information)
dataset /Snapshot00144/HaloCatalog_Rockstar/npart
group /Snapshot00144/ParticleData (Contains attributes with unit information for all datasets within it)
group /Snapshot00144/ParticleData/Dark_Bulge
dataset /Snapshot00144/ParticleData/Dark_Bulge/ID
dataset /Snapshot00144/ParticleData/Dark_Bulge/Mass
dataset /Snapshot00144/ParticleData/Dark_Bulge/Position
dataset /Snapshot00144/ParticleData/Dark_Bulge/Velocity
group /Snapshot00144/ParticleData/Dark_Disk
dataset /Snapshot00144/ParticleData/Dark_Disk/ID
dataset /Snapshot00144/ParticleData/Dark_Disk/Mass
dataset /Snapshot00144/ParticleData/Dark_Disk/Position
dataset /Snapshot00144/ParticleData/Dark_Disk/Velocity
group /Snapshot00144/ParticleData/Dark_Halo
dataset /Snapshot00144/ParticleData/Dark_Halo/ID
dataset /Snapshot00144/ParticleData/Dark_Halo/Mass
dataset /Snapshot00144/ParticleData/Dark_Halo/Position
dataset /Snapshot00144/ParticleData/Dark_Halo/Velocity
group /Snapshot00153 (Contains attributes defining redshift, scale factor, or both)
group /Snapshot00153/HaloCatalog_AHF (Should contain attributes defining the parameters of the halo finding)
dataset /Snapshot00153/HaloCatalog_AHF/Center (Contains attributes with unit information)
group /Snapshot00153/HaloCatalog_AHF/HaloParticleData
ext link /Snapshot00153/HaloCatalog_AHF/HaloParticleData/HaloParticleTypes -> SampleIRATEfile-00153particles.hdf5 /HaloParticleTypes
ext link /Snapshot00153/HaloCatalog_AHF/HaloParticleData/HaloParticleIDs -> SampleIRATEfile-00153particles.hdf5 /HaloParticleIDs
ext link /Snapshot00153/HaloCatalog_AHF/HaloParticleData/ParticlesPerHalo -> SampleIRATEfile-00153particles.hdf5 /ParticlesPerHalo
dataset /Snapshot00153/HaloCatalog_AHF/L (Contains attributes with unit information)
dataset /Snapshot00153/HaloCatalog_AHF/Mvir (Contains attributes with unit information)
group /Snapshot00153/HaloCatalog_AHF/RadialProfiles
dataset /Snapshot00153/HaloCatalog_AHF/RadialProfiles/M_in_r (Contains attributes with unit information)
dataset /Snapshot00153/HaloCatalog_AHF/RadialProfiles/r (Contains attributes with unit information)
dataset /Snapshot00153/HaloCatalog_AHF/RadialProfiles/vcirc (Contains attributes with unit information)
dataset /Snapshot00153/HaloCatalog_AHF/Rmax (Contains attributes with unit information)
dataset /Snapshot00153/HaloCatalog_AHF/Rvir (Contains attributes with unit information)
dataset /Snapshot00153/HaloCatalog_AHF/Velocity (Contains attributes with unit information)
dataset /Snapshot00153/HaloCatalog_AHF/Vmax (Contains attributes with unit information)
dataset /Snapshot00153/HaloCatalog_AHF/nbins
dataset /Snapshot00153/HaloCatalog_AHF/npart
group /Snapshot00153/HaloCatalog_Rockstar (Should contain attributes defining the parameters of the halo finding)
dataset /Snapshot00153/HaloCatalog_Rockstar/Center (Contains attributes with unit information)
dataset /Snapshot00153/HaloCatalog_Rockstar/M200b (Contains attributes with unit information)
dataset /Snapshot00153/HaloCatalog_Rockstar/Mbound200b (Contains attributes with unit information)
dataset /Snapshot00153/HaloCatalog_Rockstar/R200b (Contains attributes with unit information)
dataset /Snapshot00153/HaloCatalog_Rockstar/Rmax (Contains attributes with unit information)
dataset /Snapshot00153/HaloCatalog_Rockstar/Velocity (Contains attributes with unit information)
dataset /Snapshot00153/HaloCatalog_Rockstar/Vmax (Contains attributes with unit information)
dataset /Snapshot00153/HaloCatalog_Rockstar/npart (Contains attributes with unit information)
group /Snapshot00153/ParticleData (Contains attributes with unit information for all datasets within it)
group /Snapshot00153/ParticleData/Dark_Bulge
dataset /Snapshot00153/ParticleData/Dark_Bulge/ID
dataset /Snapshot00153/ParticleData/Dark_Bulge/Mass
dataset /Snapshot00153/ParticleData/Dark_Bulge/Position
dataset /Snapshot00153/ParticleData/Dark_Bulge/Velocity
group /Snapshot00153/ParticleData/Dark_Disk
dataset /Snapshot00153/ParticleData/Dark_Disk/ID
dataset /Snapshot00153/ParticleData/Dark_Disk/Mass
dataset /Snapshot00153/ParticleData/Dark_Disk/Position
dataset /Snapshot00153/ParticleData/Dark_Disk/Velocity
group /Snapshot00153/ParticleData/Dark_Halo
dataset /Snapshot00153/ParticleData/Dark_Halo/ID
dataset /Snapshot00153/ParticleData/Dark_Halo/Mass
dataset /Snapshot00153/ParticleData/Dark_Halo/Position
dataset /Snapshot00153/ParticleData/Dark_Halo/Velocity
}
}
In addition, we include here an example of a file with merger trees:
HDF5 "treesIRATE.hdf5" {
FILE_CONTENTS {
group /
group /Cosmology
group /MergerTrees
dataset /MergerTrees/DescendentID
dataset /MergerTrees/DescendentSnapshot
dataset /MergerTrees/HaloID
dataset /MergerTrees/HaloSnapshot
dataset /MergerTrees/HalosPerTree
dataset /MergerTrees/HostID
dataset /MergerTrees/TreeID
group /SimulationProperties
group /Snapshot00016
group /Snapshot00016/HaloCatalog
dataset /Snapshot00016/HaloCatalog/AngularMomentum
dataset /Snapshot00016/HaloCatalog/Center
dataset /Snapshot00016/HaloCatalog/HalfMassRadius
dataset /Snapshot00016/HaloCatalog/Index
dataset /Snapshot00016/HaloCatalog/Mass
dataset /Snapshot00016/HaloCatalog/MostBoundParticleID
dataset /Snapshot00016/HaloCatalog/Velocity
group /Snapshot00016/ParticleData
group /Snapshot00016/ParticleData/Dark
dataset /Snapshot00016/ParticleData/Dark/ID
dataset /Snapshot00016/ParticleData/Dark/Mass
dataset /Snapshot00016/ParticleData/Dark/Position
dataset /Snapshot00016/ParticleData/Dark/Velocity
}
}