Computing and exploring dendrograms

For a graphical description of the actual algorithm used to compute dendrograms, see An Illustrated Description of the Core Algorithm.

Computing a Dendrogram

Dendrograms can be computed from an n-dimensional array using:

>>> from astrodendro import Dendrogram
>>> d = Dendrogram.compute(array)

where array is a Numpy array and d is then an instance of the Dendrogram class, which can be used to access the computed dendrogram (see Exploring the Dendrogram below). Where the array comes from is not important - for example it can be read in from a FITS file, from an HDF5 file, or it can be generated in memory. If you are interested in making a dendrogram from data in a FITS file, you can do:

>>> from astropy.io import fits
>>> array = fits.getdata('observations.fits')
>>> from astrodendro import Dendrogram
>>> d = Dendrogram.compute(array)
  • min_value: the minimum value to consider in the dataset - any value lower than this will not be considered in the dendrogram. If you are working with observations, it is likely that you will want to set this to the “detection level,” for example 3- or 5-sigma, so that only significant values are included in the dendrogram. By default, all values are used.
  • min_delta: how significant a leaf has to be in order to be considered an independent entity. The significance is measured from the difference between its peak flux and the value at which it is being merged into the tree. If you are working with observational data, then you could set this to, e.g., 1-sigma, which means that any leaf that is locally less than 1-sigma tall is combined with its neighboring leaf or branch and is no longer considered a separate entity.
  • min_npix: the minimum number of pixels/values needed for a leaf to be considered an independent entity. When the dendrogram is being computed, and when a leaf is about to be joined onto a branch or another leaf, if the leaf has fewer than this number of pixels, then it is combined with the branch or leaf it is being merged with and is no longer considered a separate entity. By default, this parameter is set to zero, so there is no minimum number of pixels required for leaves to remain independent entities.

These options are illustrated graphically in An Illustrated Description of the Core Algorithm.

As an example, we can use a publicly available extinction map of the Perseus star-formation region from the The COordinated Molecular Probe Line Extinction Thermal Emission (COMPLETE) Survey of Star Forming Regions (PerA_Extn2MASS_F_Gal.fits, originally obtained from http://hdl.handle.net/10904/10080). The units of the map are magnitudes of extinction, and we want to make a dendrogram of all structures above a minimum value of 2 magnitudes, and we only consider leaves with at least 10 pixels and which have a peak to base difference larger than one magnitude of extinction:

>>> from astrodendro import Dendrogram
>>> from astropy.io import fits
>>> image = fits.getdata('PerA_Extn2MASS_F_Gal.fits')
>>> d = Dendrogram.compute(image, min_value=2.0, min_delta=1., min_npix=10)

By default, the computation will be silent, but for large dendrograms, it can be useful to have an idea of how long the computation will take:

>>> d = Dendrogram.compute(image, min_value=2.0, min_delta=1., min_npix=10,
                           verbose=True)
Generating dendrogram using 6,386 of 67,921 pixels (9% of data)
[=========================>               ] 64%

The ‘9% of data’ indicates that only 9% of the data are over the min_value threshold.

Exploring the Dendrogram

Once the dendrogram has been computed, you will want to explore/visualize it. You can access the full tree from the computed dendrogram. Assuming that you have computed a dendrogram with:

>>> d = Dendrogram.compute(array, ...)

you can now access the full tree from the d variable.

The first place to start is the trunk of the tree (the trunk attribute), which is a list of all the structures at the lowest level. Unless you left min_value to the default setting, which would mean that all values in the dataset are used, it’s likely that not all structures are connected. So the trunk is a collection of items at the lowest level, each of which could be a leaf or a branch (itself having leaves and branches). In the case of the Perseus extinction map, we get:

>>> d.trunk
[<Structure type=leaf idx=101>,
 <Structure type=branch idx=2152>,
 <Structure type=leaf idx=733>,
 <Structure type=branch idx=303>]

In the above case, the trunk contains two leaves and two branches. Since trunk is just a list, you can access items in it with e.g.:

>>> d.trunk[1]
<Structure type=branch idx=2152>

Branches have a children attribute that returns a list of all sub-structures, which can include branches and leaves. Thus, we can return the sub-structures of the above branch with:

>>> d.trunk[1].children
[<Structure type=branch idx=1680>,
 <Structure type=branch idx=5771>]

which shows that the child branch is composed of two more branches. We can therefore access the sub-structures of these branch with e.g.:

>>> d.trunk[1].children[0].children
[<Structure type=leaf idx=1748>,
 <Structure type=leaf idx=1842>]

which shows this branch splitting into two leaves.

We can access the properties of leaves as follows:

>>> leaf = d.trunk[1].children[0].children[0]
>>> leaf.indices
(array([143, 142, 142, 142, 139, 141, 141, 141, 143, 140, 140]),
 array([116, 114, 115, 116, 115, 114, 115, 116, 115, 115, 114]))
>>> leaf.values
array([ 2.7043395 ,  2.57071948,  3.4551146 ,  3.29953575,  2.53844047,
        2.59633183,  3.11309052,  2.70936489,  2.81024122,  2.76864815,
        2.52840114], dtype=float32)

A full list of attributes and methods for leaves and branches (i.e. structures) is available from the Structure page, while a list of attributes and methods for the dendrogram itself is available from the Dendrogram page.

Saving and loading the dendrogram

A Dendrogram object can be exported to an HDF5 file (requires h5py) or FITS file (requires astropy). To export the dendrogram to a file, use:

>>> d.save_to('my_dendrogram.hdf5')

or:

>>> d.save_to('my_dendrogram.fits')

and to load and existing dendrogram:

>>> d = Dendrogram.load_from('my_other_dendrogram.hdf5')

or:

>>> d = Dendrogram.load_from('my_other_dendrogram.fits')

If you wish, you can use this to separate the computation and analysis of the dendrogram into two scripts, to ensure that the dendrogram is only computed once. For example, you could have a script compute.py that contains:

from astropy.io import fits
from astrodendro import Dendrogram

array = fits.getdata('observations.fits')
d = Dendrogram.compute(array)
d.save_to('dendrogram.fits')

and a second file containing:

from astrodendro import Dendrogram
d = Dendrogram.load_from('dendrogram.fits')

# any analysis code here