psims

Writing mzMLb Documents

class psims.mzmlb.writer.MzMLbWriter(h5_file, close=None, vocabularies=None, missing_reference_is_error=False, vocabulary_resolver=None, id=None, accession=None, h5_compression='blosc', h5_compression_options=None, h5_blocksize: int = 1048576, buffer_blocks: int = 10, **kwargs)[source]

A high level API for generating mzMLb HDF5 files from simple Python objects.

This class’s public interface is identical to IndexedMzMLWriter, with the exception of those related to HDF5 compression described below.

Note

Although h5py can read and write through Python file-like objects, if they are used they must be opened in read+write mode to allow the file to be partially re-read during an update to an existing block.

h5_compression

A valid HDF5 compressor ID or compression scheme name or None. Available compression schemes are “gzip”/”zlib”, and if hdf5plugin is installed, “blosc”, “blosc:lz4”, “blosc:zlib”, and “blosc:zstd”. All Blosc-based compressors enable byte shuffling.

Type

str

h5_compressor_options

The options to provide to the compressor designated by h5_compressor. For “gzip”, this a single integer setting the compression level, while Blosc takes a tuple of integers.

Type

int or tuple

h5_blocksize

The number of bytes to include in a single HDF5 data block. Smaller blocks improve random access speed at the expense of compression efficiency and space. Defaults to 2 ** 20, 1MB.

Type

int

buffer_blocks

The number of array blocks to buffer in memory before syncing to disk to reduce the number of resize operations. This applies to each array independently. Defaults to 10.

Type

int

create_array(data, name, last=None, dtype=<class 'numpy.float32'>, chunks=True)[source]

Store a typed data array as a named dataset in the HDF5 file.

Note

The array should not be textual unless they’ve already been translated into a byte array with terminal null bytes.

Parameters
  • data (Iterable) – The data to be stored.

  • name (str) – The name to store the dataset by.

  • last (object, optional) – A value to associate with the final entry of the array.

  • dtype (type) – The type of the entries in the array.

  • chunks (bool) – Whether or not to store the dataset in chunks.

create_buffer(name, content)[source]

Create a compressed binary buffer with a name and fixed length in the HDF5 file.

Parameters
  • name (str) – The name of the HDF5 dataset

  • content (bytes-like object) – The data to store. Must be convertable into a bytearray, e.g. through the buffer interface.

Returns

n – The size of the buffer written

Return type

int

mzMLb Compression Methods

mzMLb can use any compression method that HDF5 can use. By default, only the “zlib” (or “gzip”) compressors are included in h5py, which will be used by default. If hdf5plugin is installed, several additional compression options are available as well.

Note

Default Compressor

If hdf5plugin is installed, the default compressor will be "blosc", otherwise, it will be "gzip".

Compressor Name

Defaults Options

Available

blosc

(0, 0, 0, 0, 5, 1, 1)

Required hdf5plugin

blosc:lz4

(0, 0, 0, 0, 5, 1, 1)

Required hdf5plugin

blosc:lz4hc

(0, 0, 0, 0, 5, 1, 2)

Required hdf5plugin

blosc:zlib

(0, 0, 0, 0, 5, 1, 4)

Required hdf5plugin

blosc:zstd

(0, 0, 0, 0, 5, 1, 5)

Required hdf5plugin

gzip

4

Built-In to h5py

zlib

4

Built-In to h5py