psims

mzML Transformation Stream

Given a file stream from an mzML file, psims.transform.mzml.MzMLTransformer will copy it to a new stream, applying a user provided transformation function to modify each spectrum en-route. It can also optionally sort the spectra by “scan start time”.

Transforming mzML Files

Often, we start with an mzML file we want to manipulate or change, but don’t want to write out explicitly unpacking it and re-packing it.

The MzMLTransformer class is intended to give you a way to wrap an input file-like object over an mzML file and an output file-like object to write the manipulated mzML file to, along with a transformation function to modify spectra, and have it do the rest of the work. It uses pyteomics.mzml to do the parsing internally.

Transformation Function Semantics

The transformation function passed receives a dict object representing the spectrum as parsed by pyteomics.mzml and expects the function to return the dictionary modified or None (in which case the spectrum is not written out).

You are free to modify existing keys in the spectrum dictionary, but new keys that are intended to be recognized as either <cvParam /> or <userParam /> elements must be instances of pyteomics.auxiliary.cvstr, or otherwise have an “accession” attribute to be picked up. Alternatively, the converter will make an effort to coerce keys whose values which are scalars, or :class:`dict`s which look like parameters (having a “name” or “accession” key, at least).

Alternatively, you can inherit from MzMLTransformer and override format_spectrum() to modify the spectrum before or after conversion (letting you directly append to the “params” key of the converted spectrum and avoid needing to mark new params with cvstr). Additionally, you can override all other format_ methods to customize how other elements are converted.

Usage and Examples

In its simplest form, we would use the MzMLTransformer like so:

from psims.transform.mzml import MzMLTransformer, cvstr

def transform_drop_ms2(spectrum):
    if spectrum['ms level'] > 1:
        return None
    return spectrum

with open("input.mzML", 'rb') as in_stream, open("ms1_only.mzML", 'wb') as out_stream:
    MzMLTransformer(in_stream, out_stream, transform_drop_ms2).write()
class psims.transform.mzml.MzMLTransformer(input_stream, output_stream, transform=None, transform_description=None, sort_by_scan_time=False)[source]

Reads an mzML file stream from input_stream, copying its metadata to output_stream, and then copies its spectra, applying transform to each spectrum object as it goes.

If sort_by_by_scan_time is True, then prior to writing spectra, a first pass will be made over the mzML file and the spectra will be written out ordered by MS:1000016|scan start time.

input_stream

A byte stream from an mzML format data buffer

Type

file-like

output_stream

A writable binary stream to copy the contents of input_stream into

Type

file-like

sort_by_scan_time

Whether or not to sort spectra by scan time prior to writing

Type

bool

transform

A function to call on each spectrum, passed as a dict object as read by pyteomics.mzml.MzML. A spectrum will be skipped if this function returns None.

Type

Callable, optional

transform_description

A description of the transformation to include in the written metadata

Type

str

Parameters
  • input_stream (path or file-like) – A byte stream from an mzML format data buffer

  • output_stream (path or file-like) – A writable binary stream to copy the contents of input_stream into

  • transform (Callable, optional) – A function to call on each spectrum, passed as a dict object as read by pyteomics.mzml.MzML.

  • transform_description (str) – A description of the transformation to include in the written metadata

  • sort_by_scan_time (bool) – Whether or not to sort spectra by scan time prior to writing

write()[source]

Write out the the transformed mzML file

MzMLb Translation

psims can also translate mzML into mzMLb automatically using a variant of MzMLtransformer called MzMLToMzMLb. It works identically to MzMLTransformer, though it can accept additional arguments to control the HDF5 block size and compression.

class psims.transform.mzml.MzMLToMzMLb(input_stream, output_stream, transform=None, transform_description=None, sort_by_scan_time=False, **hdf5args)[source]

Convert an mzML document into an mzMLb file, with an optional transformation along the way.

Parameters
  • input_stream (path or file-like) – A byte stream from an mzML format data buffer

  • output_stream (path or file-like) – A writable binary stream to copy the contents of input_stream into

  • transform (Callable, optional) – A function to call on each spectrum, passed as a dict object as read by pyteomics.mzml.MzML.

  • transform_description (str) – A description of the transformation to include in the written metadata

  • sort_by_scan_time (bool) – Whether or not to sort spectra by scan time prior to writing

  • h5_compression (str, optional) – The name of the HDF5 compression method to use. Defaults to psims.mzmlb.writer.DEFAULT_COMPRESSOR

  • h5_compression_opts (tuple or int, optional) – The configuration options for the selected compressor. For “gzip”, this a single integer setting the compression level, while Blosc takes a tuple of integers.

  • h5_blocksize (int, optional) – The size of the compression blocks used when building the HDF5 file. Smaller blocks improve random access speed at the expense of compression efficiency and space. Defaults to 2 ** 20, 1MB.

 1 #!/usr/bin/env python
 2 import sys
 3 from psims.transform.mzml import MzMLToMzMLb
 4 
 5 inpath = sys.argv[1]
 6 outpath = sys.argv[2]
 7 try:
 8     compression = sys.argv[3]
 9 except IndexError:
10     compression = "blosc"
11 
12 with open(inpath, 'rb') as instream:
13     MzMLToMzMLb(instream, outpath, h5_compression=compression).write()