Writing mzML

Using the psims library, ms_deisotope.output.mzml can write an mzML file with all associated metadata, including deconvoluted peak arrays, chromatograms, and data transformations. The MzMLSerializer class handles all facets of this process.

This module also contains a specialized version of MzMLLoader, ProcessedMzMLLoader, which can directly reconstruct each deconvoluted peak list and provides fast access to an extended index of metadata that MzMLSerializer writes to an external file.

import ms_deisotope
from ms_deisotope.test.common import datafile
from ms_deisotope.output.mzml import MzMLSerializer

reader = ms_deisotope.MSFileLoader(datafile("small.mzML"))
with open("small.deconvoluted.mzML", 'wb') as fh:
    writer = MzMLSerializer(fh, n_spectra=len(reader))

    writer.copy_metadata_from(reader)
    for bunch in reader:
        bunch.precursor.pick_peaks()
        bunch.precursor.deconvolute()
        for product in bunch.products:
            product.pick_peaks()
            product.deconvolute()
        writer.save(bunch)

    writer.close()
class ms_deisotope.output.mzml.MzMLSerializer(handle, n_spectra=200000, compression=None, deconvoluted=True, sample_name=None, build_extra_index=True, data_encoding=None, include_software_entry=True, close=None)[source]

Write ms_deisotope data structures to a file in mzML format.

base_peak_chromatogram_tracker

Accumulated mapping of scan time to base peak intensity. This is used to write the base peak chromatogram.

Type

OrderedDict

chromatogram_queue

Accumulate chromatogram data structures which will be written out after all spectra have been written to file.

Type

list

compression

The compression type to use for binary data arrays. Should be one of "zlib", "none", or None

Type

str

data_encoding

The encoding specification to specify the binary encoding of numeric data arrays that is passed to write_spectrum() and related methods.

Type

dict or int or numpy.dtype or str

data_processing_list

List of packaged DataProcessingInformation to write out

Type

list

deconvoluted

Indicates whether the translation should include extra deconvolution information

Type

bool

file_contents_list

List of terms to include in the <fileContents> tag

Type

list

handle

The file-like object being written to

Type

file-like

indexer

The external index builder

Type

ExtendedScanIndex

instrument_configuration_list

List of packaged InstrumentInformation to write out

Type

list

n_spectra

The number of spectra to provide a size for in the <spectrumList>

Type

int

processing_parameters

List of additional terms to include in a newly created DataProcessingInformation

Type

list

sample_list

List of SampleRun objects to write out

Type

list

sample_name

Default sample name

Type

str

sample_run

Description

Type

SampleRun

software_list

List of packaged Software objects to write out

Type

list

source_file_list

List of packaged SourceFile objects to write out

Type

list

total_ion_chromatogram_tracker

Accumulated mapping of scan time to total intensity. This is used to write the total ion chromatogram.

Type

OrderedDict

writer

The lower level writer implementation

Type

MzMLWriter

add_data_processing(data_processing_description: Union[ms_deisotope.data_source.metadata.data_transformation.DataProcessingInformation, ms_deisotope.data_source.metadata.data_transformation.ProcessingMethod])[source]

Add a new DataProcessingInformation or ProcessingMethod.

Creates a new <dataProcessing> entry describing one or more <processingMethod>`s for a single referenced :class:`~.Software instance.

Parameters

data_processing_description (DataProcessingInformation or ProcessingMethod) – Data manipulation sequence to add to the document

add_file_contents(file_contents: Union[str, collections.abc.Mapping, ms_deisotope.data_source.metadata.file_information.FileContent])[source]

Add a key to the resulting <fileDescription> of the output document.

Parameters

file_contents (str or Mapping) – The parameter to add

add_file_information(file_information: ms_deisotope.data_source.metadata.file_information.FileInformation)[source]

Add the information of a FileInformation to the output document.

Parameters

file_information (FileInformation) – The information to add.

add_instrument_configuration(configuration: ms_deisotope.data_source.metadata.instrument_components.InstrumentInformation)[source]

Add an InstrumentInformation object to the output document.

Parameters

configuration (InstrumentInformation) – The instrument configuration to add

add_processing_parameter(name: str, value: Optional[Union[str, int, float]] = None)[source]

Add a new processing method to the writer’s own <dataProcessing> element.

Parameters
  • name (str) – The processing technique’s name

  • value (obj) – The processing technique’s value, if any

add_software(software_description: ms_deisotope.data_source.metadata.software.Software)[source]

Add a Software object to the output document.

Parameters

software_description (Software) – The software description to add

add_software(software_description: ms_deisotope.data_source.metadata.software.Software)[source]

Add a Software object to the output document.

Parameters

software_description (Software) – The software description to add

add_source_file(source_file: ms_deisotope.data_source.metadata.file_information.SourceFile)[source]

Add the SourceFile to the output document.

Parameters

source_file (SourceFile) – The source fil to add

close()[source]

Finish writing scan data, write any pending metadata and close the file stream.

May call complete().

save(bunch: Union[ms_deisotope.data_source.scan.scan.Scan, ms_deisotope.data_source.scan.base.ScanBunch], **kwargs)

Save any scan information in bunch.

This method can handle ScanBunch or ScanBase instances, dispatching to the appropriate logic.

Parameters

bunch (ScanBunch or ScanBase) – The scan data to save. May be a collection of related scans or a single scan.

save_scan(scan: ms_deisotope.data_source.scan.base.ScanBase, **kwargs)[source]

Write a Scan to the output document as a collection of related <spectrum> tags.

Note

If no spectra have been written to the output document yet, this method will call _add_spectrum_list() and writes all of the metadata lists out. After this point, no new document-level metadata can be added.

Parameters
  • scan (Scan) – The scan to write.

  • deconvoluted (bool) – Whether the scan to write out should include deconvolution information

save_scan_bunch(bunch: ms_deisotope.data_source.scan.base.ScanBunch, **kwargs)[source]

Write a ScanBunch to the output document as a collection of related <spectrum> tags.

Note

If no spectra have been written to the output document yet, this method will call _add_spectrum_list() and writes all of the metadata lists out. After this point, no new document-level metadata can be added.

Parameters

bunch (ScanBunch) – The scan set to write.

class ms_deisotope.output.mzml.ProcessedMzMLLoader(source_file, use_index=True, use_extended_index=True)[source]

Extends MzMLLoader to support deserializing preprocessed data and to provide indexing information.

extended_index

Holds the additional indexing information that may have been generated with the data file being accessed.

Type

ExtendedIndex

sample_run
Type

SampleRun

get_index_information_by_scan_id(scan_id: str) dict

Get the scan description from the extended index.

has_index_file() bool

Checks if an extended index file exists for this reader.

Returns

Return type

bool