Writing mzML Documents¶
mzML is a standard rich XML-format for raw mass spectrometry data storage. Please refer to psidev.info for the detailed specification of the format and structure of mzML files.
In addition to mzML, there is a wrapping format called indexedmzML
which adds an extra layer to the XML document, including pre-computed byte offsets
for each <spectrum>
and <chromatogram>
element.
To write mzML
without an index use PlainMzMLWriter
, and for indexedmzML
use IndexedMzMLWriter
. Because so many tools rely on the index, IndexedMzMLWriter
is exported under the alias MzMLWriter. The interface for these two classes are the same,
with IndexedMzMLWriter
having slightly more complex behavior on writing and when finishing
the document, though you are able to alter the indexing behavior via IndexedMzMLWriter.index_builder
or through inheritance.
- class psims.mzml.writer.IndexedMzMLWriter(outfile, close=None, vocabularies=None, missing_reference_is_error=False, vocabulary_resolver=None, id=None, accession=None, **kwargs)[source]¶
A high level API for generating indexed mzML XML files from simple Python objects.
This class depends heavily on
lxml
’s incremental file writing API which in turn depends heavily on context managers. Almost all logic is handled inside a context manager and in the context of a particular document. Since all operations assume that they have access to a universal identity map for each element in the document, that map is centralized in this class.MzMLWriter inherits from
ComponentDispatcher
, giving it acontext
attribute and access to all Component objects pre-bound to that context with attribute-access notation.- index_builder¶
A writing stream that automatically tokenizes and records byte offsets for specific XML tags.
- Type
IndexingStream
- __enter__()¶
Begins writing, opening the top-level tag
- __exit__(exc_type, exc_value, traceback)¶
Closes the top-level tag, the XML formatter, and the file itself.
- __getattr__(name)¶
Provide access to an automatically parameterized version of all
ComponentBase
types which use this instance’s context.- Parameters
name (str) – Component Name
- Returns
A partially parameterized instance constructor for the
ComponentBase
type requested.- Return type
ReprBorrowingPartial
- begin()¶
Writes the doctype and starts the low-level writing machinery
- controlled_vocabularies()¶
Write out the <cvList> element and all its children, including both this format’s default controlled vocabularies and those passed as arguments to this method.this
This method requires writing to have begun.
- data_processing_list(data_processing)¶
Writes the
<dataProcessingList>
section of the document.Note
List and descriptions of data processing applied to this data
- Parameters
data_processing (list) – A list or other iterable of
dict
orDataProcessing
-like objects
- element(element_name, **kwargs)¶
Construct and immediately open a subclass instance of
TagBase
with the given tag name. All other arguments are forwarded to theTagBase
constructor.- Parameters
element_name (str) – The name of the tag type to create
*args – Arbitrary arguments for the tag
**kwargs – Key word arguments for the tag
See also
- end(exc_type=None, exc_value=None, traceback=None)¶
Ends the XML document, and flushes and closes the file if appropriate.
- file_description(file_contents=None, source_files=None, contacts=None)¶
Writes the
<fileDescription>
section of the document.If
file_contents
contains a nativeID term, andnative_id_format
has not been set explicitly, that ID format will be used for this document.Note
Information pertaining to the entire mzML file (i.e. not specific to any part of the data set) is stored here.
- Parameters
file_contents (list, optional) – A list or other iterable of
str
,dict
, or *Param-types which will be placed in the<fileContent>
element.source_files (list) – A list or other iterable of dict or
SourceFile
-like objects to be placed in the<sourceFileList>
element
- format(*args, **kwargs)[source]¶
This method is deprecated. Previously, the serialization process did not indent the XML in-place and the lxml pretty printer had to be invoked separately. With the addition of
XMLFormattingStreamWriter
, the XML stream is formatted in-place as it is being streamed to file.
- instrument_configuration_list(instrument_configurations)¶
Writes the
<instrumentConfigurationList>
section of the document.Note
List and descriptions of instrument configurations. At least one instrument configuration MUST be specified, even if it is only to specify that the instrument is unknown. In that case, the “instrument model” term is used to indicate the unknown instrument in the instrumentConfiguration
- Parameters
instrument_configurations (list) – A list or other iterable of
dict
orInstrumentConfiguration
-like objects
- property native_id_format¶
The nativeID format of the spectra to assume for this data file.
This is used to determine how to convert an integer into a spectrum’s
id
. Defaults toMS:1000774
: “multiple peak list nativeID format” which has a pattern ofindex=<number>
.This attribute has no effect on spectrum id values specified as strings already formatted.
Note
If not explicitly specified, but a term naming an ID format is passed as a parameter in file contents, that will be used. The ID format from source files will not be used.
- Returns
- Return type
NativeIDParser
- precursor_builder(mz=None, intensity=None, charge=None, spectrum_reference=None, activation=None, isolation_window_args=None, params=None, intensity_unit='number of detector counts', scan_id=None, external_spectrum_id=None, source_file_reference=None, isolation_window=None)¶
Create a
PrecursorBuilder
, an object to help populate the precursor information data structure.The helper object should be used to incrementally populate the precursor information passed to
spectrum()
orwrite_spectrum()
’s precursor_information argument.- Parameters
mz (float, optional) – The m/z of the first selected ion
intensity (float, optional) – The intensity of the first selected ion
charge (int, optional) – The charge state of the first selected ion
spectrum_reference (str, optional) – The id of the prescursor <spectrum> for this precursor, mapped through the document context.
activation (dict or list, optional) – Parameters forwarded to
PrecursorBuilder.activation()
. This should be a dictionary with a key “params” and a list ofCVParam
coerce-able values, with additional optional keys naming otherCVParam
coerce-able values. If alist
is passed, it will be wrapped in one e.g.{"params": activation}
isolation_window_args (tuple, list, or dict, optional) – Parameters forwarded to :meth:PrecursorBuilder.isolation_window`, tuple or list of three values are converted into
dict
of the correct structure. The expected keys are “lower”, the lower m/z offset, “target”, the center m/z, and “upper”, the upper m/z offset. You may also pass this argumemt as isolation_window.params (list, optional) – The cv- and user-params of the first selected ion, in addition to mz, intensity, charge.
intensity_unit (str) – The intensity unit of the first selected ion, to be specified with intensity
scan_id (str, optional) – An alias for spectrum_reference
external_spectrum_id (str, optional) – The externalSpectrumID attribute of the precursor
source_file_reference (str, optional) – The sourceFileRef attribute of the precursor
- Returns
- Return type
PrecursorBuilder
- prepare_precursor_information(mz=None, intensity=None, charge=None, spectrum_reference=None, activation=None, isolation_window_args=None, params=None, intensity_unit='number of detector counts', scan_id=None, external_spectrum_id=None, source_file_reference=None, **kwargs)¶
Prepare a
Precursor
element from disparate data structures.- Parameters
mz (float, optional) – The m/z of the first selected ion
intensity (float, optional) – The intensity of the first selected ion
charge (int, optional) – The charge state of the first seelcted ion
spectrum_reference (str, optional) – The id of the prescursor <spectrum> for this precursor
activation (list, optional) – A list of parameters describing the ion activation method used.
isolation_window_args (tuple, list, or dict, optional) – Parameters forwarded to
PrecursorBuilder.isolation_window()
, tuple or list values are converted intodict
of the correct structure. This argument may also be passed as isolation_window.params (list, optional) – The cvParams of the first selected ion
intensity_unit (str) – The intensity unit of the first selected ion
scan_id (str, optional) – An alias for spectrum_reference
external_spectrum_id (str, optional) – The externalSpectrumID attribute of the precursor
source_file_reference (str, optional) – The sourceFileRef attribute of the precursor
- Returns
- Return type
- reference_param_group_list(groups)¶
Writes the
<referenceableParamGroupList>
section of the document.- Parameters
groups (list) – A list or other iterable of
dict
orReferenceableParamGroup
-like objects
- register(entity_type, id)¶
Pre-declare an entity in the document context. Ensures that a reference look up will be satisfied.
- run(id=None, instrument_configuration=None, source_file=None, start_time=None, sample=None)¶
Begins the <run> section of the document, describing a single sample run.
- Parameters
id (str, optional) – The unique identifier for this element
instrument_configuration (str, optional) – The id string for the default InstrumentConfiguration for this sample
source_file (str, optional) – The id string for the source file used to produce this data
start_time (str, optional) – A string encoding the date and time the sample was acquired
sample (str, optional) – The id string for the sample used to produce this data
- Returns
- Return type
RunSection
- sample_list(samples)¶
Writes the
<sampleList>
section of the document
- software_list(software_list: Iterable[Union[psims.mzml.components.Software, Mapping]])¶
Writes the
<softwareList>
section of the document.Note
List and descriptions of software used to acquire and/or process the data in this mzML file
- spectrum(mz_array: Optional[numpy.ndarray] = None, intensity_array: Optional[numpy.ndarray] = None, charge_array: Optional[numpy.ndarray] = None, id: Optional[str] = None, polarity='positive scan', centroided=True, precursor_information=None, scan_start_time=None, params=None, compression='zlib', encoding=None, other_arrays=None, scan_params=None, scan_window_list=None, instrument_configuration_id=None, intensity_unit='number of detector counts') psims.mzml.components.Spectrum ¶
Create a new
Spectrum
instance to be written.This method does not immediately write and close the spectrum element, leaving it open for modification and embedding.
- Parameters
mz_array (
np.ndarray
of floats) – The m/z array of the spectrumintensity_array (
np.ndarray
of floats) – The intensity array of the spectrumcharge_array (
np.ndarray
, optional) – The charge state array of the spectrum, optional.id (str) – The native ID of the spectrum.
polarity (str or int, optional) – The polarity of the spectrum. If an integer, the sign of the integer is used, otherwise it is interpreted as a cvParam
centroided (bool, optional) – Whether the spectrum is continuous or discretized by peak picking. Defaults to
True
.precursor_information (dict or
PrecursorBuilder
, optional) – The precursor ion description. Will be passed to_prepare_precursor_list()
. The structure of this object should either be formatted as arguments toprecursor_builder()
, or aPrecursorBuilder
instance populated with information.scan_start_time (float, optional) – The scan start time, in minutes
params (list, optional) – The parameters of the spectrum
compression (str, optional) – The compression type name to use. Defaults to COMPRESSION_ZLIB.
encoding (dict, optional) – A mapping from array name to NumPy data types.
other_arrays (list, optional) – An iterable of array names to additional data arrays. Array names may either be strings,
Mapping
objects that defineCVParam
orUserParam
, or such paramter objects themselves. Use the latter two methods when defining arrays with units.scan_params (list, optional) – A list of cvParams for the scan of this spectrum
scan_window_list (list, optional) – A list of scan windows specified as pairs of m/z intervals
instrument_configuration_id (str, optional) – The id of the instrumentConfiguration to associate with this spectrum if not the default one.
- Returns
- Return type
See also
write_spectrum()
,chromatogram()
,write_chromatogram()
- validate()¶
Attempt to perform XSD validation on the XML document this writer wrote
- write(*args, **kwargs)¶
Either write a complete XML sub-tree or add free text to the file stream
- Parameters
arg (str or
lxml.etree.Element
) – The entity to be written out.
- write_spectrum(mz_array=None, intensity_array=None, charge_array=None, id=None, polarity='positive scan', centroided=True, precursor_information=None, scan_start_time=None, params=None, compression='zlib', encoding=None, other_arrays=None, scan_params=None, scan_window_list=None, instrument_configuration_id=None, intensity_unit='number of detector counts')¶
Write a
Spectrum
with the provided data.To create a spectrum element but not immediately close it off, see the
spectrum()
method.- Parameters
mz_array (
np.ndarray
of floats) – The m/z array of the spectrumintensity_array (
np.ndarray
of floats) – The intensity array of the spectrumcharge_array (
np.ndarray
, optional) – The charge state array of the spectrum, optional.id (str) – The native ID of the spectrum.
polarity (str or int, optional) – The polarity of the spectrum. If an integer, the sign of the integer is used, otherwise it is interpreted as a cvParam
centroided (bool, optional) – Whether the spectrum is continuous or discretized by peak picking. Defaults to
True
.precursor_information (dict or
PrecursorBuilder
, optional) – The precursor ion description. Will be passed to_prepare_precursor_list()
. The structure of this object should either be formatted as arguments toprecursor_builder()
, or aPrecursorBuilder
instance populated with information.scan_start_time (float, optional) – The scan start time, in minutes
params (list, optional) – The parameters of the spectrum
compression (str, optional) – The compression type name to use. Defaults to COMPRESSION_ZLIB.
encoding (dict, optional) – A mapping from array name to NumPy data types.
other_arrays (list, optional) – An iterable of array names to additional data arrays. Array names may either be strings,
Mapping
objects that defineCVParam
orUserParam
, or such paramter objects themselves. Use the latter two methods when defining arrays with units.scan_params (list, optional) – A list of cvParams for the scan of this spectrum
scan_window_list (list, optional) – A list of scan windows specified as pairs of m/z intervals
instrument_configuration_id (str, optional) – The id of the instrumentConfiguration to associate with this spectrum if not the default one.
See also
- psims.mzml.writer.compression_map¶
- The compression methods available:
Error
Unable to execute python code at writer.rst:16:
‘<’ not supported between instances of ‘NoneType’ and ‘str’