mzXML¶
mzXML is a standard XML-format for raw mass spectrometry data storage created
by the Institute for Systems Biology, intended to be replaced with mzML.
This module provides MzXMLLoader
, a RandomAccessScanSource
implementation.
The parser is based on pyteomics.mzxml
.
- class ms_deisotope.data_source.mzxml.MzXMLLoader(source_file, use_index=True, **kwargs)[source]¶
Reads scans from mzXML files. Provides both iterative and random access.
- source_file¶
Path to file to read from.
- Type
str
- source¶
Underlying scan data source
- Type
pyteomics.mzxml.MzXML
- close()¶
Close the underlying reader.
- data_processing()¶
Describe any preprocessing steps applied to the data described by this instance.
- Returns
- Return type
list
ofDataProcessingInformation
- file_description()¶
Read the file provenance from the
<parentFile>
tags if any are present.This returns no information about the file’s contents as this was not part of the mzXML schema
- Returns
The description of the file’s sources
- Return type
- find_next_ms1(start_index: int) Optional[ms_deisotope.data_source.scan.loader.ScanType] ¶
Locate the MS1 scan following
start_index
, iterating forwards through scans until either the last scan is reached or an MS1 scan is found.- Returns
- Return type
ScanBase
orNone
if not found
- find_previous_ms1(start_index: int) Optional[ms_deisotope.data_source.scan.loader.ScanType] ¶
Locate the MS1 scan preceding
start_index
, iterating backwards through scans until either the first scan is reached or an MS1 scan is found.- Returns
- Return type
ScanBase
orNone
if not found
- get_scan_by_id(scan_id)¶
Retrieve the scan object for the specified scan id.
If the scan object is still bound and in memory somewhere, a reference to that same object will be returned. Otherwise, a new object will be created.
- Parameters
scan_id (str) – The unique scan id value to be retrieved
- Returns
- Return type
- get_scan_by_index(index)¶
Retrieve the scan object for the specified scan index.
This internally calls
get_scan_by_id()
which will use its cache.- Parameters
index (int) – The index to get the scan for
- Returns
- Return type
- get_scan_by_time(time)¶
Retrieve the scan object for the specified scan time.
This internally calls
get_scan_by_id()
which will use its cache.- Parameters
time (float) – The time to get the nearest scan from
- Returns
- Return type
- property has_fast_random_access¶
Check whether the underlying data stream supports fast random access or not.
Even if the file format supports random access, it may be impractical due to overhead in parsing the underlying data stream, e.g. calling
gzip.GzipFile.seek()
can force the file to be decompressed from the beginning of the file on each call. This property can be used to signal to the caller whether or not it should use a different strategy.- Returns
One of
DefinitelyNotFastRandomAccess
,MaybeFastRandomAccess
, orDefinitelyFastRandomAccess
. The first is a False-y value, the latter two will evaluate toTrue
- Return type
Constant
- has_ms1_scans() bool ¶
Checks if this
ScanDataSource
contains MS1 spectra.- Returns
Returns a boolean value if the presence of MS1 scans is known for certain, or
None
if it cannot be determined in the case of missing metadata.- Return type
bool
orNone
- has_msn_scans() bool ¶
Checks if this
ScanDataSource
contains MSn spectra.- Returns
Returns a boolean value if the presence of MSn scans is known for certain, or
None
if it cannot be determined in the case of missing metadata.- Return type
bool
orNone
- property index¶
The byte offset index used to achieve fast random access.
Maps
ScanBase
IDs to the byte offsets, implying the order the scans reside in the file.- Returns
- Return type
pyteomics.xml.ByteEncodingOrderedDict
- initialize_scan_cache()¶
Initialize a cache which keeps track of which
Scan
objects are still in memory using aweakref.WeakValueDictionary
.When a scan is requested, if the scan object is found in the cache, the existing object is returned rather than re-read from disk.
- instrument_configuration()¶
Read the instrument configurations settings from the
<msInstrument>
elements.- Returns
A list of different instrument states that scans may be acquired under
- Return type
list of InstrumentConfiguration
- make_iterator(iterator=None, grouped=None, **kwargs) ms_deisotope.data_source.scan.loader.ScanIterator ¶
Configure the
ScanIterator
’s behavior, selecting it’s iteration strategy over either its default iterator or the providediterator
argument.- Parameters
iterator (Iterator, optional) – The iterator to manipulate. If missing, the default iterator will be used.
grouped (bool, optional) – Whether the iterator should be grouped and produce
ScanBunch
objects or singleScan
. IfNone
is passed,has_ms1_scans()
will be be used instead. Defaults toNone
.
- next()¶
Advance the iterator, fetching the next
ScanBunch
orScanBase
depending upon iteration strategy.- Returns
- Return type
ScanBunch
orScanBase
- classmethod prebuild_byte_offset_file(path)¶
Parse the file given by path, generating a byte offset index in JSON format and save it to disk for future use.
This method is intended to provide a way to save time during repeated instantiation of this type over the same file by removing the need to do a full scan of the file to rebuild of the offset index each time.
Note
This assumes that path is either a path to a file in a directory which the invoking user has read and write access to, or that it is a file-like object whose name attribute gives a path that satisfies the same requirements.
- Parameters
path (
str
or file-like) – The path to the file to index, or a file-like object with a name attribute.
- reset()¶
Reset the object, clearing out any existing state.
This resets the underlying file iterator, then calls
make_iterator()
, and clears the scan cache.
- property scan_cache¶
A
weakref.WeakValueDictionary
mapping used to retrieve scans from memory if available before re-reading them from disk.
- software_list() List[ms_deisotope.data_source.metadata.software.Software] ¶
Describe any software used on the data described by this instance.
- Returns
- Return type
list
ofSoftware
- property source¶
The file parser that this reader consumes.
- property source_file_name: Optional[str]¶
Return the name of the file that backs this data source, if available.
- Returns
- Return type
str
orNone
- start_from_scan(scan_id=None, rt=None, index=None, require_ms1=True, grouped=True, **kwargs)¶
Reconstruct an iterator which will start from the scan matching one of
scan_id
,rt
, orindex
. Only one may be provided.After invoking this method, the iterator this object wraps will be changed to begin yielding scan bunchs (or single scans if
grouped
isFalse
).This method will trigger several random-access operations, making it prohibitively expensive for normally compressed files.
- Parameters
scan_id (str, optional) – Start from the scan with the specified id.
rt (float, optional) – Start from the scan nearest to specified time (in minutes) in the run. If no exact match is found, the nearest scan time will be found, rounded up.
index (int, optional) – Start from the scan with the specified index.
require_ms1 (bool, optional) – Whether the iterator must start from an MS1 scan. True by default.
grouped (bool, optional) – whether the iterator should yield scan bunches or single scans. True by default.
- property time¶
A indexer facade that lets you index and slice by scan time.
- Returns
- Return type
TimeIndex