Common MS File Model¶
ms_deisotope.data_source
uses a set of common interfaces for reading mass spectrometry data files so that code written for one format should work for all formats which implement the same interfaces.
ms_deisotope.data_source.metadata
defines a set of data structures and a collection of controlled vocabulary terms describing mass spectrometers and mass spectrometry data files.
Abstract Base Classes¶
ms_deisotope
supports reading from many different file formats. While
the file format is abstracted away as much as possible, the modes of access are
built into the type hierarchy.
All of the currently implemented formats implement both ScanIterator
and
RandomAccessScanSource
.
- class ms_deisotope.data_source.common.ScanDataSource(*args, **kwds)[source]¶
An Abstract Base Class describing an object which can provide a consistent set of accessors for a particular format of mass spectrometry data.
Data files come in many shapes and sizes, with different underlying structures. This class provides an API that should make features as consistent as possible to clients of
Scan
objects.- close()[source]¶
Close the underlying scan data stream, which may be a file or other system resource.
A closed data source may not be able to serve data requests, but not all
ScanDataSource
implementations require the data stream be open for all operations.
- property source_file_name: Optional[str]¶
Return the name of the file that backs this data source, if available.
- Returns
- Return type
str
orNone
- class ms_deisotope.data_source.common.ScanIterator(*args, **kwds)[source]¶
An Abstract Base Class that extends ScanDataSource with additional requirements that enable clients of the class to treat the object as an iterator over the underlying data file.
- iteration_mode¶
A string denoting
ITERATION_MODE_GROUPED
orITERATION_MODE_SINGLE
that controls whetherScanBunch
orScan
are produced by iteration.- Type
str
- has_ms1_scans() bool [source]¶
Checks if this
ScanDataSource
contains MS1 spectra.- Returns
Returns a boolean value if the presence of MS1 scans is known for certain, or
None
if it cannot be determined in the case of missing metadata.- Return type
bool
orNone
- has_msn_scans() bool [source]¶
Checks if this
ScanDataSource
contains MSn spectra.- Returns
Returns a boolean value if the presence of MSn scans is known for certain, or
None
if it cannot be determined in the case of missing metadata.- Return type
bool
orNone
- initialize_scan_cache()[source]¶
Initialize a cache which keeps track of which
Scan
objects are still in memory using aweakref.WeakValueDictionary
.When a scan is requested, if the scan object is found in the cache, the existing object is returned rather than re-read from disk.
- make_iterator(iterator=None, grouped=None, **kwargs) ms_deisotope.data_source.scan.loader.ScanIterator [source]¶
Configure the
ScanIterator
’s behavior, selecting it’s iteration strategy over either its default iterator or the providediterator
argument.- Parameters
iterator (Iterator, optional) – The iterator to manipulate. If missing, the default iterator will be used.
grouped (bool, optional) – Whether the iterator should be grouped and produce
ScanBunch
objects or singleScan
. IfNone
is passed,has_ms1_scans()
will be be used instead. Defaults toNone
.
- abstract next() Union[ms_deisotope.data_source.scan.loader.ScanType, ms_deisotope.data_source.scan.base.ScanBunch] [source]¶
Advance the iterator, fetching the next
ScanBunch
orScanBase
depending upon iteration strategy.- Returns
- Return type
ScanBunch
orScanBase
- property scan_cache¶
A
weakref.WeakValueDictionary
mapping used to retrieve scans from memory if available before re-reading them from disk.
- class ms_deisotope.data_source.common.RandomAccessScanSource(*args, **kwds)[source]¶
An Abstract Base Class that extends ScanIterator with additional requirements that the implementation support random access to individual scans. This should be doable by unique identifier, sequential index, or by scan time.
- find_next_ms1(start_index: int) Optional[ms_deisotope.data_source.scan.loader.ScanType] [source]¶
Locate the MS1 scan following
start_index
, iterating forwards through scans until either the last scan is reached or an MS1 scan is found.- Returns
- Return type
ScanBase
orNone
if not found
- find_previous_ms1(start_index: int) Optional[ms_deisotope.data_source.scan.loader.ScanType] [source]¶
Locate the MS1 scan preceding
start_index
, iterating backwards through scans until either the first scan is reached or an MS1 scan is found.- Returns
- Return type
ScanBase
orNone
if not found
- abstract get_scan_by_id(scan_id: str) ms_deisotope.data_source.scan.loader.ScanType [source]¶
Retrieve the scan object for the specified scan id.
If the scan object is still bound and in memory somewhere, a reference to that same object will be returned. Otherwise, a new object will be created.
- Parameters
scan_id (str) – The unique scan id value to be retrieved
- Returns
- Return type
- abstract get_scan_by_index(index: int) ms_deisotope.data_source.scan.loader.ScanType [source]¶
Retrieve the scan object for the specified scan index.
This internally calls
get_scan_by_id()
which will use its cache.- Parameters
index (int) – The index to get the scan for
- Returns
- Return type
- abstract get_scan_by_time(time: float) ms_deisotope.data_source.scan.loader.ScanType [source]¶
Retrieve the scan object for the specified scan time.
This internally calls
get_scan_by_id()
which will use its cache.- Parameters
time (float) – The time to get the nearest scan from
- Returns
- Return type
- property has_fast_random_access: ms_deisotope.utils.Constant¶
Check whether the underlying data stream supports fast random access or not.
Even if the file format supports random access, it may be impractical due to overhead in parsing the underlying data stream, e.g. calling
gzip.GzipFile.seek()
can force the file to be decompressed from the beginning of the file on each call. This property can be used to signal to the caller whether or not it should use a different strategy.- Returns
One of
DefinitelyNotFastRandomAccess
,MaybeFastRandomAccess
, orDefinitelyFastRandomAccess
. The first is a False-y value, the latter two will evaluate toTrue
- Return type
Constant
- abstract start_from_scan(scan_id: Optional[str] = None, rt: Optional[float] = None, index: Optional[int] = None, require_ms1: bool = True, grouped=True, **kwargs) ms_deisotope.data_source.scan.loader.RandomAccessScanSource [source]¶
Reconstruct an iterator which will start from the scan matching one of
scan_id
,rt
, orindex
. Only one may be provided.After invoking this method, the iterator this object wraps will be changed to begin yielding scan bunchs (or single scans if
grouped
isFalse
).This method will trigger several random-access operations, making it prohibitively expensive for normally compressed files.
- Parameters
scan_id (str, optional) – Start from the scan with the specified id.
rt (float, optional) – Start from the scan nearest to specified time (in minutes) in the run. If no exact match is found, the nearest scan time will be found, rounded up.
index (int, optional) – Start from the scan with the specified index.
require_ms1 (bool, optional) – Whether the iterator must start from an MS1 scan. True by default.
grouped (bool, optional) – whether the iterator should yield scan bunches or single scans. True by default.
- property time¶
A indexer facade that lets you index and slice by scan time.
- Returns
- Return type
TimeIndex
Iteratation Strategies¶
ScanIterator
instances may iterate over scans in single or grouped strategies.
single mode produces a single instance of Scan
on each iteration, while grouped
produces a ScanBunch
containing an MS1 Scan
(may be None
) and 0 or more
related MSn Scan
instances which are derived from the MS1 Scan
. The default
mode for a given ScanIterator
depends upon both the file format and available metadata.
You can force the iteration strategy to be grouped when calling ScanIterator.make_iterator()
by passing grouped=True
, and single by passing grouped=False
. The same applies to
RandomAccessScanSource.start_from_scan()
. When grouped mode is requested but cannot be
fulfilled, ScanBunch
objects are still produced, but the precursor
may be
None
or products
may be empty.
The iteration mode of a ScanIterator
is always available through it’s iteration_mode
attribute, which should have the value "single"
or "grouped"
accordingy.