Common MS File Model¶
ms_deisotope.data_sourceuses a set of common interfaces for reading mass spectrometry data files so that code written for one format should work for all formats which implement the same interfaces.
ms_deisotope.data_source.metadatadefines a set of data structures and a collection of controlled vocabulary terms describing mass spectrometers and mass spectrometry data files.
Abstract Base Classes¶
ms_deisotope supports reading from many different file formats. While
the file format is abstracted away as much as possible, the modes of access are
built into the type hierarchy.
All of the currently implemented formats implement both ScanIterator and
RandomAccessScanSource.
- class ms_deisotope.data_source.common.ScanDataSource(*args, **kwds)[source]¶
An Abstract Base Class describing an object which can provide a consistent set of accessors for a particular format of mass spectrometry data.
Data files come in many shapes and sizes, with different underlying structures. This class provides an API that should make features as consistent as possible to clients of
Scanobjects.- close()[source]¶
Close the underlying scan data stream, which may be a file or other system resource.
A closed data source may not be able to serve data requests, but not all
ScanDataSourceimplementations require the data stream be open for all operations.
- property source_file_name: Optional[str]¶
Return the name of the file that backs this data source, if available.
- Returns
- Return type
strorNone
- class ms_deisotope.data_source.common.ScanIterator(*args, **kwds)[source]¶
An Abstract Base Class that extends ScanDataSource with additional requirements that enable clients of the class to treat the object as an iterator over the underlying data file.
- iteration_mode¶
A string denoting
ITERATION_MODE_GROUPEDorITERATION_MODE_SINGLEthat controls whetherScanBunchorScanare produced by iteration.- Type
str
- has_ms1_scans() bool[source]¶
Checks if this
ScanDataSourcecontains MS1 spectra.- Returns
Returns a boolean value if the presence of MS1 scans is known for certain, or
Noneif it cannot be determined in the case of missing metadata.- Return type
boolorNone
- has_msn_scans() bool[source]¶
Checks if this
ScanDataSourcecontains MSn spectra.- Returns
Returns a boolean value if the presence of MSn scans is known for certain, or
Noneif it cannot be determined in the case of missing metadata.- Return type
boolorNone
- initialize_scan_cache()[source]¶
Initialize a cache which keeps track of which
Scanobjects are still in memory using aweakref.WeakValueDictionary.When a scan is requested, if the scan object is found in the cache, the existing object is returned rather than re-read from disk.
- make_iterator(iterator=None, grouped=None, **kwargs) ms_deisotope.data_source.scan.loader.ScanIterator[source]¶
Configure the
ScanIterator’s behavior, selecting it’s iteration strategy over either its default iterator or the providediteratorargument.- Parameters
iterator (Iterator, optional) – The iterator to manipulate. If missing, the default iterator will be used.
grouped (bool, optional) – Whether the iterator should be grouped and produce
ScanBunchobjects or singleScan. IfNoneis passed,has_ms1_scans()will be be used instead. Defaults toNone.
- abstract next() Union[ms_deisotope.data_source.scan.loader.ScanType, ms_deisotope.data_source.scan.base.ScanBunch][source]¶
Advance the iterator, fetching the next
ScanBunchorScanBasedepending upon iteration strategy.- Returns
- Return type
ScanBunchorScanBase
- property scan_cache¶
A
weakref.WeakValueDictionarymapping used to retrieve scans from memory if available before re-reading them from disk.
- class ms_deisotope.data_source.common.RandomAccessScanSource(*args, **kwds)[source]¶
An Abstract Base Class that extends ScanIterator with additional requirements that the implementation support random access to individual scans. This should be doable by unique identifier, sequential index, or by scan time.
- find_next_ms1(start_index: int) Optional[ms_deisotope.data_source.scan.loader.ScanType][source]¶
Locate the MS1 scan following
start_index, iterating forwards through scans until either the last scan is reached or an MS1 scan is found.- Returns
- Return type
ScanBaseorNoneif not found
- find_previous_ms1(start_index: int) Optional[ms_deisotope.data_source.scan.loader.ScanType][source]¶
Locate the MS1 scan preceding
start_index, iterating backwards through scans until either the first scan is reached or an MS1 scan is found.- Returns
- Return type
ScanBaseorNoneif not found
- abstract get_scan_by_id(scan_id: str) ms_deisotope.data_source.scan.loader.ScanType[source]¶
Retrieve the scan object for the specified scan id.
If the scan object is still bound and in memory somewhere, a reference to that same object will be returned. Otherwise, a new object will be created.
- Parameters
scan_id (str) – The unique scan id value to be retrieved
- Returns
- Return type
- abstract get_scan_by_index(index: int) ms_deisotope.data_source.scan.loader.ScanType[source]¶
Retrieve the scan object for the specified scan index.
This internally calls
get_scan_by_id()which will use its cache.- Parameters
index (int) – The index to get the scan for
- Returns
- Return type
- abstract get_scan_by_time(time: float) ms_deisotope.data_source.scan.loader.ScanType[source]¶
Retrieve the scan object for the specified scan time.
This internally calls
get_scan_by_id()which will use its cache.- Parameters
time (float) – The time to get the nearest scan from
- Returns
- Return type
- property has_fast_random_access: ms_deisotope.utils.Constant¶
Check whether the underlying data stream supports fast random access or not.
Even if the file format supports random access, it may be impractical due to overhead in parsing the underlying data stream, e.g. calling
gzip.GzipFile.seek()can force the file to be decompressed from the beginning of the file on each call. This property can be used to signal to the caller whether or not it should use a different strategy.- Returns
One of
DefinitelyNotFastRandomAccess,MaybeFastRandomAccess, orDefinitelyFastRandomAccess. The first is a False-y value, the latter two will evaluate toTrue- Return type
Constant
- abstract start_from_scan(scan_id: Optional[str] = None, rt: Optional[float] = None, index: Optional[int] = None, require_ms1: bool = True, grouped=True, **kwargs) ms_deisotope.data_source.scan.loader.RandomAccessScanSource[source]¶
Reconstruct an iterator which will start from the scan matching one of
scan_id,rt, orindex. Only one may be provided.After invoking this method, the iterator this object wraps will be changed to begin yielding scan bunchs (or single scans if
groupedisFalse).This method will trigger several random-access operations, making it prohibitively expensive for normally compressed files.
- Parameters
scan_id (str, optional) – Start from the scan with the specified id.
rt (float, optional) – Start from the scan nearest to specified time (in minutes) in the run. If no exact match is found, the nearest scan time will be found, rounded up.
index (int, optional) – Start from the scan with the specified index.
require_ms1 (bool, optional) – Whether the iterator must start from an MS1 scan. True by default.
grouped (bool, optional) – whether the iterator should yield scan bunches or single scans. True by default.
- property time¶
A indexer facade that lets you index and slice by scan time.
- Returns
- Return type
TimeIndex
Iteratation Strategies¶
ScanIterator instances may iterate over scans in single or grouped strategies.
single mode produces a single instance of Scan on each iteration, while grouped
produces a ScanBunch containing an MS1 Scan (may be None) and 0 or more
related MSn Scan instances which are derived from the MS1 Scan. The default
mode for a given ScanIterator depends upon both the file format and available metadata.
You can force the iteration strategy to be grouped when calling ScanIterator.make_iterator()
by passing grouped=True, and single by passing grouped=False. The same applies to
RandomAccessScanSource.start_from_scan(). When grouped mode is requested but cannot be
fulfilled, ScanBunch objects are still produced, but the precursor may be
None or products may be empty.
The iteration mode of a ScanIterator is always available through it’s iteration_mode
attribute, which should have the value "single" or "grouped" accordingy.