Common MS File Model

ms_deisotope.data_source uses a set of common interfaces for reading mass spectrometry data files so that code written for one format should work for all formats which implement the same interfaces.

ms_deisotope.data_source.metadata defines a set of data structures and a collection of controlled vocabulary terms describing mass spectrometers and mass spectrometry data files.

Abstract Base Classes

ms_deisotope supports reading from many different file formats. While the file format is abstracted away as much as possible, the modes of access are built into the type hierarchy.

All of the currently implemented formats implement both ScanIterator and RandomAccessScanSource.

class ms_deisotope.data_source.common.ScanDataSource(*args, **kwds)[source]

An Abstract Base Class describing an object which can provide a consistent set of accessors for a particular format of mass spectrometry data.

Data files come in many shapes and sizes, with different underlying structures. This class provides an API that should make features as consistent as possible to clients of Scan objects.

close()[source]

Close the underlying scan data stream, which may be a file or other system resource.

A closed data source may not be able to serve data requests, but not all ScanDataSource implementations require the data stream be open for all operations.

property source_file_name: Optional[str]

Return the name of the file that backs this data source, if available.

Returns

Return type

str or None

class ms_deisotope.data_source.common.ScanIterator(*args, **kwds)[source]

An Abstract Base Class that extends ScanDataSource with additional requirements that enable clients of the class to treat the object as an iterator over the underlying data file.

iteration_mode

A string denoting ITERATION_MODE_GROUPED or ITERATION_MODE_SINGLE that controls whether ScanBunch or Scan are produced by iteration.

Type

str

has_ms1_scans() bool[source]

Checks if this ScanDataSource contains MS1 spectra.

Returns

Returns a boolean value if the presence of MS1 scans is known for certain, or None if it cannot be determined in the case of missing metadata.

Return type

bool or None

has_msn_scans() bool[source]

Checks if this ScanDataSource contains MSn spectra.

Returns

Returns a boolean value if the presence of MSn scans is known for certain, or None if it cannot be determined in the case of missing metadata.

Return type

bool or None

initialize_scan_cache()[source]

Initialize a cache which keeps track of which Scan objects are still in memory using a weakref.WeakValueDictionary.

When a scan is requested, if the scan object is found in the cache, the existing object is returned rather than re-read from disk.

make_iterator(iterator=None, grouped=None, **kwargs) ms_deisotope.data_source.scan.loader.ScanIterator[source]

Configure the ScanIterator’s behavior, selecting it’s iteration strategy over either its default iterator or the provided iterator argument.

Parameters
  • iterator (Iterator, optional) – The iterator to manipulate. If missing, the default iterator will be used.

  • grouped (bool, optional) – Whether the iterator should be grouped and produce ScanBunch objects or single Scan. If None is passed, has_ms1_scans() will be be used instead. Defaults to None.

abstract next() Union[ms_deisotope.data_source.scan.loader.ScanType, ms_deisotope.data_source.scan.base.ScanBunch][source]

Advance the iterator, fetching the next ScanBunch or ScanBase depending upon iteration strategy.

Returns

Return type

ScanBunch or ScanBase

reset()[source]

Reset the iterator, if possible, and clear any caches.

property scan_cache

A weakref.WeakValueDictionary mapping used to retrieve scans from memory if available before re-reading them from disk.

class ms_deisotope.data_source.common.RandomAccessScanSource(*args, **kwds)[source]

An Abstract Base Class that extends ScanIterator with additional requirements that the implementation support random access to individual scans. This should be doable by unique identifier, sequential index, or by scan time.

find_next_ms1(start_index: int) Optional[ms_deisotope.data_source.scan.loader.ScanType][source]

Locate the MS1 scan following start_index, iterating forwards through scans until either the last scan is reached or an MS1 scan is found.

Returns

Return type

ScanBase or None if not found

find_previous_ms1(start_index: int) Optional[ms_deisotope.data_source.scan.loader.ScanType][source]

Locate the MS1 scan preceding start_index, iterating backwards through scans until either the first scan is reached or an MS1 scan is found.

Returns

Return type

ScanBase or None if not found

abstract get_scan_by_id(scan_id: str) ms_deisotope.data_source.scan.loader.ScanType[source]

Retrieve the scan object for the specified scan id.

If the scan object is still bound and in memory somewhere, a reference to that same object will be returned. Otherwise, a new object will be created.

Parameters

scan_id (str) – The unique scan id value to be retrieved

Returns

Return type

Scan

abstract get_scan_by_index(index: int) ms_deisotope.data_source.scan.loader.ScanType[source]

Retrieve the scan object for the specified scan index.

This internally calls get_scan_by_id() which will use its cache.

Parameters

index (int) – The index to get the scan for

Returns

Return type

Scan

abstract get_scan_by_time(time: float) ms_deisotope.data_source.scan.loader.ScanType[source]

Retrieve the scan object for the specified scan time.

This internally calls get_scan_by_id() which will use its cache.

Parameters

time (float) – The time to get the nearest scan from

Returns

Return type

Scan

property has_fast_random_access: ms_deisotope.utils.Constant

Check whether the underlying data stream supports fast random access or not.

Even if the file format supports random access, it may be impractical due to overhead in parsing the underlying data stream, e.g. calling gzip.GzipFile.seek() can force the file to be decompressed from the beginning of the file on each call. This property can be used to signal to the caller whether or not it should use a different strategy.

Returns

One of DefinitelyNotFastRandomAccess, MaybeFastRandomAccess, or DefinitelyFastRandomAccess. The first is a False-y value, the latter two will evaluate to True

Return type

Constant

abstract start_from_scan(scan_id: Optional[str] = None, rt: Optional[float] = None, index: Optional[int] = None, require_ms1: bool = True, grouped=True, **kwargs) ms_deisotope.data_source.scan.loader.RandomAccessScanSource[source]

Reconstruct an iterator which will start from the scan matching one of scan_id, rt, or index. Only one may be provided.

After invoking this method, the iterator this object wraps will be changed to begin yielding scan bunchs (or single scans if grouped is False).

This method will trigger several random-access operations, making it prohibitively expensive for normally compressed files.

Parameters
  • scan_id (str, optional) – Start from the scan with the specified id.

  • rt (float, optional) – Start from the scan nearest to specified time (in minutes) in the run. If no exact match is found, the nearest scan time will be found, rounded up.

  • index (int, optional) – Start from the scan with the specified index.

  • require_ms1 (bool, optional) – Whether the iterator must start from an MS1 scan. True by default.

  • grouped (bool, optional) – whether the iterator should yield scan bunches or single scans. True by default.

property time

A indexer facade that lets you index and slice by scan time.

Returns

Return type

TimeIndex

Iteratation Strategies

ScanIterator instances may iterate over scans in single or grouped strategies. single mode produces a single instance of Scan on each iteration, while grouped produces a ScanBunch containing an MS1 Scan (may be None) and 0 or more related MSn Scan instances which are derived from the MS1 Scan. The default mode for a given ScanIterator depends upon both the file format and available metadata.

You can force the iteration strategy to be grouped when calling ScanIterator.make_iterator() by passing grouped=True, and single by passing grouped=False. The same applies to RandomAccessScanSource.start_from_scan(). When grouped mode is requested but cannot be fulfilled, ScanBunch objects are still produced, but the precursor may be None or products may be empty.

The iteration mode of a ScanIterator is always available through it’s iteration_mode attribute, which should have the value "single" or "grouped" accordingy.