Formats¶
There are many different types of mass spectrometry data files, each requiring different methods
to read them. When working with a path or file-like object, you can use
MSFileLoader()
to open most formats using the same
uniform API. Otherwise, you may directly use the file reading classes in the appropriate data source
submodule.
mzML and mzXML¶
ms_deisotope
supports reading from mzML
[Martens2011]
and mzXML
[Pedrioli2004a] files on all platforms, and can
provide fast random access to uncompressed files. By default, these file types are assumed to be
collections of MS1 and MSn scans, and will iterate over scan bunches. To iterate over single scans,
call make_iterator()
or another iterator creation method with the keyword argument grouped=False
.
mzMLb¶
With h5py
, ms_deisotope
supports reading mzMLb
[Bhamber2021].
The reader provides identical behaviors to those for mzML, while providing full random access even while
compressed conventionally.
MGF¶
ms_deisotope
supports reading MGF
files on all platforms,
and can provide fast random access to uncompressed files. As this format does not store MS1 scans, only
single MSn scans are produced by iteration, as if grouped=False
were passed to make_iterator()
.
Vendor Readers¶
If the appropriate COM DLL has been registered with Windows, ms_deisotope
is able to read a subset
of the commonly used vendor data file formats. These depend upon the platform and external libraries. Some
of these external libraries may be automatically detected while others will need to be formally registered
before use.
Note
By using an instrument vendor’s library to read their proprietary file format, you are agreeing to the requisite license and terms associated with that library.
Thermo Fisher RAW¶
If the .NET run time is available and pythonnet
is installed, Thermo’s RawFileReader
library will be used
to open Thermo Fisher RAW
.
With comtypes
, if the MSFileReader package
that provides the XRawfile2_<arch>.dll
library has been installed and registered with Windows, ms_deisotope
can open Thermo Fisher RAW
files. The implementation of the MSFileReader
bindings are derived from the work of François Allain,
distributed under the MIT license, which have been included in this codebase.
Agilent .d¶
With comtypes
, if Agilent’s MassSpecDataReader.dll and its supporting libraries have been installed, and
had their type libraries built and registered (files whose names match the DLL name, but whose extension is .tlb
)
as described in the vendor’s installation instructions, ms_deisotope
can open
Agilent .d
directories directly. Because there is no standard registered
installation location, you must explicitly tell comtypes
where to look for the DLLs before you
can use this feature. See ms_deisotope.data_source.agilent_d.register_dll_dir()
for more details.
Waters¶
On Windows, if the MassLynx SDK is installed either on the system path or are registered with ms_deisotope
’s
configuration file, ms_deisotope
can open Waters .RAW
directories directly.
To programmatically register the SDK, see ms_deisotope.data_source._vendor.masslynx.libload.register_dll()
for more
details.
References¶
- Martens2011
Martens, L., Chambers, M., Sturm, M., Kessner, D., Levander, F., Shofstahl, J., … Deutsch, E. W. (2011). mzML–a community standard for mass spectrometry data. Molecular & Cellular Proteomics : MCP, 10(1), R110.000133. https://doi.org/10.1074/mcp.R110.000133
- Pedrioli2004a
Pedrioli, P. G. A., Eng, J. K., Hubley, R., Vogelzang, M., Deutsch, E. W., Raught, B., … Aebersold, R. (2004). A common open representation of mass spectrometry data and its application to proteomics research. Nature Biotechnology, 22(11), 1459–1466. https://doi.org/10.1038/nbt1031
- Bhamber2021
Bhamber, R. S., Jankevics, A., Deutsch, E. W., Jones, A. R., & Dowsey, A. W. (2021). MzMLb: A Future-Proof Raw Mass Spectrometry Data Format Based on Standards-Compliant mzML and Optimized for Speed and Storage Requirements. Journal of Proteome Research, 20(1), 172–183. https://doi.org/10.1021/acs.jproteome.0c00192