Formats¶
There are many different types of mass spectrometry data files, each requiring different methods
to read them. When working with a path or file-like object, you can use
MSFileLoader() to open most formats using the same
uniform API. Otherwise, you may directly use the file reading classes in the appropriate data source
submodule.
mzML and mzXML¶
ms_deisotope supports reading from mzML [Martens2011]
and mzXML [Pedrioli2004a] files on all platforms, and can
provide fast random access to uncompressed files. By default, these file types are assumed to be
collections of MS1 and MSn scans, and will iterate over scan bunches. To iterate over single scans,
call make_iterator() or another iterator creation method with the keyword argument grouped=False.
mzMLb¶
With h5py, ms_deisotope supports reading mzMLb [Bhamber2021].
The reader provides identical behaviors to those for mzML, while providing full random access even while
compressed conventionally.
MGF¶
ms_deisotope supports reading MGF files on all platforms,
and can provide fast random access to uncompressed files. As this format does not store MS1 scans, only
single MSn scans are produced by iteration, as if grouped=False were passed to make_iterator().
Vendor Readers¶
If the appropriate COM DLL has been registered with Windows, ms_deisotope is able to read a subset
of the commonly used vendor data file formats. These depend upon the platform and external libraries. Some
of these external libraries may be automatically detected while others will need to be formally registered
before use.
Note
By using an instrument vendor’s library to read their proprietary file format, you are agreeing to the requisite license and terms associated with that library.
Thermo Fisher RAW¶
If the .NET run time is available and pythonnet is installed, Thermo’s RawFileReader library will be used
to open Thermo Fisher RAW.
With comtypes, if the MSFileReader package
that provides the XRawfile2_<arch>.dll library has been installed and registered with Windows, ms_deisotope
can open Thermo Fisher RAW files. The implementation of the MSFileReader
bindings are derived from the work of François Allain,
distributed under the MIT license, which have been included in this codebase.
Agilent .d¶
With comtypes, if Agilent’s MassSpecDataReader.dll and its supporting libraries have been installed, and
had their type libraries built and registered (files whose names match the DLL name, but whose extension is .tlb)
as described in the vendor’s installation instructions, ms_deisotope can open
Agilent .d directories directly. Because there is no standard registered
installation location, you must explicitly tell comtypes where to look for the DLLs before you
can use this feature. See ms_deisotope.data_source.agilent_d.register_dll_dir() for more details.
Waters¶
On Windows, if the MassLynx SDK is installed either on the system path or are registered with ms_deisotope’s
configuration file, ms_deisotope can open Waters .RAW directories directly.
To programmatically register the SDK, see ms_deisotope.data_source._vendor.masslynx.libload.register_dll() for more
details.
References¶
- Martens2011
Martens, L., Chambers, M., Sturm, M., Kessner, D., Levander, F., Shofstahl, J., … Deutsch, E. W. (2011). mzML–a community standard for mass spectrometry data. Molecular & Cellular Proteomics : MCP, 10(1), R110.000133. https://doi.org/10.1074/mcp.R110.000133
- Pedrioli2004a
Pedrioli, P. G. A., Eng, J. K., Hubley, R., Vogelzang, M., Deutsch, E. W., Raught, B., … Aebersold, R. (2004). A common open representation of mass spectrometry data and its application to proteomics research. Nature Biotechnology, 22(11), 1459–1466. https://doi.org/10.1038/nbt1031
- Bhamber2021
Bhamber, R. S., Jankevics, A., Deutsch, E. W., Jones, A. R., & Dowsey, A. W. (2021). MzMLb: A Future-Proof Raw Mass Spectrometry Data Format Based on Standards-Compliant mzML and Optimized for Speed and Storage Requirements. Journal of Proteome Research, 20(1), 172–183. https://doi.org/10.1021/acs.jproteome.0c00192