Table Of Contents
LC-MS/MS Data Preprocessing and Deconvolution¶
Convert raw mass spectral data files into deisotoped neutral mass peak lists written to a new mzML [Martens2011] file. For tandem mass spectra, recalculate precursor ion monoisotopic peaks.
This task is computationally intensive, and uses several collaborative processes to share the work.
glycresoft mzml preprocess¶
Convert raw mass spectra data into deisotoped neutral mass peak lists.
glycresoft mzml preprocess [OPTIONS] MS_FILE OUTFILE_PATH
Options
- -a, --averagine <averagine>¶
Averagine model to use for MS1 scans. Either a name or formula. May specify multiple times. (May specify more than once)
- -an, --msn-averagine <averagine>¶
Averagine model to use for MS^n scans. Either a name or formula. May specify multiple times. (May specify more than once)
- -s, --start-time <float>¶
Scan time to begin processing at in minutes
- -e, --end-time <float>¶
Scan time to stop processing at in minutes
- -c, --maximum-charge <int>¶
Highest absolute charge state to consider
- -n, --name <string>¶
Name for the sample run to be stored. Defaults to the base name of the input data file
- -t, --score-threshold <float>¶
Minimum score to accept an isotopic pattern fit in an MS1 scan
- -tn, --msn-score-threshold <float>¶
Minimum score to accept an isotopic pattern fit in an MS^n scan
- -m, --missed-peaks <int>¶
Number of missing peaks to permit before an isotopic fit is discarded
- -mn, --msn-missed-peaks <int>¶
Number of missing peaks to permit before an isotopic fit is discarded in an MSn scan
- -p, --processes <int>¶
Number of worker processes to use. Defaults to 4 or the number of CPUs, whichever is lower
- -b, --background-reduction <float>¶
Background reduction factor. Larger values more aggresively remove low abundance signal in MS1 scans.
- -bn, --msn-background-reduction <float>¶
Background reduction factor. Larger values more aggresively remove low abundance signal in MS^n scans.
- -r, --transform <func>¶
Scan transformations to apply to MS1 scans. May specify more than once. (May specify more than once)
- -rn, --msn-transform <func>¶
Scan transformations to apply to MS^n scans. May specify more than once. (May specify more than once)
- -v, --extract-only-tandem-envelopes¶
Only work on regions that will be chosen for MS/MS
- --verbose¶
Log additional diagnostic information for each scan.
- -g, --ms1-averaging <int>¶
The number of MS1 scans before and after the current MS1 scan to average when picking peaks.
- --ignore-msn¶
Ignore MS^n scans
- -i, --isotopic-strictness <float>¶
- -in, --msn-isotopic-strictness <float>¶
- -snr, --signal-to-noise-threshold <float>¶
Signal-to-noise ratio threshold to apply when filtering peaks
- -mo, --mass-offset <float>¶
Shift peak masses by the given amount
- -D, --default-precursor-ion-selection-window <float>¶
The isolation window width to assume when it is not specified.
Arguments
- MS_FILE¶
Required argument <path>
- OUTFILE_PATH¶
Required argument <path>
Usage example¶
glycresoft-cli mzml preprocess -a permethylated-glycan -t 20 -p 6 \
-s 5.0 -e 60.0 "path/to/input" "path/to/output.mzML"
Averagine Models¶
Argument type for <averagine>
. The model selected influences how isotopic
patterns are estimated for an arbitrary mass. The value of this parameter may
be a builtin model name or a formula.
For a more complete discsussion of how “averagine” isotopic models work, see [Senko1995].
Builtin Models¶
Error
Unable to execute python code at mzml-preprocess.rst:38:
cannot import name ‘AveragineParamType’ from ‘glycresoft.cli.validators’ (C:\Users\Joshua\Dev\glycresoft\src\glycresoft\cli\validators.py)
Traceback (most recent call last): File “C:\Users\Joshua\Dev\glycresoft\docs\_ext\exec_directive.py”, line 24, in run exec(‘\n’.join(self.content)) File “<string>”, line 1, in <module> ImportError: cannot import name ‘AveragineParamType’ from ‘glycresoft.cli.validators’ (C:\Users\Joshua\Dev\glycresoft\src\glycresoft\cli\validators.py)
Supported File Formats¶
MS_FILE
may be in mzML or mzXML format.
Signal Filters¶
Prior to picking peaks, the raw mass spectral signal may be filtered a number
of ways. By default, a local noise reduction filter is applied, modulated by
-b
and -bn
options respectively. Other filers may be set using -r
and -rn
:
mean_below_mean
- Remove all points below the mean of all points below the mean of all unfiltered points of this scanmedian
- Remove all points below the median intensity of this scanone_percent_of_max
- Remove all points with intensity less than 1% of the maximum intensity point of this scanfticr_baseline
- Apply the same background reduction algorithm used by-b
and-bn
savitsky_golay
- Apply Savtisky-Golay smoothing on the intensities of this scan
Output Information¶
The resulting mzML file from this tool attempts to preserve as much metadata as possible from the source data file, and records its own metadata in the appropriate sections of the document.
Each scan has a standard set of cvParam
entries covering scan polarity,
peak mode, and MS level. In addition to the normal m/z array
and intensity array
entries, each scan also includes the standardized charge array
, as well as two non-standard
arrays, deconvolution score array
and isotopic envelopes array
. The deconvolution score array
is just the result of the goodness-of-fit function used to evaluate the isotopic envelopes resulting
in the reported peaks. The isotopic envelopes array
is more complex, as it encodes the set of isotopic
peaks used to fit each reported peak, and does not have a one-to-one relationship with other arrays.
To unpack the isotopic envelopes array
after decoding, the we use the following logic:
1def decode_envelopes(array):
2 '''
3 Arguments
4 ---------
5 array: float32 array
6 '''
7 envelope_list = []
8 current_envelope = []
9 i = 0
10 n = len(array)
11 while i < n:
12 # fetch the next two values
13 mz = array[i]
14 intensity = array[i + 1]
15 i += 2
16
17 # if both numbers are zero, this denotes the beginning
18 # of a new envelope
19 if mz == 0 and intensity == 0:
20 if current_envelope is not None:
21 if current_envelope:
22 envelope_list.append(Envelope(current_envelope))
23 current_envelope = []
24 # otherwise add the current point to the existing envelope
25 else:
26 current_envelope.append(EnvelopePair(mz, intensity))
27 envelope_list.append(Envelope(current_envelope))
28 return envelope_list
Bibliography¶
Senko, M. W., Beu, S. C., & McLafferty, F. W. (1995). Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions. Journal of the American Society for Mass Spectrometry, 6(4), 229–233. https://doi.org/10.1016/1044-0305(95)00017-8
Martens, L., Chambers, M., Sturm, M., Kessner, D., Levander, F., Shofstahl, J., … Deutsch, E. W. (2011). mzML–a community standard for mass spectrometry data. Molecular & Cellular Proteomics : MCP, 10(1), R110.000133. https://doi.org/10.1074/mcp.R110.000133