Table Of Contents

LC-MS/MS Data Preprocessing and Deconvolution

Convert raw mass spectral data files into deisotoped neutral mass peak lists written to a new mzML [Martens2011] file. For tandem mass spectra, recalculate precursor ion monoisotopic peaks.

This task is computationally intensive, and uses several collaborative processes to share the work.

glycresoft mzml preprocess

Convert raw mass spectra data into deisotoped neutral mass peak lists.

glycresoft mzml preprocess [OPTIONS] MS_FILE OUTFILE_PATH

Options

-a, --averagine <averagine>

Averagine model to use for MS1 scans. Either a name or formula. May specify multiple times. (May specify more than once)

-an, --msn-averagine <averagine>

Averagine model to use for MS^n scans. Either a name or formula. May specify multiple times. (May specify more than once)

-s, --start-time <float>

Scan time to begin processing at in minutes

-e, --end-time <float>

Scan time to stop processing at in minutes

-c, --maximum-charge <int>

Highest absolute charge state to consider

-n, --name <string>

Name for the sample run to be stored. Defaults to the base name of the input data file

-t, --score-threshold <float>

Minimum score to accept an isotopic pattern fit in an MS1 scan

-tn, --msn-score-threshold <float>

Minimum score to accept an isotopic pattern fit in an MS^n scan

-m, --missed-peaks <int>

Number of missing peaks to permit before an isotopic fit is discarded

-mn, --msn-missed-peaks <int>

Number of missing peaks to permit before an isotopic fit is discarded in an MSn scan

-p, --processes <int>

Number of worker processes to use. Defaults to 4 or the number of CPUs, whichever is lower

-b, --background-reduction <float>

Background reduction factor. Larger values more aggresively remove low abundance signal in MS1 scans.

-bn, --msn-background-reduction <float>

Background reduction factor. Larger values more aggresively remove low abundance signal in MS^n scans.

-r, --transform <func>

Scan transformations to apply to MS1 scans. May specify more than once. (May specify more than once)

-rn, --msn-transform <func>

Scan transformations to apply to MS^n scans. May specify more than once. (May specify more than once)

-v, --extract-only-tandem-envelopes

Only work on regions that will be chosen for MS/MS

--verbose

Log additional diagnostic information for each scan.

-g, --ms1-averaging <int>

The number of MS1 scans before and after the current MS1 scan to average when picking peaks.

--ignore-msn

Ignore MS^n scans

-i, --isotopic-strictness <float>
-in, --msn-isotopic-strictness <float>
-snr, --signal-to-noise-threshold <float>

Signal-to-noise ratio threshold to apply when filtering peaks

-mo, --mass-offset <float>

Shift peak masses by the given amount

-D, --default-precursor-ion-selection-window <float>

The isolation window width to assume when it is not specified.

Arguments

MS_FILE

Required argument <path>

OUTFILE_PATH

Required argument <path>

Usage example

example usage
glycresoft-cli mzml preprocess -a permethylated-glycan -t 20 -p 6 \
    -s 5.0 -e 60.0 "path/to/input" "path/to/output.mzML"

Averagine Models

Argument type for <averagine>. The model selected influences how isotopic patterns are estimated for an arbitrary mass. The value of this parameter may be a builtin model name or a formula.

For a more complete discsussion of how “averagine” isotopic models work, see [Senko1995].

Builtin Models

Error

Unable to execute python code at mzml-preprocess.rst:38:

cannot import name ‘AveragineParamType’ from ‘glycresoft.cli.validators’ (C:\Users\Joshua\Dev\glycresoft\src\glycresoft\cli\validators.py)

Traceback (most recent call last): File “C:\Users\Joshua\Dev\glycresoft\docs\_ext\exec_directive.py”, line 24, in run exec(‘\n’.join(self.content)) File “<string>”, line 1, in <module> ImportError: cannot import name ‘AveragineParamType’ from ‘glycresoft.cli.validators’ (C:\Users\Joshua\Dev\glycresoft\src\glycresoft\cli\validators.py)

Supported File Formats

MS_FILE may be in mzML or mzXML format.

Signal Filters

Prior to picking peaks, the raw mass spectral signal may be filtered a number of ways. By default, a local noise reduction filter is applied, modulated by -b and -bn options respectively. Other filers may be set using -r and -rn:

  1. mean_below_mean - Remove all points below the mean of all points below the mean of all unfiltered points of this scan

  2. median - Remove all points below the median intensity of this scan

  3. one_percent_of_max - Remove all points with intensity less than 1% of the maximum intensity point of this scan

  4. fticr_baseline - Apply the same background reduction algorithm used by -b and -bn

  5. savitsky_golay - Apply Savtisky-Golay smoothing on the intensities of this scan

Output Information

The resulting mzML file from this tool attempts to preserve as much metadata as possible from the source data file, and records its own metadata in the appropriate sections of the document.

Each scan has a standard set of cvParam entries covering scan polarity, peak mode, and MS level. In addition to the normal m/z array and intensity array entries, each scan also includes the standardized charge array, as well as two non-standard arrays, deconvolution score array and isotopic envelopes array. The deconvolution score array is just the result of the goodness-of-fit function used to evaluate the isotopic envelopes resulting in the reported peaks. The isotopic envelopes array is more complex, as it encodes the set of isotopic peaks used to fit each reported peak, and does not have a one-to-one relationship with other arrays.

To unpack the isotopic envelopes array after decoding, the we use the following logic:

 1def decode_envelopes(array):
 2    '''
 3    Arguments
 4    ---------
 5    array: float32 array
 6    '''
 7    envelope_list = []
 8    current_envelope = []
 9    i = 0
10    n = len(array)
11    while i < n:
12        # fetch the next two values
13        mz = array[i]
14        intensity = array[i + 1]
15        i += 2
16
17        # if both numbers are zero, this denotes the beginning
18        # of a new envelope
19        if mz == 0 and intensity == 0:
20            if current_envelope is not None:
21                if current_envelope:
22                    envelope_list.append(Envelope(current_envelope))
23                current_envelope = []
24        # otherwise add the current point to the existing envelope
25        else:
26            current_envelope.append(EnvelopePair(mz, intensity))
27    envelope_list.append(Envelope(current_envelope))
28    return envelope_list

Bibliography

[Senko1995]

Senko, M. W., Beu, S. C., & McLafferty, F. W. (1995). Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions. Journal of the American Society for Mass Spectrometry, 6(4), 229–233. https://doi.org/10.1016/1044-0305(95)00017-8

[Martens2011]

Martens, L., Chambers, M., Sturm, M., Kessner, D., Levander, F., Shofstahl, J., … Deutsch, E. W. (2011). mzML–a community standard for mass spectrometry data. Molecular & Cellular Proteomics : MCP, 10(1), R110.000133. https://doi.org/10.1074/mcp.R110.000133