Searching a Processed Sample with a Glycan Database

Searching a Processed Sample with a Glycan Database¶

Match features from a deconvoluted LC-MS or LC-MS/MS data file with released glycan compositions from a glycan hypothesis (see Combinatorial, Text, and glySpace glycan database construction methods).

glycresoft analyze search-glycan¶

Identify glycan compositions from preprocessed LC-MS data, stored in mzML: format.

glycresoft analyze search-glycan [OPTIONS] DATABASE_CONNECTION SAMPLE_PATH
                                 HYPOTHESIS_IDENTIFIER

Options

-m, --mass-error-tolerance <relative mass error>¶: Mass accuracy constraint, in parts-per-million error, for matching. [default: 1e-05]

-mn, --msn-mass-error-tolerance <relative mass error>¶: Mass accuracy constraint, in parts-per-million error, for matching MS^n ions. [default: 2e-05]

-g, --grouping-error-tolerance <relative mass error>¶: Mass accuracy constraint, in parts-per-million error, for grouping chromatograms. [default: 1.5e-05]

-n, --analysis-name <string>¶: Name for analysis to be performed.

-a, --mass_shift <string>¶: Adducts to consider. Specify name or formula, and a multiplicity. (May specify more than once)

--mass_shift-combination-limit <int>¶: Maximum number of mass_shift combinations to consider [default: 8]

-d, --minimum-mass <float>¶: The minimum mass to consider signal at. [default: 500.0]

-o, --output-path <string>¶: Path to write resulting analysis to. [required]

-f, --ms1-scoring-feature <choice>¶: Additional features to include in evaluating chromatograms (May specify more than once)

Choices: [

null-charge; permethylated-ammonium-adducts; methyl-loss;

permethylated-ammonium-adducts-methyl-loss; formate-adduct-model]

-r, --regularize <regularization parameter>¶: Apply Laplacian regularization with either a specified weight or “grid” to grid search, or a pair of values separated by a / to specify a weight or grid search for model fitting and a separate weight for scoring

-w, --regularization-model-path <path>¶: Path to a file containing a neighborhood model for regularization

-k, --network-path <path>¶: Path to a file containing the glycan composition network and neighborhood rules

-t, --delta-rt <float>¶: The maximum time between observed data points before splitting features [default: 0.5]

--export <choice>¶: export command to after search is complete (May specify more than once)

Choices: [

csv; glycan-list; html;

model]

-s, --require-msms-signature <float>¶: Minimum oxonium ion signature required in MS/MS scans to include. [default: 0.0]

-p, --processes <int>¶: Number of worker processes to use. Defaults to 4 or the number of CPUs, whichever is lower [default: 4]

Arguments

DATABASE_CONNECTION¶: Required argument <databaseconnectionparam> A connection URI for a database, or a path on the file system

SAMPLE_PATH¶: Required argument <path> The path to the deconvoluted sample file

HYPOTHESIS_IDENTIFIER¶: Required argument <string> The ID number or name of the glycan hypothesis to use

Usage Example¶

$ glycresoft analyze search-glycan -a Formate 1 -o agp-native-results.db\
    ../hypothesis/native-n-glycans.db path/to/sample.preprocessed.mzML 1\
    --export csv

Adducts¶

Adducts are mass shifts that may represent alternative charge carriers such as formate or sodium, or chemical defects such as water loss or incomplete permethylation. The software internally refers to these as “mass shifts”.

Adducts are considered combinatorially, so if you were to pass -a Ammonium 3 and -a "C-1H-2" 1 to indicate up to three ammonium adducts and up to one incomplete permethylation, the program would search for

0 Ammonium, 0 C-1H-2	1 Ammonium, 0 C-1H-2	2 Ammonium, 0 C-1H-2	3 Ammonium, 0 C-1H-2
0 Ammonium, 1 C-1H-2	1 Ammonium, 1 C-1H-2	2 Ammonium, 1 C-1H-2	3 Ammonium, 1 C-1H-2

At this time, adduction models do not have any interaction with charge state.

Network Regularization¶

Apply network smoothing by laplacian regularization to the glycan composition identification scores. This procedure is described in detail in “Klein, J., Carvalho, L., & Zaia, J. (2018). Application of network smoothing to glycan LC-MS profiling. Bioinformatics, 34(20), 3511-3518. https://doi.org/10.1093/bioinformatics/bty397”. By default, the network used is simply the full set of all glycan compositions in the hypothesis, with edges between compositions whose composition-distance is less than or equal to \(1\) and with neighborhoods defined for N-glycans.

MS/MS Signatures¶

Though this tool is designed to annotate putative glycan compositions from LC-MS, this can lead to lots of strange matches. If your data contain MS/MS scans, passing a non-zero value to --require-msms-signature causes the program to include only features which contain MS/MS scans which look “glycan-like”. Here “glycan-like” means containing abundant peaks which have masses derived from mono-, di-, or tri-saccharide losses. The value of this parameter sets the minimum ratio score.

glycresoft documentation