Table Of Contents
Searching a Processed Sample with a Glycan Database¶
Match features from a deconvoluted LC-MS or LC-MS/MS data file with released glycan compositions from a glycan hypothesis (see Combinatorial, Text, and glySpace glycan database construction methods).
glycresoft analyze search-glycan¶
- Identify glycan compositions from preprocessed LC-MS data, stored in mzML
format.
glycresoft analyze search-glycan [OPTIONS] DATABASE_CONNECTION SAMPLE_PATH
HYPOTHESIS_IDENTIFIER
Options
- -m, --mass-error-tolerance <relative mass error>¶
Mass accuracy constraint, in parts-per-million error, for matching. [default: 1e-05]
- -mn, --msn-mass-error-tolerance <relative mass error>¶
Mass accuracy constraint, in parts-per-million error, for matching MS^n ions. [default: 2e-05]
- -g, --grouping-error-tolerance <relative mass error>¶
Mass accuracy constraint, in parts-per-million error, for grouping chromatograms. [default: 1.5e-05]
- -n, --analysis-name <string>¶
Name for analysis to be performed.
- -a, --mass_shift <string>¶
Adducts to consider. Specify name or formula, and a multiplicity. (May specify more than once)
- --mass_shift-combination-limit <int>¶
Maximum number of mass_shift combinations to consider [default: 8]
- -d, --minimum-mass <float>¶
The minimum mass to consider signal at. [default: 500.0]
- -o, --output-path <string>¶
Path to write resulting analysis to. [required]
- -f, --ms1-scoring-feature <choice>¶
Additional features to include in evaluating chromatograms (May specify more than once)
Choices: [null-charge; permethylated-ammonium-adducts; methyl-loss;permethylated-ammonium-adducts-methyl-loss; formate-adduct-model]
- -r, --regularize <regularization parameter>¶
Apply Laplacian regularization with either a specified weight or “grid” to grid search, or a pair of values separated by a / to specify a weight or grid search for model fitting and a separate weight for scoring
- -w, --regularization-model-path <path>¶
Path to a file containing a neighborhood model for regularization
- -k, --network-path <path>¶
Path to a file containing the glycan composition network and neighborhood rules
- -t, --delta-rt <float>¶
The maximum time between observed data points before splitting features [default: 0.5]
- --export <choice>¶
export command to after search is complete (May specify more than once)
Choices: [csv; glycan-list; html;model]
- -s, --require-msms-signature <float>¶
Minimum oxonium ion signature required in MS/MS scans to include. [default: 0.0]
- -p, --processes <int>¶
Number of worker processes to use. Defaults to 4 or the number of CPUs, whichever is lower [default: 4]
Arguments
- DATABASE_CONNECTION¶
Required argument <databaseconnectionparam> A connection URI for a database, or a path on the file system
- SAMPLE_PATH¶
Required argument <path> The path to the deconvoluted sample file
- HYPOTHESIS_IDENTIFIER¶
Required argument <string> The ID number or name of the glycan hypothesis to use
Usage Example¶
$ glycresoft analyze search-glycan -a Formate 1 -o agp-native-results.db\
../hypothesis/native-n-glycans.db path/to/sample.preprocessed.mzML 1\
--export csv
Adducts¶
Adducts are mass shifts that may represent alternative charge carriers such as formate or sodium, or chemical defects such as water loss or incomplete permethylation. The software internally refers to these as “mass shifts”.
Adducts are considered combinatorially, so if you were to pass -a Ammonium 3
and -a "C-1H-2" 1
to indicate up to three ammonium adducts and up to one
incomplete permethylation, the program would search for
0 Ammonium, 0 C-1H-2 |
1 Ammonium, 0 C-1H-2 |
2 Ammonium, 0 C-1H-2 |
3 Ammonium, 0 C-1H-2 |
0 Ammonium, 1 C-1H-2 |
1 Ammonium, 1 C-1H-2 |
2 Ammonium, 1 C-1H-2 |
3 Ammonium, 1 C-1H-2 |
At this time, adduction models do not have any interaction with charge state.
Network Regularization¶
Apply network smoothing by laplacian regularization to the glycan composition identification scores. This procedure is described in detail in “Klein, J., Carvalho, L., & Zaia, J. (2018). Application of network smoothing to glycan LC-MS profiling. Bioinformatics, 34(20), 3511-3518. https://doi.org/10.1093/bioinformatics/bty397”. By default, the network used is simply the full set of all glycan compositions in the hypothesis, with edges between compositions whose composition-distance is less than or equal to \(1\) and with neighborhoods defined for N-glycans.
MS/MS Signatures¶
Though this tool is designed to annotate putative glycan compositions from
LC-MS, this can lead to lots of strange matches. If your data contain MS/MS
scans, passing a non-zero value to --require-msms-signature
causes the
program to include only features which contain MS/MS scans which look
“glycan-like”. Here “glycan-like” means containing abundant peaks which have
masses derived from mono-, di-, or tri-saccharide losses. The value of this
parameter sets the minimum ratio score.