Searching a Processed Sample with a Glycopeptide Database

The end-goal of all of these tools is to be able to identify glycopeptides from experimental data. After you’ve constructed a glycopeptide database and deconvoluted an LC-MS/MS data file, you’re ready to do just that.

glycresoft analyze search-glycopeptide

Identify glycopeptide sequences from processed LC-MS/MS data. This algorithm requires a fully materialized cross-product database (the default), and uses a reverse-peptide decoy by default, evaluated on the total score.

For a search algorithm that applies separate FDR control on the peptide and the glycan, see search-glycopeptide-multipart

glycresoft analyze search-glycopeptide [OPTIONS] DATABASE_CONNECTION
                                       SAMPLE_PATH HYPOTHESIS_IDENTIFIER

Options

-m, --mass-error-tolerance <relative mass error>

Mass accuracy constraint, in parts-per-million error, for matching MS^1 ions. [default: 1e-05]

-mn, --msn-mass-error-tolerance <relative mass error>

Mass accuracy constraint, in parts-per-million error, for matching MS^n ions. [default: 2e-05]

-g, --grouping-error-tolerance <relative mass error>

Mass accuracy constraint, in parts-per-million error, for grouping chromatograms. [default: 1.5e-05]

-n, --analysis-name <string>

Name for analysis to be performed.

-q, --psm-fdr-threshold <float>

Minimum FDR Threshold to use for filtering GPSMs when selecting identified glycopeptides [default: 0.05]

-s, --tandem-scoring-model <choice or>

Select a scoring function to use for evaluating glycopeptide-spectrum matches [default: coverage_weighted_binomial]

Choices: [
penalized_log_intensty; log_intensity; simple;
coverage_weighted_binomial; peptide_only_cw_binomial; binomial]
-x, --oxonium-threshold <float>

Minimum HexNAc-derived oxonium ion abundance ratio to filter MS/MS scans. Defaults to 0.05. [default: 0.05]

-a, --adduct <string>

Adducts to consider. Specify name or formula, and a multiplicity. (May specify more than once)

-f, --use-peptide-mass-filter

Filter putative spectrum matches by estimating the peptide backbone mass from the precursor mass and stub glycopeptide signature ions [default: False]

-p, --processes <int>

Number of worker processes to use. Defaults to 4 or the number of CPUs, whichever is lower [default: 4]

--export <choice>

export command to after search is complete (May specify more than once)

Choices: [
csv; html; psm-csv]
-o, --output-path <path>

Path to write resulting analysis to. [required]

-w, --workload-size <int>

Number of spectra to process at once [default: 500]

--save-intermediate-results <path>

Save intermediate spectrum matches to a file

--maximum-mass <float>

[default: inf]

-D, --decoy-database-connection <string>

Provide an alternative hypothesis to draw decoy glycopeptides from instead of the simpler reversed-peptide decoy. This is especially necessary when the stub peptide+Y ions account for a large fraction of MS2 signal.

-G, --permute-decoy-glycan-fragments

Whether or not to permute decoy glycopeptides’ peptide+Y ions. The intact mass, peptide, and peptide+Y1 ions are unchanged. [default: False]

--isotope-probing-range <int>

The maximum number of isotopic peak errors to allow when searching for untrusted precursor masses [default: 3]

-R, --rare-signatures

Look for rare signature ions when scoring glycan oxonium signature [default: False]

Arguments

DATABASE_CONNECTION

Required argument <databaseconnectionparam> A connection URI for a database, or a path on the file system

SAMPLE_PATH

Required argument <path> The path to the deconvoluted sample file

HYPOTHESIS_IDENTIFIER

Required argument <string> The ID number or name of the glycopeptide hypothesis to use

Usage Example

$ glycresoft analyze search-glycopeptide -m 5e-6 -mn 1e-5 fasta-glycopeptides.db path/to/processed/sample.mzML 1\
     -o "agp-glycopepitdes-in-sample.db"

glycresoft analyze search-glycopeptide-multipart

glycresoft analyze search-glycopeptide-multipart [OPTIONS] DATABASE_CONNECTION
                                                 DECOY_DATABASE_CONNECTION
                                                 SAMPLE_PATH

Options

-T, --target-hypothesis-identifier <int>

The ID number or name of the glycopeptide hypothesis to use [default: 1]

-D, --decoy-hypothesis-identifier <int>

The ID number or name of the glycopeptide hypothesis to use [default: 1]

-M, --memory-database-index

Whether to load the entire peptide database into memory during spectrum mapping. Uses more memory but substantially accelerates the process [default: False]

-m, --mass-error-tolerance <relative mass error>

Mass accuracy constraint, in parts-per-million error, for matching MS^1 ions. [default: 1e-05]

-mn, --msn-mass-error-tolerance <relative mass error>

Mass accuracy constraint, in parts-per-million error, for matching MS^n ions. [default: 2e-05]

-g, --grouping-error-tolerance <relative mass error>

Mass accuracy constraint, in parts-per-million error, for grouping chromatograms. [default: 1.5e-05]

-n, --analysis-name <string>

Name for analysis to be performed.

-q, --psm-fdr-threshold <float>

Minimum FDR Threshold to use for filtering GPSMs when selecting identified glycopeptides [default: 0.05]

-f, --fdr-estimation-strategy <choice>

The FDR estimation strategy to use. The joint estimate uses both peptide and glycan scores, peptide uses only peptide scores, glycan uses only glycan scores, and any uses the smallest FDR of the joint, peptide, and glycan estiamtes. [default: joint]

Choices: [
joint; peptide; glycan;
any]
-s, --tandem-scoring-model <choice or>

Select a scoring function to use for evaluating glycopeptide-spectrum matches [default: log_intensity]

Choices: [
log_intensity; simple; penalized_log_intensty]
-y, --glycan-score-threshold <float>

The minimum glycan score required to consider a peptide mass [default: 1.0]

-a, --adduct <string>

Adducts to consider. Specify name or formula, and a multiplicity. (May specify more than once)

-p, --processes <int>

Number of worker processes to use. Defaults to 4 or the number of CPUs, whichever is lower [default: 4]

--export <choice>

export command to after search is complete (May specify more than once)

Choices: [
csv; html; psm-csv]
-o, --output-path <path>

Path to write resulting analysis to. [required]

-w, --workload-size <int>

Number of spectra to process at once [default: 100]

-R, --rare-signatures

Look for rare signature ions when scoring glycan oxonium signature [default: False]

--isotope-probing-range <int>

The maximum number of isotopic peak errors to allow when searching for untrusted precursor masses [default: 3]

-S, --glycoproteome-smoothing-model <path>

Path to a glycoproteome site-specific glycome model

Arguments

DATABASE_CONNECTION

Required argument <databaseconnectionparam> A connection URI for a database, or a path on the file system

DECOY_DATABASE_CONNECTION

Required argument <databaseconnectionparam> A connection URI for a database, or a path on the file system

SAMPLE_PATH

Required argument <path> The path to the deconvoluted sample file

Memory Consumption and Workload Size

Extensive use of caching and work-sharing has been done to make searching enormous databases still tractable. If you find you are running out of memory during a search consider shrinking the -w parameter.