Table Of Contents

Searching a Processed Sample with a Glycopeptide Database

The end-goal of all of these tools is to be able to identify glycopeptides from experimental data. After you’ve constructed a glycopeptide database and deconvoluted an LC-MS/MS data file, you’re ready to do just that.

Memory Consumption and Workload Size

Extensive use of caching and work-sharing has been done to make searching enormous databases still tractable. If you find you are running out of memory during a search consider shrinking the -w parameter.

Build a Glycosite Network Smoothing Model

glycresoft analyze fit-glycoproteome-smoothing-model

glycresoft analyze fit-glycoproteome-smoothing-model [OPTIONS]

Options

-p, --processes <int>

Number of worker processes to use. Defaults to 4 or the number of CPUs, whichever is lower [default: 4]

-i, --analysis-path <string>

[required] (May specify more than once)

-o, --output-path <path>

[required]

-q, --fdr-threshold <float>

The FDR threshold to apply when selecting identified glycopeptides [default: 0.05]

-P, --glycopeptide-hypothesis <tuple>
-g, --glycan-hypothesis <tuple>
-u, --unobserved-penalty-scale <float>

A penalty to scale unobserved-but-suggested glycans by. Defaults to 1.0, no penalty. [default: 1.0]

-a, --smoothing-limit <float>

An upper bound on the network smoothness to use when estimating the posterior probability. [default: 0.2]

-r, --require-multiple-observations, --no-require-multiple-observations

Require a glycan/glycosite combination be observed in multiple samples to treat it as real. Defaults to False. [default: False]

-w, --network-path <path>

The path to a text file defining the glycan network and its neighborhoods, as produced by glycresfoft build-hypothesis glycan-network, otherwise the default human N-glycan network will be used with the glycans defined in -g.

Adducts

Unlike the glycan search tool, the glycopeptide search tool does not apply combinatorial expansion of adducts. It will not mix mass shifts of different types together, so if both Ammonium 2 and Na1H-1 1 are specified, the algorithm will only search for 0, 1, or 2 Ammonium shifts and 0 or 1 Na1H-1 shifts. This is in order to keep the search space tractable, but also in tested datasets, most multiply adducted ion species are low in abundance.