Envelope Scoring¶

The ms_deisotope.scoring module contains classes for evaluating the goodness-of-fit of isotopic pattern matches. It is used by deconvoluters defined in ms_deisotope.deconvolution to decide which pattern fit is best. Each instance of IsotopicFitterBase takes a score threshold, a float, and additional configuration arguments. The score threshold is used to filter out noise matches.

When the isotopic patterns of the target data are of high quality and intensity can be used to effectively filter out noise peaks, PenalizedMSDeconVFitter often works best. If the isotopic patterns are not high quality, but intensity can be used to discriminate noise, MSDeconVFitter is more forgiving. If intensity is not important, one of either ScaledGTestFitter or LeastSquaresFitter may work.

Scoring Functions¶

class ms_deisotope.scoring.IsotopicFitterBase(score_threshold=0.5)[source]¶

A base class for Isotopic Pattern Fitters, objects which given a set of experimental peaks and a set of matching theoretical peaks, returns a fit score describing how good the match is.

An IsotopicFitter may be optimal when the score is small (minimizing) or when the score is large (maximizing), and the appropriate FitSelectorBase type will be used for select. This will also be reflected by is_maximizing().

__call__(*args, **kwargs)[source]¶

Invokes evaluate()

Parameters

*args – Forwarded to evaluate()

**kwargs – Forwarded to evaluate()

Returns

The score

Return type

float

evaluate(peaklist, observed, expected, **kwargs)[source]¶

Evaluate a pair of peak lists for goodness-of-fit.

Parameters

peaklist (PeakSet) – The full set of all experimental peaks

observed (list) – The list of experimental peaks that are part of this fit

expected (list) – The list of theoretical peaks that are part of this fit

**kwargs –

Returns

The score

Return type

float

is_maximizing()[source]¶

Whether or not this fitter’s score gets better as it grows

Returns

Whether or not this fitter is a maximizing fitter

Return type

bool

reject(fit)[source]¶

Test whether this fit is too poor to be used

Parameters

fit (IsotopicFitRecord) – The fit to test

Returns

Return type

bool

Minimizing Fitters¶

Minimizing envelope scoring methods aim to minimize some “fit error” criterion, and discard solutions which don’t fit well. They tend to be quite sensitive and can be independent of the magnitude of the signal. This also means they do not handle detector noise or interference well. These methods can work well when targeting a list of compositions instead of exhaustively deconvolving an entire spectrum.

class ms_deisotope.scoring.GTestFitter(score_threshold=0.5)[source]¶

Evaluate an isotopic fit using a G-test

\[G = 2\sum_i^n{o_i * ({log}o_i - {log}e_i)}\]

where \(o_i\) is the intensity of the ith experimental peak and \(e_i\) is the intensity of the ith theoretical peak.

class ms_deisotope.scoring.ScaledGTestFitter¶

Evaluate an isotopic fit using a G-test after normalizing the list of experimental and theoretical peaks to both sum to 1.

\[G = 2\sum_i^n{o_i * ({log}o_i - {log}e_i)}\]

where \(o_i\) is the intensity of the ith experimental peak and \(e_i\) is the intensity of the ith theoretical peak.

class ms_deisotope.scoring.LeastSquaresFitter¶

Evaluate an isotopic fit using least squares coefficient of determination \(R^2\).

\[ \begin{align}\begin{aligned}{\hat e_i} &= e_i / max(e)\\{\hat t_i} &= t_i / max(t)\\{\hat t} &= \sum_i^n {\hat t_i}^2\\R^2 &= \frac{1}{{\hat t}}\sum_i^n ({\hat e_i} - {\hat t_i})^2\end{aligned}\end{align} \]

where \(e_i\) is the ith experimental peak intensity and \(t_i\) is the ith theoretical peak intensity

Maximizing Fitters¶

Maximizing envelope scoring methods aim to maximize some “goodness-of-fit” criterion, and discards solutions that don’t score highly enough. While not universally true, many of the maximizing scoring functions here are a function of the magnitude of the signal, which means that the threshold selected is signal magnitude dependent. This has the advantage of making the score threshold also a detector noise filter, but the threshold would now be instrument type-dependent.

A maximizing fitter with a well-chosen threshold is good at exhaustively deconvoluting a spectrum because it can more easily eliminate bad fits, but as isotopic pattern becomes more homogenous around the base peak of the isotopic pattern, they can more easily make mistakes identifying the monoisotopic peak. This is where MSDeconVFitter is less accurate than PenalizedMSDeconVFitter, albeit neither is perfect.

class ms_deisotope.scoring.MSDeconVFitter(minimum_score=10, mass_error_tolerance=0.02)¶

An implementation of the scoring function used in MSDeconV

\begin{split} s_{mz}(e, t) &= \begin{cases} 1 - \frac{\|mz(e) - mz(t)\|}{d} & \text{if } \|mz(e) - mz(t)\| < d,\\ 0 & \text{otherwise} \end{cases}\\ s_{int}(e, t) &= \begin{cases} 1 - \frac{int(t) - int(e)}{int(e)} & \text{if } int(e) < int(t) \text{ and } \frac{int(t) - int(e)}{int(e)} \le 1, \\ \sqrt{1 - \frac{int(e) - int(t)}{int(t)}} & \text{if } int(e) \ge int(t) \text{ and } \frac{int(e) - int(t)}{int(t)} \le 1,\\ 0 & \text{otherwise} \end{cases}\\ \text{S}(e, t) &= \sqrt{int(t)}\times s_{mz}(e,t) \times s_{int}(e, t) \end{split}
References

Liu, X., Inbar, Y., Dorrestein, P. C., Wynne, C., Edwards, N., Souda, P., … Pevzner, P. A. (2010). Deconvolution and database search of complex tandem mass spectra of intact proteins: a combinatorial approach. Molecular & Cellular Proteomics : MCP, 9(12), 2772–2782. https://doi.org/10.1074/mcp.M110.002766

class ms_deisotope.scoring.PenalizedMSDeconVFitter(minimum_score=10, penalty_factor=1, mass_error_tolerance=0.02)¶

An Isotopic Fitter which uses the MSDeconVFitter score weighted by 1 - penalty_factor * ScaledGTestFitter score

\[S(e, t) = M(e, t) * (1 - G(e, t))\]

where \(e\) is the experimental peak list and \(t\) is the theoretical peak list

Other Fitters¶

class ms_deisotope.scoring.FunctionScorer(function, minimum_score=10, selector_type=MaximizeFitSelector)¶

Use a user-provided Python function or callable object to evaluate an isotopic envelope fit.

The provided function should take two arguments, a list of experimental fitted peaks and a list of theoretical peaks, and return a single number as an output.

Make sure to pass the right thresholds and selector types when creating an instance of this class that match the semantics of the provided function.

While this type is available for convenience, there is considerable overhead in using it compared to one of the C accelerated fitters.

Support Structures¶

class ms_deisotope.scoring.IsotopicFitRecord(FittedPeak seed_peak, double score, int charge, TheoreticalIsotopicPattern theoretical, list experimental, data=None, int missed_peaks=0)¶

Describes a single isotopic pattern fit, comparing how well an experimentally observed sequence of peaks matches a theoretical isotopic pattern.

IsotopicFitRecord instances are hashable and orderable (by score).

charge¶

The charge state used to generate the theoretical pattern

Type

int

data¶

An arbitrary Python object containing extra information

Type

object

experimental¶

The observed experimental peaks to be fitted

Type

list of FittedPeak

missed_peaks¶

The number of peaks in the theoretical pattern that do not have a matching experimental peak

Type

int

monoisotopic_peak¶

The fitted peak which corresponds to the monoisotopic peak

Type

FittedPeak

score¶

The score assigned to the fit by an IsotopicFitter object

Type

float

seed_peak¶

The peak that was used to initiate the fit. This may be unused if not using an Averagine method

Type

FittedPeak

theoretical¶

The theoretical isotopic pattern being fitted on the experimental data

Type

TheoreticalIsotopicPattern

class ms_deisotope.scoring.FitSelectorBase(minimum_score=0)[source]¶

An object that controls the filtering and selection of IsotopicFitRecord

minimum_score¶

The minimum score needed to be a candidate for selection. If the FitSelector is minimizing it is the maximal score to be a candidate.

Type

int

class ms_deisotope.scoring.MaximizeFitSelector(minimum_score=0)[source]¶

A FitSelector which tries to maximize the score of the best fit.

class ms_deisotope.scoring.MinimizeFitSelector(minimum_score=0)[source]¶

A FitSelector which tries to minimize the score of the best fit.