Envelope Scoring¶
The ms_deisotope.scoring
module contains classes for evaluating the goodness-of-fit
of isotopic pattern matches. It is used by deconvoluters defined in ms_deisotope.deconvolution
to decide which pattern fit is best. Each instance of IsotopicFitterBase
takes a score
threshold, a float
, and additional configuration arguments. The score threshold is used
to filter out noise matches.
When the isotopic patterns of the target data are of high quality and intensity can be used
to effectively filter out noise peaks, PenalizedMSDeconVFitter
often works best. If
the isotopic patterns are not high quality, but intensity can be used to discriminate noise,
MSDeconVFitter
is more forgiving. If intensity is not important, one of either
ScaledGTestFitter
or LeastSquaresFitter
may work.
Scoring Functions¶
- class ms_deisotope.scoring.IsotopicFitterBase(score_threshold=0.5)[source]¶
A base class for Isotopic Pattern Fitters, objects which given a set of experimental peaks and a set of matching theoretical peaks, returns a fit score describing how good the match is.
An IsotopicFitter may be optimal when the score is small (minimizing) or when the score is large (maximizing), and the appropriate
FitSelectorBase
type will be used forselect
. This will also be reflected byis_maximizing()
.
- __call__(*args, **kwargs)[source]¶
Invokes
evaluate()
- Parameters
*args – Forwarded to
evaluate()
**kwargs – Forwarded to
evaluate()
- Returns
The score
- Return type
float
- evaluate(peaklist, observed, expected, **kwargs)[source]¶
Evaluate a pair of peak lists for goodness-of-fit.
- Parameters
peaklist (
PeakSet
) – The full set of all experimental peaksobserved (list) – The list of experimental peaks that are part of this fit
expected (list) – The list of theoretical peaks that are part of this fit
**kwargs –
- Returns
The score
- Return type
float
- is_maximizing()[source]¶
Whether or not this fitter’s score gets better as it grows
- Returns
Whether or not this fitter is a maximizing fitter
- Return type
bool
- reject(fit)[source]¶
Test whether this fit is too poor to be used
- Parameters
fit (
IsotopicFitRecord
) – The fit to test- Returns
- Return type
bool
Minimizing Fitters¶
Minimizing envelope scoring methods aim to minimize some “fit error” criterion, and discard solutions which don’t fit well. They tend to be quite sensitive and can be independent of the magnitude of the signal. This also means they do not handle detector noise or interference well. These methods can work well when targeting a list of compositions instead of exhaustively deconvolving an entire spectrum.
- class ms_deisotope.scoring.GTestFitter(score_threshold=0.5)[source]¶
Evaluate an isotopic fit using a G-test
\[G = 2\sum_i^n{o_i * ({log}o_i - {log}e_i)}\]where \(o_i\) is the intensity of the ith experimental peak and \(e_i\) is the intensity of the ith theoretical peak.
- class ms_deisotope.scoring.ScaledGTestFitter¶
Evaluate an isotopic fit using a G-test after normalizing the list of experimental and theoretical peaks to both sum to 1.
\[G = 2\sum_i^n{o_i * ({log}o_i - {log}e_i)}\]where \(o_i\) is the intensity of the ith experimental peak and \(e_i\) is the intensity of the ith theoretical peak.
- class ms_deisotope.scoring.LeastSquaresFitter¶
Evaluate an isotopic fit using least squares coefficient of determination \(R^2\).
\[ \begin{align}\begin{aligned}{\hat e_i} &= e_i / max(e)\\{\hat t_i} &= t_i / max(t)\\{\hat t} &= \sum_i^n {\hat t_i}^2\\R^2 &= \frac{1}{{\hat t}}\sum_i^n ({\hat e_i} - {\hat t_i})^2\end{aligned}\end{align} \]where \(e_i\) is the ith experimental peak intensity and \(t_i\) is the ith theoretical peak intensity
Maximizing Fitters¶
Maximizing envelope scoring methods aim to maximize some “goodness-of-fit” criterion, and discards solutions that don’t score highly enough. While not universally true, many of the maximizing scoring functions here are a function of the magnitude of the signal, which means that the threshold selected is signal magnitude dependent. This has the advantage of making the score threshold also a detector noise filter, but the threshold would now be instrument type-dependent.
A maximizing fitter with a well-chosen threshold is good at exhaustively deconvoluting a spectrum because it can more easily
eliminate bad fits, but as isotopic pattern becomes more homogenous around the base peak of the isotopic pattern, they can more
easily make mistakes identifying the monoisotopic peak. This is where MSDeconVFitter
is less accurate than PenalizedMSDeconVFitter
,
albeit neither is perfect.
- class ms_deisotope.scoring.MSDeconVFitter(minimum_score=10, mass_error_tolerance=0.02)¶
An implementation of the scoring function used in MSDeconV
\begin{split} s_{mz}(e, t) &= \begin{cases} 1 - \frac{\|mz(e) - mz(t)\|}{d} & \text{if } \|mz(e) - mz(t)\| < d,\\ 0 & \text{otherwise} \end{cases}\\ s_{int}(e, t) &= \begin{cases} 1 - \frac{int(t) - int(e)}{int(e)} & \text{if } int(e) < int(t) \text{ and } \frac{int(t) - int(e)}{int(e)} \le 1, \\ \sqrt{1 - \frac{int(e) - int(t)}{int(t)}} & \text{if } int(e) \ge int(t) \text{ and } \frac{int(e) - int(t)}{int(t)} \le 1,\\ 0 & \text{otherwise} \end{cases}\\ \text{S}(e, t) &= \sqrt{int(t)}\times s_{mz}(e,t) \times s_{int}(e, t) \end{split}References
Liu, X., Inbar, Y., Dorrestein, P. C., Wynne, C., Edwards, N., Souda, P., … Pevzner, P. A. (2010). Deconvolution and database search of complex tandem mass spectra of intact proteins: a combinatorial approach. Molecular & Cellular Proteomics : MCP, 9(12), 2772–2782. https://doi.org/10.1074/mcp.M110.002766
- class ms_deisotope.scoring.PenalizedMSDeconVFitter(minimum_score=10, penalty_factor=1, mass_error_tolerance=0.02)¶
An Isotopic Fitter which uses the
MSDeconVFitter
score weighted by 1 -penalty_factor
*ScaledGTestFitter
score\[S(e, t) = M(e, t) * (1 - G(e, t))\]where \(e\) is the experimental peak list and \(t\) is the theoretical peak list
Other Fitters¶
- class ms_deisotope.scoring.FunctionScorer(function, minimum_score=10, selector_type=MaximizeFitSelector)¶
Use a user-provided Python function or callable object to evaluate an isotopic envelope fit.
The provided function should take two arguments, a list of experimental fitted peaks and a list of theoretical peaks, and return a single number as an output.
Make sure to pass the right thresholds and selector types when creating an instance of this class that match the semantics of the provided function.
While this type is available for convenience, there is considerable overhead in using it compared to one of the C accelerated fitters.
Support Structures¶
- class ms_deisotope.scoring.IsotopicFitRecord(FittedPeak seed_peak, double score, int charge, TheoreticalIsotopicPattern theoretical, list experimental, data=None, int missed_peaks=0)¶
Describes a single isotopic pattern fit, comparing how well an experimentally observed sequence of peaks matches a theoretical isotopic pattern.
IsotopicFitRecord instances are hashable and orderable (by score).
- charge¶
The charge state used to generate the theoretical pattern
- Type
int
- data¶
An arbitrary Python object containing extra information
- Type
object
- experimental¶
The observed experimental peaks to be fitted
- Type
list of FittedPeak
- missed_peaks¶
The number of peaks in the theoretical pattern that do not have a matching experimental peak
- Type
int
- monoisotopic_peak¶
The fitted peak which corresponds to the monoisotopic peak
- Type
FittedPeak
- score¶
The score assigned to the fit by an IsotopicFitter object
- Type
float
- seed_peak¶
The peak that was used to initiate the fit. This may be unused if not using an Averagine method
- Type
FittedPeak
- theoretical¶
The theoretical isotopic pattern being fitted on the experimental data
- class ms_deisotope.scoring.FitSelectorBase(minimum_score=0)[source]¶
An object that controls the filtering and selection of IsotopicFitRecord
- minimum_score¶
The minimum score needed to be a candidate for selection. If the FitSelector is minimizing it is the maximal score to be a candidate.
- Type
int