tidyms.chem.envelope_tools¶
Scores sum formula candidates using the isotopic envelope.
- class EnvelopeScorer(bounds: Dict[str, Tuple[int, int]], max_M: float | None = None, max_length: int = 10, scorer: Callable | None = None, custom_abundances: dict | None = None, **kwargs)¶
Ranks formula candidates by comparing a measured isotopic envelope against the theoretical envelopes of candidates.
Methods
score(M, p, tol)Scores the isotopic envelope.
get_top_results([n])Return the top ranked formula candidates and their score.
Constructor method.
- Parameters:
- formula_generatorFormulaGenerator
- scorerCallable or None, default=None
Function used to score formula candidate envelopes. If
None, the functionscore_envelope()is used. A custom scoring function can be passed with the following signature:def score(M, p, Mq, pq, **kwargs): pass
where M and p are arrays of the formula candidates exact mass and abundances and Mq and pq are the query mass and query abundance.
- max_lengthint, 10
Length of the generated envelopes.
- custom_abundancesdict, optional
Overrides natural abundances of elements.A mapping from element symbols str to an abundance array. The abundance array must have the same size that the natural abundance and its sum must be equal to one. For example, for “C”, an alternative abundance can be array([0.15, 0.85]) for isotopes with nominal mass 12 and 13.
- Other Parameters:
- kwargs
Optional parameter to pass into the scoring function.
- get_top_results(n: int | None = 10)¶
Return the top ranked formula candidates and their score.
- Parameters:
- n: int or None, default=10
number of first n results to return. If
None, return all formula candidates.
- Returns:
- coefficientsarray
Formula coefficients. Each row is a formula candidate, each column is an element.
- elementsarray
The corresponding element to each column of coefficients.
- scoresarray
The corresponding score to each row of coefficients.
- score(M: ndarray, p: ndarray, tol: float)¶
Scores the isotopic envelope. The results can be recovered using the get_top_results method.
Formulas are generated assuming that the first element in the envelope is the minimum mass isotopologue.
- Parameters:
- Marray
Exact mass of the envelope.
- parray
Abundance of the envelope.
- tolfloat
Mass tolerance used in formula generation.
- class EnvelopeValidator(bounds: Dict[str, Tuple[int, int]], max_M: float | None = None, max_length: int = 10, p_tol: float = 0.05, min_M_tol: float = 0.01, max_M_tol: float = 0.01, custom_abundances: dict | None = None)¶
- Parameters:
- max_lengthint, default=10
Maximum length of the envelopes.
- min_M_tolfloat or None, default=None
Exact mass tolerance for high abundance isotopologues. If
None, the parameter is set based on the mode value. See the notes for an explanation of how this value is used.- max_M_tolfloat or None, default=None
Exact mass tolerance for low abundance isotopologues. If
None, the parameter is set based on the mode value. See the notes for an explanation of how this value is used.- p_tolfloat or None, default=None
tolerance threshold to include in the abundance results
- custom_abundancesdict, optional
Provides custom elemental abundances. A mapping from element symbols str to an abundance array. The abundance array must have the same size that the natural abundance and its sum must be equal to one. For example, for “C”, an alternative abundance can be array([0.15, 0.85]) for isotopes with nominal mass 12 and 13.
Notes
Envelope validation is performed as follows:
For a query envelope mass and abundance Mq`and `pq, all formulas compatibles with the MMI are computed (see FormulaGenerator).
For each i-th pair of Mq and pq, a mass tolerance and abundance tolerance is defined as follows:
\[dM_{i} = dM^{\textrm{max}} * pq_{i} + dM^{\textrm{min}} (1 - pq_{i})\]Where \(dM^{\textrm{max}}\) is min_M_tol, \(dM^{\textrm{min}}\) is max_M_tol and \(pq_{i}\) is the i-th query abundance. Using the mass tolerance and abundance tolerance, candidates with mass or abundance values outside this interval are removed.
The candidates that remains define a mass and abundance window for the i + 1 elements of Mq and pq. If the values fall inside the window, the i + 1 elements are validated and the procedure is repeated until all isotopologues are validated or until an invalid isotopologue is found.
- make_formula_coefficients_envelopes(bounds: Dict[str, Tuple[int, int]], coefficients: FormulaCoefficients, max_length: int, p: Dict[str, ndarray] | None = None)¶
Computes the isotopic envelopes for coefficient formulas.
- score_envelope(M: ndarray, p: ndarray, Mq: ndarray, pq: ndarray, min_sigma_M: float = 0.01, max_sigma_M: float = 0.01, min_sigma_p: float = 0.05, max_sigma_p: float = 0.05)¶
Scores the similarity between two isotopes. Parameters ———- M : array
Theoretical mass values.
- parray
Theoretical abundances.
- Mqarray
Query Mass values
- pqarray
Query abundances.
- min_sigma_Mfloat
Minimum mass standard deviation
- max_sigma_Mfloat
Maximum mass standard deviation
- min_sigma_pfloat
Minimum abundance standard deviation.
- max_sigma_pfloat
Maximum abundance standard deviation.
- Returns:
- scorefloat
Number between 0 and 1. Higher values are related with similar envelopes.
Notes
The query envelope is compared against the theoretical envelope assuming a likelihood approach, similar to the described in [Rdacaac115aab-1]. It is assumed that the theoretical mass and abundance is a normal random variable, with mean values defined by M and p and standard deviation computed as follows:
\[\sigma_{M,i} = p_{i} \sigma_{M}^{\textrm{max}} + (1 - p_{i}) \sigma_{M}^{\textrm{min}}\]Where \(\sigma_{M,i}\) is the standard deviation for the i-th element of M, \(p_{i}\) is the i-th element of p, \(\sigma_{M}^{\textrm{max}}\) is max_sigma_M and \(\sigma_{M}^{\textrm{min}}\) is min_sigma_M. An analogous computation is done to compute the standard deviation for each abundance. Using this values, the likelihood of generating the values Mq and pq from M and p is computed using the error function.
References