tidyms.chem.envelope_tools¶

Scores sum formula candidates using the isotopic envelope.

class EnvelopeScorer(bounds: Dict[str, Tuple[int, int]], max_M: float | None = None, max_length: int = 10, scorer: Callable | None = None, custom_abundances: dict | None = None, **kwargs)¶

Ranks formula candidates by comparing a measured isotopic envelope against the theoretical envelopes of candidates.

Methods

`score`(M, p, tol)	Scores the isotopic envelope.
`get_top_results`([n])	Return the top ranked formula candidates and their score.

Constructor method.

Parameters:

formula_generatorFormulaGenerator

scorerCallable or None, default=None

Function used to score formula candidate envelopes. If None, the function score_envelope() is used. A custom scoring function can be passed with the following signature:

def score(M, p, Mq, pq, **kwargs):
    pass

where M and p are arrays of the formula candidates exact mass and abundances and Mq and pq are the query mass and query abundance.

max_lengthint, 10

Length of the generated envelopes.

custom_abundancesdict, optional

Overrides natural abundances of elements.A mapping from element symbols str to an abundance array. The abundance array must have the same size that the natural abundance and its sum must be equal to one. For example, for “C”, an alternative abundance can be array([0.15, 0.85]) for isotopes with nominal mass 12 and 13.

Other Parameters:

kwargs: Optional parameter to pass into the scoring function.

get_top_results(n: int | None = 10)¶

Return the top ranked formula candidates and their score.

Parameters:

n: int or None, default=10: number of first n results to return. If None, return all formula candidates.

Returns:

coefficientsarray: Formula coefficients. Each row is a formula candidate, each column is an element.
elementsarray: The corresponding element to each column of coefficients.
scoresarray: The corresponding score to each row of coefficients.

score(M: ndarray, p: ndarray, tol: float)¶

Scores the isotopic envelope. The results can be recovered using the get_top_results method.

Formulas are generated assuming that the first element in the envelope is the minimum mass isotopologue.

Parameters:

Marray: Exact mass of the envelope.
parray: Abundance of the envelope.
tolfloat: Mass tolerance used in formula generation.

class EnvelopeValidator(bounds: Dict[str, Tuple[int, int]], max_M: float | None = None, max_length: int = 10, p_tol: float = 0.05, min_M_tol: float = 0.01, max_M_tol: float = 0.01, custom_abundances: dict | None = None)¶

Parameters:

max_lengthint, default=10: Maximum length of the envelopes.
min_M_tolfloat or None, default=None: Exact mass tolerance for high abundance isotopologues. If None, the parameter is set based on the mode value. See the notes for an explanation of how this value is used.
max_M_tolfloat or None, default=None: Exact mass tolerance for low abundance isotopologues. If None, the parameter is set based on the mode value. See the notes for an explanation of how this value is used.
p_tolfloat or None, default=None: tolerance threshold to include in the abundance results
custom_abundancesdict, optional: Provides custom elemental abundances. A mapping from element symbols str to an abundance array. The abundance array must have the same size that the natural abundance and its sum must be equal to one. For example, for “C”, an alternative abundance can be array([0.15, 0.85]) for isotopes with nominal mass 12 and 13.

Notes

Envelope validation is performed as follows:

For a query envelope mass and abundance Mq`and `pq, all formulas compatibles with the MMI are computed (see FormulaGenerator).
For each i-th pair of Mq and pq, a mass tolerance and abundance tolerance is defined as follows:

\[dM_{i} = dM^{\textrm{max}} * pq_{i} + dM^{\textrm{min}} (1 - pq_{i})\]

Where \(dM^{\textrm{max}}\) is min_M_tol, \(dM^{\textrm{min}}\) is max_M_tol and \(pq_{i}\) is the i-th query abundance. Using the mass tolerance and abundance tolerance, candidates with mass or abundance values outside this interval are removed.
The candidates that remains define a mass and abundance window for the i + 1 elements of Mq and pq. If the values fall inside the window, the i + 1 elements are validated and the procedure is repeated until all isotopologues are validated or until an invalid isotopologue is found.

make_formula_coefficients_envelopes(bounds: Dict[str, Tuple[int, int]], coefficients: FormulaCoefficients, max_length: int, p: Dict[str, ndarray] | None = None)¶: Computes the isotopic envelopes for coefficient formulas.

score_envelope(M: ndarray, p: ndarray, Mq: ndarray, pq: ndarray, min_sigma_M: float = 0.01, max_sigma_M: float = 0.01, min_sigma_p: float = 0.05, max_sigma_p: float = 0.05)¶

Scores the similarity between two isotopes. Parameters ———- M : array

Theoretical mass values.

parray: Theoretical abundances.
Mqarray: Query Mass values
pqarray: Query abundances.
min_sigma_Mfloat: Minimum mass standard deviation
max_sigma_Mfloat: Maximum mass standard deviation
min_sigma_pfloat: Minimum abundance standard deviation.
max_sigma_pfloat: Maximum abundance standard deviation.

Returns:

scorefloat: Number between 0 and 1. Higher values are related with similar envelopes.

Notes

The query envelope is compared against the theoretical envelope assuming a likelihood approach, similar to the described in [Rdacaac115aab-1]. It is assumed that the theoretical mass and abundance is a normal random variable, with mean values defined by M and p and standard deviation computed as follows:

\[\sigma_{M,i} = p_{i} \sigma_{M}^{\textrm{max}} + (1 - p_{i}) \sigma_{M}^{\textrm{min}}\]

Where \(\sigma_{M,i}\) is the standard deviation for the i-th element of M, \(p_{i}\) is the i-th element of p, \(\sigma_{M}^{\textrm{max}}\) is max_sigma_M and \(\sigma_{M}^{\textrm{min}}\) is min_sigma_M. An analogous computation is done to compute the standard deviation for each abundance. Using this values, the likelihood of generating the values Mq and pq from M and p is computed using the error function.

References