tidyms.chem.FormulaGenerator

class FormulaGenerator(bounds: Dict[str, Tuple[int, int]], max_M: float | None = None)

Generates sum formulas based on exact mass values.

Attributes:
n_results: int

Number of valid formulas generated.

results: dict

a mapping of nominal masses of the results to a tuple of three arrays: 1. the row index of positive coefficients. 2. the row index of negative coefficients. 3. the number of 12C in the formula.

Methods

generate_formulas(M, tolerance[, ...])

Computes formulas compatibles with the given query mass.

results_to_array()

Convert results to an array of coefficients.

from_hmdb(mass[, bounds])

Creates a FormulaGenerator using elemental bounds obtained from molecules present in the Human Metabolome database.

FormulaGenerator constructor.

Parameters:
bounds: Dict

A dictionary from strings with isotopes to lower and upper bounds of formulas coefficients. Isotope strings can be an element symbol (eg: “C”) or an isotope string representation (eg: “13C”). In the first case, the element is converted to the most abundant isotope (“12C”).

max_Mfloat or None, default=None

Maximum mass value for generated formulas. If specified it is used to update the bounds. For examples is max_M=300 and the bounds for 32S are (0, 10), then they are updated to (0, 9).

Examples

>>> import tidyms as ms
>>> fg_bounds = {"C": (0, 5), "H": (0, 10), "O": (0, 4)}
>>> fg = ms.chem.FormulaGenerator(fg_bounds)
static from_hmdb(mass: int, bounds: Dict[str, Tuple[int, int]] | None = None)

Creates a FormulaGenerator using elemental bounds obtained from molecules present in the Human Metabolome database. By default, bounds for CHNOPS elements are included.

Parameters:
mass{500, 1000, 1500, 2000}

Bounds are created using molecules with molecular mass lower than this value.

bounds: Dict[str, Tuple[int, int]] or None, default=None

Passes additional isotopes to the generator.

Returns:
FormulaGenerator

See also

get_chnops_bounds

Examples

>>> import tidyms as ms
# creates a formula generator using a max mass of 500.
# Also include chlorine to the bounds.
>>> fg = ms.chem.FormulaGenerator.from_hmdb(500, bounds={"Cl": (0, 2)})
generate_formulas(M: float, tolerance: float, min_defect: float | None = None, max_defect: float | None = None)

Computes formulas compatibles with the given query mass. The formulas are computed assuming neutral species. If charged species are used, mass values must be corrected using the electron mass.

Results are stored in an internal format, use results_to_array to obtain the compatible formulas.

Parameters:
Mfloat

Exact mass used for formula generation.

tolerancefloat

Tolerance to search compatible formulas.

min_defect: float or None, default=None

Minimum mass defect allowed for the results. If None, all values are allowed.

max_defect: float or None, default=None

Maximum mass defect allowed for the results. If None, all values are allowed.

Examples

>>> import tidyms as ms
>>> fg_bounds = {"C": (0, 5), "H": (0, 10), "O": (0, 4)}
>>> fg = ms.chem.FormulaGenerator(fg_bounds)
>>> fg.generate_formulas(46.042, 0.005)
results_to_array() Tuple[ndarray, List[Isotope], ndarray]

Convert results to an array of coefficients.

Returns:
coefficients: np.array

Formula coefficients. Each row is a formula, each column is an isotope.

isotopes: list[Isotopes]

Isotopes associated to each column of coefficients.

M: array

Exact mass associated to each row of coefficients.

Examples

>>> import tidyms as ms
>>> fg_bounds = {"C": (0, 5), "H": (0, 10), "O": (0, 4)}
>>> fg = ms.chem.FormulaGenerator(fg_bounds)
>>> fg.generate_formulas(46.042, 0.005)
>>> coeff, isotopes, M = fg.results_to_array()