Chemical data utilities¶
The chem module contains utilities to work with chemical data such as isotopes, elements and formulas. Also, it contain utilities to generate formulas from exact mass, score isotopic envelopes and search isotopic envelope candidates from a list of m/z values.
Searching chemical data¶
PeriodicTable() contains element and isotope information.
The get_element method returns a Element
>>> import tidyms as ms
>>> ptable = ms.chem.PeriodicTable()
>>> oxygen = ptable.get_element("O")
>>> oxygen
Element(O)
Element information can be retrieved easily:
>>> oxygen.z
8
>>> oxygen.symbol
"O"
>>> oxygen.isotopes
{16: Isotope(16O), 17: Isotope(17O), 18: Isotope(18O)}
>>> oxygen.get_monoisotope()
Isotope(16O)
>>> oxygen.get_abundances()
(array([16, 17, 18]),
array([15.99491462, 16.9991317 , 17.999161 ]),
array([9.9757e-01, 3.8000e-04, 2.0500e-03]))
Isotope store exact mass, nominal mass and abundance
of each isotope:
>>> o16 = oxygen.get_monoisotope()
>>> o16.m
15.99491462
>>> o16.a
16
>>> o16.p
0.99757
Working with chemical formulas¶
Chemical formulas can be created with the Formula object:
>>> water = ms.chem.Formula("H2O")
>>> water
Formula(H2O)
Formula objects can be used to compute a formula mass and its isotopic envelope:
>>> water.get_exact_mass()
18.010564684
>>> M, p = water.get_isotopic_envelope()
>>> M
array([18.01056468, 19.01555724, 20.01481138, 21.02108788])
>>> p
array([9.97340572e-01, 6.09327319e-04, 2.04962911e-03, 4.71450803e-07]))
Formulas can be created by passing a dictionary of element or isotopes to a formula coefficient and the numerical charge of the formula. Formulas are implemented as dictionaries of isotopes to formula coefficients, so if an element is passed, it is assumed that it is the most abundant isotope.
>>> f = ms.chem.Formula({"C": 1, "13C": 1, "O": 4}, 0)
>>> f
Formula(C(13C)O4)
Isotopes can also be specified in the string format:
>>> f = ms.chem.Formula("[C(13C)2H2O4]2-")
Formula([C(13C)2H2O4]2-)
>>> f.charge
-2
Sum formula generation¶
The FormulaGenerator generates sum formulas from a mass
value. To generate formulas, the space of formula must be defined by using
and passed to the formula generator constructor:
>>> bounds = {"C": (0, 20), "H": (0, 40), "O": (0, 10), "N": (0, 5)}
>>> formula_generator = ms.chem.FormulaGenerator(bounds)
To generate formulas, an exact mass value must be passed, along with a tolerance to find compatible formulas.
>>> f = ms.chem.Formula("C5H10O2")
>>> M = f.get_exact_mass() # Mass value to generate formulas
>>> tolerance = 0.005
>>> formula_generator.generate_formulas(M, tolerance)
>>> coefficients, isotopes, M_coeff = formula_generator.results_to_array()
>>> coefficients
array([[ 0, 10, 2, 4],
[ 3, 8, 3, 1],
[ 5, 10, 0, 2]])
>>> isotopes
[Isotope(12C), Isotope(1H), Isotope(14N), Isotope(16O)]
Coefficients is a 2D Numpy array where each row are coefficients of valid formulas and each column is an isotope.
Formula generator objects can be created easily by using the static method
from_hmdb(), which generates reasonable
coefficients spaces for the CHNOPS elements by finding the maximum coefficients
in compounds from the Human Metabolome DataBase:
m = 1000
formula_generator = ms.chem.FormulaGenerator.from_hmdb(m)
m defines the maximum mass of the compounds included to create the coefficient
space. m can take values of 500, 1000, 1500 and 2000. Other element can be
added as follows =
m = 1000
bounds = {"Cl": (0, 2)
formula_generator = ms.chem.FormulaGenerator.from_hmdb(m, bounds=bounds)
Scoring Isotopic envelopes¶
Scoring measured envelopes against theoretical values is a common strategy
to establish a formula candidate for an unknown compound. The
EnvelopeScorer uses the formulas generated by a formula
generator and scores them using a measure of similarity between the measured and
theoretical envelopes:
>>> bounds = {"C": (0, 20), "H": (0, 40), "O": (0, 10), "N": (0, 5)}
>>> fg = ms.chem.FormulaGenerator(bounds)
>>> envelope_scorer = ms.chem.EnvelopeScorer(fg, scorer="qtof", max_length=10)
The max_length parameter sets the maximum length of the measured envelopes to
compare against theoretical values. The scorer parameter can be qtof,
orbitrap or a callable that implements a custom scorer. In the first two
cases, default parameters are set for values measured in Q-TOF or Orbitrap
instruments. The score method takes a list of exact mass and abundances of an
envelope and scores against all compatible formulas. See the API for a detailed
description on how to customize the scorer function. The results can be obtained
with the tidyms.chem.EnvelopeScorer.get_top_results() method:
>>> import numpy as np
>>> f = ms.chem.Formula("C5H10O2")
>>> M, p = f.get_isotopic_envelope(4) # Get first four peaks from the envelope
>>> tolerance = 0.005
>>> envelope_scorer.score(M, p, tolerance)
>>> coefficients, isotopes, score = envelope_scorer.get_top_results()
>>> coefficients[np.argmax(score)]
array([ 5, 10, 0, 2])