Chemical data utilities

The chem module contains utilities to work with chemical data such as isotopes, elements and formulas. Also, it contain utilities to generate formulas from exact mass, score isotopic envelopes and search isotopic envelope candidates from a list of m/z values.

Searching chemical data

PeriodicTable() contains element and isotope information. The get_element method returns a Element

>>> import tidyms as ms
>>> ptable = ms.chem.PeriodicTable()
>>> oxygen = ptable.get_element("O")
>>> oxygen
Element(O)

Element information can be retrieved easily:

>>> oxygen.z
8
>>> oxygen.symbol
"O"
>>> oxygen.isotopes
{16: Isotope(16O), 17: Isotope(17O), 18: Isotope(18O)}
>>> oxygen.get_monoisotope()
Isotope(16O)
>>> oxygen.get_abundances()
(array([16, 17, 18]),
 array([15.99491462, 16.9991317 , 17.999161  ]),
 array([9.9757e-01, 3.8000e-04, 2.0500e-03]))

Isotope store exact mass, nominal mass and abundance of each isotope:

>>> o16 = oxygen.get_monoisotope()
>>> o16.m
15.99491462
>>> o16.a
16
>>> o16.p
0.99757

Working with chemical formulas

Chemical formulas can be created with the Formula object:

>>> water = ms.chem.Formula("H2O")
>>> water
Formula(H2O)

Formula objects can be used to compute a formula mass and its isotopic envelope:

>>> water.get_exact_mass()
18.010564684
>>> M, p = water.get_isotopic_envelope()
>>> M
array([18.01056468, 19.01555724, 20.01481138, 21.02108788])
>>> p
array([9.97340572e-01, 6.09327319e-04, 2.04962911e-03, 4.71450803e-07]))

Formulas can be created by passing a dictionary of element or isotopes to a formula coefficient and the numerical charge of the formula. Formulas are implemented as dictionaries of isotopes to formula coefficients, so if an element is passed, it is assumed that it is the most abundant isotope.

>>> f = ms.chem.Formula({"C": 1, "13C": 1, "O": 4}, 0)
>>> f
Formula(C(13C)O4)

Isotopes can also be specified in the string format:

>>> f = ms.chem.Formula("[C(13C)2H2O4]2-")
Formula([C(13C)2H2O4]2-)
>>> f.charge
-2

Sum formula generation

The FormulaGenerator generates sum formulas from a mass value. To generate formulas, the space of formula must be defined by using and passed to the formula generator constructor:

>>> bounds = {"C": (0, 20), "H": (0, 40), "O": (0, 10), "N": (0, 5)}
>>> formula_generator = ms.chem.FormulaGenerator(bounds)

To generate formulas, an exact mass value must be passed, along with a tolerance to find compatible formulas.

>>> f = ms.chem.Formula("C5H10O2")
>>> M = f.get_exact_mass()  # Mass value to generate formulas
>>> tolerance = 0.005
>>> formula_generator.generate_formulas(M, tolerance)
>>> coefficients, isotopes, M_coeff = formula_generator.results_to_array()
>>> coefficients
array([[ 0, 10,  2,  4],
       [ 3,  8,  3,  1],
       [ 5, 10,  0,  2]])
>>> isotopes
[Isotope(12C), Isotope(1H), Isotope(14N), Isotope(16O)]

Coefficients is a 2D Numpy array where each row are coefficients of valid formulas and each column is an isotope.

Formula generator objects can be created easily by using the static method from_hmdb(), which generates reasonable coefficients spaces for the CHNOPS elements by finding the maximum coefficients in compounds from the Human Metabolome DataBase:

m = 1000
formula_generator = ms.chem.FormulaGenerator.from_hmdb(m)

m defines the maximum mass of the compounds included to create the coefficient space. m can take values of 500, 1000, 1500 and 2000. Other element can be added as follows =

m = 1000
bounds = {"Cl": (0, 2)
formula_generator = ms.chem.FormulaGenerator.from_hmdb(m, bounds=bounds)

Scoring Isotopic envelopes

Scoring measured envelopes against theoretical values is a common strategy to establish a formula candidate for an unknown compound. The EnvelopeScorer uses the formulas generated by a formula generator and scores them using a measure of similarity between the measured and theoretical envelopes:

>>> bounds = {"C": (0, 20), "H": (0, 40), "O": (0, 10), "N": (0, 5)}
>>> fg = ms.chem.FormulaGenerator(bounds)
>>> envelope_scorer = ms.chem.EnvelopeScorer(fg, scorer="qtof", max_length=10)

The max_length parameter sets the maximum length of the measured envelopes to compare against theoretical values. The scorer parameter can be qtof, orbitrap or a callable that implements a custom scorer. In the first two cases, default parameters are set for values measured in Q-TOF or Orbitrap instruments. The score method takes a list of exact mass and abundances of an envelope and scores against all compatible formulas. See the API for a detailed description on how to customize the scorer function. The results can be obtained with the tidyms.chem.EnvelopeScorer.get_top_results() method:

>>> import numpy as np
>>> f = ms.chem.Formula("C5H10O2")
>>> M, p = f.get_isotopic_envelope(4)  # Get first four peaks from the envelope
>>> tolerance = 0.005
>>> envelope_scorer.score(M, p, tolerance)
>>> coefficients, isotopes, score = envelope_scorer.get_top_results()
>>> coefficients[np.argmax(score)]
array([ 5, 10,  0,  2])