Working with raw data

TidyMS works with raw data in the mzML format using the MSData class. In this section we show commons operations on raw data. For file conversion to the mzML format see this guide

For the examples we will use an example mzML file that can be downloaded with the following code:

import numpy as np
import tidyms as ms

filename = "NZ_20200227_039.mzML"
dataset = "test-nist-raw-data"
ms.fileio.download_tidyms_data(dataset, [filename], download_dir=".")

Raw data

Raw MS data in the mzML format can be read through the MSData object.

ms_data = ms.MSData.create_MSData_instance(
    filename,
    ms_mode="centroid",
    instrument="qtof",
    separation="uplc"
)

It is necessary to specify if the data is in centroid or profile mode using the ms_mode parameter, as some methods work in different ways for each type of data. Specifying the instrument and separation is also recommended, as these parameters set reasonable defaults in different functions used.

MSData is optimized for low memory usage and only loads the required data into memory. A single MS spectrum can be loaded using get_spectrum() which returns a MSSpectrum.

index = 20
sp = ms_data.get_spectrum(index)

The index used is the order in which the data was stored in the file. In the same way, a stored chromatogram can be retrieved using get_chromatogram(). The total count of spectra and chromatograms in the file can be obtained using tidyms.MSData.get_n_spectra() and tidyms.MSData.get_n_chromatograms() respectively. Iterating over all the spectra in a file can be done using get_spectra_iterator(), which generates each one of the spectra in the file and allows filtering by acquisition time or MS level. Common operations with raw data are located in tidyms.raw_data_utils.

Working with Mass Spectra

MSSpectrum stores the information from one scan. It is mostly used as a data storage class in several data processing steps, but it also has functionality to visualize the spectrum using the plot() method and to convert a profile data spectrum into centroid mode using tidyms.MSSpectrum.find_centroids().

tidyms.raw_data_utils.accumulate_spectra() combines a series of scans in a file into a single spectrum:

combined_sp = ms.accumulate_spectra(ms_data, start_time=110, end_time=115)

Chromatograms

Besides the chromatograms stored in a file, extracted chromatograms can be created tidyms.raw_data_utils.make_chromatograms() which takes an array of m/z and returns a list tidyms.Chromatogram objects, each one associated to one of the m/z values provided:

mz_list = np.array([189.0734, 205.0967, 188.071])
chromatograms = ms.make_chromatograms(ms_data, mz_list)

A chromatogram can be visualized using plot method:

chrom = chromatograms[0]
chrom.plot()

Peaks in a chromatogram are detected using tidyms.lcms.LCRoi.extract_features(), which stores a list of tidyms.lcms.Peak objects in the features attribute of the chromatogram. Plotting again the chromatogram shows the detected peaks:

chrom.extract_features()
chrom.plot()

Peak descriptors can be obtained using tidyms.lcms.Roi.describe_features():

>>> chrom.describe_features()
[{'height': 16572.38, 'area': 108529.94, 'rt': 125.73, 'width': 14.06,
  'snr': 385.44, 'mz': None, 'mz_std': None}]

A detailed description of the algorithm used for peak picking can be found here. These methods are also used to create a data matrix from a dataset. See here a tutorial on how to work with complete datasets to extract a data matrix.