Working with raw data¶
TidyMS works with raw data in the mzML format using the MSData
class. In this section we show commons operations on raw data. For file
conversion to the mzML format see this guide
For the examples we will use an example mzML file that can be downloaded with the following code:
import numpy as np
import tidyms as ms
filename = "NZ_20200227_039.mzML"
dataset = "test-nist-raw-data"
ms.fileio.download_tidyms_data(dataset, [filename], download_dir=".")
Raw data¶
Raw MS data in the mzML format can be read through the MSData
object.
ms_data = ms.MSData.create_MSData_instance(
filename,
ms_mode="centroid",
instrument="qtof",
separation="uplc"
)
It is necessary to specify if the data is in centroid or profile mode using the
ms_mode parameter, as some methods work in different ways for each
type of data. Specifying the instrument and separation is also
recommended, as these parameters set reasonable defaults in different functions
used.
MSData is optimized for low memory usage and only loads the
required data into memory. A single MS spectrum can be loaded using
get_spectrum() which returns a
MSSpectrum.
index = 20
sp = ms_data.get_spectrum(index)
The index used is the order in which the data was stored in the file. In the
same way, a stored chromatogram can be retrieved using
get_chromatogram(). The total count of spectra and
chromatograms in the file can be obtained using
tidyms.MSData.get_n_spectra() and
tidyms.MSData.get_n_chromatograms() respectively. Iterating over all
the spectra in a file can be done using
get_spectra_iterator(), which generates each one of the
spectra in the file and allows filtering by acquisition time or MS level.
Common operations with raw data are located in tidyms.raw_data_utils.
Working with Mass Spectra¶
MSSpectrum stores the information from one scan. It is mostly
used as a data storage class in several data processing steps, but it also has
functionality to visualize the spectrum using the
plot() method and to convert a profile data spectrum
into centroid mode using tidyms.MSSpectrum.find_centroids().
tidyms.raw_data_utils.accumulate_spectra() combines a series of scans in
a file into a single spectrum:
combined_sp = ms.accumulate_spectra(ms_data, start_time=110, end_time=115)
Chromatograms¶
Besides the chromatograms stored in a file, extracted chromatograms can be
created tidyms.raw_data_utils.make_chromatograms() which takes an array of
m/z and returns a list tidyms.Chromatogram objects, each one associated
to one of the m/z values provided:
mz_list = np.array([189.0734, 205.0967, 188.071])
chromatograms = ms.make_chromatograms(ms_data, mz_list)
A chromatogram can be visualized using plot method:
chrom = chromatograms[0]
chrom.plot()
Peaks in a chromatogram are detected using
tidyms.lcms.LCRoi.extract_features(), which stores a list of
tidyms.lcms.Peak objects in the features attribute of the
chromatogram. Plotting again the chromatogram shows the detected peaks:
chrom.extract_features()
chrom.plot()
Peak descriptors can be obtained using
tidyms.lcms.Roi.describe_features():
>>> chrom.describe_features()
[{'height': 16572.38, 'area': 108529.94, 'rt': 125.73, 'width': 14.06,
'snr': 385.44, 'mz': None, 'mz_std': None}]
A detailed description of the algorithm used for peak picking can be found here. These methods are also used to create a data matrix from a dataset. See here a tutorial on how to work with complete datasets to extract a data matrix.