tidyms.fileio

Functions and objects to work with mzML data and tabular data obtained from third party software used to process Mass Spectrometry data.

Objects

MSData: reads raw MS data in the mzML format. Creates Chromatograms, ROI and MSSpectrum from ra data.

Functions

read_pickle(path): Reads a DataContainer stored as a pickle. read_progenesis(path): Reads data matrix in a csv file generated with Progenesis software. read_data_matrix(path, mode): Reads data matrix in several formats. Calls other read functions. functions.

See Also

Chromatogram MSSpectrum DataContainer Roi

class MSData(ms_mode: str = 'centroid', instrument: str = 'qtof', separation: str = 'uplc', is_virtual_sample: bool = False)

Reader object for raw MS data.

Manages chromatogram, roi and spectra creation.

Attributes:
pathstr

Path to a mzML file.

ms_mode{“centroid”, “profile”}, default=”centroid”

The mode in which the MS data is stored.

instrument{“qtof”. “orbitrap”}, default=”qtof”

The MS instrument type used to acquire the experimental data. Used to set default parameters in the methods.

separation{“uplc”, “hplc”}, default=”uplc”

The separation technique used before MS analysis. Used to set default parameters in the methods.

abstract get_chromatogram(n: int) Tuple[str, Chromatogram]

Get the nth chromatogram stored in the file.

Parameters:
nint
Returns:
namestr
chromatogramlcms.Chromatogram
abstract get_n_chromatograms() int

Get the chromatogram count in the file

Returns:
n_chromatogramsint
abstract get_n_spectra() int

Get the spectra count in the file

Returns:
n_spectraint
abstract get_spectra_iterator(ms_level: int = 1, start: int = 0, end: int | None = None, start_time: float = 0.0, end_time: float | None = None) Generator[Tuple[int, MSSpectrum], None, None]

Yields the spectra in the file.

Parameters:
ms_levelint, default=1

Use data from this ms level.

startint, default=0

Starts iteration at this spectrum index.

endint or None, default=None

Ends iteration at this spectrum index. If None, stops after the last spectrum.

start_timefloat, default=0.0

Ignore scans with acquisition times lower than this value.

end_timefloat or None, default=None

Ignore scans with acquisition times higher than this value.

Yields:
scan_number: int
spectrumlcms.MSSpectrum
abstract get_spectrum(n: int) MSSpectrum

get the nth spectrum stored in the file.

Parameters:
n: int

scan number

Returns:
MSSpectrum
class MSData_Proxy(to_MSData_object)
get_chromatogram(n: int) Tuple[str, Chromatogram]

Get the nth chromatogram stored in the file.

Parameters:
nint
Returns:
namestr
chromatogramlcms.Chromatogram
get_n_chromatograms() int

Get the chromatogram count in the file

Returns:
n_chromatogramsint
get_n_spectra() int

Get the spectra count in the file

Returns:
n_spectraint
abstract get_spectra_iterator(ms_level: int = 1, start: int = 0, end: int | None = None, start_time: float = 0.0, end_time: float | None = None) Generator[Tuple[int, MSSpectrum], None, None]

Yields the spectra in the file.

Parameters:
ms_levelint, default=1

Use data from this ms level.

startint, default=0

Starts iteration at this spectrum index.

endint or None, default=None

Ends iteration at this spectrum index. If None, stops after the last spectrum.

start_timefloat, default=0.0

Ignore scans with acquisition times lower than this value.

end_timefloat or None, default=None

Ignore scans with acquisition times higher than this value.

Yields:
scan_number: int
spectrumlcms.MSSpectrum
get_spectrum(n: int) MSSpectrum

get the nth spectrum stored in the file.

Parameters:
n: int

scan number

Returns:
MSSpectrum
class MSData_from_file(path: str | Path, ms_mode: str = 'centroid', instrument: str = 'qtof', separation: str = 'uplc')

Class for reading data from files without storing too much data in the memory

get_chromatogram(n: int) Tuple[str, Chromatogram]

Get the nth chromatogram stored in the file.

Parameters:
nint
Returns:
namestr
chromatogramlcms.Chromatogram
get_n_chromatograms() int

Get the chromatogram count in the file

Returns:
n_chromatogramsint
get_n_spectra() int

Get the spectra count in the file

Returns:
n_spectraint
get_spectra_iterator(ms_level: int = 1, start: int = 0, end: int | None = None, start_time: float = 0.0, end_time: float | None = None) Generator[Tuple[int, MSSpectrum], None, None]

Yields the spectra in the file.

Parameters:
ms_levelint, default=1

Use data from this ms level.

startint, default=0

Starts iteration at this spectrum index.

endint or None, default=None

Ends iteration at this spectrum index. If None, stops after the last spectrum.

start_timefloat, default=0.0

Ignore scans with acquisition times lower than this value.

end_timefloat or None, default=None

Ignore scans with acquisition times higher than this value.

Yields:
scan_number: int
spectrumlcms.MSSpectrum
get_spectrum(n: int) MSSpectrum

get the nth spectrum stored in the file.

Parameters:
n: int

scan number

Returns:
MSSpectrum
class MSData_in_memory(ms_mode: str = 'centroid', instrument: str = 'qtof', separation: str = 'uplc')

Class for reading the entire file once to memory.

get_chromatogram(n: int) Tuple[str, Chromatogram]

Get the nth chromatogram stored in the file.

Parameters:
nint
Returns:
namestr
chromatogramlcms.Chromatogram
get_n_chromatograms() int

Get the chromatogram count in the file

Returns:
n_chromatogramsint
get_n_spectra() int

Get the spectra count in the file

Returns:
n_spectraint
get_spectra_iterator(ms_level: int = 1, start: int = 0, end: int | None = None, start_time: float = 0.0, end_time: float | None = None) Generator[Tuple[int, MSSpectrum], None, None]

Yields the spectra in the file.

Parameters:
ms_levelint, default=1

Use data from this ms level.

startint, default=0

Starts iteration at this spectrum index.

endint or None, default=None

Ends iteration at this spectrum index. If None, stops after the last spectrum.

start_timefloat, default=0.0

Ignore scans with acquisition times lower than this value.

end_timefloat or None, default=None

Ignore scans with acquisition times higher than this value.

Yields:
scan_number: int
spectrumlcms.MSSpectrum
get_spectrum(n: int) MSSpectrum

get the nth spectrum stored in the file.

Parameters:
n: int

scan number

Returns:
MSSpectrum
class MSData_simulated(mz_values: ndarray, rt_values: ndarray, mz_params: ndarray, rt_params: ndarray, ft_noise: ndarray | None = None, noise: float | None = None, ms_mode: str = 'centroid', separation: str = 'uplc', instrument: str = 'qtof')

Emulates a MSData using simulated data. Used for tests.

get_chromatogram(n: int) Tuple[str, Chromatogram]

Get the nth chromatogram stored in the file.

Parameters:
nint
Returns:
namestr
chromatogramlcms.Chromatogram
get_n_chromatograms() int

Get the chromatogram count in the file

Returns:
n_chromatogramsint
get_n_spectra() int

Get the spectra count in the file

Returns:
n_spectraint
get_spectra_iterator(ms_level: int = 1, start: int = 0, end: int | None = None, start_time: float = 0.0, end_time: float | None = None) Generator[Tuple[int, MSSpectrum], None, None]

Yields the spectra in the file.

Parameters:
ms_levelint, default=1

Use data from this ms level.

startint, default=0

Starts iteration at this spectrum index.

endint or None, default=None

Ends iteration at this spectrum index. If None, stops after the last spectrum.

start_timefloat, default=0.0

Ignore scans with acquisition times lower than this value.

end_timefloat or None, default=None

Ignore scans with acquisition times higher than this value.

Yields:
scan_number: int
spectrumlcms.MSSpectrum
get_spectrum(n: int) MSSpectrum

get the nth spectrum stored in the file.

Parameters:
n: int

scan number

Returns:
MSSpectrum
class MSData_subset_spectra(start_ind: int, end_ind: int, from_MSData_object: MSData)

Subset of another MSData object.

Attributes:
start_ind: integer (including)
end_ind: integer (including, must be greater or equal to start_ind)
from_MSData_object: MSData object
abstract get_chromatogram(n: int) Tuple[str, Chromatogram]

Get the nth chromatogram stored in the file.

Parameters:
nint
Returns:
namestr
chromatogramlcms.Chromatogram
get_n_chromatograms() int

Get the chromatogram count in the file

Returns:
n_chromatogramsint
get_n_spectra() int

Get the spectra count in the file

Returns:
n_spectraint
abstract get_spectra_iterator(ms_level: int = 1, start: int = 0, end: int | None = None, start_time: float = 0.0, end_time: float | None = None) Generator[Tuple[int, MSSpectrum], None, None]

Yields the spectra in the file.

Parameters:
ms_levelint, default=1

Use data from this ms level.

startint, default=0

Starts iteration at this spectrum index.

endint or None, default=None

Ends iteration at this spectrum index. If None, stops after the last spectrum.

start_timefloat, default=0.0

Ignore scans with acquisition times lower than this value.

end_timefloat or None, default=None

Ignore scans with acquisition times higher than this value.

Yields:
scan_number: int
spectrumlcms.MSSpectrum
abstract get_spectrum(n: int) MSSpectrum

get the nth spectrum stored in the file.

Parameters:
n: int

scan number

Returns:
MSSpectrum
download_dataset(name: str, download_dir: str | None = None)

Download a directory from the data repository.

https://github.com/griquelme/tidyms-data

Parameters:
namestr

Name of the data directory

download_dirstr or None, default=None

String representation of a path to download the data. If None, downloads the data to the .tidyms directory

Examples

Download the data.csv file from the reference-materials directory into the current directory:

>>> import tidyms as ms
>>> dataset = "reference-materials"
>>> ms.fileio.download_dataset(dataset, download_dir=".")
download_tidyms_data(name: str, files: List[str], download_dir: str | None = None)

Download a list of files from the data repository

https://github.com/griquelme/tidyms-data

Parameters:
namestr

Name of the data directory

filesList[str]

List of files inside the data directory.

download_dirstr or None, default=None

String representation of a path to download the data. If None, downloads the data to the .tidyms directory

Examples

Download the data.csv file from the reference-materials directory into the current directory:

>>> import tidyms as ms
>>> dataset = "reference-materials"
>>> file_list = ["data.csv"]
>>> ms.fileio.download_tidyms_data(dataset, file_list, download_dir=".")
list_available_datasets(hide_test_data: bool = True) List[str]

List available example datasets

Parameters:
hide_test_databool
Returns:
datasets: List[str]
load_dataset(name: str, cache: bool = True, **kwargs) DataContainer

load example dataset into a DataContainer. Available datasets can be seen using the list_datasets function.

Parameters:
namestr

name of an available dataset.

cachebool

If True tries to read the dataset from a local cache.

kwargs: additional parameters to pass to the Pandas read_csv function
Returns:
data_matrixDataFrame
feature_metadataDataFrame
sample_metadataDataFrame
read_data_matrix(path: str | TextIO | BinaryIO, data_matrix_format: str, sample_metadata: str | None = None) DataContainer

Read different Data Matrix formats into a DataContainer.

Parameters:
path: str

path to the data matrix file.

data_matrix_format: {“progenesis”, “pickle”, “mzmine”}
sample_metadatastr, file or DataFrame.

Required for mzmine data.

Returns:
DataContainer

Examples

>>> data = read_data_matrix("data_path.csv", "progenesis")
read_mzmine(data: str | TextIO, sample_metadata: str | TextIO) DataContainer

read a MZMine2 csv file into a DataContainer.

Parameters:
datastr or file

csv file generated with MZMine.

sample_metadatastr, file or DataFrame

csv file with sample metadata. The following columns are required: * sample : the same sample names used in data * class : the sample classes Columns with run order and analytical batch information are optional.

Returns:
DataContainer
read_pickle(path: str | BinaryIO) DataContainer

read a DataContainer stored as a pickle

Parameters:
path: str or file

path to read DataContainer

Returns:
DataContainer
read_progenesis(path: str | TextIO)

Read a progenesis file into a DataContainer

Parameters:
pathstr or file
Returns:
dcDataContainer
read_xcms(data_matrix: str, feature_metadata: str, sample_metadata: str, class_column: str = 'class', sep: str = '\t')

Reads tabular data generated with xcms.

Parameters:
data_matrixstr

Path to a tab-delimited data matrix generated with the R package SummarizedExperiment assay method.

feature_metadatastr

Path to a tab-delimited file with feature metadata, (called feature definitions in XCMS) generated with the R package SummarizedExperiment colData method.

sample_metadatastr

Path to a tab-delimited file with sample metadata, generated with the R package SummarizedExperiment colData method. A column named class is required. If the class information is under another name, it must be specified in the class_column parameter.

class_columnstr

Column name which holds sample class information in the sample metadata.

sepstr

Separator used in the files. As the feature metadata generated with XCMS has comma characters, the default value for the separator is “\t”.

Returns:
DataContainer