tidyms.fileio¶

Functions and objects to work with mzML data and tabular data obtained from third party software used to process Mass Spectrometry data.

Objects¶

MSData: reads raw MS data in the mzML format. Creates Chromatograms, ROI and MSSpectrum from ra data.

Functions¶

read_pickle(path): Reads a DataContainer stored as a pickle. read_progenesis(path): Reads data matrix in a csv file generated with Progenesis software. read_data_matrix(path, mode): Reads data matrix in several formats. Calls other read functions. functions.

See Also¶

Chromatogram MSSpectrum DataContainer Roi

class MSData(ms_mode: str = 'centroid', instrument: str = 'qtof', separation: str = 'uplc', is_virtual_sample: bool = False)¶

Reader object for raw MS data.

Manages chromatogram, roi and spectra creation.

Attributes:

pathstr: Path to a mzML file.
ms_mode{“centroid”, “profile”}, default=”centroid”: The mode in which the MS data is stored.
instrument{“qtof”. “orbitrap”}, default=”qtof”: The MS instrument type used to acquire the experimental data. Used to set default parameters in the methods.
separation{“uplc”, “hplc”}, default=”uplc”: The separation technique used before MS analysis. Used to set default parameters in the methods.

abstract get_chromatogram(n: int) → Tuple[str, Chromatogram]¶

Get the nth chromatogram stored in the file.

Parameters:

nint

Returns:

namestr
chromatogramlcms.Chromatogram

abstract get_n_chromatograms() → int¶

Get the chromatogram count in the file

Returns:

n_chromatogramsint

abstract get_n_spectra() → int¶

Get the spectra count in the file

Returns:

n_spectraint

abstract get_spectra_iterator(ms_level: int = 1, start: int = 0, end: int | None = None, start_time: float = 0.0, end_time: float | None = None) → Generator[Tuple[int, MSSpectrum], None, None]¶

Yields the spectra in the file.

Parameters:

ms_levelint, default=1: Use data from this ms level.
startint, default=0: Starts iteration at this spectrum index.
endint or None, default=None: Ends iteration at this spectrum index. If None, stops after the last spectrum.
start_timefloat, default=0.0: Ignore scans with acquisition times lower than this value.
end_timefloat or None, default=None: Ignore scans with acquisition times higher than this value.

Yields:

scan_number: int
spectrumlcms.MSSpectrum

abstract get_spectrum(n: int) → MSSpectrum¶

get the nth spectrum stored in the file.

Parameters:

n: int: scan number

Returns:

MSSpectrum

class MSData_Proxy(to_MSData_object)¶

get_chromatogram(n: int) → Tuple[str, Chromatogram]¶

Get the nth chromatogram stored in the file.

Parameters:

nint

Returns:

namestr
chromatogramlcms.Chromatogram

get_n_chromatograms() → int¶

Get the chromatogram count in the file

Returns:

n_chromatogramsint

get_n_spectra() → int¶

Get the spectra count in the file

Returns:

n_spectraint

abstract get_spectra_iterator(ms_level: int = 1, start: int = 0, end: int | None = None, start_time: float = 0.0, end_time: float | None = None) → Generator[Tuple[int, MSSpectrum], None, None]¶

Yields the spectra in the file.

Parameters:

ms_levelint, default=1: Use data from this ms level.
startint, default=0: Starts iteration at this spectrum index.
endint or None, default=None: Ends iteration at this spectrum index. If None, stops after the last spectrum.
start_timefloat, default=0.0: Ignore scans with acquisition times lower than this value.
end_timefloat or None, default=None: Ignore scans with acquisition times higher than this value.

Yields:

scan_number: int
spectrumlcms.MSSpectrum

get_spectrum(n: int) → MSSpectrum¶

get the nth spectrum stored in the file.

Parameters:

n: int: scan number

Returns:

MSSpectrum

class MSData_from_file(path: str | Path, ms_mode: str = 'centroid', instrument: str = 'qtof', separation: str = 'uplc')¶

Class for reading data from files without storing too much data in the memory

get_chromatogram(n: int) → Tuple[str, Chromatogram]¶

Get the nth chromatogram stored in the file.

Parameters:

nint

Returns:

namestr
chromatogramlcms.Chromatogram

get_n_chromatograms() → int¶

Get the chromatogram count in the file

Returns:

n_chromatogramsint

get_n_spectra() → int¶

Get the spectra count in the file

Returns:

n_spectraint

get_spectra_iterator(ms_level: int = 1, start: int = 0, end: int | None = None, start_time: float = 0.0, end_time: float | None = None) → Generator[Tuple[int, MSSpectrum], None, None]¶

Yields the spectra in the file.

Parameters:

ms_levelint, default=1: Use data from this ms level.
startint, default=0: Starts iteration at this spectrum index.
endint or None, default=None: Ends iteration at this spectrum index. If None, stops after the last spectrum.
start_timefloat, default=0.0: Ignore scans with acquisition times lower than this value.
end_timefloat or None, default=None: Ignore scans with acquisition times higher than this value.

Yields:

scan_number: int
spectrumlcms.MSSpectrum

get_spectrum(n: int) → MSSpectrum¶

get the nth spectrum stored in the file.

Parameters:

n: int: scan number

Returns:

MSSpectrum

class MSData_in_memory(ms_mode: str = 'centroid', instrument: str = 'qtof', separation: str = 'uplc')¶

Class for reading the entire file once to memory.

get_chromatogram(n: int) → Tuple[str, Chromatogram]¶

Get the nth chromatogram stored in the file.

Parameters:

nint

Returns:

namestr
chromatogramlcms.Chromatogram

get_n_chromatograms() → int¶

Get the chromatogram count in the file

Returns:

n_chromatogramsint

get_n_spectra() → int¶

Get the spectra count in the file

Returns:

n_spectraint

get_spectra_iterator(ms_level: int = 1, start: int = 0, end: int | None = None, start_time: float = 0.0, end_time: float | None = None) → Generator[Tuple[int, MSSpectrum], None, None]¶

Yields the spectra in the file.

Parameters:

ms_levelint, default=1: Use data from this ms level.
startint, default=0: Starts iteration at this spectrum index.
endint or None, default=None: Ends iteration at this spectrum index. If None, stops after the last spectrum.
start_timefloat, default=0.0: Ignore scans with acquisition times lower than this value.
end_timefloat or None, default=None: Ignore scans with acquisition times higher than this value.

Yields:

scan_number: int
spectrumlcms.MSSpectrum

get_spectrum(n: int) → MSSpectrum¶

get the nth spectrum stored in the file.

Parameters:

n: int: scan number

Returns:

MSSpectrum

class MSData_simulated(mz_values: ndarray, rt_values: ndarray, mz_params: ndarray, rt_params: ndarray, ft_noise: ndarray | None = None, noise: float | None = None, ms_mode: str = 'centroid', separation: str = 'uplc', instrument: str = 'qtof')¶

Emulates a MSData using simulated data. Used for tests.

get_chromatogram(n: int) → Tuple[str, Chromatogram]¶

Get the nth chromatogram stored in the file.

Parameters:

nint

Returns:

namestr
chromatogramlcms.Chromatogram

get_n_chromatograms() → int¶

Get the chromatogram count in the file

Returns:

n_chromatogramsint

get_n_spectra() → int¶

Get the spectra count in the file

Returns:

n_spectraint

get_spectra_iterator(ms_level: int = 1, start: int = 0, end: int | None = None, start_time: float = 0.0, end_time: float | None = None) → Generator[Tuple[int, MSSpectrum], None, None]¶

Yields the spectra in the file.

Parameters:

ms_levelint, default=1: Use data from this ms level.
startint, default=0: Starts iteration at this spectrum index.
endint or None, default=None: Ends iteration at this spectrum index. If None, stops after the last spectrum.
start_timefloat, default=0.0: Ignore scans with acquisition times lower than this value.
end_timefloat or None, default=None: Ignore scans with acquisition times higher than this value.

Yields:

scan_number: int
spectrumlcms.MSSpectrum

get_spectrum(n: int) → MSSpectrum¶

get the nth spectrum stored in the file.

Parameters:

n: int: scan number

Returns:

MSSpectrum

class MSData_subset_spectra(start_ind: int, end_ind: int, from_MSData_object: MSData)¶

Subset of another MSData object.

Attributes:

start_ind: integer (including)
end_ind: integer (including, must be greater or equal to start_ind)
from_MSData_object: MSData object

abstract get_chromatogram(n: int) → Tuple[str, Chromatogram]¶

Get the nth chromatogram stored in the file.

Parameters:

nint

Returns:

namestr
chromatogramlcms.Chromatogram

get_n_chromatograms() → int¶

Get the chromatogram count in the file

Returns:

n_chromatogramsint

get_n_spectra() → int¶

Get the spectra count in the file

Returns:

n_spectraint

abstract get_spectra_iterator(ms_level: int = 1, start: int = 0, end: int | None = None, start_time: float = 0.0, end_time: float | None = None) → Generator[Tuple[int, MSSpectrum], None, None]¶

Yields the spectra in the file.

Parameters:

ms_levelint, default=1: Use data from this ms level.
startint, default=0: Starts iteration at this spectrum index.
endint or None, default=None: Ends iteration at this spectrum index. If None, stops after the last spectrum.
start_timefloat, default=0.0: Ignore scans with acquisition times lower than this value.
end_timefloat or None, default=None: Ignore scans with acquisition times higher than this value.

Yields:

scan_number: int
spectrumlcms.MSSpectrum

abstract get_spectrum(n: int) → MSSpectrum¶

get the nth spectrum stored in the file.

Parameters:

n: int: scan number

Returns:

MSSpectrum

download_dataset(name: str, download_dir: str | None = None)¶

Download a directory from the data repository.

https://github.com/griquelme/tidyms-data

Parameters:

namestr: Name of the data directory
download_dirstr or None, default=None: String representation of a path to download the data. If None, downloads the data to the .tidyms directory

See also

download_dataset
load_dataset

Examples

Download the data.csv file from the reference-materials directory into the current directory:

>>> import tidyms as ms
>>> dataset = "reference-materials"
>>> file_list = ["data.csv"]
>>> ms.fileio.download_tidyms_data(dataset, file_list, download_dir=".")

list_available_datasets(hide_test_data: bool = True) → List[str]¶

List available example datasets

Parameters:

hide_test_databool

Returns:

datasets: List[str]

load_dataset(name: str, cache: bool = True, **kwargs) → DataContainer¶

load example dataset into a DataContainer. Available datasets can be seen using the list_datasets function.

Parameters:

namestr: name of an available dataset.
cachebool: If True tries to read the dataset from a local cache.
kwargs: additional parameters to pass to the Pandas read_csv function

Returns:

data_matrixDataFrame
feature_metadataDataFrame
sample_metadataDataFrame

read_data_matrix(path: str | TextIO | BinaryIO, data_matrix_format: str, sample_metadata: str | None = None) → DataContainer¶

Read different Data Matrix formats into a DataContainer.

Parameters:

path: str: path to the data matrix file.
data_matrix_format: {“progenesis”, “pickle”, “mzmine”}
sample_metadatastr, file or DataFrame.: Required for mzmine data.

Returns:

DataContainer

Examples

>>> data = read_data_matrix("data_path.csv", "progenesis")

read_mzmine(data: str | TextIO, sample_metadata: str | TextIO) → DataContainer¶

read a MZMine2 csv file into a DataContainer.

Parameters:

datastr or file: csv file generated with MZMine.
sample_metadatastr, file or DataFrame: csv file with sample metadata. The following columns are required: * sample : the same sample names used in data * class : the sample classes Columns with run order and analytical batch information are optional.

Returns:

DataContainer

read_pickle(path: str | BinaryIO) → DataContainer¶

read a DataContainer stored as a pickle

Parameters:

path: str or file: path to read DataContainer

Returns:

DataContainer

read_progenesis(path: str | TextIO)¶

Read a progenesis file into a DataContainer

Parameters:

pathstr or file

Returns:

dcDataContainer

read_xcms(data_matrix: str, feature_metadata: str, sample_metadata: str, class_column: str = 'class', sep: str = '\t')¶

Reads tabular data generated with xcms.

Parameters:

data_matrixstr: Path to a tab-delimited data matrix generated with the R package SummarizedExperiment assay method.
feature_metadatastr: Path to a tab-delimited file with feature metadata, (called feature definitions in XCMS) generated with the R package SummarizedExperiment colData method.
sample_metadatastr: Path to a tab-delimited file with sample metadata, generated with the R package SummarizedExperiment colData method. A column named class is required. If the class information is under another name, it must be specified in the class_column parameter.
class_columnstr: Column name which holds sample class information in the sample metadata.
sepstr: Separator used in the files. As the feature metadata generated with XCMS has comma characters, the default value for the separator is “\t”.

Returns:

DataContainer