tidyms.fileio¶
Functions and objects to work with mzML data and tabular data obtained from third party software used to process Mass Spectrometry data.
Objects¶
MSData: reads raw MS data in the mzML format. Creates Chromatograms, ROI and MSSpectrum from ra data.
Functions¶
read_pickle(path): Reads a DataContainer stored as a pickle. read_progenesis(path): Reads data matrix in a csv file generated with Progenesis software. read_data_matrix(path, mode): Reads data matrix in several formats. Calls other read functions. functions.
See Also¶
Chromatogram MSSpectrum DataContainer Roi
- class MSData(ms_mode: str = 'centroid', instrument: str = 'qtof', separation: str = 'uplc', is_virtual_sample: bool = False)¶
Reader object for raw MS data.
Manages chromatogram, roi and spectra creation.
- Attributes:
- pathstr
Path to a mzML file.
- ms_mode{“centroid”, “profile”}, default=”centroid”
The mode in which the MS data is stored.
- instrument{“qtof”. “orbitrap”}, default=”qtof”
The MS instrument type used to acquire the experimental data. Used to set default parameters in the methods.
- separation{“uplc”, “hplc”}, default=”uplc”
The separation technique used before MS analysis. Used to set default parameters in the methods.
- abstract get_chromatogram(n: int) Tuple[str, Chromatogram]¶
Get the nth chromatogram stored in the file.
- Parameters:
- nint
- Returns:
- namestr
- chromatogramlcms.Chromatogram
- abstract get_n_chromatograms() int¶
Get the chromatogram count in the file
- Returns:
- n_chromatogramsint
- abstract get_n_spectra() int¶
Get the spectra count in the file
- Returns:
- n_spectraint
- abstract get_spectra_iterator(ms_level: int = 1, start: int = 0, end: int | None = None, start_time: float = 0.0, end_time: float | None = None) Generator[Tuple[int, MSSpectrum], None, None]¶
Yields the spectra in the file.
- Parameters:
- ms_levelint, default=1
Use data from this ms level.
- startint, default=0
Starts iteration at this spectrum index.
- endint or None, default=None
Ends iteration at this spectrum index. If None, stops after the last spectrum.
- start_timefloat, default=0.0
Ignore scans with acquisition times lower than this value.
- end_timefloat or None, default=None
Ignore scans with acquisition times higher than this value.
- Yields:
- scan_number: int
- spectrumlcms.MSSpectrum
- abstract get_spectrum(n: int) MSSpectrum¶
get the nth spectrum stored in the file.
- Parameters:
- n: int
scan number
- Returns:
- MSSpectrum
- class MSData_Proxy(to_MSData_object)¶
- get_chromatogram(n: int) Tuple[str, Chromatogram]¶
Get the nth chromatogram stored in the file.
- Parameters:
- nint
- Returns:
- namestr
- chromatogramlcms.Chromatogram
- get_n_chromatograms() int¶
Get the chromatogram count in the file
- Returns:
- n_chromatogramsint
- get_n_spectra() int¶
Get the spectra count in the file
- Returns:
- n_spectraint
- abstract get_spectra_iterator(ms_level: int = 1, start: int = 0, end: int | None = None, start_time: float = 0.0, end_time: float | None = None) Generator[Tuple[int, MSSpectrum], None, None]¶
Yields the spectra in the file.
- Parameters:
- ms_levelint, default=1
Use data from this ms level.
- startint, default=0
Starts iteration at this spectrum index.
- endint or None, default=None
Ends iteration at this spectrum index. If None, stops after the last spectrum.
- start_timefloat, default=0.0
Ignore scans with acquisition times lower than this value.
- end_timefloat or None, default=None
Ignore scans with acquisition times higher than this value.
- Yields:
- scan_number: int
- spectrumlcms.MSSpectrum
- get_spectrum(n: int) MSSpectrum¶
get the nth spectrum stored in the file.
- Parameters:
- n: int
scan number
- Returns:
- MSSpectrum
- class MSData_from_file(path: str | Path, ms_mode: str = 'centroid', instrument: str = 'qtof', separation: str = 'uplc')¶
Class for reading data from files without storing too much data in the memory
- get_chromatogram(n: int) Tuple[str, Chromatogram]¶
Get the nth chromatogram stored in the file.
- Parameters:
- nint
- Returns:
- namestr
- chromatogramlcms.Chromatogram
- get_n_chromatograms() int¶
Get the chromatogram count in the file
- Returns:
- n_chromatogramsint
- get_n_spectra() int¶
Get the spectra count in the file
- Returns:
- n_spectraint
- get_spectra_iterator(ms_level: int = 1, start: int = 0, end: int | None = None, start_time: float = 0.0, end_time: float | None = None) Generator[Tuple[int, MSSpectrum], None, None]¶
Yields the spectra in the file.
- Parameters:
- ms_levelint, default=1
Use data from this ms level.
- startint, default=0
Starts iteration at this spectrum index.
- endint or None, default=None
Ends iteration at this spectrum index. If None, stops after the last spectrum.
- start_timefloat, default=0.0
Ignore scans with acquisition times lower than this value.
- end_timefloat or None, default=None
Ignore scans with acquisition times higher than this value.
- Yields:
- scan_number: int
- spectrumlcms.MSSpectrum
- get_spectrum(n: int) MSSpectrum¶
get the nth spectrum stored in the file.
- Parameters:
- n: int
scan number
- Returns:
- MSSpectrum
- class MSData_in_memory(ms_mode: str = 'centroid', instrument: str = 'qtof', separation: str = 'uplc')¶
Class for reading the entire file once to memory.
- get_chromatogram(n: int) Tuple[str, Chromatogram]¶
Get the nth chromatogram stored in the file.
- Parameters:
- nint
- Returns:
- namestr
- chromatogramlcms.Chromatogram
- get_n_chromatograms() int¶
Get the chromatogram count in the file
- Returns:
- n_chromatogramsint
- get_n_spectra() int¶
Get the spectra count in the file
- Returns:
- n_spectraint
- get_spectra_iterator(ms_level: int = 1, start: int = 0, end: int | None = None, start_time: float = 0.0, end_time: float | None = None) Generator[Tuple[int, MSSpectrum], None, None]¶
Yields the spectra in the file.
- Parameters:
- ms_levelint, default=1
Use data from this ms level.
- startint, default=0
Starts iteration at this spectrum index.
- endint or None, default=None
Ends iteration at this spectrum index. If None, stops after the last spectrum.
- start_timefloat, default=0.0
Ignore scans with acquisition times lower than this value.
- end_timefloat or None, default=None
Ignore scans with acquisition times higher than this value.
- Yields:
- scan_number: int
- spectrumlcms.MSSpectrum
- get_spectrum(n: int) MSSpectrum¶
get the nth spectrum stored in the file.
- Parameters:
- n: int
scan number
- Returns:
- MSSpectrum
- class MSData_simulated(mz_values: ndarray, rt_values: ndarray, mz_params: ndarray, rt_params: ndarray, ft_noise: ndarray | None = None, noise: float | None = None, ms_mode: str = 'centroid', separation: str = 'uplc', instrument: str = 'qtof')¶
Emulates a MSData using simulated data. Used for tests.
- get_chromatogram(n: int) Tuple[str, Chromatogram]¶
Get the nth chromatogram stored in the file.
- Parameters:
- nint
- Returns:
- namestr
- chromatogramlcms.Chromatogram
- get_n_chromatograms() int¶
Get the chromatogram count in the file
- Returns:
- n_chromatogramsint
- get_n_spectra() int¶
Get the spectra count in the file
- Returns:
- n_spectraint
- get_spectra_iterator(ms_level: int = 1, start: int = 0, end: int | None = None, start_time: float = 0.0, end_time: float | None = None) Generator[Tuple[int, MSSpectrum], None, None]¶
Yields the spectra in the file.
- Parameters:
- ms_levelint, default=1
Use data from this ms level.
- startint, default=0
Starts iteration at this spectrum index.
- endint or None, default=None
Ends iteration at this spectrum index. If None, stops after the last spectrum.
- start_timefloat, default=0.0
Ignore scans with acquisition times lower than this value.
- end_timefloat or None, default=None
Ignore scans with acquisition times higher than this value.
- Yields:
- scan_number: int
- spectrumlcms.MSSpectrum
- get_spectrum(n: int) MSSpectrum¶
get the nth spectrum stored in the file.
- Parameters:
- n: int
scan number
- Returns:
- MSSpectrum
- class MSData_subset_spectra(start_ind: int, end_ind: int, from_MSData_object: MSData)¶
Subset of another MSData object.
- Attributes:
- start_ind: integer (including)
- end_ind: integer (including, must be greater or equal to start_ind)
- from_MSData_object: MSData object
- abstract get_chromatogram(n: int) Tuple[str, Chromatogram]¶
Get the nth chromatogram stored in the file.
- Parameters:
- nint
- Returns:
- namestr
- chromatogramlcms.Chromatogram
- get_n_chromatograms() int¶
Get the chromatogram count in the file
- Returns:
- n_chromatogramsint
- get_n_spectra() int¶
Get the spectra count in the file
- Returns:
- n_spectraint
- abstract get_spectra_iterator(ms_level: int = 1, start: int = 0, end: int | None = None, start_time: float = 0.0, end_time: float | None = None) Generator[Tuple[int, MSSpectrum], None, None]¶
Yields the spectra in the file.
- Parameters:
- ms_levelint, default=1
Use data from this ms level.
- startint, default=0
Starts iteration at this spectrum index.
- endint or None, default=None
Ends iteration at this spectrum index. If None, stops after the last spectrum.
- start_timefloat, default=0.0
Ignore scans with acquisition times lower than this value.
- end_timefloat or None, default=None
Ignore scans with acquisition times higher than this value.
- Yields:
- scan_number: int
- spectrumlcms.MSSpectrum
- abstract get_spectrum(n: int) MSSpectrum¶
get the nth spectrum stored in the file.
- Parameters:
- n: int
scan number
- Returns:
- MSSpectrum
- download_dataset(name: str, download_dir: str | None = None)¶
Download a directory from the data repository.
https://github.com/griquelme/tidyms-data
- Parameters:
- namestr
Name of the data directory
- download_dirstr or None, default=None
String representation of a path to download the data. If None, downloads the data to the .tidyms directory
See also
Examples
Download the data.csv file from the reference-materials directory into the current directory:
>>> import tidyms as ms >>> dataset = "reference-materials" >>> ms.fileio.download_dataset(dataset, download_dir=".")
- download_tidyms_data(name: str, files: List[str], download_dir: str | None = None)¶
Download a list of files from the data repository
https://github.com/griquelme/tidyms-data
- Parameters:
- namestr
Name of the data directory
- filesList[str]
List of files inside the data directory.
- download_dirstr or None, default=None
String representation of a path to download the data. If None, downloads the data to the .tidyms directory
See also
Examples
Download the data.csv file from the reference-materials directory into the current directory:
>>> import tidyms as ms >>> dataset = "reference-materials" >>> file_list = ["data.csv"] >>> ms.fileio.download_tidyms_data(dataset, file_list, download_dir=".")
- list_available_datasets(hide_test_data: bool = True) List[str]¶
List available example datasets
- Parameters:
- hide_test_databool
- Returns:
- datasets: List[str]
- load_dataset(name: str, cache: bool = True, **kwargs) DataContainer¶
load example dataset into a DataContainer. Available datasets can be seen using the list_datasets function.
- Parameters:
- namestr
name of an available dataset.
- cachebool
If True tries to read the dataset from a local cache.
- kwargs: additional parameters to pass to the Pandas read_csv function
- Returns:
- data_matrixDataFrame
- feature_metadataDataFrame
- sample_metadataDataFrame
- read_data_matrix(path: str | TextIO | BinaryIO, data_matrix_format: str, sample_metadata: str | None = None) DataContainer¶
Read different Data Matrix formats into a DataContainer.
- Parameters:
- path: str
path to the data matrix file.
- data_matrix_format: {“progenesis”, “pickle”, “mzmine”}
- sample_metadatastr, file or DataFrame.
Required for mzmine data.
- Returns:
- DataContainer
Examples
>>> data = read_data_matrix("data_path.csv", "progenesis")
- read_mzmine(data: str | TextIO, sample_metadata: str | TextIO) DataContainer¶
read a MZMine2 csv file into a DataContainer.
- Parameters:
- datastr or file
csv file generated with MZMine.
- sample_metadatastr, file or DataFrame
csv file with sample metadata. The following columns are required: * sample : the same sample names used in data * class : the sample classes Columns with run order and analytical batch information are optional.
- Returns:
- DataContainer
- read_pickle(path: str | BinaryIO) DataContainer¶
read a DataContainer stored as a pickle
- Parameters:
- path: str or file
path to read DataContainer
- Returns:
- DataContainer
- read_progenesis(path: str | TextIO)¶
Read a progenesis file into a DataContainer
- Parameters:
- pathstr or file
- Returns:
- dcDataContainer
- read_xcms(data_matrix: str, feature_metadata: str, sample_metadata: str, class_column: str = 'class', sep: str = '\t')¶
Reads tabular data generated with xcms.
- Parameters:
- data_matrixstr
Path to a tab-delimited data matrix generated with the R package SummarizedExperiment assay method.
- feature_metadatastr
Path to a tab-delimited file with feature metadata, (called feature definitions in XCMS) generated with the R package SummarizedExperiment colData method.
- sample_metadatastr
Path to a tab-delimited file with sample metadata, generated with the R package SummarizedExperiment colData method. A column named class is required. If the class information is under another name, it must be specified in the class_column parameter.
- class_columnstr
Column name which holds sample class information in the sample metadata.
- sepstr
Separator used in the files. As the feature metadata generated with XCMS has comma characters, the default value for the separator is “\t”.
- Returns:
- DataContainer