tidyms.dartms¶
Functionality to process DART-MS datasets
Objects¶
DartMSAssay: Stores raw and processed DART-MS data.
Usage¶
Predefined workflow:
prefab_DARTMS_dataProcessing_pipeline
Semi-automated parameter optimization:
compare_parameters_for_function
Spot detection:
create_assay_from_chronogramFiles
Data import:
create_assay_from_chronogramFiles
Data manipulation:
select_top_n_spectra
correct_MZ_shift_across_samples
calculate_consensus_spectra_for_samples
bracket_consensus_spectrum_samples
build_data_matrix
batch_correction
blank_subtraction
Annotation
annotate_features
annotate_with_compounds
Import/Export
save_self_to_dill_file
read_from_dill_file
export_data_matrix
export_for_R
write_bracketing_results_to_featureML
generate_feature_raw_plot
Statistics
restrict_to_high_quality_features__found_in_replicates
restrict_to_high_quality_features__minimum_intensity_filter
print_results_overview
plot_RSDs_per_group
calc_volcano_plots
calc_2D_Embedding
generate_feature_abundance_plot
- class DartMSAssay(name='Generic')¶
An object inspired by tidyms’ Assay class that encapsulates a DARTMS experiment.
Constructor for a new DartMSAssay object
- Parameters:
- namestr, optional
name of the experiment. Defaults to “Generic”.
- add_data_processing_step(step_identifier_text, log_text, processing_data=None)¶
Adds a data processing step to the log of the DartMSAssay object
- Parameters:
- step_identifier_textstring
name of the data processing step
- log_textstring
description of the data processing step
- processing_datadict, optional
further inforamtion (e.g., parameters) of the data processing step. Defaults to None.
- annotate_features(useGroups=None, max_deviation_ppm=100, search_ions=None, remove_other_ions=True, plot=False)¶
Function to annotate the bracketed features with different sister ions (adducts, isotopologs, etc.) relative to parent ions (mostly [M+H]+ or [M-H]-)
- Parameters:
- useGroupslist of str, optional
groups to be used for the annotation (important for testing the ratio). Defaults to None.
- max_deviation_ppmint, optional
the maximum allowed deviation between the calculated and observed mz value of a sister ion. Defaults to 100.
- search_ionsdict, optional
the ions to search for. keys are ion names, values are mz increments (no decrements allowed). Defaults to None.
- remove_other_ionsbool, optional
indicator if annotated sister ions should be removed. Defaults to True.
- plotbool, optional
indicator if annotation results should be plotted. Defaults to False.
- annotate_with_compounds(tsv_file, max_ppm_dev=15.0, adducts=None, delimiter='\t', quote_character='', comment_character='#')¶
Annotation of detected features with compounds from a database.
- Parameters:
- tsv_filestr
Path to a tab-separated file containing the database.
- max_ppm_devfloat, optional
Maximum allowed mz difference in ppm relative to the theoretical value. Defaults to 15.0.
- adductsdict, optional
key: stri, value: tuple of (charge number, mz increment). Defaults to None.
- delimiterstr, optional
delimter character of the database. Defaults to “ “.
- quote_characterstr, optional
quote character of the database. Defaults to “”.
- comment_characterstr, optional
comment character of the database (not allowed in first row/header). Defaults to “#”.
- batch_correction(by_group, plot=True)¶
Correct the abundances of all detected features in different batches. The algorithm is as follows An overall mean-QC-overall-value is derived from all QC samples (regardless of the batch) using all features detected in these QC samples. For each batch, a mean-QC-sample-value is derived from QC samples in each batch using all features detected in these QC samples. All samples in the batch are corrected by this mean-QC-sample-value. For this, all feature abundances are divided by this value Furthermore, this corrected abundance values are multiplied by the mean-QC-overall-value to achieve similar abundance values than before the correction.
- Parameters:
- by_groupstring
the name of the group to be used for the batch correction
- plotbool, optional
show the correction results as plots. Defaults to True.
- blank_subtraction(blankGroup, toTestGroups, foldCutoff=2, pvalueCutoff=0.05, minDetected=2, plot=False)¶
Method to remove background features from the datamatrix. Repeated calls with different blank groups are possible. Inspired by the background-subtraction module of MZmine3
- Parameters:
- blankGroupstring
the name of the blank group
- toTestGroupslist of str
the name of the groups to test against the blank group
- foldCutoffint, optional
the minimum fold-change between at least one test-group and the blank group in order for a feature to not be considered a background. Defaults to 2.
- pvalueCutofffloat, optional
the alpha-threshold for the ttest. Defaults to 0.05.
- minDetectedint, optional
the minimum number a feature must be detected in the background samples in order to be considered a background features. Defaults to 2.
- plotbool, optional
indicator whether the subtraction shall be plotted as a volcano plot. Defaults to False.
- Raises:
- NotImplementedError
should never be raised, but is if the algorithm’s implementation is incorrect
- bracket_consensus_spectrum_samples(closest_signal_max_deviation_ppm=20, max_ppm_deviation=25, show_diagnostic_plots=False)¶
Function to bracket consensus spectra across different samples
- Parameters:
- max_ppm_deviationfloat, optional
the maximum allowed devation (in ppm) a consensus group is allowed to have. Defaults to 25.
- show_diagnostic_plotsbool, optional
indicator if a diagnostic plot shall be shown. Defaults to False.
- build_data_matrix(on='originalData', originalData_mz_deviation_multiplier_PPM=0, aggregation_fun='average')¶
generates a data matrix from corrected, consensus spectra and bracketed features
- Parameters:
- onstr, optional
the data used to derived abundance values from. with ‘processedData’ the consensus spectra will be used, while with ‘originalData’ the raw-data will be used. Defaults to “originalData”.
- originalData_mz_deviation_multiplier_PPMint, optional
an optional mz deviation allowed for the raw-data integration. Defaults to 0.
- aggregation_funstr, optional
the method to calculate the derived abundance on integration of raw data. Defaults to “average”.
- Raises:
- ValueError
raised if parameters on and aggregation_fun have invalid values
- calc_2D_Embedding(keep_features=None, remove_features=None, keep_samples=None, remove_samples=None, keep_groups=None, remove_groups=None, keep_batches=None, remove_batches=None, imputation='zero', scaling='standard', embedding='pca')¶
Calculates a two-dimensional embedding of the data matrix (or a subset) and illustrates it as a scores/component plot
- Parameters:
- keep_featureslist of indices, optional
features to be included for the embedding. Defaults to None.
- remove_featureslist of indices, optional
features to not be included for the embedding. Defaults to None.
- keep_sampleslist of str, optional
samples to be included for the embedding. Defaults to None.
- remove_sampleslist of str, optional
samples to not be included for the embedding. Defaults to None.
- keep_groupslist of str, optional
groups to be included for the embedding. Defaults to None.
- remove_groupslist of str, optional
groups to not be included for the embedding. Defaults to None.
- keep_batcheslist of int, optional
batches to be included for the embedding. Defaults to None.
- remove_batcheslist of int, optional
batches to not be included for the embedding. Defaults to None.
- imputationstr, optional
imputation method for features with missing values (i.e., no signals detected for them). Defaults to “zero”, allowed are “zero” and “omitNA”.
- scalingstr, optional
the scaling to be applied before the embedding calculation. Defaults to “standard”, allowed are None, “”, “standard”.
- embeddingstr, optional
the embedding type (). Defaults to “pca”, allowed are “pca”, “lda”, “umap”.
- Returns:
- plotnine
the plot
- Raises:
- ValueError
raised if invalid parameters are provided
- calc_volcano_plots(comparisons, alpha_critical=0.05, minimum_fold_change=2, keep_features=None, remove_features=None, highlight_features=None, min_ratio_samples_for_found=0.75, min_ratio_samples_for_not_found=0.25, sig_color='firebrick', not_different_color='cadetblue')¶
generate a volcano plot from the results
- Parameters:
- comparisonslist of tuple ofstr, str)
the names of the two groups to be compared
- alpha_criticalfloat, optional
the critical alpha value for a significant difference. Defaults to 0.05.
- minimum_fold_changeint, optional
the mimimum required fold-change for a significant difference. Defaults to 2.
- keep_features_type_, optional
a list of indices to be used for the uni-variate comparison. Defaults to None.
- remove_features_type_, optional
a list of features to not be used for the uni-variate comparison. Defaults to None.
- highlight_featureslist of featre inds, optional
features to highlight, indices according to self.dat matrix and self.features must be provided.
- sig_colorstr, optional
the name of the color used for plotting significantly different features. Defaults to “firebrick”.
- not_different_colorstr, optional
the name of the color used for plotting not significantly different features. Defaults to “cadetblue”.
- Returns:
- Pandard dataframe
the data matrix for the volcano plot
- calculate_consensus_spectra_for_samples(min_difference_ppm=30, closest_signal_max_deviation_ppm=20, max_mz_deviation_ppm=20, min_signals_per_cluster=10, minimum_intensity_for_signals=0, cluster_quality_check_functions=None, aggregation_function='average', exportAsFeatureML=True, featureMLlocation='.')¶
Function to collapse several spectra into a single consensus spectrum per spot
- Parameters:
- min_difference_ppmfloat, optional
Minimum difference in PPM required to separate into different clusters. Defaults to 30.
- min_signals_per_clusterint, optional
Minimum number of signals for a certain MZ cluster for it to be used in the collapsed spectrum. Defaults to 10.
- clone_DartMSAssay()¶
Clones the DartMSAssay object (deepcopy)
- Returns:
- DartMSAssay
the cloned, new DartMSAssay object
- correct_MZ_shift_across_samples(referenceMZs=[166.086254594], max_mz_deviation_absolute=0.1, correctby='mzDeviationPPM', max_deviationPPM_to_use_for_correction=80, selection_criteria='mostAbundant', correct_on_level='file', plot=False)¶
Function to correct systematic shifts of mz values in individual spot samples Currently, only a constant MZ offset relative to several reference features can be corrected. The correction is carried out by calculating the median error relative to the reference features’ and then apply either the aboslute or ppm devaition to all mz values in the spot sample. A signal for a feature on the referenceMZs list is said to be found, if it is within the max_mz_deviation_absolute parameter. The feature with the closest mz difference will be used in cases where several features are present in the search window
- Parameters:
- assayAssay
The assay object of the experiment
- referenceMZslist of MZ values (float), optional
The reference features for which the offsets shall be calculated. Defaults to [165.078978594 + 1.007276].
- max_mz_deviation_absolutefloat, optional
Maximum deviation used for the search. Defaults to 0.1.
- correctbystr, optional
Either “mzDeviation” for correcting by a constant mz offset or “mzDeviationPPM” to correct by a constant PPM offset. Defaults to “mzDeviationPPM”.
- plotool, optional
Indicates if a plot shall be generated and returned. Defaults to False.
- Returns:
- pandas.DataFrame, plot
Overview of the correction and plot (if it shall be generated)
- static create_assay_from_chronogramFiles(assay_name, filenames, spot_file, ms_mode, instrument, centroid_profileMode=True, fileNameChangeFunction=None, use_signal_function=None, rewriteRTinFiles=False, rewriteDeleteUnusedScans=True, intensity_threshold_spot_extraction=0, import_filters=None)¶
Generates a new assay from a series of chronograms and a spot_file
- Parameters:
- filenameslist of str
File path of the chronograms
- spot_filestr
File path of the spot file
- centroid_profileModebool
indicates if profile mode data shall be centroided automatically
- Returns:
- Assay
The new generated assay object with the either automatically or user-guided spots from the spot_file
- drop_lower_spectra(drop_rate)¶
A function to restrict chronogram spots to only certain spectra in the dataset (e.g., use the ‘core’ of the spot). From the all spectra assigned to the spot, only the drop_rate % will be used Use with caution, as a variable number of scans over the spot might affect abundances, especially when aggregation_method = “sum” is used
- Parameters:
- drop_ratefloat
the ratio of the highest-abundant spectra to be used.
- export_data_matrix(to_file, separator='\t', quotechar='"')¶
Export the data matrix to a tsv file
- Parameters:
- to_filestring
the file to save the results to
- export_for_R(to_file)¶
Export the results to a tsv file and generate R code to import it
- Parameters:
- to_filestring
the location of the file (without extension, ‘.tsv’ will be added automatically)
- static generate_database_template(to_tsv_file)¶
Generates a template for the database search
- Parameters:
- to_tsv_filestr
file to which the template shall be written
- generate_feature_abundance_plot(feature_index, keep_samples=None, remove_samples=None, keep_groups=None, remove_groups=None, keep_batches=None, remove_batches=None)¶
Shows a single feature
- Parameters:
- feature_indexindex
the index of the feature to be plotted
- Returns:
- Pandas DataFrame
the data matrix for the plot
- generate_feature_raw_plot(refMZ, reverseCorrectMZ=True, ppmDev=20, keep_samples=None, remove_samples=None, keep_groups=None, remove_groups=None, keep_batches=None, remove_batches=None)¶
generates a raw data plot for a detected feature
- Parameters:
- refMZfloat
the mz value of the feature to plot after mz correction (reverseCorrectMZ = True) or in the raw data (reverseCorrectMZ = False)
- reverseCorrectMZboolean, optional
indicates if the refMZ value is after (True) or before (False) mz correction
- ppmDevfloat, optional
the allowed mz deviation for the feature
- Raises:
- ValueError
raised if parameters on and aggregation_fun have invalid values
- get_data_matrix_and_meta(keep_features=None, remove_features=None, keep_samples=None, remove_samples=None, keep_groups=None, remove_groups=None, keep_batches=None, remove_batches=None, copy=False)¶
Returns the data matrix as well as feature and samples informatoin available in the DartMSAssay object. Features and samples/groups/batches can be included or excluded before the export.
- Parameters:
- keep_featureslist of indices, optional
features to be included in the export. To export all, the parameter must be set to None. Defaults to None.
- remove_featureslist of indices, optional
features to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.
- keep_sampleslist of strings, optional
samples to be included in the export. To export all, the parameter must be set to None. Defaults to None.
- remove_sampleslist of strings, optional
samples to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.
- keep_groupslist of strings, optional
groups to be included in the export. To export all, the parameter must be set to None. Defaults to None.
- remove_groupslist of strings, optional
groups to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.
- keep_batcheslist of strings, optional
batches to be inlcuded in the export. To export all, the parameter must be set to None. Defaults to None.
- remove_batcheslist of integers, optional
batches to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.
- copybool, optional
indicator if the object should be cloned before export. This will be done automatically if any inclusion or restriction is provided. Defaults to False.
- Returns:
- (numpy data matrix [samples x features], list of feature properties, list of feature annotaitons, list of sample names, list of assigned group names, list of assigned batches
data matrix and meta-data
- Raises:
- RuntimeError
an exception is raised when the necessary data is not available
- get_summary_of_results(reference_features=None, reference_features_allowed_deviationPPM=20.0)¶
Show a summary of the results.
- Parameters:
- reference_featureslist of float, optional
reference mz values to be used. Defaults to None.
- reference_features_allowed_deviationPPMfloat, optional
allowed mz deviation. Defaults to 20.0.
- Returns:
- Pandas DataFrame
Summary table
- normalize_samples_by_TIC(multiplication_factor=1)¶
abundances of spot spectra can be normalized by the toal intensity of the spectra
- Parameters:
- multiplication_factorint, optional
a factor that is applied on top of the normalization (i.e., shifts the maximum abundance to this value). Defaults to 1.
- normalize_to_internal_standard(std, multiplication_factor=1, plot=False)¶
Abundances of spot spectra are normalized by the abundance of a selected internal standard
- Parameters:
- stdfloat
standard to normalize to
- multiplication_factorint, optional
Defaults to 1.
- plotbool, optional
Defaults to False.
- plot_RSDs_per_group(uGroups=None, include=None, plotType='points', scales='free_y')¶
Plots an rsd distribution per group
- Parameters:
- uGroupslist of str, optional
the groups to be included in the overview. Defaults to None.
- includelist of str, optional
missing values replacement strategies to be used. Defaults to None.
- plotTypestr, optional
type of plot (either points or histogram). Defaults to “points”.
- scalesstr, optional
parameter for plotnine and the y-scale of facetted plots. Defaults to “free_y”.
- Raises:
- ValueError
raised if an unknown plottype is provided
- plot_feature_mz_deviations(featureInd, refMZ=None, types=None, keep_samples=None, remove_samples=None, keep_groups=None, remove_groups=None, keep_batches=None, remove_batches=None)¶
plots the mz deviation for a particular features
- Parameters:
- featureIndindex
the index of the feature to plot
- typeslist of str, optional
plots to include. Defaults to None.
- plot_heatmap(keep_features=None, remove_features=None, keep_samples=None, remove_samples=None, keep_groups=None, remove_groups=None, keep_batches=None, remove_batches=None, linkage_method='ward', distance_metric='euclidean')¶
Calculates and plots a heatmap.
- Parameters:
- keep_featureslist of indices, optional
features to be included for the embedding. Defaults to None.
- remove_featureslist of indices, optional
features to not be included for the embedding. Defaults to None.
- keep_sampleslist of str, optional
samples to be included for the embedding. Defaults to None.
- remove_sampleslist of str, optional
samples to not be included for the embedding. Defaults to None.
- keep_groupslist of str, optional
groups to be included for the embedding. Defaults to None.
- remove_groupslist of str, optional
groups to not be included for the embedding. Defaults to None.
- keep_batcheslist of int, optional
batches to be included for the embedding. Defaults to None.
- remove_batcheslist of int, optional
batches to not be included for the embedding. Defaults to None.
- linkage_methodstr, optional
linkage method to be used for generating the feature clustering. Defaults to ‘ward’, options are from scipy.linkage
- distance_metricstr, optional
distnace method to be used for generating the feature clustering. Defaults to ‘euclidean’, options are from scipy.linkage
- Returns:
- plot
the heatmap
- Raises:
- ValueError
raised if invalid parameters are provided
- plot_mz_deviation_overview(show=None, random_fraction=1)¶
plots an overview of the mz deviation in the bracketed results
- Parameters:
- show_type_, optional
indicator which plots should be shown. Defaults to None.
- random_fractionint, optional
use only a random fraction of the features. Defaults to 1.
- Raises:
- ValueError
Unknown option(s) specified
- plot_sample_TICs(separate=True, separate_by='group')¶
Plots the detected spots of the chronograms
- Parameters:
- separatebool, optional
indicator if a facetted plot shall be used or not. Defaults to True.
- separate_bystr, optional
the variable used for grouping the results (can be file, group, or batch). Defaults to “group”.
- plot_sample_abundances()¶
plots an overview of the feature abundances per sample an dgroup
- Returns:
- plotnine plot
the generated plot
- print_results_overview()¶
prints an overview of the bracketed results to the console
- print_sample_overview()¶
Prints an overview of the samples
- static read_from_dill_file(dill_file)¶
Load a DartMSAssay from a file
- Parameters:
- dill_filestring
the *.dill file to load the DartMSAssay from
- Returns:
- DartMSAssay
the loaded DartMSAssay object
- restrict_to_high_quality_features__found_in_replicates(test_groups, minimum_ratio_found, found_in_type='anyGroup')¶
Function to select high-quality features (after bracketing) This function selects only features that are found in at least % of replicates
- Parameters:
- test_groupslist of str
the groups to be tested
- minimum_ratio_foundfloat
the minimum ratio of all samples in a group in which the feature must have been detected
- found_in_typestr, optional
indicator if the feature must be present in all or any group in at least % samples. Defaults to “anyGroup”.
- restrict_to_high_quality_features__low_RSD_in_groups(test_groups, maximum_RSD)¶
Function to select high-quality features (after bracketing) This function selects only features that have a low within-group variability
- Parameters:
- test_groupslist of str
the groups to be tested
- maximum_RSDfloat
the maximum allowed rsd for a feature to be used, must be lower in all groups
- restrict_to_high_quality_features__minimum_intensity_filter(test_groups, minimum_intensity)¶
Function to select high-quality features (after bracketing) This function selects only features that have a minimum intensity in at least one samples of a group
- Parameters:
- test_groupslist of str
the groups to test
- minimum_intensityfloat
the mimimum requested signal intensity
- restrict_to_high_quality_features__most_n_abundant(n_features)¶
Function to select high-quality features (after bracketing) This function selects the top-n most abundant features
- Parameters:
- n_featuresinteger
the number of features to select
- save_self_to_dill_file(dill_file)¶
Save the DartMSAssay object to a file
- Parameters:
- dill_filestring
the *.dill file to save the DartMSAssay to
- select_top_n_spectra(n)¶
A function to restrict chronogram spots to only certain spectra in the dataset (e.g., use the ‘core’ of the spot). From all spectra assigned to the spot, only the n most abundant will be used Parameters ———- n : integer
the number of highest-abundant spectra to be used.
- set_data(dat, features, featureAnnotations, samples, groups, batches)¶
Sets the data for a DartMSAssay object
- Parameters:
- datnumpy.ndarray of [n, m]
the feature table of the experiment
- featureslist of mz values
the features’ information (mz values)
- featureAnnotationslist of dictionaries
the features’ derived information (sister ions, etc.)
- sampleslist of string
the names of the samples in the experiment
- groupslist of str
the group names the samples in the experiment are assigned to
- batcheslist of int
the batch ids the samples in the experiment are assigned to
- static show_sample_overview(filenames, ms_mode, instrument, separation_intensity=1000.0)¶
Generates an overview of the data to be imported in subsequent steps
- Parameters:
- filenameslist of str
File path of the chronograms
- subset_features(keep_features_with_indices=None, remove_features_with_indices=None)¶
Subset the detected features and include or exclude them
- Parameters:
- keep_featureslist of indices, optional
features to be included in the export. To export all, the parameter must be set to None. Defaults to None.
- remove_featureslist of indices, optional
features to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.
- Raises:
- RuntimeError
if no features have been detected, this exception will be raised
- subset_samples(keep_samples=None, keep_groups=None, keep_batches=None, remove_samples=None, remove_groups=None, remove_batches=None)¶
Subset certain samples, groups or batches in the DartMSAssay object
- Parameters:
- keep_sampleslist of strings, optional
samples to be included in the export. To export all, the parameter must be set to None. Defaults to None.
- remove_sampleslist of strings, optional
samples to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.
- keep_groupslist of strings, optional
groups to be included in the export. To export all, the parameter must be set to None. Defaults to None.
- remove_groupslist of strings, optional
groups to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.
- keep_batcheslist of strings, optional
batches to be inlcuded in the export. To export all, the parameter must be set to None. Defaults to None.
- remove_batcheslist of integers, optional
batches to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.
- Raises:
- RuntimeError
if no features have been detected, this exception will be raised
- write_bracketing_results_to_featureML(featureMLlocation='./results.featureML', featureMLStartRT=0, featureMLEndRT=1400)¶
export the bracketed results to a featureML file for easy visualization in TOPPView
- Parameters:
- featureMLlocationstr, optional
path to the featureML file. Defaults to “./results.featureML”.
- featureMLStartRTint, optional
the earliest chronogram time. Defaults to 0.
- featureMLEndRTint, optional
the latest chronogram time. Defaults to 1400.
- write_consensus_spectrum_to_featureML_file_per_sample(widthRT=40)¶
export the consensus results to a featureML file for easy visualization in TOPPView. A separate featureML file will be generated for each sample. The path of the file will be the path of the mlML file with the replaced extension ‘.featureML’
- Parameters:
- widthRTint, optional
the with of the chronogram spots. Defaults to 40.
- cluster_quality_check_function__peak_form(sample, msDataObj, spectrumIDs, time, mz, intensity, cluster, min_correlation_for_cutoff=0.5)¶
A function to check the detected feature clusters for certain attributes. This particular function checks if the distribution form somehow resembles a spot (approximated by a normal distribution). Any cluster not resembling such a form will be removed in a subsequent step (by setting the cluster ids to -1)
- Parameters:
- samplestring)
the name of the sample
- msDataObjMSData object of tidyms
he MSData object in which this feature was detected
- spectrumIDslist of ids
the spectrum id of each found signal in a cluster
- timelist of numeric
the chronogram time of each found signal in a cluster
- mzlist of numeric
the mz value of each found signal in a cluster
- intensitylist of numeric
the intensity value of each found signal in a cluster
- clusterlist of integer
the cluster ids each signal was assigned to
- min_correlation_for_cutofffloat, optional
the minimum Pearson correlation cutoff for the spot shape form comparison [-1 to 1]. Defaults to 0.5.
- Returns:
- list of integer
the new cluster ids each signal is assigned to. clusters to be removed are set to -1
- cluster_quality_check_function__ppmDeviationCheck(sample, msDataObj, spectrumIDs, time, mz, intensity, cluster, max_weighted_ppm_deviation=15)¶
A function to check the detected feature clusters for certain attributes. This particular function checks if the clusters are within a certain ppm devaition. Any cluster exceeding this deviation will be removed in a subsequent step (by setting the cluster ids to -1)
- Parameters:
- samplestring
he name of the sample
- msDataObjMSData object of tidyms
the MSData object in which this feature was detected
- spectrumIDslist of ids
the spectrum id of each found signal in a cluster
- timelist of numeric
the chronogram time of each found signal in a cluster
- mzlist of numeric
the mz value of each found signal in a cluster
- intensitylist of numeric
the intensity value of each found signal in a cluster
- clusterlist of integer
the cluster ids each signal was assigned to
- max_weighted_ppm_deviationfloat, optional
the maximum allowed ppm deviation for all signals within a cluster.
- Returns:
- list of integer
the new cluster ids each signal is assigned to. clusters to be removed are set to -1
- cohen_d(d1, d2)¶
Calculate cohen’s d value for effect size
- Parameters:
- d1list or numpy array of numerics
numeric values of group 1
- d2list or numpy array of numerics
numeric values of group 2
- Returns:
- numeric
cohen’s d value for the two groups
- import_filter_artifact_removal(msData, artifacts)¶
Filter artefacts before importing the mzML data
- Parameters:
- msDataMSData object of tidyms
the loaded mzML raw data to be filtered
- artifactslist of (mz_min and mz_max) tuples
a variable number of artifacts to be removed from the dataset. Each eantry must be a tuple of a minimum and maximum mz value describing the artifacts
- Returns:
- MSData
the altered or changed MSData object
- import_filter_mz_range(msData, min_mz, max_mz)¶
Filter mz values within a certain range before importing the mzML data
- Parameters:
- msDataMSData object of tidyms
the loaded mzML raw data to be filtered
- min_mznumeric
the lower mz value to be used
- max_mznumeric
the higher mz value to be used
- Returns:
- MSData
the altered or changed MSData object
- import_filter_remove_signals_below_intensity(msData, minimum_signal_intensity)¶
Filter signals below a minimum intensity threshold
- Parameters:
- msDataMSData object of tidyms
the loaded mzML raw data to be filtered
- minimum_signal_intensitynumeric
the minimum intensity value for signals to be used
- Returns:
- MSData
the altered or changed MSData object
- prefab_DARTMS_dataProcessing_pipeline(spotFile, files, ms_mode='centroid', instrument='qtof', fileNameChangeFunction=None, create_assay_from_chronogramFiles__import_filters=None, select_top_n_spectra__top_n_spectra=None, correct_mz_shift__referenceMZs=None, correct_mz_shift__max_mz_deviation_absolute=0.1, correct_mz_shift__max_deviationPPM_to_use_for_correction=200, correct_mz_shift__correctby='mzDeviationPPM', correct_mz_shift__selection_criteria='mostabundant', correct_mz_shift__correct_on_level='file', calculate_consensus_spectra_for_samples__min_difference_ppm=100, calculate_consensus_spectra_for_samples__min_signals_per_cluster=5, calculate_consensus_spectra_for_samples__minimum_intensity_for_signals=250.0, calculate_consensus_spectra_for_samples__cluster_quality_check_functions=None, normalize_to_internal_standard__perform=False, normalize_to_internal_standard__internal_standard_mzs=None, normalize_to_internal_standard__multiplication_factor=1, bracket_consensus_spectrum_samples__max_ppm_deviation=25, annotate_features__useGroups=None, annotate_features_remove_other_ions=False, build_data_matrix__originalData_mz_deviation_multiplier_PPM=30, build_data_matrix__aggregation_fun='average', results_file=None, dill_file=None)¶
Entire DART-MS workflow
Function processes samples (parameter file) and their spots (spotFile) using 1. Top-N spectra selection (optional) 2. m/z shift correction 3. Consensus spectra generation 4. Internal standard normalizatoin 5. Bracketing of features across samples 6. Generation of data matrix 7. Annotation of common adducts and isotopologs
The results of the processing are returned in form of a new DartMSAssay object and can optionally also be saved to a dill file.
For a detailed explanation of the parameters <parameter_name> please see the respective function function_name. Each parameter name has the form <function_name>__<parameter_name>.
Note: the spotFile must already exist
- Parameters:
- spotFilestring
The path to the spotFile
- fileslist of str
The path to the raw-data
- dill_fileoptional, str
the path to the dill file for storing the results
- Returns:
- (DartMSAssay object of the processing)