tidyms.dartms¶

Functionality to process DART-MS datasets

Objects¶

DartMSAssay: Stores raw and processed DART-MS data.

Usage¶

Predefined workflow:

prefab_DARTMS_dataProcessing_pipeline

Semi-automated parameter optimization:

compare_parameters_for_function

Spot detection:

create_assay_from_chronogramFiles

Data import:

create_assay_from_chronogramFiles

Data manipulation:

select_top_n_spectra
correct_MZ_shift_across_samples
calculate_consensus_spectra_for_samples
bracket_consensus_spectrum_samples
build_data_matrix
batch_correction
blank_subtraction

Annotation

annotate_features
annotate_with_compounds

Import/Export

save_self_to_dill_file
read_from_dill_file
export_data_matrix
export_for_R
write_bracketing_results_to_featureML
generate_feature_raw_plot

Statistics

restrict_to_high_quality_features__found_in_replicates
restrict_to_high_quality_features__minimum_intensity_filter
print_results_overview
plot_RSDs_per_group
calc_volcano_plots
calc_2D_Embedding
generate_feature_abundance_plot

class DartMSAssay(name='Generic')¶

An object inspired by tidyms’ Assay class that encapsulates a DARTMS experiment.

Constructor for a new DartMSAssay object

Parameters:

namestr, optional: name of the experiment. Defaults to “Generic”.

add_data_processing_step(step_identifier_text, log_text, processing_data=None)¶

Adds a data processing step to the log of the DartMSAssay object

Parameters:

step_identifier_textstring: name of the data processing step
log_textstring: description of the data processing step
processing_datadict, optional: further inforamtion (e.g., parameters) of the data processing step. Defaults to None.

annotate_features(useGroups=None, max_deviation_ppm=100, search_ions=None, remove_other_ions=True, plot=False)¶

Function to annotate the bracketed features with different sister ions (adducts, isotopologs, etc.) relative to parent ions (mostly [M+H]+ or [M-H]-)

Parameters:

useGroupslist of str, optional: groups to be used for the annotation (important for testing the ratio). Defaults to None.
max_deviation_ppmint, optional: the maximum allowed deviation between the calculated and observed mz value of a sister ion. Defaults to 100.
search_ionsdict, optional: the ions to search for. keys are ion names, values are mz increments (no decrements allowed). Defaults to None.
remove_other_ionsbool, optional: indicator if annotated sister ions should be removed. Defaults to True.
plotbool, optional: indicator if annotation results should be plotted. Defaults to False.

annotate_with_compounds(tsv_file, max_ppm_dev=15.0, adducts=None, delimiter='\t', quote_character='', comment_character='#')¶

Annotation of detected features with compounds from a database.

Parameters:

tsv_filestr: Path to a tab-separated file containing the database.
max_ppm_devfloat, optional: Maximum allowed mz difference in ppm relative to the theoretical value. Defaults to 15.0.
adductsdict, optional: key: stri, value: tuple of (charge number, mz increment). Defaults to None.
delimiterstr, optional: delimter character of the database. Defaults to “ “.
quote_characterstr, optional: quote character of the database. Defaults to “”.
comment_characterstr, optional: comment character of the database (not allowed in first row/header). Defaults to “#”.

batch_correction(by_group, plot=True)¶

Correct the abundances of all detected features in different batches. The algorithm is as follows An overall mean-QC-overall-value is derived from all QC samples (regardless of the batch) using all features detected in these QC samples. For each batch, a mean-QC-sample-value is derived from QC samples in each batch using all features detected in these QC samples. All samples in the batch are corrected by this mean-QC-sample-value. For this, all feature abundances are divided by this value Furthermore, this corrected abundance values are multiplied by the mean-QC-overall-value to achieve similar abundance values than before the correction.

Parameters:

by_groupstring: the name of the group to be used for the batch correction
plotbool, optional: show the correction results as plots. Defaults to True.

blank_subtraction(blankGroup, toTestGroups, foldCutoff=2, pvalueCutoff=0.05, minDetected=2, plot=False)¶

Method to remove background features from the datamatrix. Repeated calls with different blank groups are possible. Inspired by the background-subtraction module of MZmine3

Parameters:

blankGroupstring: the name of the blank group
toTestGroupslist of str: the name of the groups to test against the blank group
foldCutoffint, optional: the minimum fold-change between at least one test-group and the blank group in order for a feature to not be considered a background. Defaults to 2.
pvalueCutofffloat, optional: the alpha-threshold for the ttest. Defaults to 0.05.
minDetectedint, optional: the minimum number a feature must be detected in the background samples in order to be considered a background features. Defaults to 2.
plotbool, optional: indicator whether the subtraction shall be plotted as a volcano plot. Defaults to False.

Raises:

NotImplementedError: should never be raised, but is if the algorithm’s implementation is incorrect

bracket_consensus_spectrum_samples(closest_signal_max_deviation_ppm=20, max_ppm_deviation=25, show_diagnostic_plots=False)¶

Function to bracket consensus spectra across different samples

Parameters:

max_ppm_deviationfloat, optional: the maximum allowed devation (in ppm) a consensus group is allowed to have. Defaults to 25.
show_diagnostic_plotsbool, optional: indicator if a diagnostic plot shall be shown. Defaults to False.

build_data_matrix(on='originalData', originalData_mz_deviation_multiplier_PPM=0, aggregation_fun='average')¶

generates a data matrix from corrected, consensus spectra and bracketed features

Parameters:

onstr, optional: the data used to derived abundance values from. with ‘processedData’ the consensus spectra will be used, while with ‘originalData’ the raw-data will be used. Defaults to “originalData”.
originalData_mz_deviation_multiplier_PPMint, optional: an optional mz deviation allowed for the raw-data integration. Defaults to 0.
aggregation_funstr, optional: the method to calculate the derived abundance on integration of raw data. Defaults to “average”.

Raises:

ValueError: raised if parameters on and aggregation_fun have invalid values

calc_2D_Embedding(keep_features=None, remove_features=None, keep_samples=None, remove_samples=None, keep_groups=None, remove_groups=None, keep_batches=None, remove_batches=None, imputation='zero', scaling='standard', embedding='pca')¶

Calculates a two-dimensional embedding of the data matrix (or a subset) and illustrates it as a scores/component plot

Parameters:

keep_featureslist of indices, optional: features to be included for the embedding. Defaults to None.
remove_featureslist of indices, optional: features to not be included for the embedding. Defaults to None.
keep_sampleslist of str, optional: samples to be included for the embedding. Defaults to None.
remove_sampleslist of str, optional: samples to not be included for the embedding. Defaults to None.
keep_groupslist of str, optional: groups to be included for the embedding. Defaults to None.
remove_groupslist of str, optional: groups to not be included for the embedding. Defaults to None.
keep_batcheslist of int, optional: batches to be included for the embedding. Defaults to None.
remove_batcheslist of int, optional: batches to not be included for the embedding. Defaults to None.
imputationstr, optional: imputation method for features with missing values (i.e., no signals detected for them). Defaults to “zero”, allowed are “zero” and “omitNA”.
scalingstr, optional: the scaling to be applied before the embedding calculation. Defaults to “standard”, allowed are None, “”, “standard”.
embeddingstr, optional: the embedding type (). Defaults to “pca”, allowed are “pca”, “lda”, “umap”.

Returns:

plotnine: the plot

Raises:

ValueError: raised if invalid parameters are provided

calc_volcano_plots(comparisons, alpha_critical=0.05, minimum_fold_change=2, keep_features=None, remove_features=None, highlight_features=None, min_ratio_samples_for_found=0.75, min_ratio_samples_for_not_found=0.25, sig_color='firebrick', not_different_color='cadetblue')¶

generate a volcano plot from the results

Parameters:

comparisonslist of tuple ofstr, str): the names of the two groups to be compared
alpha_criticalfloat, optional: the critical alpha value for a significant difference. Defaults to 0.05.
minimum_fold_changeint, optional: the mimimum required fold-change for a significant difference. Defaults to 2.
keep_features_type_, optional: a list of indices to be used for the uni-variate comparison. Defaults to None.
remove_features_type_, optional: a list of features to not be used for the uni-variate comparison. Defaults to None.
highlight_featureslist of featre inds, optional: features to highlight, indices according to self.dat matrix and self.features must be provided.
sig_colorstr, optional: the name of the color used for plotting significantly different features. Defaults to “firebrick”.
not_different_colorstr, optional: the name of the color used for plotting not significantly different features. Defaults to “cadetblue”.

Returns:

Pandard dataframe: the data matrix for the volcano plot

calculate_consensus_spectra_for_samples(min_difference_ppm=30, closest_signal_max_deviation_ppm=20, max_mz_deviation_ppm=20, min_signals_per_cluster=10, minimum_intensity_for_signals=0, cluster_quality_check_functions=None, aggregation_function='average', exportAsFeatureML=True, featureMLlocation='.')¶

Function to collapse several spectra into a single consensus spectrum per spot

Parameters:

min_difference_ppmfloat, optional: Minimum difference in PPM required to separate into different clusters. Defaults to 30.
min_signals_per_clusterint, optional: Minimum number of signals for a certain MZ cluster for it to be used in the collapsed spectrum. Defaults to 10.

clone_DartMSAssay()¶

Clones the DartMSAssay object (deepcopy)

Returns:

DartMSAssay: the cloned, new DartMSAssay object

correct_MZ_shift_across_samples(referenceMZs=[166.086254594], max_mz_deviation_absolute=0.1, correctby='mzDeviationPPM', max_deviationPPM_to_use_for_correction=80, selection_criteria='mostAbundant', correct_on_level='file', plot=False)¶

Function to correct systematic shifts of mz values in individual spot samples Currently, only a constant MZ offset relative to several reference features can be corrected. The correction is carried out by calculating the median error relative to the reference features’ and then apply either the aboslute or ppm devaition to all mz values in the spot sample. A signal for a feature on the referenceMZs list is said to be found, if it is within the max_mz_deviation_absolute parameter. The feature with the closest mz difference will be used in cases where several features are present in the search window

Parameters:

assayAssay: The assay object of the experiment
referenceMZslist of MZ values (float), optional: The reference features for which the offsets shall be calculated. Defaults to [165.078978594 + 1.007276].
max_mz_deviation_absolutefloat, optional: Maximum deviation used for the search. Defaults to 0.1.
correctbystr, optional: Either “mzDeviation” for correcting by a constant mz offset or “mzDeviationPPM” to correct by a constant PPM offset. Defaults to “mzDeviationPPM”.
plotool, optional: Indicates if a plot shall be generated and returned. Defaults to False.

Returns:

pandas.DataFrame, plot: Overview of the correction and plot (if it shall be generated)

static create_assay_from_chronogramFiles(assay_name, filenames, spot_file, ms_mode, instrument, centroid_profileMode=True, fileNameChangeFunction=None, use_signal_function=None, rewriteRTinFiles=False, rewriteDeleteUnusedScans=True, intensity_threshold_spot_extraction=0, import_filters=None)¶

Generates a new assay from a series of chronograms and a spot_file

Parameters:

filenameslist of str: File path of the chronograms
spot_filestr: File path of the spot file
centroid_profileModebool: indicates if profile mode data shall be centroided automatically

Returns:

Assay: The new generated assay object with the either automatically or user-guided spots from the spot_file

drop_lower_spectra(drop_rate)¶

A function to restrict chronogram spots to only certain spectra in the dataset (e.g., use the ‘core’ of the spot). From the all spectra assigned to the spot, only the drop_rate % will be used Use with caution, as a variable number of scans over the spot might affect abundances, especially when aggregation_method = “sum” is used

Parameters:

drop_ratefloat: the ratio of the highest-abundant spectra to be used.

export_data_matrix(to_file, separator='\t', quotechar='"')¶

Export the data matrix to a tsv file

Parameters:

to_filestring: the file to save the results to

export_for_R(to_file)¶

Export the results to a tsv file and generate R code to import it

Parameters:

to_filestring: the location of the file (without extension, ‘.tsv’ will be added automatically)

static generate_database_template(to_tsv_file)¶

Generates a template for the database search

Parameters:

to_tsv_filestr: file to which the template shall be written

generate_feature_abundance_plot(feature_index, keep_samples=None, remove_samples=None, keep_groups=None, remove_groups=None, keep_batches=None, remove_batches=None)¶

Shows a single feature

Parameters:

feature_indexindex: the index of the feature to be plotted

Returns:

Pandas DataFrame: the data matrix for the plot

generate_feature_raw_plot(refMZ, reverseCorrectMZ=True, ppmDev=20, keep_samples=None, remove_samples=None, keep_groups=None, remove_groups=None, keep_batches=None, remove_batches=None)¶

generates a raw data plot for a detected feature

Parameters:

refMZfloat: the mz value of the feature to plot after mz correction (reverseCorrectMZ = True) or in the raw data (reverseCorrectMZ = False)
reverseCorrectMZboolean, optional: indicates if the refMZ value is after (True) or before (False) mz correction
ppmDevfloat, optional: the allowed mz deviation for the feature

Raises:

ValueError: raised if parameters on and aggregation_fun have invalid values

get_data_matrix_and_meta(keep_features=None, remove_features=None, keep_samples=None, remove_samples=None, keep_groups=None, remove_groups=None, keep_batches=None, remove_batches=None, copy=False)¶

Returns the data matrix as well as feature and samples informatoin available in the DartMSAssay object. Features and samples/groups/batches can be included or excluded before the export.

Parameters:

keep_featureslist of indices, optional: features to be included in the export. To export all, the parameter must be set to None. Defaults to None.
remove_featureslist of indices, optional: features to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.
keep_sampleslist of strings, optional: samples to be included in the export. To export all, the parameter must be set to None. Defaults to None.
remove_sampleslist of strings, optional: samples to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.
keep_groupslist of strings, optional: groups to be included in the export. To export all, the parameter must be set to None. Defaults to None.
remove_groupslist of strings, optional: groups to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.
keep_batcheslist of strings, optional: batches to be inlcuded in the export. To export all, the parameter must be set to None. Defaults to None.
remove_batcheslist of integers, optional: batches to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.
copybool, optional: indicator if the object should be cloned before export. This will be done automatically if any inclusion or restriction is provided. Defaults to False.

Returns:

(numpy data matrix [samples x features], list of feature properties, list of feature annotaitons, list of sample names, list of assigned group names, list of assigned batches: data matrix and meta-data

Raises:

RuntimeError: an exception is raised when the necessary data is not available

get_summary_of_results(reference_features=None, reference_features_allowed_deviationPPM=20.0)¶

Show a summary of the results.

Parameters:

reference_featureslist of float, optional: reference mz values to be used. Defaults to None.
reference_features_allowed_deviationPPMfloat, optional: allowed mz deviation. Defaults to 20.0.

Returns:

Pandas DataFrame: Summary table

normalize_samples_by_TIC(multiplication_factor=1)¶

abundances of spot spectra can be normalized by the toal intensity of the spectra

Parameters:

multiplication_factorint, optional: a factor that is applied on top of the normalization (i.e., shifts the maximum abundance to this value). Defaults to 1.

normalize_to_internal_standard(std, multiplication_factor=1, plot=False)¶

Abundances of spot spectra are normalized by the abundance of a selected internal standard

Parameters:

stdfloat: standard to normalize to
multiplication_factorint, optional: Defaults to 1.
plotbool, optional: Defaults to False.

plot_RSDs_per_group(uGroups=None, include=None, plotType='points', scales='free_y')¶

Plots an rsd distribution per group

Parameters:

uGroupslist of str, optional: the groups to be included in the overview. Defaults to None.
includelist of str, optional: missing values replacement strategies to be used. Defaults to None.
plotTypestr, optional: type of plot (either points or histogram). Defaults to “points”.
scalesstr, optional: parameter for plotnine and the y-scale of facetted plots. Defaults to “free_y”.

Raises:

ValueError: raised if an unknown plottype is provided

plot_feature_mz_deviations(featureInd, refMZ=None, types=None, keep_samples=None, remove_samples=None, keep_groups=None, remove_groups=None, keep_batches=None, remove_batches=None)¶

plots the mz deviation for a particular features

Parameters:

featureIndindex: the index of the feature to plot
typeslist of str, optional: plots to include. Defaults to None.

plot_heatmap(keep_features=None, remove_features=None, keep_samples=None, remove_samples=None, keep_groups=None, remove_groups=None, keep_batches=None, remove_batches=None, linkage_method='ward', distance_metric='euclidean')¶

Calculates and plots a heatmap.

Parameters:

keep_featureslist of indices, optional: features to be included for the embedding. Defaults to None.
remove_featureslist of indices, optional: features to not be included for the embedding. Defaults to None.
keep_sampleslist of str, optional: samples to be included for the embedding. Defaults to None.
remove_sampleslist of str, optional: samples to not be included for the embedding. Defaults to None.
keep_groupslist of str, optional: groups to be included for the embedding. Defaults to None.
remove_groupslist of str, optional: groups to not be included for the embedding. Defaults to None.
keep_batcheslist of int, optional: batches to be included for the embedding. Defaults to None.
remove_batcheslist of int, optional: batches to not be included for the embedding. Defaults to None.
linkage_methodstr, optional: linkage method to be used for generating the feature clustering. Defaults to ‘ward’, options are from scipy.linkage
distance_metricstr, optional: distnace method to be used for generating the feature clustering. Defaults to ‘euclidean’, options are from scipy.linkage

Returns:

plot: the heatmap

Raises:

ValueError: raised if invalid parameters are provided

plot_mz_deviation_overview(show=None, random_fraction=1)¶

plots an overview of the mz deviation in the bracketed results

Parameters:

show_type_, optional: indicator which plots should be shown. Defaults to None.
random_fractionint, optional: use only a random fraction of the features. Defaults to 1.

Raises:

ValueError: Unknown option(s) specified

plot_sample_TICs(separate=True, separate_by='group')¶

Plots the detected spots of the chronograms

Parameters:

separatebool, optional: indicator if a facetted plot shall be used or not. Defaults to True.
separate_bystr, optional: the variable used for grouping the results (can be file, group, or batch). Defaults to “group”.

plot_sample_abundances()¶

plots an overview of the feature abundances per sample an dgroup

Returns:

plotnine plot: the generated plot

print_results_overview()¶: prints an overview of the bracketed results to the console

print_sample_overview()¶: Prints an overview of the samples

static read_from_dill_file(dill_file)¶

Load a DartMSAssay from a file

Parameters:

dill_filestring: the *.dill file to load the DartMSAssay from

Returns:

DartMSAssay: the loaded DartMSAssay object

restrict_to_high_quality_features__found_in_replicates(test_groups, minimum_ratio_found, found_in_type='anyGroup')¶

Function to select high-quality features (after bracketing) This function selects only features that are found in at least % of replicates

Parameters:

test_groupslist of str: the groups to be tested
minimum_ratio_foundfloat: the minimum ratio of all samples in a group in which the feature must have been detected
found_in_typestr, optional: indicator if the feature must be present in all or any group in at least % samples. Defaults to “anyGroup”.

restrict_to_high_quality_features__low_RSD_in_groups(test_groups, maximum_RSD)¶

Function to select high-quality features (after bracketing) This function selects only features that have a low within-group variability

Parameters:

test_groupslist of str: the groups to be tested
maximum_RSDfloat: the maximum allowed rsd for a feature to be used, must be lower in all groups

restrict_to_high_quality_features__minimum_intensity_filter(test_groups, minimum_intensity)¶

Function to select high-quality features (after bracketing) This function selects only features that have a minimum intensity in at least one samples of a group

Parameters:

test_groupslist of str: the groups to test
minimum_intensityfloat: the mimimum requested signal intensity

restrict_to_high_quality_features__most_n_abundant(n_features)¶

Function to select high-quality features (after bracketing) This function selects the top-n most abundant features

Parameters:

n_featuresinteger: the number of features to select

save_self_to_dill_file(dill_file)¶

Save the DartMSAssay object to a file

Parameters:

dill_filestring: the *.dill file to save the DartMSAssay to

select_top_n_spectra(n)¶: A function to restrict chronogram spots to only certain spectra in the dataset (e.g., use the ‘core’ of the spot). From all spectra assigned to the spot, only the n most abundant will be used Parameters ———- n : integer

the number of highest-abundant spectra to be used.

set_data(dat, features, featureAnnotations, samples, groups, batches)¶

Sets the data for a DartMSAssay object

Parameters:

datnumpy.ndarray of [n, m]: the feature table of the experiment
featureslist of mz values: the features’ information (mz values)
featureAnnotationslist of dictionaries: the features’ derived information (sister ions, etc.)
sampleslist of string: the names of the samples in the experiment
groupslist of str: the group names the samples in the experiment are assigned to
batcheslist of int: the batch ids the samples in the experiment are assigned to

static show_sample_overview(filenames, ms_mode, instrument, separation_intensity=1000.0)¶

Generates an overview of the data to be imported in subsequent steps

Parameters:

filenameslist of str: File path of the chronograms

subset_features(keep_features_with_indices=None, remove_features_with_indices=None)¶

Subset the detected features and include or exclude them

Parameters:

keep_featureslist of indices, optional: features to be included in the export. To export all, the parameter must be set to None. Defaults to None.
remove_featureslist of indices, optional: features to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.

Raises:

RuntimeError: if no features have been detected, this exception will be raised

subset_samples(keep_samples=None, keep_groups=None, keep_batches=None, remove_samples=None, remove_groups=None, remove_batches=None)¶

Subset certain samples, groups or batches in the DartMSAssay object

Parameters:

keep_sampleslist of strings, optional: samples to be included in the export. To export all, the parameter must be set to None. Defaults to None.
remove_sampleslist of strings, optional: samples to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.
keep_groupslist of strings, optional: groups to be included in the export. To export all, the parameter must be set to None. Defaults to None.
remove_groupslist of strings, optional: groups to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.
keep_batcheslist of strings, optional: batches to be inlcuded in the export. To export all, the parameter must be set to None. Defaults to None.
remove_batcheslist of integers, optional: batches to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.

Raises:

RuntimeError: if no features have been detected, this exception will be raised

write_bracketing_results_to_featureML(featureMLlocation='./results.featureML', featureMLStartRT=0, featureMLEndRT=1400)¶

export the bracketed results to a featureML file for easy visualization in TOPPView

Parameters:

featureMLlocationstr, optional: path to the featureML file. Defaults to “./results.featureML”.
featureMLStartRTint, optional: the earliest chronogram time. Defaults to 0.
featureMLEndRTint, optional: the latest chronogram time. Defaults to 1400.

write_consensus_spectrum_to_featureML_file_per_sample(widthRT=40)¶

export the consensus results to a featureML file for easy visualization in TOPPView. A separate featureML file will be generated for each sample. The path of the file will be the path of the mlML file with the replaced extension ‘.featureML’

Parameters:

widthRTint, optional: the with of the chronogram spots. Defaults to 40.

cluster_quality_check_function__peak_form(sample, msDataObj, spectrumIDs, time, mz, intensity, cluster, min_correlation_for_cutoff=0.5)¶

A function to check the detected feature clusters for certain attributes. This particular function checks if the distribution form somehow resembles a spot (approximated by a normal distribution). Any cluster not resembling such a form will be removed in a subsequent step (by setting the cluster ids to -1)

Parameters:

samplestring): the name of the sample
msDataObjMSData object of tidyms: he MSData object in which this feature was detected
spectrumIDslist of ids: the spectrum id of each found signal in a cluster
timelist of numeric: the chronogram time of each found signal in a cluster
mzlist of numeric: the mz value of each found signal in a cluster
intensitylist of numeric: the intensity value of each found signal in a cluster
clusterlist of integer: the cluster ids each signal was assigned to
min_correlation_for_cutofffloat, optional: the minimum Pearson correlation cutoff for the spot shape form comparison [-1 to 1]. Defaults to 0.5.

Returns:

list of integer: the new cluster ids each signal is assigned to. clusters to be removed are set to -1

cluster_quality_check_function__ppmDeviationCheck(sample, msDataObj, spectrumIDs, time, mz, intensity, cluster, max_weighted_ppm_deviation=15)¶

A function to check the detected feature clusters for certain attributes. This particular function checks if the clusters are within a certain ppm devaition. Any cluster exceeding this deviation will be removed in a subsequent step (by setting the cluster ids to -1)

Parameters:

samplestring: he name of the sample
msDataObjMSData object of tidyms: the MSData object in which this feature was detected
spectrumIDslist of ids: the spectrum id of each found signal in a cluster
timelist of numeric: the chronogram time of each found signal in a cluster
mzlist of numeric: the mz value of each found signal in a cluster
intensitylist of numeric: the intensity value of each found signal in a cluster
clusterlist of integer: the cluster ids each signal was assigned to
max_weighted_ppm_deviationfloat, optional: the maximum allowed ppm deviation for all signals within a cluster.

Returns:

list of integer: the new cluster ids each signal is assigned to. clusters to be removed are set to -1

cohen_d(d1, d2)¶

Calculate cohen’s d value for effect size

Parameters:

d1list or numpy array of numerics: numeric values of group 1
d2list or numpy array of numerics: numeric values of group 2

Returns:

numeric: cohen’s d value for the two groups

import_filter_artifact_removal(msData, artifacts)¶

Filter artefacts before importing the mzML data

Parameters:

msDataMSData object of tidyms: the loaded mzML raw data to be filtered
artifactslist of (mz_min and mz_max) tuples: a variable number of artifacts to be removed from the dataset. Each eantry must be a tuple of a minimum and maximum mz value describing the artifacts

Returns:

MSData: the altered or changed MSData object

import_filter_mz_range(msData, min_mz, max_mz)¶

Filter mz values within a certain range before importing the mzML data

Parameters:

msDataMSData object of tidyms: the loaded mzML raw data to be filtered
min_mznumeric: the lower mz value to be used
max_mznumeric: the higher mz value to be used

Returns:

MSData: the altered or changed MSData object

import_filter_remove_signals_below_intensity(msData, minimum_signal_intensity)¶

Filter signals below a minimum intensity threshold

Parameters:

msDataMSData object of tidyms: the loaded mzML raw data to be filtered
minimum_signal_intensitynumeric: the minimum intensity value for signals to be used

Returns:

MSData: the altered or changed MSData object

prefab_DARTMS_dataProcessing_pipeline(spotFile, files, ms_mode='centroid', instrument='qtof', fileNameChangeFunction=None, create_assay_from_chronogramFiles__import_filters=None, select_top_n_spectra__top_n_spectra=None, correct_mz_shift__referenceMZs=None, correct_mz_shift__max_mz_deviation_absolute=0.1, correct_mz_shift__max_deviationPPM_to_use_for_correction=200, correct_mz_shift__correctby='mzDeviationPPM', correct_mz_shift__selection_criteria='mostabundant', correct_mz_shift__correct_on_level='file', calculate_consensus_spectra_for_samples__min_difference_ppm=100, calculate_consensus_spectra_for_samples__min_signals_per_cluster=5, calculate_consensus_spectra_for_samples__minimum_intensity_for_signals=250.0, calculate_consensus_spectra_for_samples__cluster_quality_check_functions=None, normalize_to_internal_standard__perform=False, normalize_to_internal_standard__internal_standard_mzs=None, normalize_to_internal_standard__multiplication_factor=1, bracket_consensus_spectrum_samples__max_ppm_deviation=25, annotate_features__useGroups=None, annotate_features_remove_other_ions=False, build_data_matrix__originalData_mz_deviation_multiplier_PPM=30, build_data_matrix__aggregation_fun='average', results_file=None, dill_file=None)¶

Entire DART-MS workflow

Function processes samples (parameter file) and their spots (spotFile) using 1. Top-N spectra selection (optional) 2. m/z shift correction 3. Consensus spectra generation 4. Internal standard normalizatoin 5. Bracketing of features across samples 6. Generation of data matrix 7. Annotation of common adducts and isotopologs

The results of the processing are returned in form of a new DartMSAssay object and can optionally also be saved to a dill file.

For a detailed explanation of the parameters <parameter_name> please see the respective function function_name. Each parameter name has the form <function_name>__<parameter_name>.

Note: the spotFile must already exist

Parameters:

spotFilestring: The path to the spotFile
fileslist of str: The path to the raw-data
dill_fileoptional, str: the path to the dill file for storing the results

Returns:

(DartMSAssay object of the processing)