tidyms.dartms

Functionality to process DART-MS datasets

Objects

  • DartMSAssay: Stores raw and processed DART-MS data.

Usage

Predefined workflow:

  • prefab_DARTMS_dataProcessing_pipeline

Semi-automated parameter optimization:

  • compare_parameters_for_function

Spot detection:

  • create_assay_from_chronogramFiles

Data import:

  • create_assay_from_chronogramFiles

Data manipulation:

  • select_top_n_spectra

  • correct_MZ_shift_across_samples

  • calculate_consensus_spectra_for_samples

  • bracket_consensus_spectrum_samples

  • build_data_matrix

  • batch_correction

  • blank_subtraction

Annotation

  • annotate_features

  • annotate_with_compounds

Import/Export

  • save_self_to_dill_file

  • read_from_dill_file

  • export_data_matrix

  • export_for_R

  • write_bracketing_results_to_featureML

  • generate_feature_raw_plot

Statistics

  • restrict_to_high_quality_features__found_in_replicates

  • restrict_to_high_quality_features__minimum_intensity_filter

  • print_results_overview

  • plot_RSDs_per_group

  • calc_volcano_plots

  • calc_2D_Embedding

  • generate_feature_abundance_plot

class DartMSAssay(name='Generic')

An object inspired by tidyms’ Assay class that encapsulates a DARTMS experiment.

Constructor for a new DartMSAssay object

Parameters:
namestr, optional

name of the experiment. Defaults to “Generic”.

add_data_processing_step(step_identifier_text, log_text, processing_data=None)

Adds a data processing step to the log of the DartMSAssay object

Parameters:
step_identifier_textstring

name of the data processing step

log_textstring

description of the data processing step

processing_datadict, optional

further inforamtion (e.g., parameters) of the data processing step. Defaults to None.

annotate_features(useGroups=None, max_deviation_ppm=100, search_ions=None, remove_other_ions=True, plot=False)

Function to annotate the bracketed features with different sister ions (adducts, isotopologs, etc.) relative to parent ions (mostly [M+H]+ or [M-H]-)

Parameters:
useGroupslist of str, optional

groups to be used for the annotation (important for testing the ratio). Defaults to None.

max_deviation_ppmint, optional

the maximum allowed deviation between the calculated and observed mz value of a sister ion. Defaults to 100.

search_ionsdict, optional

the ions to search for. keys are ion names, values are mz increments (no decrements allowed). Defaults to None.

remove_other_ionsbool, optional

indicator if annotated sister ions should be removed. Defaults to True.

plotbool, optional

indicator if annotation results should be plotted. Defaults to False.

annotate_with_compounds(tsv_file, max_ppm_dev=15.0, adducts=None, delimiter='\t', quote_character='', comment_character='#')

Annotation of detected features with compounds from a database.

Parameters:
tsv_filestr

Path to a tab-separated file containing the database.

max_ppm_devfloat, optional

Maximum allowed mz difference in ppm relative to the theoretical value. Defaults to 15.0.

adductsdict, optional

key: stri, value: tuple of (charge number, mz increment). Defaults to None.

delimiterstr, optional

delimter character of the database. Defaults to “ “.

quote_characterstr, optional

quote character of the database. Defaults to “”.

comment_characterstr, optional

comment character of the database (not allowed in first row/header). Defaults to “#”.

batch_correction(by_group, plot=True)

Correct the abundances of all detected features in different batches. The algorithm is as follows An overall mean-QC-overall-value is derived from all QC samples (regardless of the batch) using all features detected in these QC samples. For each batch, a mean-QC-sample-value is derived from QC samples in each batch using all features detected in these QC samples. All samples in the batch are corrected by this mean-QC-sample-value. For this, all feature abundances are divided by this value Furthermore, this corrected abundance values are multiplied by the mean-QC-overall-value to achieve similar abundance values than before the correction.

Parameters:
by_groupstring

the name of the group to be used for the batch correction

plotbool, optional

show the correction results as plots. Defaults to True.

blank_subtraction(blankGroup, toTestGroups, foldCutoff=2, pvalueCutoff=0.05, minDetected=2, plot=False)

Method to remove background features from the datamatrix. Repeated calls with different blank groups are possible. Inspired by the background-subtraction module of MZmine3

Parameters:
blankGroupstring

the name of the blank group

toTestGroupslist of str

the name of the groups to test against the blank group

foldCutoffint, optional

the minimum fold-change between at least one test-group and the blank group in order for a feature to not be considered a background. Defaults to 2.

pvalueCutofffloat, optional

the alpha-threshold for the ttest. Defaults to 0.05.

minDetectedint, optional

the minimum number a feature must be detected in the background samples in order to be considered a background features. Defaults to 2.

plotbool, optional

indicator whether the subtraction shall be plotted as a volcano plot. Defaults to False.

Raises:
NotImplementedError

should never be raised, but is if the algorithm’s implementation is incorrect

bracket_consensus_spectrum_samples(closest_signal_max_deviation_ppm=20, max_ppm_deviation=25, show_diagnostic_plots=False)

Function to bracket consensus spectra across different samples

Parameters:
max_ppm_deviationfloat, optional

the maximum allowed devation (in ppm) a consensus group is allowed to have. Defaults to 25.

show_diagnostic_plotsbool, optional

indicator if a diagnostic plot shall be shown. Defaults to False.

build_data_matrix(on='originalData', originalData_mz_deviation_multiplier_PPM=0, aggregation_fun='average')

generates a data matrix from corrected, consensus spectra and bracketed features

Parameters:
onstr, optional

the data used to derived abundance values from. with ‘processedData’ the consensus spectra will be used, while with ‘originalData’ the raw-data will be used. Defaults to “originalData”.

originalData_mz_deviation_multiplier_PPMint, optional

an optional mz deviation allowed for the raw-data integration. Defaults to 0.

aggregation_funstr, optional

the method to calculate the derived abundance on integration of raw data. Defaults to “average”.

Raises:
ValueError

raised if parameters on and aggregation_fun have invalid values

calc_2D_Embedding(keep_features=None, remove_features=None, keep_samples=None, remove_samples=None, keep_groups=None, remove_groups=None, keep_batches=None, remove_batches=None, imputation='zero', scaling='standard', embedding='pca')

Calculates a two-dimensional embedding of the data matrix (or a subset) and illustrates it as a scores/component plot

Parameters:
keep_featureslist of indices, optional

features to be included for the embedding. Defaults to None.

remove_featureslist of indices, optional

features to not be included for the embedding. Defaults to None.

keep_sampleslist of str, optional

samples to be included for the embedding. Defaults to None.

remove_sampleslist of str, optional

samples to not be included for the embedding. Defaults to None.

keep_groupslist of str, optional

groups to be included for the embedding. Defaults to None.

remove_groupslist of str, optional

groups to not be included for the embedding. Defaults to None.

keep_batcheslist of int, optional

batches to be included for the embedding. Defaults to None.

remove_batcheslist of int, optional

batches to not be included for the embedding. Defaults to None.

imputationstr, optional

imputation method for features with missing values (i.e., no signals detected for them). Defaults to “zero”, allowed are “zero” and “omitNA”.

scalingstr, optional

the scaling to be applied before the embedding calculation. Defaults to “standard”, allowed are None, “”, “standard”.

embeddingstr, optional

the embedding type (). Defaults to “pca”, allowed are “pca”, “lda”, “umap”.

Returns:
plotnine

the plot

Raises:
ValueError

raised if invalid parameters are provided

calc_volcano_plots(comparisons, alpha_critical=0.05, minimum_fold_change=2, keep_features=None, remove_features=None, highlight_features=None, min_ratio_samples_for_found=0.75, min_ratio_samples_for_not_found=0.25, sig_color='firebrick', not_different_color='cadetblue')

generate a volcano plot from the results

Parameters:
comparisonslist of tuple ofstr, str)

the names of the two groups to be compared

alpha_criticalfloat, optional

the critical alpha value for a significant difference. Defaults to 0.05.

minimum_fold_changeint, optional

the mimimum required fold-change for a significant difference. Defaults to 2.

keep_features_type_, optional

a list of indices to be used for the uni-variate comparison. Defaults to None.

remove_features_type_, optional

a list of features to not be used for the uni-variate comparison. Defaults to None.

highlight_featureslist of featre inds, optional

features to highlight, indices according to self.dat matrix and self.features must be provided.

sig_colorstr, optional

the name of the color used for plotting significantly different features. Defaults to “firebrick”.

not_different_colorstr, optional

the name of the color used for plotting not significantly different features. Defaults to “cadetblue”.

Returns:
Pandard dataframe

the data matrix for the volcano plot

calculate_consensus_spectra_for_samples(min_difference_ppm=30, closest_signal_max_deviation_ppm=20, max_mz_deviation_ppm=20, min_signals_per_cluster=10, minimum_intensity_for_signals=0, cluster_quality_check_functions=None, aggregation_function='average', exportAsFeatureML=True, featureMLlocation='.')

Function to collapse several spectra into a single consensus spectrum per spot

Parameters:
min_difference_ppmfloat, optional

Minimum difference in PPM required to separate into different clusters. Defaults to 30.

min_signals_per_clusterint, optional

Minimum number of signals for a certain MZ cluster for it to be used in the collapsed spectrum. Defaults to 10.

clone_DartMSAssay()

Clones the DartMSAssay object (deepcopy)

Returns:
DartMSAssay

the cloned, new DartMSAssay object

correct_MZ_shift_across_samples(referenceMZs=[166.086254594], max_mz_deviation_absolute=0.1, correctby='mzDeviationPPM', max_deviationPPM_to_use_for_correction=80, selection_criteria='mostAbundant', correct_on_level='file', plot=False)

Function to correct systematic shifts of mz values in individual spot samples Currently, only a constant MZ offset relative to several reference features can be corrected. The correction is carried out by calculating the median error relative to the reference features’ and then apply either the aboslute or ppm devaition to all mz values in the spot sample. A signal for a feature on the referenceMZs list is said to be found, if it is within the max_mz_deviation_absolute parameter. The feature with the closest mz difference will be used in cases where several features are present in the search window

Parameters:
assayAssay

The assay object of the experiment

referenceMZslist of MZ values (float), optional

The reference features for which the offsets shall be calculated. Defaults to [165.078978594 + 1.007276].

max_mz_deviation_absolutefloat, optional

Maximum deviation used for the search. Defaults to 0.1.

correctbystr, optional

Either “mzDeviation” for correcting by a constant mz offset or “mzDeviationPPM” to correct by a constant PPM offset. Defaults to “mzDeviationPPM”.

plotool, optional

Indicates if a plot shall be generated and returned. Defaults to False.

Returns:
pandas.DataFrame, plot

Overview of the correction and plot (if it shall be generated)

static create_assay_from_chronogramFiles(assay_name, filenames, spot_file, ms_mode, instrument, centroid_profileMode=True, fileNameChangeFunction=None, use_signal_function=None, rewriteRTinFiles=False, rewriteDeleteUnusedScans=True, intensity_threshold_spot_extraction=0, import_filters=None)

Generates a new assay from a series of chronograms and a spot_file

Parameters:
filenameslist of str

File path of the chronograms

spot_filestr

File path of the spot file

centroid_profileModebool

indicates if profile mode data shall be centroided automatically

Returns:
Assay

The new generated assay object with the either automatically or user-guided spots from the spot_file

drop_lower_spectra(drop_rate)

A function to restrict chronogram spots to only certain spectra in the dataset (e.g., use the ‘core’ of the spot). From the all spectra assigned to the spot, only the drop_rate % will be used Use with caution, as a variable number of scans over the spot might affect abundances, especially when aggregation_method = “sum” is used

Parameters:
drop_ratefloat

the ratio of the highest-abundant spectra to be used.

export_data_matrix(to_file, separator='\t', quotechar='"')

Export the data matrix to a tsv file

Parameters:
to_filestring

the file to save the results to

export_for_R(to_file)

Export the results to a tsv file and generate R code to import it

Parameters:
to_filestring

the location of the file (without extension, ‘.tsv’ will be added automatically)

static generate_database_template(to_tsv_file)

Generates a template for the database search

Parameters:
to_tsv_filestr

file to which the template shall be written

generate_feature_abundance_plot(feature_index, keep_samples=None, remove_samples=None, keep_groups=None, remove_groups=None, keep_batches=None, remove_batches=None)

Shows a single feature

Parameters:
feature_indexindex

the index of the feature to be plotted

Returns:
Pandas DataFrame

the data matrix for the plot

generate_feature_raw_plot(refMZ, reverseCorrectMZ=True, ppmDev=20, keep_samples=None, remove_samples=None, keep_groups=None, remove_groups=None, keep_batches=None, remove_batches=None)

generates a raw data plot for a detected feature

Parameters:
refMZfloat

the mz value of the feature to plot after mz correction (reverseCorrectMZ = True) or in the raw data (reverseCorrectMZ = False)

reverseCorrectMZboolean, optional

indicates if the refMZ value is after (True) or before (False) mz correction

ppmDevfloat, optional

the allowed mz deviation for the feature

Raises:
ValueError

raised if parameters on and aggregation_fun have invalid values

get_data_matrix_and_meta(keep_features=None, remove_features=None, keep_samples=None, remove_samples=None, keep_groups=None, remove_groups=None, keep_batches=None, remove_batches=None, copy=False)

Returns the data matrix as well as feature and samples informatoin available in the DartMSAssay object. Features and samples/groups/batches can be included or excluded before the export.

Parameters:
keep_featureslist of indices, optional

features to be included in the export. To export all, the parameter must be set to None. Defaults to None.

remove_featureslist of indices, optional

features to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.

keep_sampleslist of strings, optional

samples to be included in the export. To export all, the parameter must be set to None. Defaults to None.

remove_sampleslist of strings, optional

samples to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.

keep_groupslist of strings, optional

groups to be included in the export. To export all, the parameter must be set to None. Defaults to None.

remove_groupslist of strings, optional

groups to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.

keep_batcheslist of strings, optional

batches to be inlcuded in the export. To export all, the parameter must be set to None. Defaults to None.

remove_batcheslist of integers, optional

batches to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.

copybool, optional

indicator if the object should be cloned before export. This will be done automatically if any inclusion or restriction is provided. Defaults to False.

Returns:
(numpy data matrix [samples x features], list of feature properties, list of feature annotaitons, list of sample names, list of assigned group names, list of assigned batches

data matrix and meta-data

Raises:
RuntimeError

an exception is raised when the necessary data is not available

get_summary_of_results(reference_features=None, reference_features_allowed_deviationPPM=20.0)

Show a summary of the results.

Parameters:
reference_featureslist of float, optional

reference mz values to be used. Defaults to None.

reference_features_allowed_deviationPPMfloat, optional

allowed mz deviation. Defaults to 20.0.

Returns:
Pandas DataFrame

Summary table

normalize_samples_by_TIC(multiplication_factor=1)

abundances of spot spectra can be normalized by the toal intensity of the spectra

Parameters:
multiplication_factorint, optional

a factor that is applied on top of the normalization (i.e., shifts the maximum abundance to this value). Defaults to 1.

normalize_to_internal_standard(std, multiplication_factor=1, plot=False)

Abundances of spot spectra are normalized by the abundance of a selected internal standard

Parameters:
stdfloat

standard to normalize to

multiplication_factorint, optional

Defaults to 1.

plotbool, optional

Defaults to False.

plot_RSDs_per_group(uGroups=None, include=None, plotType='points', scales='free_y')

Plots an rsd distribution per group

Parameters:
uGroupslist of str, optional

the groups to be included in the overview. Defaults to None.

includelist of str, optional

missing values replacement strategies to be used. Defaults to None.

plotTypestr, optional

type of plot (either points or histogram). Defaults to “points”.

scalesstr, optional

parameter for plotnine and the y-scale of facetted plots. Defaults to “free_y”.

Raises:
ValueError

raised if an unknown plottype is provided

plot_feature_mz_deviations(featureInd, refMZ=None, types=None, keep_samples=None, remove_samples=None, keep_groups=None, remove_groups=None, keep_batches=None, remove_batches=None)

plots the mz deviation for a particular features

Parameters:
featureIndindex

the index of the feature to plot

typeslist of str, optional

plots to include. Defaults to None.

plot_heatmap(keep_features=None, remove_features=None, keep_samples=None, remove_samples=None, keep_groups=None, remove_groups=None, keep_batches=None, remove_batches=None, linkage_method='ward', distance_metric='euclidean')

Calculates and plots a heatmap.

Parameters:
keep_featureslist of indices, optional

features to be included for the embedding. Defaults to None.

remove_featureslist of indices, optional

features to not be included for the embedding. Defaults to None.

keep_sampleslist of str, optional

samples to be included for the embedding. Defaults to None.

remove_sampleslist of str, optional

samples to not be included for the embedding. Defaults to None.

keep_groupslist of str, optional

groups to be included for the embedding. Defaults to None.

remove_groupslist of str, optional

groups to not be included for the embedding. Defaults to None.

keep_batcheslist of int, optional

batches to be included for the embedding. Defaults to None.

remove_batcheslist of int, optional

batches to not be included for the embedding. Defaults to None.

linkage_methodstr, optional

linkage method to be used for generating the feature clustering. Defaults to ‘ward’, options are from scipy.linkage

distance_metricstr, optional

distnace method to be used for generating the feature clustering. Defaults to ‘euclidean’, options are from scipy.linkage

Returns:
plot

the heatmap

Raises:
ValueError

raised if invalid parameters are provided

plot_mz_deviation_overview(show=None, random_fraction=1)

plots an overview of the mz deviation in the bracketed results

Parameters:
show_type_, optional

indicator which plots should be shown. Defaults to None.

random_fractionint, optional

use only a random fraction of the features. Defaults to 1.

Raises:
ValueError

Unknown option(s) specified

plot_sample_TICs(separate=True, separate_by='group')

Plots the detected spots of the chronograms

Parameters:
separatebool, optional

indicator if a facetted plot shall be used or not. Defaults to True.

separate_bystr, optional

the variable used for grouping the results (can be file, group, or batch). Defaults to “group”.

plot_sample_abundances()

plots an overview of the feature abundances per sample an dgroup

Returns:
plotnine plot

the generated plot

print_results_overview()

prints an overview of the bracketed results to the console

print_sample_overview()

Prints an overview of the samples

static read_from_dill_file(dill_file)

Load a DartMSAssay from a file

Parameters:
dill_filestring

the *.dill file to load the DartMSAssay from

Returns:
DartMSAssay

the loaded DartMSAssay object

restrict_to_high_quality_features__found_in_replicates(test_groups, minimum_ratio_found, found_in_type='anyGroup')

Function to select high-quality features (after bracketing) This function selects only features that are found in at least % of replicates

Parameters:
test_groupslist of str

the groups to be tested

minimum_ratio_foundfloat

the minimum ratio of all samples in a group in which the feature must have been detected

found_in_typestr, optional

indicator if the feature must be present in all or any group in at least % samples. Defaults to “anyGroup”.

restrict_to_high_quality_features__low_RSD_in_groups(test_groups, maximum_RSD)

Function to select high-quality features (after bracketing) This function selects only features that have a low within-group variability

Parameters:
test_groupslist of str

the groups to be tested

maximum_RSDfloat

the maximum allowed rsd for a feature to be used, must be lower in all groups

restrict_to_high_quality_features__minimum_intensity_filter(test_groups, minimum_intensity)

Function to select high-quality features (after bracketing) This function selects only features that have a minimum intensity in at least one samples of a group

Parameters:
test_groupslist of str

the groups to test

minimum_intensityfloat

the mimimum requested signal intensity

restrict_to_high_quality_features__most_n_abundant(n_features)

Function to select high-quality features (after bracketing) This function selects the top-n most abundant features

Parameters:
n_featuresinteger

the number of features to select

save_self_to_dill_file(dill_file)

Save the DartMSAssay object to a file

Parameters:
dill_filestring

the *.dill file to save the DartMSAssay to

select_top_n_spectra(n)

A function to restrict chronogram spots to only certain spectra in the dataset (e.g., use the ‘core’ of the spot). From all spectra assigned to the spot, only the n most abundant will be used Parameters ———- n : integer

the number of highest-abundant spectra to be used.

set_data(dat, features, featureAnnotations, samples, groups, batches)

Sets the data for a DartMSAssay object

Parameters:
datnumpy.ndarray of [n, m]

the feature table of the experiment

featureslist of mz values

the features’ information (mz values)

featureAnnotationslist of dictionaries

the features’ derived information (sister ions, etc.)

sampleslist of string

the names of the samples in the experiment

groupslist of str

the group names the samples in the experiment are assigned to

batcheslist of int

the batch ids the samples in the experiment are assigned to

static show_sample_overview(filenames, ms_mode, instrument, separation_intensity=1000.0)

Generates an overview of the data to be imported in subsequent steps

Parameters:
filenameslist of str

File path of the chronograms

subset_features(keep_features_with_indices=None, remove_features_with_indices=None)

Subset the detected features and include or exclude them

Parameters:
keep_featureslist of indices, optional

features to be included in the export. To export all, the parameter must be set to None. Defaults to None.

remove_featureslist of indices, optional

features to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.

Raises:
RuntimeError

if no features have been detected, this exception will be raised

subset_samples(keep_samples=None, keep_groups=None, keep_batches=None, remove_samples=None, remove_groups=None, remove_batches=None)

Subset certain samples, groups or batches in the DartMSAssay object

Parameters:
keep_sampleslist of strings, optional

samples to be included in the export. To export all, the parameter must be set to None. Defaults to None.

remove_sampleslist of strings, optional

samples to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.

keep_groupslist of strings, optional

groups to be included in the export. To export all, the parameter must be set to None. Defaults to None.

remove_groupslist of strings, optional

groups to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.

keep_batcheslist of strings, optional

batches to be inlcuded in the export. To export all, the parameter must be set to None. Defaults to None.

remove_batcheslist of integers, optional

batches to be removed before exporting. To remove none, the parameter must be set to None. Defaults to None.

Raises:
RuntimeError

if no features have been detected, this exception will be raised

write_bracketing_results_to_featureML(featureMLlocation='./results.featureML', featureMLStartRT=0, featureMLEndRT=1400)

export the bracketed results to a featureML file for easy visualization in TOPPView

Parameters:
featureMLlocationstr, optional

path to the featureML file. Defaults to “./results.featureML”.

featureMLStartRTint, optional

the earliest chronogram time. Defaults to 0.

featureMLEndRTint, optional

the latest chronogram time. Defaults to 1400.

write_consensus_spectrum_to_featureML_file_per_sample(widthRT=40)

export the consensus results to a featureML file for easy visualization in TOPPView. A separate featureML file will be generated for each sample. The path of the file will be the path of the mlML file with the replaced extension ‘.featureML’

Parameters:
widthRTint, optional

the with of the chronogram spots. Defaults to 40.

cluster_quality_check_function__peak_form(sample, msDataObj, spectrumIDs, time, mz, intensity, cluster, min_correlation_for_cutoff=0.5)

A function to check the detected feature clusters for certain attributes. This particular function checks if the distribution form somehow resembles a spot (approximated by a normal distribution). Any cluster not resembling such a form will be removed in a subsequent step (by setting the cluster ids to -1)

Parameters:
samplestring)

the name of the sample

msDataObjMSData object of tidyms

he MSData object in which this feature was detected

spectrumIDslist of ids

the spectrum id of each found signal in a cluster

timelist of numeric

the chronogram time of each found signal in a cluster

mzlist of numeric

the mz value of each found signal in a cluster

intensitylist of numeric

the intensity value of each found signal in a cluster

clusterlist of integer

the cluster ids each signal was assigned to

min_correlation_for_cutofffloat, optional

the minimum Pearson correlation cutoff for the spot shape form comparison [-1 to 1]. Defaults to 0.5.

Returns:
list of integer

the new cluster ids each signal is assigned to. clusters to be removed are set to -1

cluster_quality_check_function__ppmDeviationCheck(sample, msDataObj, spectrumIDs, time, mz, intensity, cluster, max_weighted_ppm_deviation=15)

A function to check the detected feature clusters for certain attributes. This particular function checks if the clusters are within a certain ppm devaition. Any cluster exceeding this deviation will be removed in a subsequent step (by setting the cluster ids to -1)

Parameters:
samplestring

he name of the sample

msDataObjMSData object of tidyms

the MSData object in which this feature was detected

spectrumIDslist of ids

the spectrum id of each found signal in a cluster

timelist of numeric

the chronogram time of each found signal in a cluster

mzlist of numeric

the mz value of each found signal in a cluster

intensitylist of numeric

the intensity value of each found signal in a cluster

clusterlist of integer

the cluster ids each signal was assigned to

max_weighted_ppm_deviationfloat, optional

the maximum allowed ppm deviation for all signals within a cluster.

Returns:
list of integer

the new cluster ids each signal is assigned to. clusters to be removed are set to -1

cohen_d(d1, d2)

Calculate cohen’s d value for effect size

Parameters:
d1list or numpy array of numerics

numeric values of group 1

d2list or numpy array of numerics

numeric values of group 2

Returns:
numeric

cohen’s d value for the two groups

import_filter_artifact_removal(msData, artifacts)

Filter artefacts before importing the mzML data

Parameters:
msDataMSData object of tidyms

the loaded mzML raw data to be filtered

artifactslist of (mz_min and mz_max) tuples

a variable number of artifacts to be removed from the dataset. Each eantry must be a tuple of a minimum and maximum mz value describing the artifacts

Returns:
MSData

the altered or changed MSData object

import_filter_mz_range(msData, min_mz, max_mz)

Filter mz values within a certain range before importing the mzML data

Parameters:
msDataMSData object of tidyms

the loaded mzML raw data to be filtered

min_mznumeric

the lower mz value to be used

max_mznumeric

the higher mz value to be used

Returns:
MSData

the altered or changed MSData object

import_filter_remove_signals_below_intensity(msData, minimum_signal_intensity)

Filter signals below a minimum intensity threshold

Parameters:
msDataMSData object of tidyms

the loaded mzML raw data to be filtered

minimum_signal_intensitynumeric

the minimum intensity value for signals to be used

Returns:
MSData

the altered or changed MSData object

prefab_DARTMS_dataProcessing_pipeline(spotFile, files, ms_mode='centroid', instrument='qtof', fileNameChangeFunction=None, create_assay_from_chronogramFiles__import_filters=None, select_top_n_spectra__top_n_spectra=None, correct_mz_shift__referenceMZs=None, correct_mz_shift__max_mz_deviation_absolute=0.1, correct_mz_shift__max_deviationPPM_to_use_for_correction=200, correct_mz_shift__correctby='mzDeviationPPM', correct_mz_shift__selection_criteria='mostabundant', correct_mz_shift__correct_on_level='file', calculate_consensus_spectra_for_samples__min_difference_ppm=100, calculate_consensus_spectra_for_samples__min_signals_per_cluster=5, calculate_consensus_spectra_for_samples__minimum_intensity_for_signals=250.0, calculate_consensus_spectra_for_samples__cluster_quality_check_functions=None, normalize_to_internal_standard__perform=False, normalize_to_internal_standard__internal_standard_mzs=None, normalize_to_internal_standard__multiplication_factor=1, bracket_consensus_spectrum_samples__max_ppm_deviation=25, annotate_features__useGroups=None, annotate_features_remove_other_ions=False, build_data_matrix__originalData_mz_deviation_multiplier_PPM=30, build_data_matrix__aggregation_fun='average', results_file=None, dill_file=None)

Entire DART-MS workflow

Function processes samples (parameter file) and their spots (spotFile) using 1. Top-N spectra selection (optional) 2. m/z shift correction 3. Consensus spectra generation 4. Internal standard normalizatoin 5. Bracketing of features across samples 6. Generation of data matrix 7. Annotation of common adducts and isotopologs

The results of the processing are returned in form of a new DartMSAssay object and can optionally also be saved to a dill file.

For a detailed explanation of the parameters <parameter_name> please see the respective function function_name. Each parameter name has the form <function_name>__<parameter_name>.

Note: the spotFile must already exist

Parameters:
spotFilestring

The path to the spotFile

fileslist of str

The path to the raw-data

dill_fileoptional, str

the path to the dill file for storing the results

Returns:
(DartMSAssay object of the processing)