tidyms.correspondence

Functions used to match features

match_features(feature_table: DataFrame, samples_per_class: dict, include_classes: List[int] | None, mz_tolerance: float, rt_tolerance: float, min_fraction: float, max_deviation: float, n_jobs: int | None = None, verbose: bool = False)

Match features across samples using DBSCAN and GMM.See the user guide for a detailed description of the algorithm.

Parameters:
feature_tablepd.DataFrame

Feature table obtained after feature detection.

samples_per_classdict

Maps a class name to the number of samples in the class.

include_classesList or None, default=None

Sample classes used to estimate the minimum cluster size and number of chemical species in a cluster.

mz_tolerancefloat

m/z tolerance used to group close features. Sets the eps parameter in the DBSCAN algorithm.

rt_tolerancefloat

Rt tolerance used to group close features. Sets the eps parameter in the DBSCAN algorithm.

min_fractionfloat

Minimum fraction of samples of a given class in a cluster. If include_classes is None, the total number of sample is used to compute the minimum fraction.

max_deviationfloat

The maximum deviation of a feature from a cluster, measured in numbers of standard deviations from the cluster.

n_jobs: int or None, default=None

Number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

verbosebool

If True, shows a progress bar.

Returns:
results: dictionary

clusters_ Contains the results from the feature mathing, where each number is a different ionic species. Features labelled with -1 are considered noise. indecisiveness is a metric that counts the fraction of features in a cluster that could be potentially assigned to more than one cluster. Values close to zero indicate higher quality grouping.