tidyms.correspondence¶

Functions used to match features

match_features(feature_table: DataFrame, samples_per_class: dict, include_classes: List[int] | None, mz_tolerance: float, rt_tolerance: float, min_fraction: float, max_deviation: float, n_jobs: int | None = None, verbose: bool = False)¶

Match features across samples using DBSCAN and GMM.See the user guide for a detailed description of the algorithm.

Parameters:

feature_tablepd.DataFrame: Feature table obtained after feature detection.
samples_per_classdict: Maps a class name to the number of samples in the class.
include_classesList or None, default=None: Sample classes used to estimate the minimum cluster size and number of chemical species in a cluster.
mz_tolerancefloat: m/z tolerance used to group close features. Sets the eps parameter in the DBSCAN algorithm.
rt_tolerancefloat: Rt tolerance used to group close features. Sets the eps parameter in the DBSCAN algorithm.
min_fractionfloat: Minimum fraction of samples of a given class in a cluster. If include_classes is None, the total number of sample is used to compute the minimum fraction.
max_deviationfloat: The maximum deviation of a feature from a cluster, measured in numbers of standard deviations from the cluster.
n_jobs: int or None, default=None: Number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.
verbosebool: If True, shows a progress bar.

Returns:

results: dictionary: clusters_ Contains the results from the feature mathing, where each number is a different ionic species. Features labelled with -1 are considered noise. indecisiveness is a metric that counts the fraction of features in a cluster that could be potentially assigned to more than one cluster. Values close to zero indicate higher quality grouping.