tidyms.correspondence¶
Functions used to match features
- match_features(feature_table: DataFrame, samples_per_class: dict, include_classes: List[int] | None, mz_tolerance: float, rt_tolerance: float, min_fraction: float, max_deviation: float, n_jobs: int | None = None, verbose: bool = False)¶
Match features across samples using DBSCAN and GMM.See the user guide for a detailed description of the algorithm.
- Parameters:
- feature_tablepd.DataFrame
Feature table obtained after feature detection.
- samples_per_classdict
Maps a class name to the number of samples in the class.
- include_classesList or None, default=None
Sample classes used to estimate the minimum cluster size and number of chemical species in a cluster.
- mz_tolerancefloat
m/z tolerance used to group close features. Sets the eps parameter in the DBSCAN algorithm.
- rt_tolerancefloat
Rt tolerance used to group close features. Sets the eps parameter in the DBSCAN algorithm.
- min_fractionfloat
Minimum fraction of samples of a given class in a cluster. If include_classes is
None, the total number of sample is used to compute the minimum fraction.- max_deviationfloat
The maximum deviation of a feature from a cluster, measured in numbers of standard deviations from the cluster.
- n_jobs: int or None, default=None
Number of jobs to run in parallel.
Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors.- verbosebool
If True, shows a progress bar.
- Returns:
- results: dictionary
clusters_ Contains the results from the feature mathing, where each number is a different ionic species. Features labelled with -1 are considered noise. indecisiveness is a metric that counts the fraction of features in a cluster that could be potentially assigned to more than one cluster. Values close to zero indicate higher quality grouping.