tidyms.utils¶
Utility functions used inside several modules.
- array1d_to_str(arr: ndarray)¶
Encode a numpy array into a string.
- Parameters:
- arrarray
- Returns:
- str
- cv(df: DataFrame | Series, fill_value: float | None = None) Series | float¶
Computes the Coefficient of variation for each column.
Used by DataContainer objects to compute metrics.
- detection_rate(df: DataFrame | Series, threshold: float = 0.0) Series | float¶
Computes the fraction of values in a column above the threshold.
- Parameters:
- dfDataFrame
- thresholdfloat
- Returns:
- drpd.Series
- find_closest(x: ndarray, xq: ndarray | float | int, is_sorted: bool = True) ndarray¶
Search the closest value between two arrays.
- Parameters:
- xarray
Array used to search
- xqarray
query values
- is_sortedbool, default=True
If True, assumes that x is sorted.
- Returns:
- array of indices in x
- gauss(x: ndarray, mu: float, sigma: float, amp: float)¶
gaussian curve.
- Parameters:
- xnp.array
- mufloat
- sigmafloat
- ampfloat
- Returns:
- gaussiannp.array
- gaussian_mixture(x: ndarray, params: ndarray) ndarray¶
Mixture of gaussian curves.
- Parameters:
- xarray
- params: np.ndarray
parameter for each curve the shape of the array is n_curves by 3. Each row has parameters for one curve (mu, sigma, amp)
- Returns:
- mixture: np.ndarray
array with gaussian curves. Each row is a gaussian curve. The shape of the array is params.shape[0] by x.size.
- get_filename(full_path: str) str¶
get the filename from a full path.
- Parameters:
- full_path: str
- Returns:
- filename: str`
- get_settings() dict¶
Loads the settings into a dictionary object.
- Returns:
- settingsdict
- get_tidyms_path() str¶
Returns the path to the directory where datasets and config files are stored.
- Returns:
- pathstr
- is_notebook() bool¶
Returns True if the environment is jupyter notebook.
- Returns:
- bool
- mad(df: DataFrame | Series) Series | float¶
Computes the median absolute deviation for each column. Fill missing values with zero.
- metadata_correlation(y, x, mode: str = 'ols')¶
Computes correlation metrics between two variables.
- Parameters:
- yarray
- xarray
- mode: {“ols”, “pearson”, “spearman”}
ols computes r squared, Jarque-Bera test p-value and Durwin-Watson statistic from the ordinary least squares linear regression. spearman computes the spearman rank correlation coefficient.
- Returns:
- dict
- normalize(df: DataFrame, method: str, feature: str | None = None) DataFrame¶
Normalize samples using different methods.
- Parameters:
- df: pandas.DataFrame
- method: {“sum”, “max”, “euclidean”, “feature”}
Normalization method. sum normalizes using the sum along each row, max normalizes using the maximum of each row. euclidean normalizes using the euclidean norm of the row. feature normalizes area using the value of a specified feature.
- feature: str, optional
Feature used for normalization in feature mode.
- Returns:
- normalized: pandas.DataFrame
- robust_cv(df: DataFrame | Series, fill_value: float | None = None) Series | float¶
Estimation of the coefficient of variation using the MAD and median. Assumes a normal distribution.
- sample_to_path(samples, path)¶
map sample names to raw path if available.
- Parameters:
- samplesIterable[str].
samples names
- pathstr.
path to raw sample data.
- Returns:
- ddict
- scale(df: DataFrame, method: str) DataFrame¶
scales features using different methods.
- Parameters:
- df: pandas.DataFrame
- method: {“autoscaling”, “rescaling”, “pareto”}
Scaling method. autoscaling performs mean centering scaling of features to unitary variance. rescaling scales data to a 0-1 range. pareto performs mean centering and scaling using the square root of the standard deviation
- Returns:
- scaled: pandas.DataFrame
- sd_ratio(df1: DataFrame, df2: DataFrame, robust: bool = False, fill_value: float | None = None) Series¶
Computes the ratio between the standard deviation of the columns of DataFrame1 and DataFrame2.
Used to compute the D-Ratio metric.
- Parameters:
- df1DataFrame with shape (n1, m)
- df2DataFrame with shape (n2, m)
- robustbool
If True uses the MAD as an estimator of the standard deviation. Else computes the sample standard deviation.
- fill_valueNumber used to input NaNs.
- Returns:
- ratiopd.Series
- str_to_array1d(s: str)¶
Decode a string generated with array1d_to_str into a numpy array.
- Parameters:
- sstr
- Returns:
- numpy.ndarray