tidyms.utils

Utility functions used inside several modules.

array1d_to_str(arr: ndarray)

Encode a numpy array into a string.

Parameters:
arrarray
Returns:
str
cv(df: DataFrame | Series, fill_value: float | None = None) Series | float

Computes the Coefficient of variation for each column.

Used by DataContainer objects to compute metrics.

detection_rate(df: DataFrame | Series, threshold: float = 0.0) Series | float

Computes the fraction of values in a column above the threshold.

Parameters:
dfDataFrame
thresholdfloat
Returns:
drpd.Series
find_closest(x: ndarray, xq: ndarray | float | int, is_sorted: bool = True) ndarray

Search the closest value between two arrays.

Parameters:
xarray

Array used to search

xqarray

query values

is_sortedbool, default=True

If True, assumes that x is sorted.

Returns:
array of indices in x
gauss(x: ndarray, mu: float, sigma: float, amp: float)

gaussian curve.

Parameters:
xnp.array
mufloat
sigmafloat
ampfloat
Returns:
gaussiannp.array
gaussian_mixture(x: ndarray, params: ndarray) ndarray

Mixture of gaussian curves.

Parameters:
xarray
params: np.ndarray

parameter for each curve the shape of the array is n_curves by 3. Each row has parameters for one curve (mu, sigma, amp)

Returns:
mixture: np.ndarray

array with gaussian curves. Each row is a gaussian curve. The shape of the array is params.shape[0] by x.size.

get_filename(full_path: str) str

get the filename from a full path.

Parameters:
full_path: str
Returns:
filename: str`
get_settings() dict

Loads the settings into a dictionary object.

Returns:
settingsdict
get_tidyms_path() str

Returns the path to the directory where datasets and config files are stored.

Returns:
pathstr
is_notebook() bool

Returns True if the environment is jupyter notebook.

Returns:
bool
mad(df: DataFrame | Series) Series | float

Computes the median absolute deviation for each column. Fill missing values with zero.

metadata_correlation(y, x, mode: str = 'ols')

Computes correlation metrics between two variables.

Parameters:
yarray
xarray
mode: {“ols”, “pearson”, “spearman”}

ols computes r squared, Jarque-Bera test p-value and Durwin-Watson statistic from the ordinary least squares linear regression. spearman computes the spearman rank correlation coefficient.

Returns:
dict
normalize(df: DataFrame, method: str, feature: str | None = None) DataFrame

Normalize samples using different methods.

Parameters:
df: pandas.DataFrame
method: {“sum”, “max”, “euclidean”, “feature”}

Normalization method. sum normalizes using the sum along each row, max normalizes using the maximum of each row. euclidean normalizes using the euclidean norm of the row. feature normalizes area using the value of a specified feature.

feature: str, optional

Feature used for normalization in feature mode.

Returns:
normalized: pandas.DataFrame
robust_cv(df: DataFrame | Series, fill_value: float | None = None) Series | float

Estimation of the coefficient of variation using the MAD and median. Assumes a normal distribution.

sample_to_path(samples, path)

map sample names to raw path if available.

Parameters:
samplesIterable[str].

samples names

pathstr.

path to raw sample data.

Returns:
ddict
scale(df: DataFrame, method: str) DataFrame

scales features using different methods.

Parameters:
df: pandas.DataFrame
method: {“autoscaling”, “rescaling”, “pareto”}

Scaling method. autoscaling performs mean centering scaling of features to unitary variance. rescaling scales data to a 0-1 range. pareto performs mean centering and scaling using the square root of the standard deviation

Returns:
scaled: pandas.DataFrame
sd_ratio(df1: DataFrame, df2: DataFrame, robust: bool = False, fill_value: float | None = None) Series

Computes the ratio between the standard deviation of the columns of DataFrame1 and DataFrame2.

Used to compute the D-Ratio metric.

Parameters:
df1DataFrame with shape (n1, m)
df2DataFrame with shape (n2, m)
robustbool

If True uses the MAD as an estimator of the standard deviation. Else computes the sample standard deviation.

fill_valueNumber used to input NaNs.
Returns:
ratiopd.Series
str_to_array1d(s: str)

Decode a string generated with array1d_to_str into a numpy array.

Parameters:
sstr
Returns:
numpy.ndarray
transform(df: DataFrame, method: str) DataFrame

perform common data transformations.

Parameters:
df: pandas.DataFrame
method: {“log”, “power”}

transform method. log applies the base 10 logarithm on the data. power

Returns:
transformed: pandas.DataFrame