tidyms.utils¶

Utility functions used inside several modules.

array1d_to_str(arr: ndarray)¶

Encode a numpy array into a string.

Parameters:

arrarray

Returns:

str

cv(df: DataFrame | Series, fill_value: float | None = None) → Series | float¶

Computes the Coefficient of variation for each column.

Used by DataContainer objects to compute metrics.

detection_rate(df: DataFrame | Series, threshold: float = 0.0) → Series | float¶

Computes the fraction of values in a column above the threshold.

Parameters:

dfDataFrame
thresholdfloat

Returns:

drpd.Series

find_closest(x: ndarray, xq: ndarray | float | int, is_sorted: bool = True) → ndarray¶

Search the closest value between two arrays.

Parameters:

xarray: Array used to search
xqarray: query values
is_sortedbool, default=True: If True, assumes that x is sorted.

Returns:

array of indices in x

gauss(x: ndarray, mu: float, sigma: float, amp: float)¶

gaussian curve.

Parameters:

xnp.array
mufloat
sigmafloat
ampfloat

Returns:

gaussiannp.array

gaussian_mixture(x: ndarray, params: ndarray) → ndarray¶

Mixture of gaussian curves.

Parameters:

xarray
params: np.ndarray: parameter for each curve the shape of the array is n_curves by 3. Each row has parameters for one curve (mu, sigma, amp)

Returns:

mixture: np.ndarray: array with gaussian curves. Each row is a gaussian curve. The shape of the array is params.shape[0] by x.size.

get_filename(full_path: str) → str¶

get the filename from a full path.

Parameters:

full_path: str

Returns:

filename: str`

get_settings() → dict¶

Loads the settings into a dictionary object.

Returns:

settingsdict

get_tidyms_path() → str¶

Returns the path to the directory where datasets and config files are stored.

Returns:

pathstr

is_notebook() → bool¶

Returns True if the environment is jupyter notebook.

Returns:

bool

mad(df: DataFrame | Series) → Series | float¶: Computes the median absolute deviation for each column. Fill missing values with zero.

metadata_correlation(y, x, mode: str = 'ols')¶

Computes correlation metrics between two variables.

Parameters:

yarray
xarray
mode: {“ols”, “pearson”, “spearman”}: ols computes r squared, Jarque-Bera test p-value and Durwin-Watson statistic from the ordinary least squares linear regression. spearman computes the spearman rank correlation coefficient.

Returns:

dict

normalize(df: DataFrame, method: str, feature: str | None = None) → DataFrame¶

Normalize samples using different methods.

Parameters:

df: pandas.DataFrame
method: {“sum”, “max”, “euclidean”, “feature”}: Normalization method. sum normalizes using the sum along each row, max normalizes using the maximum of each row. euclidean normalizes using the euclidean norm of the row. feature normalizes area using the value of a specified feature.
feature: str, optional: Feature used for normalization in feature mode.

Returns:

normalized: pandas.DataFrame

robust_cv(df: DataFrame | Series, fill_value: float | None = None) → Series | float¶: Estimation of the coefficient of variation using the MAD and median. Assumes a normal distribution.

sample_to_path(samples, path)¶

map sample names to raw path if available.

Parameters:

samplesIterable[str].: samples names
pathstr.: path to raw sample data.

Returns:

ddict

scale(df: DataFrame, method: str) → DataFrame¶

scales features using different methods.

Parameters:

df: pandas.DataFrame
method: {“autoscaling”, “rescaling”, “pareto”}: Scaling method. autoscaling performs mean centering scaling of features to unitary variance. rescaling scales data to a 0-1 range. pareto performs mean centering and scaling using the square root of the standard deviation

Returns:

scaled: pandas.DataFrame

sd_ratio(df1: DataFrame, df2: DataFrame, robust: bool = False, fill_value: float | None = None) → Series¶

Computes the ratio between the standard deviation of the columns of DataFrame1 and DataFrame2.

Used to compute the D-Ratio metric.

Parameters:

df1DataFrame with shape (n1, m)
df2DataFrame with shape (n2, m)
robustbool: If True uses the MAD as an estimator of the standard deviation. Else computes the sample standard deviation.
fill_valueNumber used to input NaNs.

Returns:

ratiopd.Series

str_to_array1d(s: str)¶

Decode a string generated with array1d_to_str into a numpy array.

Parameters:

sstr

Returns:

numpy.ndarray

transform(df: DataFrame, method: str) → DataFrame¶

perform common data transformations.

Parameters:

df: pandas.DataFrame
method: {“log”, “power”}: transform method. log applies the base 10 logarithm on the data. power

Returns:

transformed: pandas.DataFrame