Utilities

sumo.utils.adjusted_rand_index(cl: <sphinx.ext.autodoc.importer._MockObject object at 0x7fd9013237b8>, org: <sphinx.ext.autodoc.importer._MockObject object at 0x7fd901323748>)

Clustering accuracy measure calculated by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted and true clusterings

sumo.utils.check_accuracy(cl: <sphinx.ext.autodoc.importer._MockObject object at 0x7fd901323eb8>, org: <sphinx.ext.autodoc.importer._MockObject object at 0x7fd901323ef0>, method='purity')

Check clustering accuracy

Args:
cl (Numpy.ndarray): one dimensional array containing computed clusters ids for every node org (Numpy.ndarray): one dimensional array containing true classes ids for every node method (str): accuracy assessment function from [‘NMI’, ‘purity’, ‘ARI’]
sumo.utils.check_categories(a: <sphinx.ext.autodoc.importer._MockObject object at 0x7fd9013235c0>)

Check categories in data

sumo.utils.check_matrix_symmetry(m: <sphinx.ext.autodoc.importer._MockObject object at 0x7fd901323470>, tol=1e-08, equal_nan=True)

Check symmetry of numpy array, after removal of missing samples

sumo.utils.close_logger(logger)

Remove all handlers of logger

sumo.utils.docstring_formatter(*args, **kwargs)

Decorator allowing for printing variable values in docstrings

sumo.utils.extract_max_value(h: <sphinx.ext.autodoc.importer._MockObject object at 0x7fd901323780>)

Select clusters based on maximum value in feature matrix H for every sample/row

Args:
h (Numpy.ndarray): feature matrix from optimization algorithm run of shape (n,k), where ‘n’ is a number of nodes
and ‘k’ is a number of clusters
Returns:
one dimensional array containing clusters ids for every node
sumo.utils.extract_ncut(a: <sphinx.ext.autodoc.importer._MockObject object at 0x7fd9013234e0>, k: int)

Select clusters using normalized cut based on graph similarity matrix

Args:
a (Numpy.ndarray): symmetric similarity matrix k (int): number of clusters
Returns:
one dimensional array containing clusters ids for every node
sumo.utils.extract_spectral(h: <sphinx.ext.autodoc.importer._MockObject object at 0x7fd901323860>, assign_labels: str = 'kmeans', n_neighbors: int = 10, n_clusters: int = None)

Select clusters using spectral clustering of feature matrix H

Args:
h (Numpy.ndarray): feature matrix from optimization algorithm run of shape (n,k), where ‘n’ is a number of nodes and ‘k’ is a number of clusters assign_labels : {‘kmeans’, ‘discretize’}, strategy to use to assign labels in the embedding space n_neighbors (int): number of neighbors to use when constructing the affinity matrix n_clusters (int): number of clusters, if not set use number of columns of ‘a’
Returns:
one dimensional array containing clusters ids for every node
sumo.utils.filter_features_and_samples(data: <sphinx.ext.autodoc.importer._MockObject object at 0x7fd901323f60>, drop_features: float = 0.1, drop_samples: float = 0.1)

Filter data frame features and samples

Args:
data (pandas.DataFrame): data frame (with samples in columns and features in rows) drop_features (float): if percentage of missing values for feature exceeds this value, remove this feature drop_samples (float): if percentage of missing values for sample (that remains after feature dropping) exceeds this value, remove this sample
Returns:
filtered data frame
sumo.utils.is_standardized(a: <sphinx.ext.autodoc.importer._MockObject object at 0x7fd901323f28>, axis: int = 1, atol: float = 0.001)

Check if matrix values are standardized (have mean equal 0 and standard deviation equal 1)

Args:
a (Numpy.ndarray):feature matrix axis: either 0 (column-wise standardization) or 1 (row-wise standardization) atol (float): absolute tolerance
Returns:
is_standard (bool): True if data is standardized mean (float): maximum mean of columns/rows std (float): maximum standard deviation of columns/rows
sumo.utils.load_data_text(file_path: str, sample_names: int = None, feature_names: int = None, drop_features: float = 0.1, drop_samples: float = 0.1)

Loads data from text file (with samples in columns and features in rows) into pandas.DataFrame

Args:
file_path (str): path to the tab delimited .txt file sample_names (int): index of row with sample names feature_names (int): index of column with feature names drop_features (float): if percentage of missing values for feature exceeds this value, remove this feature drop_samples (float): if percentage of missing values for sample (that remains after feature dropping) exceeds this value, remove this sample
Returns:
data (pandas.DataFrame): data frame loaded from file, with missing values removed
sumo.utils.load_npz(file_path: str)

Load data from .npz file

Args:
file_path (str): path to .npz file
Returns:
dictionary with arrays as values and their indices used during saving to .npz file as keys
sumo.utils.normalized_mutual_information(cl: <sphinx.ext.autodoc.importer._MockObject object at 0x7fd901323e48>, org: <sphinx.ext.autodoc.importer._MockObject object at 0x7fd901323e80>)

Clustering accuracy measure, which takes into account mutual information between two clusterings and entropy of each cluster

sumo.utils.plot_metric(x: list, y: list, xlabel='x', ylabel='y', title='', file_path: str = None, color='blue')

Create plot of median metric values, with ribbon between min and max values for each x

sumo.utils.purity(cl: <sphinx.ext.autodoc.importer._MockObject object at 0x7fd901323828>, org: <sphinx.ext.autodoc.importer._MockObject object at 0x7fd9013237f0>)

Clustering accuracy measure representing percentage of total number of nodes classified correctly

sumo.utils.save_arrays_to_npz(data: Union[dict, list], file_path: str)

Save numpy arrays to .npz file

Args:
data (dict/list): list of numpy arrays or dictionary with specified keywords for every array file_path (str): optional path to output file
sumo.utils.setup_logger(logger_name, level='INFO', log_file: str = None)

Create and configure logging object