Utilities¶

sumo.utils.adjusted_rand_index(cl: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b400>, org: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b390>)¶: Clustering accuracy measure calculated by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted and true clusterings

sumo.utils.check_accuracy(cl: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b828>, org: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b860>, method='purity')¶

Check clustering accuracy

Args:: cl (Numpy.ndarray): one dimensional array containing computed clusters ids for every node org (Numpy.ndarray): one dimensional array containing true classes ids for every node method (str): accuracy assessment function from [‘NMI’, ‘purity’, ‘ARI’]

sumo.utils.check_categories(a: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b208>)¶: Check categories in data

sumo.utils.check_matrix_symmetry(m: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b0b8>, tol=1e-08, equal_nan=True)¶: Check symmetry of numpy array, after removal of missing samples

sumo.utils.close_logger(logger)¶: Remove all handlers of logger

sumo.utils.docstring_formatter(*args, **kwargs)¶: Decorator allowing for printing variable values in docstrings

sumo.utils.extract_max_value(h: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b3c8>)¶

Select clusters based on maximum value in feature matrix H for every sample/row

Args:

h (Numpy.ndarray): feature matrix from optimization algorithm run of shape (n,k), where ‘n’ is a number of nodes: and ‘k’ is a number of clusters

Returns:

one dimensional array containing clusters ids for every node

sumo.utils.extract_ncut(a: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b128>, k: int)¶

Select clusters using normalized cut based on graph similarity matrix

Args:: a (Numpy.ndarray): symmetric similarity matrix k (int): number of clusters
Returns:: one dimensional array containing clusters ids for every node

sumo.utils.extract_spectral(h: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b4a8>, assign_labels: str = 'kmeans', n_neighbors: int = 10, n_clusters: int = None)¶

Select clusters using spectral clustering of feature matrix H

Args:: h (Numpy.ndarray): feature matrix from optimization algorithm run of shape (n,k), where ‘n’ is a number of nodes and ‘k’ is a number of clusters assign_labels : {‘kmeans’, ‘discretize’}, strategy to use to assign labels in the embedding space n_neighbors (int): number of neighbors to use when constructing the affinity matrix n_clusters (int): number of clusters, if not set use number of columns of ‘a’
Returns:: one dimensional array containing clusters ids for every node

sumo.utils.filter_features_and_samples(data: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b8d0>, drop_features: float = 0.1, drop_samples: float = 0.1)¶

Filter data frame features and samples

Args:: data (pandas.DataFrame): data frame (with samples in columns and features in rows) drop_features (float): if percentage of missing values for feature exceeds this value, remove this feature drop_samples (float): if percentage of missing values for sample (that remains after feature dropping) exceeds this value, remove this sample
Returns:: filtered data frame

sumo.utils.is_standardized(a: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b898>, axis: int = 1, atol: float = 0.001)¶

Check if matrix values are standardized (have mean equal 0 and standard deviation equal 1)

Args:: a (Numpy.ndarray):feature matrix axis: either 0 (column-wise standardization) or 1 (row-wise standardization) atol (float): absolute tolerance
Returns:: is_standard (bool): True if data is standardized mean (float): maximum and minimum mean of columns/rows std (float): maximum and minimum standard deviation of columns/rows

sumo.utils.load_data_text(file_path: str, sample_names: int = None, feature_names: int = None, drop_features: float = 0.1, drop_samples: float = 0.1)¶

Loads data from text file (with samples in columns and features in rows) into pandas.DataFrame

Args:: file_path (str): path to the tab delimited .txt file sample_names (int): index of row with sample names feature_names (int): index of column with feature names drop_features (float): if percentage of missing values for feature exceeds this value, remove this feature drop_samples (float): if percentage of missing values for sample (that remains after feature dropping) exceeds this value, remove this sample
Returns:: data (pandas.DataFrame): data frame loaded from file, with missing values removed

sumo.utils.load_npz(file_path: str)¶

Load data from .npz file

Args:: file_path (str): path to .npz file
Returns:: dictionary with arrays as values and their indices used during saving to .npz file as keys

sumo.utils.normalized_mutual_information(cl: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b7b8>, org: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b7f0>)¶: Clustering accuracy measure, which takes into account mutual information between two clusterings and entropy of each cluster

sumo.utils.plot_metric(x: list, y: list, xlabel='x', ylabel='y', title='', file_path: str = None, color='blue', allow_omit_xticks: bool = False)¶: Create plot of median metric values, with ribbon between min and max values for each x

sumo.utils.purity(cl: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b470>, org: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b438>)¶: Clustering accuracy measure representing percentage of total number of nodes classified correctly

sumo.utils.save_arrays_to_npz(data: Union[dict, list], file_path: str)¶

Save numpy arrays to .npz file

Args:: data (dict/list): list of numpy arrays or dictionary with specified keywords for every array file_path (str): optional path to output file

sumo.utils.setup_logger(logger_name, level='INFO', log_file: str = None)¶: Create and configure logging object

Utilities¶

sumo

Navigation

Related Topics