Utilities¶
-
sumo.utils.
adjusted_rand_index
(cl: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b400>, org: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b390>)¶ Clustering accuracy measure calculated by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted and true clusterings
-
sumo.utils.
check_accuracy
(cl: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b828>, org: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b860>, method='purity')¶ Check clustering accuracy
- Args:
- cl (Numpy.ndarray): one dimensional array containing computed clusters ids for every node org (Numpy.ndarray): one dimensional array containing true classes ids for every node method (str): accuracy assessment function from [‘NMI’, ‘purity’, ‘ARI’]
-
sumo.utils.
check_categories
(a: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b208>)¶ Check categories in data
-
sumo.utils.
check_matrix_symmetry
(m: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b0b8>, tol=1e-08, equal_nan=True)¶ Check symmetry of numpy array, after removal of missing samples
-
sumo.utils.
close_logger
(logger)¶ Remove all handlers of logger
-
sumo.utils.
docstring_formatter
(*args, **kwargs)¶ Decorator allowing for printing variable values in docstrings
-
sumo.utils.
extract_max_value
(h: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b3c8>)¶ Select clusters based on maximum value in feature matrix H for every sample/row
- Args:
- h (Numpy.ndarray): feature matrix from optimization algorithm run of shape (n,k), where ‘n’ is a number of nodes
- and ‘k’ is a number of clusters
- Returns:
- one dimensional array containing clusters ids for every node
-
sumo.utils.
extract_ncut
(a: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b128>, k: int)¶ Select clusters using normalized cut based on graph similarity matrix
- Args:
- a (Numpy.ndarray): symmetric similarity matrix k (int): number of clusters
- Returns:
- one dimensional array containing clusters ids for every node
-
sumo.utils.
extract_spectral
(h: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b4a8>, assign_labels: str = 'kmeans', n_neighbors: int = 10, n_clusters: int = None)¶ Select clusters using spectral clustering of feature matrix H
- Args:
- h (Numpy.ndarray): feature matrix from optimization algorithm run of shape (n,k), where ‘n’ is a number of nodes and ‘k’ is a number of clusters assign_labels : {‘kmeans’, ‘discretize’}, strategy to use to assign labels in the embedding space n_neighbors (int): number of neighbors to use when constructing the affinity matrix n_clusters (int): number of clusters, if not set use number of columns of ‘a’
- Returns:
- one dimensional array containing clusters ids for every node
-
sumo.utils.
filter_features_and_samples
(data: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b8d0>, drop_features: float = 0.1, drop_samples: float = 0.1)¶ Filter data frame features and samples
- Args:
- data (pandas.DataFrame): data frame (with samples in columns and features in rows) drop_features (float): if percentage of missing values for feature exceeds this value, remove this feature drop_samples (float): if percentage of missing values for sample (that remains after feature dropping) exceeds this value, remove this sample
- Returns:
- filtered data frame
-
sumo.utils.
is_standardized
(a: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b898>, axis: int = 1, atol: float = 0.001)¶ Check if matrix values are standardized (have mean equal 0 and standard deviation equal 1)
- Args:
- a (Numpy.ndarray):feature matrix axis: either 0 (column-wise standardization) or 1 (row-wise standardization) atol (float): absolute tolerance
- Returns:
- is_standard (bool): True if data is standardized mean (float): maximum and minimum mean of columns/rows std (float): maximum and minimum standard deviation of columns/rows
-
sumo.utils.
load_data_text
(file_path: str, sample_names: int = None, feature_names: int = None, drop_features: float = 0.1, drop_samples: float = 0.1)¶ Loads data from text file (with samples in columns and features in rows) into pandas.DataFrame
- Args:
- file_path (str): path to the tab delimited .txt file sample_names (int): index of row with sample names feature_names (int): index of column with feature names drop_features (float): if percentage of missing values for feature exceeds this value, remove this feature drop_samples (float): if percentage of missing values for sample (that remains after feature dropping) exceeds this value, remove this sample
- Returns:
- data (pandas.DataFrame): data frame loaded from file, with missing values removed
-
sumo.utils.
load_npz
(file_path: str)¶ Load data from .npz file
- Args:
- file_path (str): path to .npz file
- Returns:
- dictionary with arrays as values and their indices used during saving to .npz file as keys
-
sumo.utils.
normalized_mutual_information
(cl: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b7b8>, org: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b7f0>)¶ Clustering accuracy measure, which takes into account mutual information between two clusterings and entropy of each cluster
-
sumo.utils.
plot_metric
(x: list, y: list, xlabel='x', ylabel='y', title='', file_path: str = None, color='blue', allow_omit_xticks: bool = False)¶ Create plot of median metric values, with ribbon between min and max values for each x
-
sumo.utils.
purity
(cl: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b470>, org: <sphinx.ext.autodoc.importer._MockObject object at 0x7f10e868b438>)¶ Clustering accuracy measure representing percentage of total number of nodes classified correctly
-
sumo.utils.
save_arrays_to_npz
(data: Union[dict, list], file_path: str)¶ Save numpy arrays to .npz file
- Args:
- data (dict/list): list of numpy arrays or dictionary with specified keywords for every array file_path (str): optional path to output file
-
sumo.utils.
setup_logger
(logger_name, level='INFO', log_file: str = None)¶ Create and configure logging object