API Documentation¶
TopoPy - Topological constructs for Python.
TopoPy is a Python package for constructing approximate topological constructs in arbitrary dimensions using a neighborhood graph structure for approximating local gradient.
-
class
topopy.
TopologicalObject
(graph=None, gradient='steepest', normalization=None, aggregator=None, debug=False)[source]¶ - A base class for housing common interactions between Morse and
- Morse-Smale complexes, and Contour and Merge Trees
Parameters: - graph (nglpy.Graph) – A graph object used for determining neighborhoods in gradient estimation
- gradient (str) – An optional string specifying the type of gradient estimator to use. Currently the only available option is ‘steepest’.
- normalization (str) – An optional string specifying whether the inputs/output should be scaled before computing. Currently, two modes are supported ‘zscore’ and ‘feature’. ‘zscore’ will ensure the data has a mean of zero and a standard deviation of 1 by subtracting the mean and dividing by the variance. ‘feature’ scales the data into the unit hypercube.
- aggregator (str) – An optional string that specifies what type of aggregation to do when duplicates are found in the domain space. Default value is None meaning the code will error if duplicates are identified.
- debug (bool) – An optional boolean flag for whether debugging output should be enabled.
- short_circuit (bool) – An optional boolean flag for whether the contour tree should be short circuited. Enabling this will speed up the processing by bypassing the fully augmented search and only focusing on partially augmented split and join trees
-
static
aggregate_duplicates
(X, Y, aggregator='mean', precision=16)[source]¶ - A function that will attempt to collapse duplicates in domain
- space, X, by aggregating values over the range space, Y.
Parameters: - X (np.ndarray) – An m-by-n array of values specifying m n-dimensional samples
- Y (np.array) – A m vector of values specifying the output responses corresponding to the m samples specified by X
- aggregator (str) – An optional string or callable object that specifies what type of aggregation to do when duplicates are found in the domain space. Default value is mean meaning the code will calculate the mean range value over each of the unique, duplicated samples.
- precision (int) – An optional positive integer specifying how many digits numbers should be rounded to in order to determine if they are unique or not.
Returns: A tuple where the first value is an m’-by-n array specifying the unique domain samples and the second value is an m’ vector specifying the associated range values. m’ <= m.
Return type: tuple(np.ndarray, np.array)
-
build
(X, Y, w=None)[source]¶ - Assigns data to this object and builds the requested topological
- structure
Uses an internal graph given in the constructor to build a topological object on the passed in data. Weights are currently ignored.
Parameters: - X (np.ndarray) – An m-by-n array of values specifying m n-dimensional samples
- Y (np.array) – An m vector of values specifying the output responses corresponding to the m samples specified by X
- w (np.array) – An optional m vector of values specifying the weights associated to each of the m samples used. Default of None means all points will be equally weighted
Returns: Return type: None
-
check_duplicates
()[source]¶ Function to test whether duplicates exist in the input or output space.
First, if an aggregator function has been specified, the domain space duplicates will be consolidated using the function to generate a new range value for that shared point. Otherwise, it will raise a ValueError. The function will raise a warning if duplicates exist in the output space
Returns: Return type: None
-
get_dimensionality
()[source]¶ Returns the dimensionality of the input space of the input data
Returns: Integer specifying the dimensionality of the input samples. Return type: int
-
get_neighbors
(idx)[source]¶ Returns a list of neighbors for the specified index
Parameters: idx (int) – An integer specifying the query point Returns: Integer list of neighbors indices Return type: list of int
-
get_normed_x
(rows=None, cols=None)[source]¶ Returns the normalized input data requested by the user.
Parameters: - rows (list of int) – A list of non-negative integers specifying the row indices to return
- cols (list of int) – A list of non-negative integers specifying the column indices to return
Returns: A matrix of floating point values specifying the normalized data values used in internal computations filtered by the three input parameters.
Return type: np.ndarray
-
get_sample_size
()[source]¶ Returns the number of samples in the input data
Returns: Integer specifying the number of samples. Return type: int
-
get_weights
(indices=None)[source]¶ Returns the weights requested by the user
Parameters: indices (list of int) – A list of non-negative integers specifying the row indices to return Returns: An array of floating point values specifying the weights associated to the input data rows filtered by the indices input parameter. Return type: np.array
-
get_x
(rows=None, cols=None)[source]¶ Returns the input data requested by the user
Parameters: - rows (list of int) – A list of non-negative integers specifying the row indices to return
- cols (list of int) – A list of non-negative integers specifying the column indices to return
Returns: A matrix of floating point values specifying the input data values filtered by the two input parameters.
Return type: np.ndarray
-
get_y
(indices=None)[source]¶ Returns the output data requested by the user
Parameters: indices (list of int) – A list of non-negative integers specifying the row indices to return Returns: An array of floating point values specifying the output data values filtered by the indices input parameter. Return type: np.array
-
load_data_and_build
(filename, delimiter=', ')[source]¶ Convenience function for directly working with a data file.
This opens a file and reads the data into an array, sets the data as an nparray and list of dimnames
Parameters: filename (str) – string representing the data file Returns: Return type: None
-
class
topopy.
MorseComplex
(graph=None, gradient='steepest', normalization=None, simplification='difference', aggregator=None, debug=False)[source]¶ A wrapper class for the C++ approximate Morse complex Object
Parameters: - graph (nglpy.Graph) – A graph object used for determining neighborhoods in gradient estimation
- gradient (str) – An optional string specifying the type of gradient estimator to use. Currently the only available option is ‘steepest’.
- normalization (str) – An optional string specifying whether the inputs/output should be scaled before computing. Currently, two modes are supported ‘zscore’ and ‘feature’. ‘zscore’ will ensure the data has a mean of zero and a standard deviation of 1 by subtracting the mean and dividing by the variance. ‘feature’ scales the data into the unit hypercube.
- simplification (str) – An optional string specifying how we will compute the simplification hierarchy. Currently, three modes are supported ‘difference’, ‘probability’ and ‘count’. ‘difference’ will take the function value difference of the extrema and its closest function valued neighboring saddle (standard persistence simplification), ‘probability’ will augment this value by multiplying the probability of the extremum and its saddle, and count will order the simplification by the size (number of points) in each manifold such that smaller features will be absorbed into neighboring larger features first.
- aggregator (str) – An optional string that specifies what type of aggregation to do when duplicates are found in the domain space. Default value is None meaning the code will error if duplicates are identified.
- debug (bool) – An optional boolean flag for whether debugging output should be enabled.
-
build
(X, Y, w=None)[source]¶ Assigns data to this object and builds the Morse Complex
Uses an internal graph given in the constructor to build a Morse complex on the passed in data. Weights are currently ignored.
Parameters: - X (np.ndarray) – An m-by-n array of values specifying m n-dimensional samples
- Y (np.array) – An m vector of values specifying the output responses corresponding to the m samples specified by X
- w (np.array) – An optional m vector of values specifying the weights associated to each of the m samples used. Default of None means all points will be equally weighted
Returns: Return type: None
-
get_classification
(idx)[source]¶ Given an index, this function will report whether that sample is a local maximum or a regular point.
Parameters: idx (int) – A non-negative integer less than the sample size of the input data. Returns: A string specifying the classification type of the input sample: will be ‘maximum’ or ‘regular.’ Return type: str
-
get_current_labels
()[source]¶ Returns a list of tuples that specifies the extremum index labels associated to each input sample
Returns: a list of non-negative integers specifying the extremum-flow indices associated to each input sample at the current level of persistence Return type: list of tuple(int, int)
-
get_label
(indices=None)[source]¶ Returns the label indices requested by the user
Parameters: indices (list of int) – A list of non-negative integers specifying the row indices to return Returns: A list of integers specifying the extremum index of the specified rows. Return type: list of int
-
get_merge_sequence
()[source]¶ - Returns a data structure holding the ordered merge sequence
- of extrema simplification
Returns: dict of int – A dictionary of tuples where the key is the dying extrema and the tuple is the the persistence, parent index, and the saddle index associated to the dying index, in that order. Return type: tuple(float, int, int)
-
get_partitions
(persistence=None)[source]¶ Returns the partitioned data based on a specified persistence level
Parameters: persistence (float) – A floating point value specifying the size of the smallest feature we want to track. Default = None means consider all features. Returns: dict of int – A dictionary lists where each key is a integer specifying the index of the extremum. Each entry will hold a list of indices specifying points that are associated to this extremum. Return type: list of int
-
get_persistence
()[source]¶ Retrieves the persistence simplfication level being used for this complex
Returns: Floating point value specifying the current persistence setting Return type: float
-
get_sample_size
(key=None)[source]¶ Returns the number of samples in the input data
Parameters: key (int) – An optional integer specifying a max id used for determining which partition size should be returned. If not specified then the size of the entire data set will be returned. Returns: An integer specifying the number of samples. Return type: int
-
save
(filename=None)[source]¶ Saves a constructed Morse Complex in json file
Parameters: filename (str) – A filename for storing the hierarchical merging of features and the base level partitions of the data Returns: Return type: None
-
class
topopy.
MorseSmaleComplex
(graph=None, gradient='steepest', normalization=None, simplification='difference', aggregator=None, debug=False)[source]¶ A wrapper class for the C++ approximate Morse-Smale complex Object
Parameters: - graph (nglpy.Graph) – A graph object used for determining neighborhoods in gradient estimation
- gradient (str) – An optional string specifying the type of gradient estimator to use. Currently the only available option is ‘steepest’.
- normalization (str) – An optional string specifying whether the inputs/output should be scaled before computing. Currently, two modes are supported ‘zscore’ and ‘feature’. ‘zscore’ will ensure the data has a mean of zero and a standard deviation of 1 by subtracting the mean and dividing by the variance. ‘feature’ scales the data into the unit hypercube.
- simplification (str) – An optional string specifying how we will compute the simplification hierarchy. Currently, three modes are supported ‘difference’, ‘probability’ and ‘count’. ‘difference’ will take the function value difference of the extrema and its closest function valued neighboring saddle (standard persistence simplification), ‘probability’ will augment this value by multiplying the probability of the extremum and its saddle, and count will order the simplification by the size (number of points) in each manifold such that smaller features will be absorbed into neighboring larger features first.
- aggregator (str) – An optional string that specifies what type of aggregation to do when duplicates are found in the domain space. Default value is None meaning the code will error if duplicates are identified.
- debug (bool) – An optional boolean flag for whether debugging output should be enabled.
-
build
(X, Y, w=None)[source]¶ Assigns data to this object and builds the Morse-Smale Complex
Uses an internal graph given in the constructor to build a Morse-Smale complex on the passed in data. Weights are currently ignored.
Parameters: - X (np.ndarray) – An m-by-n array of values specifying m n-dimensional samples
- Y (np.array) – An m vector of values specifying the output responses corresponding to the m samples specified by X
- w (np.array) – An optional m vector of values specifying the weights associated to each of the m samples used. Default of None means all points will be equally weighted
Returns: Return type: None
-
get_classification
(idx)[source]¶ Given an index, this function will report whether that sample is a local minimum, a local maximum, or a regular point.
Parameters: idx (int) – A non-negative integer less than the sample size of the input data. Returns: A string specifying the classification type of the input sample: will be ‘maximum,’ ‘minimum,’ or ‘regular.’ Return type: str
-
get_current_labels
()[source]¶ Returns a list of tuples that specifies the min-max index labels associated to each input sample
Returns: a list of non-negative integer tuples specifying the min-max index labels associated to each input sample at the current level of persistence Return type: list of tuple(int, int)
-
get_label
(indices=None)[source]¶ Returns the label pair indices requested by the user
Parameters: indices (list of int) – A list of non-negative integers specifying the row indices to return Returns: A list of integer 2-tuples specifying the minimum and maximum index of the specified rows, respectively. Return type: list of tuple(int, int)
-
get_merge_sequence
()[source]¶ - Returns a data structure holding the ordered merge sequence
- of extrema simplification
Returns: dict of int – A dictionary of tuples where the key is the dying extrema and the tuple is the the persistence, parent index, and the saddle index associated to the dying index, in that order. Return type: tuple(float, int, int)
-
get_partitions
(persistence=None)[source]¶ Returns the partitioned data based on a specified persistence level
Parameters: persistence (float) – A floating point value specifying the size of the smallest feature we want to track. Default = None means consider all features. Returns: dict of tuple(int,int) – A dictionary lists where each key is a min-max tuple specifying the index of the minimum and maximum, respectively. Each entry will hold a list of indices specifying points that are associated to this min-max pair. Return type: list of int
-
get_persistence
()[source]¶ Retrieves the persistence simplfication level being used for this complex
Returns: Floating point value specifying the current persistence setting Return type: float
-
get_sample_size
(key=None)[source]¶ Returns the number of samples in the input data
Parameters: key (int) – An optional integer specifying a max id used for determining which partition size should be returned. If not specified then the size of the entire data set will be returned. Returns: An integer specifying the number of samples. Return type: int
-
get_stable_manifolds
(persistence=None)[source]¶ Returns the partitioned data based on a specified persistence level
Parameters: persistence (float) – A floating point value specifying the size of the smallest feature we want to track. Default = None means consider all features. Returns: dict of int – A dictionary lists where each key is a integer specifying the index of the maximum. Each entry will hold a list of indices specifying points that are associated to this maximum. Return type: list of int
-
get_unstable_manifolds
(persistence=None)[source]¶ Returns the partitioned data based on a specified persistence level
Parameters: persistence (float) – A floating point value specifying the size of the smallest feature we want to track. Default = None means consider all features. Returns: dict of int – A dictionary lists where each key is a integer specifying the index of the minimum. Each entry will hold a list of indices specifying points that are associated to this minimum. Return type: list of int
-
save
(filename=None)[source]¶ Saves a constructed Morse-Smale Complex in json file
Parameters: filename (str) – A filename for storing the hierarchical merging of features and the base level partitions of the data Returns: Return type: None
-
class
topopy.
MergeTree
(graph=None, gradient='steepest', normalization=None, aggregator=None, debug=False)[source]¶ A wrapper class for the C++ merge tree data structure.
Parameters: - graph (nglpy.Graph) – A graph object used for determining neighborhoods in gradient estimation
- gradient (str) – An optional string specifying the type of gradient estimator to use. Currently the only available option is ‘steepest’.
- normalization (str) – An optional string specifying whether the inputs/output should be scaled before computing. Currently, two modes are supported ‘zscore’ and ‘feature’. ‘zscore’ will ensure the data has a mean of zero and a standard deviation of 1 by subtracting the mean and dividing by the variance. ‘feature’ scales the data into the unit hypercube.
- aggregator (str) – An optional string that specifies what type of aggregation to do when duplicates are found in the domain space. Default value is None meaning the code will error if duplicates are identified.
- debug (bool) – An optional boolean flag for whether debugging output should be enabled.
-
build
(X, Y, w=None)[source]¶ Assigns data to this object and builds the Merge Tree.
Uses an internal graph given in the constructor to build a merge tree on the passed in data. Weights are currently ignored.
Parameters: - X (np.ndarray) – An m-by-n array of values specifying m n-dimensional samples
- Y (np.array) – An m vector of values specifying the output responses corresponding to the m samples specified by X
- w (np.array) – An optional m vector of values specifying the weights associated to each of the m samples used. Default of None means all points will be equally weighted
Returns: Return type: None
-
class
topopy.
ContourTree
(graph=None, gradient='steepest', normalization=None, aggregator=None, debug=False, short_circuit=True)[source]¶ A class for computing a contour tree from two merge trees
Parameters: - graph (nglpy.Graph) – A graph object used for determining neighborhoods in gradient estimation
- gradient (str) – An optional string specifying the type of gradient estimator to use. Currently the only available option is ‘steepest’.
- normalization (str) – An optional string specifying whether the inputs/output should be scaled before computing. Currently, two modes are supported ‘zscore’ and ‘feature’. ‘zscore’ will ensure the data has a mean of zero and a standard deviation of 1 by subtracting the mean and dividing by the variance. ‘feature’ scales the data into the unit hypercube.
- aggregator (str) – An optional string that specifies what type of aggregation to do when duplicates are found in the domain space. Default value is None meaning the code will error if duplicates are identified.
- debug (bool) – An optional boolean flag for whether debugging output should be enabled.
- short_circuit (bool) – An optional boolean flag for whether the contour tree should be short circuited. Enabling this will speed up the processing by bypassing the fully augmented search and only focusing on partially augmented split and join trees
-
build
(X, Y, w=None)[source]¶ Assigns data to this object and builds the Contour Tree
Uses an internal graph given in the constructor to build a contour tree on the passed in data. Weights are currently ignored.
Parameters: - X (np.ndarray) – An m-by-n array of values specifying m n-dimensional samples
- Y (np.array) – An m vector of values specifying the output responses corresponding to the m samples specified by X
- w (np.array) – An optional m vector of values specifying the weights associated to each of the m samples used. Default of None means all points will be equally weighted
Returns: Return type: None
-
get_seeds
(threshold)[source]¶ Returns a list of seed points for isosurface extraction given a threshold value
Parameters: threshold (float) – The isovalue for which we want to identify seed points for isosurface extraction Returns: A list of integers representing seed points in the data held by this object. There will be one seed point for each connected component of the isosurface defined by the given threshold value. Return type: list of int