API Documentation¶

TopoPy - Topological constructs for Python.

TopoPy is a Python package for constructing approximate topological constructs in arbitrary dimensions using a neighborhood graph structure for approximating local gradient.

class topopy.TopologicalObject(graph=None, gradient='steepest', normalization=None, aggregator=None, debug=False)[source]¶

A base class for housing common interactions between Morse and: Morse-Smale complexes, and Contour and Merge Trees

Parameters:

graph (nglpy.Graph) – A graph object used for determining neighborhoods in gradient estimation
gradient (str) – An optional string specifying the type of gradient estimator to use. Currently the only available option is ‘steepest’.
normalization (str) – An optional string specifying whether the inputs/output should be scaled before computing. Currently, two modes are supported ‘zscore’ and ‘feature’. ‘zscore’ will ensure the data has a mean of zero and a standard deviation of 1 by subtracting the mean and dividing by the variance. ‘feature’ scales the data into the unit hypercube.
aggregator (str) – An optional string that specifies what type of aggregation to do when duplicates are found in the domain space. Default value is None meaning the code will error if duplicates are identified.
debug (bool) – An optional boolean flag for whether debugging output should be enabled.
short_circuit (bool) – An optional boolean flag for whether the contour tree should be short circuited. Enabling this will speed up the processing by bypassing the fully augmented search and only focusing on partially augmented split and join trees

static aggregate_duplicates(X, Y, aggregator='mean', precision=16)[source]¶

A function that will attempt to collapse duplicates in domain: space, X, by aggregating values over the range space, Y.

Parameters:	X (np.ndarray) – An m-by-n array of values specifying m n-dimensional samples Y (np.array) – A m vector of values specifying the output responses corresponding to the m samples specified by X aggregator (str) – An optional string or callable object that specifies what type of aggregation to do when duplicates are found in the domain space. Default value is mean meaning the code will calculate the mean range value over each of the unique, duplicated samples. precision (int) – An optional positive integer specifying how many digits numbers should be rounded to in order to determine if they are unique or not.
Returns:	A tuple where the first value is an m’-by-n array specifying the unique domain samples and the second value is an m’ vector specifying the associated range values. m’ <= m.
Return type:	tuple(np.ndarray, np.array)

build(X, Y, w=None)[source]¶

Assigns data to this object and builds the requested topological: structure

Uses an internal graph given in the constructor to build a topological object on the passed in data. Weights are currently ignored.

Parameters:	X (np.ndarray) – An m-by-n array of values specifying m n-dimensional samples Y (np.array) – An m vector of values specifying the output responses corresponding to the m samples specified by X w (np.array) – An optional m vector of values specifying the weights associated to each of the m samples used. Default of None means all points will be equally weighted
Returns:
Return type:	None

check_duplicates()[source]¶

Function to test whether duplicates exist in the input or output space.

First, if an aggregator function has been specified, the domain space duplicates will be consolidated using the function to generate a new range value for that shared point. Otherwise, it will raise a ValueError. The function will raise a warning if duplicates exist in the output space

Returns:
Return type:	None

get_dimensionality()[source]¶

Returns the dimensionality of the input space of the input data

Returns:	Integer specifying the dimensionality of the input samples.
Return type:	int

get_neighbors(idx)[source]¶

Returns a list of neighbors for the specified index

Parameters:	idx (int) – An integer specifying the query point
Returns:	Integer list of neighbors indices
Return type:	list of int

get_normed_x(rows=None, cols=None)[source]¶

Returns the normalized input data requested by the user.

Parameters:	rows (list of int) – A list of non-negative integers specifying the row indices to return cols (list of int) – A list of non-negative integers specifying the column indices to return
Returns:	A matrix of floating point values specifying the normalized data values used in internal computations filtered by the three input parameters.
Return type:	np.ndarray

get_sample_size()[source]¶

Returns the number of samples in the input data

Returns:	Integer specifying the number of samples.
Return type:	int

get_weights(indices=None)[source]¶

Returns the weights requested by the user

Parameters:	indices (list of int) – A list of non-negative integers specifying the row indices to return
Returns:	An array of floating point values specifying the weights associated to the input data rows filtered by the indices input parameter.
Return type:	np.array

get_x(rows=None, cols=None)[source]¶

Returns the input data requested by the user

Parameters:	rows (list of int) – A list of non-negative integers specifying the row indices to return cols (list of int) – A list of non-negative integers specifying the column indices to return
Returns:	A matrix of floating point values specifying the input data values filtered by the two input parameters.
Return type:	np.ndarray

get_y(indices=None)[source]¶

Returns the output data requested by the user

Parameters:	indices (list of int) – A list of non-negative integers specifying the row indices to return
Returns:	An array of floating point values specifying the output data values filtered by the indices input parameter.
Return type:	np.array

load_data_and_build(filename, delimiter=', ')[source]¶

Convenience function for directly working with a data file.

This opens a file and reads the data into an array, sets the data as an nparray and list of dimnames

Parameters:	filename (str) – string representing the data file
Returns:
Return type:	None

reset()[source]¶

Empties all internal storage containers

Returns:
Return type:	None

class topopy.MorseComplex(graph=None, gradient='steepest', normalization=None, simplification='difference', aggregator=None, debug=False)[source]¶

A wrapper class for the C++ approximate Morse complex Object

Parameters:

graph (nglpy.Graph) – A graph object used for determining neighborhoods in gradient estimation
gradient (str) – An optional string specifying the type of gradient estimator to use. Currently the only available option is ‘steepest’.
normalization (str) – An optional string specifying whether the inputs/output should be scaled before computing. Currently, two modes are supported ‘zscore’ and ‘feature’. ‘zscore’ will ensure the data has a mean of zero and a standard deviation of 1 by subtracting the mean and dividing by the variance. ‘feature’ scales the data into the unit hypercube.
simplification (str) – An optional string specifying how we will compute the simplification hierarchy. Currently, three modes are supported ‘difference’, ‘probability’ and ‘count’. ‘difference’ will take the function value difference of the extrema and its closest function valued neighboring saddle (standard persistence simplification), ‘probability’ will augment this value by multiplying the probability of the extremum and its saddle, and count will order the simplification by the size (number of points) in each manifold such that smaller features will be absorbed into neighboring larger features first.
aggregator (str) – An optional string that specifies what type of aggregation to do when duplicates are found in the domain space. Default value is None meaning the code will error if duplicates are identified.
debug (bool) – An optional boolean flag for whether debugging output should be enabled.

build(X, Y, w=None)[source]¶

Assigns data to this object and builds the Morse Complex

Uses an internal graph given in the constructor to build a Morse complex on the passed in data. Weights are currently ignored.

Parameters:	X (np.ndarray) – An m-by-n array of values specifying m n-dimensional samples Y (np.array) – An m vector of values specifying the output responses corresponding to the m samples specified by X w (np.array) – An optional m vector of values specifying the weights associated to each of the m samples used. Default of None means all points will be equally weighted
Returns:
Return type:	None

get_classification(idx)[source]¶

Given an index, this function will report whether that sample is a local maximum or a regular point.

Parameters:	idx (int) – A non-negative integer less than the sample size of the input data.
Returns:	A string specifying the classification type of the input sample: will be ‘maximum’ or ‘regular.’
Return type:	str

get_current_labels()[source]¶

Returns a list of tuples that specifies the extremum index labels associated to each input sample

Returns:	a list of non-negative integers specifying the extremum-flow indices associated to each input sample at the current level of persistence
Return type:	list of tuple(int, int)

get_label(indices=None)[source]¶

Returns the label indices requested by the user

Parameters:	indices (list of int) – A list of non-negative integers specifying the row indices to return
Returns:	A list of integers specifying the extremum index of the specified rows.
Return type:	list of int

get_merge_sequence()[source]¶

Returns a data structure holding the ordered merge sequence: of extrema simplification

Returns:	dict of int – A dictionary of tuples where the key is the dying extrema and the tuple is the the persistence, parent index, and the saddle index associated to the dying index, in that order.
Return type:	tuple(float, int, int)

get_partitions(persistence=None)[source]¶

Returns the partitioned data based on a specified persistence level

Parameters:	persistence (float) – A floating point value specifying the size of the smallest feature we want to track. Default = None means consider all features.
Returns:	dict of int – A dictionary lists where each key is a integer specifying the index of the extremum. Each entry will hold a list of indices specifying points that are associated to this extremum.
Return type:	list of int

get_persistence()[source]¶

Retrieves the persistence simplfication level being used for this complex

Returns:	Floating point value specifying the current persistence setting
Return type:	float

get_sample_size(key=None)[source]¶

Returns the number of samples in the input data

Parameters:	key (int) – An optional integer specifying a max id used for determining which partition size should be returned. If not specified then the size of the entire data set will be returned.
Returns:	An integer specifying the number of samples.
Return type:	int

reset()[source]¶

Empties all internal storage containers

Returns:
Return type:	None

save(filename=None)[source]¶

Saves a constructed Morse Complex in json file

Parameters:	filename (str) – A filename for storing the hierarchical merging of features and the base level partitions of the data
Returns:
Return type:	None

set_persistence(p)[source]¶

Sets the persistence simplfication level to be used for representing this complex

Parameters:	p (float) – A floating point value specifying the internally held size of the smallest feature we want to track.
Returns:
Return type:	None

to_json()[source]¶

Writes the complete Morse complex merge hierarchy to a string

Returns:	A string storing the entire merge hierarchy of all maxima
Return type:	str

class topopy.MorseSmaleComplex(graph=None, gradient='steepest', normalization=None, simplification='difference', aggregator=None, debug=False)[source]¶

A wrapper class for the C++ approximate Morse-Smale complex Object

Parameters:

graph (nglpy.Graph) – A graph object used for determining neighborhoods in gradient estimation
gradient (str) – An optional string specifying the type of gradient estimator to use. Currently the only available option is ‘steepest’.
normalization (str) – An optional string specifying whether the inputs/output should be scaled before computing. Currently, two modes are supported ‘zscore’ and ‘feature’. ‘zscore’ will ensure the data has a mean of zero and a standard deviation of 1 by subtracting the mean and dividing by the variance. ‘feature’ scales the data into the unit hypercube.
simplification (str) – An optional string specifying how we will compute the simplification hierarchy. Currently, three modes are supported ‘difference’, ‘probability’ and ‘count’. ‘difference’ will take the function value difference of the extrema and its closest function valued neighboring saddle (standard persistence simplification), ‘probability’ will augment this value by multiplying the probability of the extremum and its saddle, and count will order the simplification by the size (number of points) in each manifold such that smaller features will be absorbed into neighboring larger features first.
aggregator (str) – An optional string that specifies what type of aggregation to do when duplicates are found in the domain space. Default value is None meaning the code will error if duplicates are identified.
debug (bool) – An optional boolean flag for whether debugging output should be enabled.

build(X, Y, w=None)[source]¶

Assigns data to this object and builds the Morse-Smale Complex

Uses an internal graph given in the constructor to build a Morse-Smale complex on the passed in data. Weights are currently ignored.

Parameters:	X (np.ndarray) – An m-by-n array of values specifying m n-dimensional samples Y (np.array) – An m vector of values specifying the output responses corresponding to the m samples specified by X w (np.array) – An optional m vector of values specifying the weights associated to each of the m samples used. Default of None means all points will be equally weighted
Returns:
Return type:	None

get_classification(idx)[source]¶

Given an index, this function will report whether that sample is a local minimum, a local maximum, or a regular point.

Parameters:	idx (int) – A non-negative integer less than the sample size of the input data.
Returns:	A string specifying the classification type of the input sample: will be ‘maximum,’ ‘minimum,’ or ‘regular.’
Return type:	str

get_current_labels()[source]¶

Returns a list of tuples that specifies the min-max index labels associated to each input sample

Returns:	a list of non-negative integer tuples specifying the min-max index labels associated to each input sample at the current level of persistence
Return type:	list of tuple(int, int)

get_label(indices=None)[source]¶

Returns the label pair indices requested by the user

Parameters:	indices (list of int) – A list of non-negative integers specifying the row indices to return
Returns:	A list of integer 2-tuples specifying the minimum and maximum index of the specified rows, respectively.
Return type:	list of tuple(int, int)

get_merge_sequence()[source]¶

Returns a data structure holding the ordered merge sequence: of extrema simplification

Returns:	dict of int – A dictionary of tuples where the key is the dying extrema and the tuple is the the persistence, parent index, and the saddle index associated to the dying index, in that order.
Return type:	tuple(float, int, int)

get_partitions(persistence=None)[source]¶

Returns the partitioned data based on a specified persistence level

Parameters:	persistence (float) – A floating point value specifying the size of the smallest feature we want to track. Default = None means consider all features.
Returns:	dict of tuple(int,int) – A dictionary lists where each key is a min-max tuple specifying the index of the minimum and maximum, respectively. Each entry will hold a list of indices specifying points that are associated to this min-max pair.
Return type:	list of int

get_persistence()[source]¶

Retrieves the persistence simplfication level being used for this complex

Returns:	Floating point value specifying the current persistence setting
Return type:	float

get_sample_size(key=None)[source]¶

Returns the number of samples in the input data

Parameters:	key (int) – An optional integer specifying a max id used for determining which partition size should be returned. If not specified then the size of the entire data set will be returned.
Returns:	An integer specifying the number of samples.
Return type:	int

get_stable_manifolds(persistence=None)[source]¶

Returns the partitioned data based on a specified persistence level

Parameters:	persistence (float) – A floating point value specifying the size of the smallest feature we want to track. Default = None means consider all features.
Returns:	dict of int – A dictionary lists where each key is a integer specifying the index of the maximum. Each entry will hold a list of indices specifying points that are associated to this maximum.
Return type:	list of int

get_unstable_manifolds(persistence=None)[source]¶

Returns the partitioned data based on a specified persistence level

Parameters:	persistence (float) – A floating point value specifying the size of the smallest feature we want to track. Default = None means consider all features.
Returns:	dict of int – A dictionary lists where each key is a integer specifying the index of the minimum. Each entry will hold a list of indices specifying points that are associated to this minimum.
Return type:	list of int

reset()[source]¶

Empties all internal storage containers

Returns:
Return type:	None

save(filename=None)[source]¶

Saves a constructed Morse-Smale Complex in json file

Parameters:	filename (str) – A filename for storing the hierarchical merging of features and the base level partitions of the data
Returns:
Return type:	None

set_persistence(p)[source]¶

Sets the persistence simplfication level to be used for representing this complex

Parameters:	p (float) – A floating point value specifying the internally held size of the smallest feature we want to track.
Returns:
Return type:	None

to_json()[source]¶

Writes the complete Morse-Smale complex merge hierarchy to a string

Returns:	A string storing the entire merge hierarchy of all minima and maxima
Return type:	str

class topopy.MergeTree(graph=None, gradient='steepest', normalization=None, aggregator=None, debug=False)[source]¶

A wrapper class for the C++ merge tree data structure.

Parameters:

graph (nglpy.Graph) – A graph object used for determining neighborhoods in gradient estimation
gradient (str) – An optional string specifying the type of gradient estimator to use. Currently the only available option is ‘steepest’.
normalization (str) – An optional string specifying whether the inputs/output should be scaled before computing. Currently, two modes are supported ‘zscore’ and ‘feature’. ‘zscore’ will ensure the data has a mean of zero and a standard deviation of 1 by subtracting the mean and dividing by the variance. ‘feature’ scales the data into the unit hypercube.
aggregator (str) – An optional string that specifies what type of aggregation to do when duplicates are found in the domain space. Default value is None meaning the code will error if duplicates are identified.
debug (bool) – An optional boolean flag for whether debugging output should be enabled.

build(X, Y, w=None)[source]¶

Assigns data to this object and builds the Merge Tree.

Uses an internal graph given in the constructor to build a merge tree on the passed in data. Weights are currently ignored.

Parameters:	X (np.ndarray) – An m-by-n array of values specifying m n-dimensional samples Y (np.array) – An m vector of values specifying the output responses corresponding to the m samples specified by X w (np.array) – An optional m vector of values specifying the weights associated to each of the m samples used. Default of None means all points will be equally weighted
Returns:
Return type:	None

class topopy.ContourTree(graph=None, gradient='steepest', normalization=None, aggregator=None, debug=False, short_circuit=True)[source]¶

A class for computing a contour tree from two merge trees

Parameters:

graph (nglpy.Graph) – A graph object used for determining neighborhoods in gradient estimation
gradient (str) – An optional string specifying the type of gradient estimator to use. Currently the only available option is ‘steepest’.
normalization (str) – An optional string specifying whether the inputs/output should be scaled before computing. Currently, two modes are supported ‘zscore’ and ‘feature’. ‘zscore’ will ensure the data has a mean of zero and a standard deviation of 1 by subtracting the mean and dividing by the variance. ‘feature’ scales the data into the unit hypercube.
aggregator (str) – An optional string that specifies what type of aggregation to do when duplicates are found in the domain space. Default value is None meaning the code will error if duplicates are identified.
debug (bool) – An optional boolean flag for whether debugging output should be enabled.
short_circuit (bool) – An optional boolean flag for whether the contour tree should be short circuited. Enabling this will speed up the processing by bypassing the fully augmented search and only focusing on partially augmented split and join trees

build(X, Y, w=None)[source]¶

Assigns data to this object and builds the Contour Tree

Uses an internal graph given in the constructor to build a contour tree on the passed in data. Weights are currently ignored.

Parameters:	X (np.ndarray) – An m-by-n array of values specifying m n-dimensional samples Y (np.array) – An m vector of values specifying the output responses corresponding to the m samples specified by X w (np.array) – An optional m vector of values specifying the weights associated to each of the m samples used. Default of None means all points will be equally weighted
Returns:
Return type:	None

get_seeds(threshold)[source]¶

Returns a list of seed points for isosurface extraction given a threshold value

Parameters:	threshold (float) – The isovalue for which we want to identify seed points for isosurface extraction
Returns:	A list of integers representing seed points in the data held by this object. There will be one seed point for each connected component of the isosurface defined by the given threshold value.
Return type:	list of int

reset()[source]¶

Empties all internal storage containers

Returns:
Return type:	None