opttscluster module
OptTSCluster class
- class tscluster.opttscluster.OptTSCluster(n_clusters: int, scheme: str = 'z1c0', *, n_allow_assignment_change: None | int = None, use_sum_distance: bool = False, warm_start: bool = True, use_MILP_centroid: bool = True, random_state: None | int = None)
Class for optimal time-series clustering. Throughout this doc and code, ‘z’ refers to cluster centers, while ‘c’ to label assignment. This creates an OptTSCluster object
- Parameters:
- n_clusters: int
number of clusters
- scheme: {‘z0c0’, ‘z0c1’, ‘z1c0’, ‘z1c1’}, default=’z1c0’
- The scheme to use for tsclustering. Could be one of:
‘z0c0’ means fixed center, fixed assignment
‘z0c1’ means fixed center, changing assignment
‘z1c0’ means changing center, fixed assignment
‘z1c1’ means changing center, changing assignment
Scheme needs to be a dynamic label assignment scheme (either ‘z1c1’ or ‘z0c1’) when using constrained cluster change (either with n_allow_assignment_change or lagrangian_multiplier)
- n_allow_assignment_change: int or None, default=None
total number of label changes to allow
- use_sum_distance: bool, default=False
Indicate if to use sum of distance to cluster as the objective. This is the sum of the distances between points in a time series and their centroids.
- warm_start: bool, default=True
Indicates if to use k-means to initialize the centroids (Z) and their assignments (C).
- use_MILP_centroid: bool, default=True
If True, cluster_centers_ atrribute will be cluster centers obtained from MILP solution, else the average of the datapoints per timestep
- random_state: int, default=None
Set the random seed used when initializing with k-means or when initializing samples when using constraint generation.
- Attributes:
cluster_centers_returns the cluster centers. If scheme is fixed centers, returns a k x F 2D array. Where k is the number of clusters and F is the number of features. If scheme is changing centers, returns a T x k x F 3D array. Where T is the number of time stesp, k is the number of clusters and F is the number of features.
fitted_data_shape_returns a tuple of the shape of the fitted data in TNF format. E.g (T, N, F) where T, N, and F are the number of timesteps,
labels_returns the assignment labels. values are integers in range [0, k-1], where k is the number of clusters. If scheme is fixed assignment, returns a 1D array of size N. Where N is the number of entities. A value of j at the i-th index means that entity i is assigned to the j-th cluster at all time steps. If scheme is changing assignment, returns a N x T 2D array. Where N is the number of entities and T is the number of time steps. A value of j at the i-th row and t-th column means that entity i is assigned to the j-th cluster at the t-th time step.
label_dict_returns a dictionary of the labels whose keys are ‘T’, ‘N’, and ‘F’ (which are the number of time steps, entities, and features respectively). Value of each key is a list such that the value of key:
n_changes_returns the total number of label changes
Methods
fit(X[, label_dict, verbose, print_to])Method for fitting the model by solving the MILP model.
returns the dynamic entities and their number of changes.
get_index_of_label(labels[, axis])function to return the integer indexes of some given labelled items in self.label_dict_.
get_label_of_index(indexes[, axis])function to return the labels of some given integer indexes as labelled in self.label_dict_.
Method to return the size of the model as a tuple of (v, c).
get_named_cluster_centers([label_dict])Method to return the cluster centers with custom names of time steps and features.
get_named_labels([label_dict])Method to return the a data frame of the label assignments with custom names of time steps and entities.
set_label_dict(value)Method to manually set the label_dict_.
- fit(X: npt.NDArray[np.float64], label_dict: dict | None = None, verbose: bool = True, print_to: TextIO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, **kwargs) OptTSCluster
Method for fitting the model by solving the MILP model.
- Parameters:
- Xnumpy array
Input time series data. Should be a 3 dimensional array in TNF fromat.
- label_dictdict, default=None
- A dictionary of the labels of X. Keys should be ‘T’, ‘N’, and ‘F’ (which are the number of time steps, entities, and features respectively). Value of each key is a list such that the value of key:
‘T’ is a list of names/labels of each time step used as index of each dataframe during fit. Default is range(0, T). Where T is the number of time steps in the fitted data
‘N’ is a list of names/labels of each entity used as index of the dataframe. Default is range(0, N). Where N is the number of entities/observations in the fitted data
‘F’ is a list of names/labels of each feature used as column of each dataframe. Default is range(0, F). Where F is the number of features in the fitted data
data_loader function from tscluster.preprocessing.utils can help in getting label_dict of a data.
- verbosebool, default=True
If True, some model training information will be printed out. Set to False to surpress printouts
- print_toTextIO, default=sys.stdout
An object with a write method to write model’s printout information during training. Default is standard output.
- Returns:
- self
The fitted OptTSCluster object.
- get_dynamic_entities() Tuple[List[int64], List[int64]]
returns the dynamic entities and their number of changes. Both lists are sorted by the number of cluster changes in descending order.
- Returns:
- dynamic entitieslist
a 1-D array of the indexes of the entities that change cluster at least once.
- number of changeslist
a 1-D array of the number of changes for each dynamic entity such that the i-th element is the number of cluster changes for the i-th dynamic entity
- get_index_of_label(labels: List[str], axis: str = 'N') List[int]
function to return the integer indexes of some given labelled items in self.label_dict_. The indexes are assumed to be 0-indexed.
- Parameters:
- labelslist
a list of the label(s) whose integer indexes should be returned.
- axisstr, default=’N’
can be any of {‘T’, ‘N’, ‘F’}. - If ‘T’, the values in the labels parameter are interpreted as time labels (as stored in self.label_dict_[‘T’]). - If ‘N’, the values in the labels parameter are interpreted as entity labels (as stored in self.label_dict_[‘N’]). - If ‘F’, the values in the labels parameter are interpreted as feature labels (as stored in self.label_dict_[‘F’]).
- Returns:
- list
a list of the integer indexes of the labels in the given axis dimension.
- get_label_of_index(indexes: List[int], axis: str = 'N') List[str]
function to return the labels of some given integer indexes as labelled in self.label_dict_. The indexes are assumed to be 0-indexed.
- Parameters:
- indexeslist
a list of the index(es) whose labels should be returned.
- axisstr, default=’N’
can be any of {‘T’, ‘N’, ‘F’}. - If ‘T’, the values in the indexes parameter are interpreted as the time indexes whose labels (as stored in self.label_dict_[‘T’]) should be returned. - If ‘N’, the values in the indexes parameter are interpreted as the entity indexes whose labels (as stored in self.label_dict_[‘N’]) should be returned. - If ‘F’, the values in the indexes parameter are interpreted as the feature indexes whose labels (as stored in self.label_dict_[‘F’]) should be returned.
- Returns:
- list
a list of the labels of the given integer indexes in the given axis dimension.
- get_model_size(X: ndarray[Any, dtype[float64]]) Tuple
Method to return the size of the model as a tuple of (v, c). Wehre v is the number of variables, and c is the number of constraints
Parameters
- Xnumpy array
Input time series data. Should be a 3 dimensional array in TNF fromat.
- Returns:
- number of variable
The number of variables in the model
- number of constraints
The number of constraints
- get_named_cluster_centers(label_dict: dict | None = None) List[pd.DataFrame]
Method to return the cluster centers with custom names of time steps and features.
- Parameters:
- label_dict dict, default=None
a dictionary whose keys are ‘T’, ‘N’, and ‘F’ (which are the number of time steps, entities, and features respectively). Value of each key is a list such that the value of key: - ‘T’ is a list of names/labels of each time step to be used as index of each dataframe. If None, range(0, T) is used. Where T is the number of time steps in the fitted data - ‘N’ is a list of names/labels of each entity to be used as index of the dataframe. If None, range(0, N) is used. Where N is the number of entities/observations in the fitted data - ‘F’ is a list of names/labels of each feature to be used as column of each dataframe. If None, range(0, F) is used. Where F is the number of features in the fitted data If label_dict is None, the result of self.label_dict_ is used.
- Returns
- ——
- list
A list of k pandas DataFrames. Where k is the number of clusters. The i-th dataframe in the list is a T x F dataframe of the values of the cluster centers of the i-th cluster.
- get_named_labels(label_dict: dict | None = None) pd.DataFrame
Method to return the a data frame of the label assignments with custom names of time steps and entities.
- Parameters:
- label_dictdict, default=None
a dictionary whose keys are ‘T’, ‘N’, and ‘F’ (which are the number of time steps, entities, and features respectively). Value of each key is a list such that the value of key: - ‘T’ is a list of names/labels of each time step to be used as index of each dataframe. If None, range(0, T) is used. Where T is the number of time steps in the fitted data - ‘N’ is a list of names/labels of each entity to be used as index of the dataframe. If None, range(0, N) is used. Where N is the number of entities/observations in the fitted data - ‘F’ is a list of names/labels of each feature to be used as column of each dataframe. If None, range(0, F) is used. Where F is the number of features in the fitted data If label_dict is None, the result of self.label_dict_ is used.
- Returns:
- pd.DataFrame
A pandas DataFrame with shape (N, T). The value in the n-th row and t-th column is an integer indicating the custer assignment of the n-th entity/observation at time t.
- set_label_dict(value: dict) None
Method to manually set the label_dict_.
- Parameters:
- valuedict
the value to set as label_dict_. Should be a dict with all of ‘T’, ‘N’, and ‘F’ (case sensitive, which are number of time steps, entities, and features respectively) as key. The value of each key is a list of labels for the key in the data. If your data don’t have values for any of the keys, set its value to None.
- Returns:
- dict
a dictionary whose keys are ‘T’, ‘N’, and ‘F’; and values are lists of the labels of each key.