opttscluster module

OptTSCluster class

class tscluster.opttscluster.OptTSCluster(n_clusters: int, scheme: str = 'z1c0', *, n_allow_assignment_change: None | int = None, use_sum_distance: bool = False, warm_start: bool = True, use_MILP_centroid: bool = True, random_state: None | int = None)

Class for optimal time-series clustering. Throughout this doc and code, ‘z’ refers to cluster centers, while ‘c’ to label assignment. This creates an OptTSCluster object

Parameters:
n_clusters: int

number of clusters

scheme: {‘z0c0’, ‘z0c1’, ‘z1c0’, ‘z1c1’}, default=’z1c0’
The scheme to use for tsclustering. Could be one of:
  • ‘z0c0’ means fixed center, fixed assignment

  • ‘z0c1’ means fixed center, changing assignment

  • ‘z1c0’ means changing center, fixed assignment

  • ‘z1c1’ means changing center, changing assignment

Scheme needs to be a dynamic label assignment scheme (either ‘z1c1’ or ‘z0c1’) when using constrained cluster change (either with n_allow_assignment_change or lagrangian_multiplier)

n_allow_assignment_change: int or None, default=None

total number of label changes to allow

use_sum_distance: bool, default=False

Indicate if to use sum of distance to cluster as the objective. This is the sum of the distances between points in a time series and their centroids.

warm_start: bool, default=True

Indicates if to use k-means to initialize the centroids (Z) and their assignments (C).

use_MILP_centroid: bool, default=True

If True, cluster_centers_ atrribute will be cluster centers obtained from MILP solution, else the average of the datapoints per timestep

random_state: int, default=None

Set the random seed used when initializing with k-means or when initializing samples when using constraint generation.

Attributes:
cluster_centers_

returns the cluster centers. If scheme is fixed centers, returns a k x F 2D array. Where k is the number of clusters and F is the number of features. If scheme is changing centers, returns a T x k x F 3D array. Where T is the number of time stesp, k is the number of clusters and F is the number of features.

fitted_data_shape_

returns a tuple of the shape of the fitted data in TNF format. E.g (T, N, F) where T, N, and F are the number of timesteps,

labels_

returns the assignment labels. values are integers in range [0, k-1], where k is the number of clusters. If scheme is fixed assignment, returns a 1D array of size N. Where N is the number of entities. A value of j at the i-th index means that entity i is assigned to the j-th cluster at all time steps. If scheme is changing assignment, returns a N x T 2D array. Where N is the number of entities and T is the number of time steps. A value of j at the i-th row and t-th column means that entity i is assigned to the j-th cluster at the t-th time step.

label_dict_

returns a dictionary of the labels whose keys are ‘T’, ‘N’, and ‘F’ (which are the number of time steps, entities, and features respectively). Value of each key is a list such that the value of key:

n_changes_

returns the total number of label changes

Methods

fit(X[, label_dict, verbose, print_to])

Method for fitting the model by solving the MILP model.

get_dynamic_entities()

returns the dynamic entities and their number of changes.

get_index_of_label(labels[, axis])

function to return the integer indexes of some given labelled items in self.label_dict_.

get_label_of_index(indexes[, axis])

function to return the labels of some given integer indexes as labelled in self.label_dict_.

get_model_size(X)

Method to return the size of the model as a tuple of (v, c).

get_named_cluster_centers([label_dict])

Method to return the cluster centers with custom names of time steps and features.

get_named_labels([label_dict])

Method to return the a data frame of the label assignments with custom names of time steps and entities.

set_label_dict(value)

Method to manually set the label_dict_.

fit(X: npt.NDArray[np.float64], label_dict: dict | None = None, verbose: bool = True, print_to: TextIO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, **kwargs) OptTSCluster

Method for fitting the model by solving the MILP model.

Parameters:
Xnumpy array

Input time series data. Should be a 3 dimensional array in TNF fromat.

label_dictdict, default=None
A dictionary of the labels of X. Keys should be ‘T’, ‘N’, and ‘F’ (which are the number of time steps, entities, and features respectively). Value of each key is a list such that the value of key:
  • ‘T’ is a list of names/labels of each time step used as index of each dataframe during fit. Default is range(0, T). Where T is the number of time steps in the fitted data

  • ‘N’ is a list of names/labels of each entity used as index of the dataframe. Default is range(0, N). Where N is the number of entities/observations in the fitted data

  • ‘F’ is a list of names/labels of each feature used as column of each dataframe. Default is range(0, F). Where F is the number of features in the fitted data

data_loader function from tscluster.preprocessing.utils can help in getting label_dict of a data.

verbosebool, default=True

If True, some model training information will be printed out. Set to False to surpress printouts

print_toTextIO, default=sys.stdout

An object with a write method to write model’s printout information during training. Default is standard output.

Returns:
self

The fitted OptTSCluster object.

get_dynamic_entities() Tuple[List[int64], List[int64]]

returns the dynamic entities and their number of changes. Both lists are sorted by the number of cluster changes in descending order.

Returns:
dynamic entitieslist

a 1-D array of the indexes of the entities that change cluster at least once.

number of changeslist

a 1-D array of the number of changes for each dynamic entity such that the i-th element is the number of cluster changes for the i-th dynamic entity

get_index_of_label(labels: List[str], axis: str = 'N') List[int]

function to return the integer indexes of some given labelled items in self.label_dict_. The indexes are assumed to be 0-indexed.

Parameters:
labelslist

a list of the label(s) whose integer indexes should be returned.

axisstr, default=’N’

can be any of {‘T’, ‘N’, ‘F’}. - If ‘T’, the values in the labels parameter are interpreted as time labels (as stored in self.label_dict_[‘T’]). - If ‘N’, the values in the labels parameter are interpreted as entity labels (as stored in self.label_dict_[‘N’]). - If ‘F’, the values in the labels parameter are interpreted as feature labels (as stored in self.label_dict_[‘F’]).

Returns:
list

a list of the integer indexes of the labels in the given axis dimension.

get_label_of_index(indexes: List[int], axis: str = 'N') List[str]

function to return the labels of some given integer indexes as labelled in self.label_dict_. The indexes are assumed to be 0-indexed.

Parameters:
indexeslist

a list of the index(es) whose labels should be returned.

axisstr, default=’N’

can be any of {‘T’, ‘N’, ‘F’}. - If ‘T’, the values in the indexes parameter are interpreted as the time indexes whose labels (as stored in self.label_dict_[‘T’]) should be returned. - If ‘N’, the values in the indexes parameter are interpreted as the entity indexes whose labels (as stored in self.label_dict_[‘N’]) should be returned. - If ‘F’, the values in the indexes parameter are interpreted as the feature indexes whose labels (as stored in self.label_dict_[‘F’]) should be returned.

Returns:
list

a list of the labels of the given integer indexes in the given axis dimension.

get_model_size(X: ndarray[Any, dtype[float64]]) Tuple

Method to return the size of the model as a tuple of (v, c). Wehre v is the number of variables, and c is the number of constraints

Parameters

Xnumpy array

Input time series data. Should be a 3 dimensional array in TNF fromat.

Returns:
number of variable

The number of variables in the model

number of constraints

The number of constraints

get_named_cluster_centers(label_dict: dict | None = None) List[pd.DataFrame]

Method to return the cluster centers with custom names of time steps and features.

Parameters:
label_dict dict, default=None

a dictionary whose keys are ‘T’, ‘N’, and ‘F’ (which are the number of time steps, entities, and features respectively). Value of each key is a list such that the value of key: - ‘T’ is a list of names/labels of each time step to be used as index of each dataframe. If None, range(0, T) is used. Where T is the number of time steps in the fitted data - ‘N’ is a list of names/labels of each entity to be used as index of the dataframe. If None, range(0, N) is used. Where N is the number of entities/observations in the fitted data - ‘F’ is a list of names/labels of each feature to be used as column of each dataframe. If None, range(0, F) is used. Where F is the number of features in the fitted data If label_dict is None, the result of self.label_dict_ is used.

Returns
——
list

A list of k pandas DataFrames. Where k is the number of clusters. The i-th dataframe in the list is a T x F dataframe of the values of the cluster centers of the i-th cluster.

get_named_labels(label_dict: dict | None = None) pd.DataFrame

Method to return the a data frame of the label assignments with custom names of time steps and entities.

Parameters:
label_dictdict, default=None

a dictionary whose keys are ‘T’, ‘N’, and ‘F’ (which are the number of time steps, entities, and features respectively). Value of each key is a list such that the value of key: - ‘T’ is a list of names/labels of each time step to be used as index of each dataframe. If None, range(0, T) is used. Where T is the number of time steps in the fitted data - ‘N’ is a list of names/labels of each entity to be used as index of the dataframe. If None, range(0, N) is used. Where N is the number of entities/observations in the fitted data - ‘F’ is a list of names/labels of each feature to be used as column of each dataframe. If None, range(0, F) is used. Where F is the number of features in the fitted data If label_dict is None, the result of self.label_dict_ is used.

Returns:
pd.DataFrame

A pandas DataFrame with shape (N, T). The value in the n-th row and t-th column is an integer indicating the custer assignment of the n-th entity/observation at time t.

set_label_dict(value: dict) None

Method to manually set the label_dict_.

Parameters:
valuedict

the value to set as label_dict_. Should be a dict with all of ‘T’, ‘N’, and ‘F’ (case sensitive, which are number of time steps, entities, and features respectively) as key. The value of each key is a list of labels for the key in the data. If your data don’t have values for any of the keys, set its value to None.

Returns:
dict

a dictionary whose keys are ‘T’, ‘N’, and ‘F’; and values are lists of the labels of each key.