greedytscluster module

GreedyTSCluster class

class tscluster.greedytscluster.GreedyTSCluster(n_clusters: int, scheme: str = 'z1c0', *, n_allow_assignment_change: None | int = None, random_state: None | int = None, initialization: str = 'kmeans++')

Bases: TSCluster, TSClusterInterface

(Under development) Class for Maxima Minimation (MM) algorithm (a.k.a. greedy algorithm) for time-series clustering. Throughout this doc and code, ‘z’ refers to cluster centers, while ‘c’ to label assignment. This creates an GreedyTSCluster object.

Parameters:
n_clustersint

The number of clusters to generate.

scheme: {‘z0c0’, ‘z0c1’, ‘z1c0’, ‘z1c1’}, default=’z1c0’
The scheme to use for tsclustering. Could be one of:
  • ‘z0c0’ means fixed center, fixed assignment

  • ‘z0c1’ means fixed center, changing assignment

  • ‘z1c0’ means changing center, fixed assignment

  • ‘z1c1’ means changing center, changing assignment

Scheme needs to be a dynamic label assignment scheme (either ‘z1c1’ or ‘z0c1’) when using constrained cluster change (either with n_allow_assignment_change)

n_allow_assignment_changeint or None, default=None

Penalty added to changing assignments over time for ‘c1’ schemes.

random_stateint or None, default=None

Random seed for reproducibility.

initializationstr, default=’kmeans++’

Method to initialize cluster centers. Must be one of {‘kmeans++’, ‘random’}.

Attributes:
cluster_centers_

Cluster centers learned by the model.

fitted_data_shape_

Shape of the data the model was fit on.

label_dict_

returns a dictionary of the labels whose keys are ‘T’, ‘N’, and ‘F’ (which are the number of time steps, entities, and features respectively). Value of each key is a list such that the value of key:

labels_

Cluster labels for each sample at each time step.

n_changes_

returns the total number of label changes

Methods

fit(X[, label_dict, verbose, print_to, max_iter])

Fit the temporal clustering model using greedy optimization.

get_dynamic_entities()

returns the dynamic entities and their number of changes.

get_index_of_label(labels[, axis])

function to return the integer indexes of some given labelled items in self.label_dict_.

get_label_of_index(indexes[, axis])

function to return the labels of some given integer indexes as labelled in self.label_dict_.

get_named_cluster_centers([label_dict])

Method to return the cluster centers with custom names of time steps and features.

get_named_labels([label_dict])

Method to return the a data frame of the label assignments with custom names of time steps and entities.

set_label_dict(value)

Method to manually set the label_dict_.

property cluster_centers_

Cluster centers learned by the model.

Returns:
np.ndarray of shape (T, K, F)

The cluster centroids for each cluster k and time t.

fit(X: npt.NDArray[np.float64], label_dict: dict | None = None, verbose: bool = True, print_to: TextIO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, max_iter: int = 1000, **kwargs) GreedyTSCluster

Fit the temporal clustering model using greedy optimization.

Parameters:
Xnp.ndarray of shape (T, N, F)

The input time series data, where T is the number of time steps, N is the number of samples, and F is the number of features.

label_dictdict, optional

Optional dictionary of axis labels used for interpretability.

verbosebool, default=True

If True, print progress and diagnostic information during fitting.

print_toTextIO, default=sys.stdout

File-like stream to output verbose logs.

max_iterint, default=1000

Maximum number of optimization iterations.

Returns:
selfGreedyTSCluster

The fitted model instance.

property fitted_data_shape_: Tuple[int, int, int]

Shape of the data the model was fit on.

Returns:
tuple of int

Tuple (T, N, F) corresponding to time, samples, and features.

property labels_

Cluster labels for each sample at each time step.

Returns:
np.ndarray of shape (N, T)

The cluster assignment for each sample and time.