metrics module

tscluster.metrics.inertia(X, cluster_centers, labels, ord=2)

Calculates the inertia score

This calculates the sum of the distance between all points and their cluster centers across the different time steps. See note.

Parameters:
Xnumpy array

Input time series data. Should be a 3 dimensional array in TNF fromat.

cluster_centersnumpy array

If numpy array, it is expected to be a 3D in TNF format. Here, N is the number of clusters. If 2-D array, then it is interpreted as a K x F array where K is the number of clusters, and F is the number of features. Suitable for fixed cluster centers clustering.

labelsnumpy array

It is expected to be a 2D array of shape (N, T). Where N is the number of entities and T is the number of time steps. The value of the ith row at the t-th column is the label (cluster index) entity i was assigned to at time t. If 1-D array, it is interpreted as an array of length N. Where N is the number of entities. In such case, the i-th element is the cluster the i-th entit was assigned to across all time steps. Suitable for fixed assignment clustering.

ordint, default2

The distance metric to use. 1 is l1 distance, 2 is l2 distance etc.

Returns:
float

The intertia value.

See also

max_dist

Calculates the maximum distance

Notes

The inertia is calculated as:

\[\sum_{t=1}^{T} \sum_{i=1}^{N} D(X_{ti}, Z_t) \]

Where T, N are the number of time steps and entities respectively, D is a distance function (or metric e.g \(L_1\) distance, \(L_2\) distance etc), \(X_{ti} \in \mathbf{R}^f\) is the feature vector of entity i at time t, f is the number of features, and \(Z_t \in \mathbf{R}^f\) is the cluster center \(X_{ti}\) is assigned to at time t

tscluster.metrics.max_dist(X: ndarray[Any, dtype[float64]], cluster_centers: ndarray[Any, dtype[float64]], labels: ndarray[Any, dtype[int64]], ord: int = 2) float64

Calculate the max_dist score

This calculates the maximum of the distance between all points and their cluster centers across the different time steps. See note.

Parameters:
Xnumpy array

Input time series data. Should be a 3 dimensional array in TNF fromat.

cluster_centersnumpy array

If numpy array, it is expected to be a 3D in TNF format. Here, N is the number of clusters. If 2-D array, then it is interpreted as a K x F array where K is the number of clusters, and F is the number of features. Suitable for fixed cluster centers clustering.

labelsnumpy array

It is expected to be a 2D array of shape (N, T). Where N is the number of entities and T is the number of time steps. The value of the ith row at the t-th column is the label (cluster index) entity i was assigned to at time t. If 1-D array, it is interpreted as an array of length N. Where N is the number of entities. In such case, the i-th element is the cluster the i-th entit was assigned to across all time steps. Suitable for fixed assignment clustering.

ordint, default2

The distance metric to use. 1 is l1 distance, 2 is l2 distance etc

Returns:
float

The max distance value.

See also

interia

Calculates the inertia score

Notes

The max_dist is calculated as:

\[max(D(X_{ti}, Z_t)) \]

Where D is a distance function (or metric e.g \(L_1\) distance, \(L_2\) distance etc), \(X_{ti} \in \mathbf{R}^f\) is the feature vector of entity i at time t, f is the number of features, and \(Z_t \in \mathbf{R}^f\) is the cluster center \(X_{ti}\) is assigned to at time t