metrics module
- tscluster.metrics.inertia(X, cluster_centers, labels, ord=2)
Calculates the inertia score
This calculates the sum of the distance between all points and their cluster centers across the different time steps. See note.
- Parameters:
- Xnumpy array
Input time series data. Should be a 3 dimensional array in TNF fromat.
- cluster_centersnumpy array
If numpy array, it is expected to be a 3D in TNF format. Here, N is the number of clusters. If 2-D array, then it is interpreted as a K x F array where K is the number of clusters, and F is the number of features. Suitable for fixed cluster centers clustering.
- labelsnumpy array
It is expected to be a 2D array of shape (N, T). Where N is the number of entities and T is the number of time steps. The value of the ith row at the t-th column is the label (cluster index) entity i was assigned to at time t. If 1-D array, it is interpreted as an array of length N. Where N is the number of entities. In such case, the i-th element is the cluster the i-th entit was assigned to across all time steps. Suitable for fixed assignment clustering.
- ordint, default2
The distance metric to use. 1 is l1 distance, 2 is l2 distance etc.
- Returns:
- float
The intertia value.
See also
max_distCalculates the maximum distance
Notes
The inertia is calculated as:
\[\sum_{t=1}^{T} \sum_{i=1}^{N} D(X_{ti}, Z_t) \]Where T, N are the number of time steps and entities respectively, D is a distance function (or metric e.g \(L_1\) distance, \(L_2\) distance etc), \(X_{ti} \in \mathbf{R}^f\) is the feature vector of entity i at time t, f is the number of features, and \(Z_t \in \mathbf{R}^f\) is the cluster center \(X_{ti}\) is assigned to at time t
- tscluster.metrics.max_dist(X: ndarray[Any, dtype[float64]], cluster_centers: ndarray[Any, dtype[float64]], labels: ndarray[Any, dtype[int64]], ord: int = 2) float64
Calculate the max_dist score
This calculates the maximum of the distance between all points and their cluster centers across the different time steps. See note.
- Parameters:
- Xnumpy array
Input time series data. Should be a 3 dimensional array in TNF fromat.
- cluster_centersnumpy array
If numpy array, it is expected to be a 3D in TNF format. Here, N is the number of clusters. If 2-D array, then it is interpreted as a K x F array where K is the number of clusters, and F is the number of features. Suitable for fixed cluster centers clustering.
- labelsnumpy array
It is expected to be a 2D array of shape (N, T). Where N is the number of entities and T is the number of time steps. The value of the ith row at the t-th column is the label (cluster index) entity i was assigned to at time t. If 1-D array, it is interpreted as an array of length N. Where N is the number of entities. In such case, the i-th element is the cluster the i-th entit was assigned to across all time steps. Suitable for fixed assignment clustering.
- ordint, default2
The distance metric to use. 1 is l1 distance, 2 is l2 distance etc
- Returns:
- float
The max distance value.
See also
interiaCalculates the inertia score
Notes
The max_dist is calculated as:
\[max(D(X_{ti}, Z_t)) \]Where D is a distance function (or metric e.g \(L_1\) distance, \(L_2\) distance etc), \(X_{ti} \in \mathbf{R}^f\) is the feature vector of entity i at time t, f is the number of features, and \(Z_t \in \mathbf{R}^f\) is the cluster center \(X_{ti}\) is assigned to at time t