11module ClusteringAPI
22
3- # Use the README as the module docs
4- @doc let
5- path = joinpath (dirname (@__DIR__ ), " README.md" )
6- include_dependency (path)
7- read (path, String)
8- end ClusteringAPI
9-
103export ClusteringAlgorithm, ClusteringResults
11- export cluster, cluster_number, cluster_labels
4+ export cluster, cluster_number, cluster_labels, cluster_probs
125
136abstract type ClusteringAlgorithm end
147abstract type ClusteringResults end
@@ -18,29 +11,33 @@ abstract type ClusteringResults end
1811
1912Cluster input `data` according to the algorithm specified by `ca`.
2013All options related to the algorithm are given as keyword arguments when
21- constructing `ca`. The input data can be specified two ways:
14+ constructing `ca`.
2215
23- - as a (d, m) matrix, with d the dimension of the data points and m the amount of
24- data points (i.e., each column is a data point).
25- - as a length-m vector of length-d vectors (i.e., each inner vector is a data point).
16+ The input data are a length-m vector of length-d vectors.
17+ "Vector" here is considered in the generalized sense, i.e., any objects that
18+ a distance can be defined on them. Some clustering algorithms may allow alternative
19+ data input type for performance acceleration.
2620
21+ The output is always a subtype of `ClusteringResults` that can be further queried.
2722The cluster labels are always the
28- positive integers `1:n` with `n::Int` the number of created clusters.
23+ positive integers `1:n` with `n::Int` the number of created clusters,
24+ Data points that couldn't get clustered (e.g., outliers or noise)
25+ get assigned negative integers, typically just `-1`.
2926
30- The output is always a subtype of `ClusteringResults`,
31- which always extends the following two methods:
27+ `ClusteringResults` subtypes always implement the following functions:
3228
29+ - `cluster_labels(cr)` returns a length-m vector `labels::Vector{Int}` containing
30+ the clustering labels (most of which are of `1:n` while some may be negative integers).
31+ - `cluster_probs(cr)` returns `probs` a length-m vector of length-`n` vectors
32+ containing the "probabilities" or "score" of each point belonging to one of
33+ the created clusters (used with fuzzy clustering algorithms).
3334- `cluster_number(cr)` returns `n`.
34- - `cluster_labels(cr)` returns `labels::Vector{Int}` a length-m vector of labels
35- mapping each data point to each cluster (`1:n`).
36-
37- and always includes `ca` in the field `algorithm`.
3835
3936Other algorithm-related output can be obtained as a field of the result type,
40- or other specific functions of the result type.
41- This is described in the individual algorithm implementations.
37+ or by using other specific functions of the result type.
38+ This is described in the individual algorithm implementations docstrings .
4239"""
43- function cluster (ca:: ClusteringAlgorithm , data:: AbstractMatrix )
40+ function cluster (ca:: ClusteringAlgorithm , data)
4441 throw (ArgumentError (" No implementation for `cluster` for $(typeof (ca)) ." ))
4542end
4643
5047Return the number of created clusters in the output of [`cluster`](@ref).
5148"""
5249function cluster_number (cr:: ClusteringResults )
53- return length ( Set (cluster_labels (cr))) # fastest way to count unique elements
50+ return count ( > ( 0 ), Set (cluster_labels (cr))) # fastest way to count positive labels
5451end
5552
5653"""
57- cluster_labels(cr::ClusteringResults) → labels ::Vector{Int }
54+ cluster_labels(cr::ClusteringResults) → probs ::Vector{Vector{Real} }
5855
5956Return the cluster labels of the data points used in [`cluster`](@ref).
6057"""
6158function cluster_labels (cr:: ClusteringResults )
6259 return cr. labels # typically there
6360end
6461
62+ """
63+ cluster_probs(cr::ClusteringResults) → probs::Vector{Vector{Real}}
64+
65+ Return the cluster probabilities of the data points used in [`cluster`](@ref).
66+ They are length-`n` vectors containing the "probabilities" or "score" of each point
67+ belonging to one of the created clusters (used with fuzzy clustering algorithms).
68+ """
69+ function cluster_labels (cr:: ClusteringResults )
70+ return cr. labels # typically there
71+ end
72+
6573# two helper functions for agnostic input data type
6674"""
6775 input_data_size(data) → (d, m)
0 commit comments