Skip to content

Commit abd32fc

Browse files
committed
more
1 parent ad1362d commit abd32fc

28 files changed

+145
-78
lines changed

.devel/sphinx/bibliography.bib

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,23 @@
1-
@article{nca,
2-
author = {M. Gagolewski},
3-
title = {Normalised clustering accuracy: {A}n asymmetric external cluster validity measure},
4-
journal = {Journal of Classification},
5-
year = {2024},
6-
url = {https://link.springer.com/content/pdf/10.1007/s00357-024-09482-2.pdf},
7-
doi = {10.1007/s00357-024-09482-2},
8-
note = {in press}
9-
}
10-
111
@article{cvimst,
122
author = {M. Gagolewski and A. Cena and M. Bartoszuk and L. Brzozowski},
133
title = {Clustering with minimum spanning trees: {H}ow good can it be?},
144
journal = {Journal of Classification},
15-
year = {2024},
5+
year = {2025},
6+
volume = {42},
7+
pages = {90--112},
168
url = {https://link.springer.com/content/pdf/10.1007/s00357-024-09483-1.pdf},
179
doi = {10.1007/s00357-024-09483-1},
18-
note = {in press}
10+
}
11+
12+
@article{nca,
13+
author = {M. Gagolewski},
14+
title = {Normalised clustering accuracy: {A}n asymmetric external cluster validity measure},
15+
journal = {Journal of Classification},
16+
year = {2025},
17+
volume = {42},
18+
pages = {2--30},
19+
url = {https://link.springer.com/content/pdf/10.1007/s00357-024-09482-2.pdf},
20+
doi = {10.1007/s00357-024-09482-2},
1921
}
2022

2123
@article{clustering_benchmarks,

.devel/sphinx/weave/data-v1.Rmd

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,3 +90,10 @@ cat(readLines("include-dataset-browser.js"), sep="\n")
9090
```
9191
</script>
9292
::::
93+
94+
95+
::::{important}
96+
As a courtesy, **please cite** the original source as well as the current project
97+
{cite}`clustering_benchmarks` as well as mention {cite}`clustering_data_v1`
98+
which gives the exact version and URL of the dataset suite. Thank you.
99+
::::

.devel/sphinx/weave/data-v1.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -217,3 +217,10 @@ window.onhashchange = locationHashChanged;
217217
locationHashChanged();
218218
</script>
219219
::::
220+
221+
222+
::::{important}
223+
As a courtesy, **please cite** the original source as well as the current project
224+
{cite}`clustering_benchmarks` as well as mention {cite}`clustering_data_v1`
225+
which gives the exact version and URL of the dataset suite. Thank you.
226+
::::

.devel/sphinx/weave/how-to-access.Rmd

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -113,9 +113,10 @@ former can be called from within the latter.
113113

114114
## Julia
115115

116-
Very similar to Python and R the datasets can be accessed in Julia using the [*CSV.jl*](https://csv.juliadata.org) package.
116+
Very similar to Python and R the datasets can be accessed
117+
in Julia using the [*CSV.jl*](https://csv.juliadata.org) package.
117118

118-
```{julia}
119+
```julia
119120
using CSV
120121

121122
base_name = joinpath("~", "Projects", "clustering-data-v1", "wut", "smile")
@@ -128,5 +129,8 @@ labels = CSV.read(base_name * ".labels0.gz", CSV.Tables.matrix; header=false, de
128129
::::{todo}
129130
Contributions are welcome: Describe how to load
130131
the datasets and benchmark results
131-
in GNU Octave, Scilab, Julia, Mathematica, ... (🚧 help needed 🚧)
132+
in GNU Octave, Scilab, Mathematica, ... (🚧 help needed 🚧)
133+
134+
Thanks to [Torsten Stöter](https://github.com/tstoeter) for contributing
135+
the Julia code.
132136
::::

.devel/sphinx/weave/how-to-access.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,9 +135,27 @@ former can be called from within the latter.
135135

136136

137137

138+
## Julia
139+
140+
Very similar to Python and R the datasets can be accessed
141+
in Julia using the [*CSV.jl*](https://csv.juliadata.org) package.
142+
143+
144+
```julia
145+
using CSV
146+
147+
base_name = joinpath("~", "Projects", "clustering-data-v1", "wut", "smile")
148+
base_name = expanduser(base_name)
149+
data = CSV.read(base_name * ".data.gz", CSV.Tables.matrix; header=false, delim=' ')
150+
labels = CSV.read(base_name * ".labels0.gz", CSV.Tables.matrix; header=false, delim=' ')
151+
```
152+
138153

139154
::::{todo}
140155
Contributions are welcome: Describe how to load
141156
the datasets and benchmark results
142-
in GNU Octave, Scilab, Julia, Mathematica, ... (🚧 help needed 🚧)
157+
in GNU Octave, Scilab, Mathematica, ... (🚧 help needed 🚧)
158+
159+
Thanks to [Torsten Stöter](https://github.com/tstoeter) for contributing
160+
the Julia code.
143161
::::

.devel/sphinx/weave/suite-v1.Rmd

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ def summarise_battery(battery):
4444
(sec:suite-v1)=
4545
# Benchmark Suite (v`r VERSION`)
4646

47-
We have compiled **a large suite of benchmark datasets**.
47+
We have compiled, curated, and polished **a large suite of benchmark datasets**.
4848
For reproducibility, the datasets and label vectors are **versioned**.
4949

5050

@@ -77,7 +77,7 @@ index *g*, where $g=0$ means that all clusters consist of the same number
7777
of points.
7878

7979

80-
::::{important}
80+
::::{note}
8181
The versioned **snapshots of the suite** are available for download at:
8282
<https://github.com/gagolews/clustering-data-v1/releases/tag/v`r VERSION`>.
8383

@@ -107,9 +107,11 @@ each dataset is accompanied by a text file specifying more details thereon
107107
(e.g., the literature references that we are asked to cite).
108108

109109

110-
As a courtesy, **please cite** also the current project
110+
::::{important}
111+
As a courtesy, **please cite** the original source as well as the current project
111112
{cite}`clustering_benchmarks` as well as mention {cite}`clustering_data_v1`
112113
which gives the exact version and URL of the dataset suite. Thank you.
114+
::::
113115

114116

115117
There is some inherent overlap between the original databases.
@@ -189,7 +191,7 @@ summarise_battery("sipu")
189191
(sec:battery-fcps)=
190192
## `fcps`
191193

192-
9 datasets from the *Fundamental Clustering Problem Suite*
194+
Nine datasets from the *Fundamental Clustering Problem Suite*
193195
proposed by A. Ultsch {cite}`fcps` from the Marburg University,
194196
Germany.
195197

@@ -214,7 +216,7 @@ summarise_battery("fcps")
214216
(sec:battery-graves)=
215217
## `graves`
216218

217-
10 *synthetic data sets* discussed by D. Graves and W. Pedrycz
219+
Ten *synthetic data sets* discussed by D. Graves and W. Pedrycz
218220
in {cite}`graves`.
219221

220222
The dataset consist of 200–1050 observations in 2 dimensions.
@@ -272,7 +274,7 @@ summarise_battery("other")
272274
(sec:battery-uci)=
273275
## `uci`
274276

275-
A selection of 8 high-dimensional datasets available through the UCI
277+
A selection of eight high-dimensional datasets available through the UCI
276278
(University of California, Irvine)
277279
[Machine Learning Repository](http://archive.ics.uci.edu/ml/) {cite}`uci`.
278280
Some of them were considered for benchmark purposes

.devel/sphinx/weave/suite-v1.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
(sec:suite-v1)=
1111
# Benchmark Suite (v1.1.0)
1212

13-
We have compiled **a large suite of benchmark datasets**.
13+
We have compiled, curated, and polished **a large suite of benchmark datasets**.
1414
For reproducibility, the datasets and label vectors are **versioned**.
1515

1616

@@ -43,7 +43,7 @@ index *g*, where $g=0$ means that all clusters consist of the same number
4343
of points.
4444

4545

46-
::::{important}
46+
::::{note}
4747
The versioned **snapshots of the suite** are available for download at:
4848
<https://github.com/gagolews/clustering-data-v1/releases/tag/v1.1.0>.
4949

@@ -70,9 +70,11 @@ each dataset is accompanied by a text file specifying more details thereon
7070
(e.g., the literature references that we are asked to cite).
7171

7272

73-
As a courtesy, **please cite** also the current project
73+
::::{important}
74+
As a courtesy, **please cite** the original source as well as the current project
7475
{cite}`clustering_benchmarks` as well as mention {cite}`clustering_data_v1`
7576
which gives the exact version and URL of the dataset suite. Thank you.
77+
::::
7678

7779

7880
There is some inherent overlap between the original databases.
@@ -203,7 +205,7 @@ We excluded the `DIM`-sets as they are too easy for most algorithms.
203205
(sec:battery-fcps)=
204206
## `fcps`
205207

206-
9 datasets from the *Fundamental Clustering Problem Suite*
208+
Nine datasets from the *Fundamental Clustering Problem Suite*
207209
proposed by A. Ultsch {cite}`fcps` from the Marburg University,
208210
Germany.
209211

@@ -238,7 +240,7 @@ see also {cite}`ThrunUltsch2020:fcps`.
238240
(sec:battery-graves)=
239241
## `graves`
240242

241-
10 *synthetic data sets* discussed by D. Graves and W. Pedrycz
243+
Ten *synthetic data sets* discussed by D. Graves and W. Pedrycz
242244
in {cite}`graves`.
243245

244246
The dataset consist of 200–1050 observations in 2 dimensions.
@@ -316,7 +318,7 @@ Datasets from multiple sources:
316318
(sec:battery-uci)=
317319
## `uci`
318320

319-
A selection of 8 high-dimensional datasets available through the UCI
321+
A selection of eight high-dimensional datasets available through the UCI
320322
(University of California, Irvine)
321323
[Machine Learning Repository](http://archive.ics.uci.edu/ml/) {cite}`uci`.
322324
Some of them were considered for benchmark purposes

docs/clustbench-documentation.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -895,7 +895,7 @@ <h1>Documentation<a class="headerlink" href="#documentation" title="Link to this
895895
Some rights reserved. Licensed under <a href='https://creativecommons.org/licenses/by-nc-nd/4.0/'>CC BY-NC-ND 4.0</a>.
896896
Built with <a href="https://sphinx-doc.org/">Sphinx</a>
897897
and a customised <a href="https://github.com/pradyunsg/furo">Furo</a> theme.
898-
Last updated on 2025-04-08T13:43:38+0200.
898+
Last updated on 2025-05-21T11:51:24+0200.
899899
This site will never display any ads: it is a non-profit project.
900900
It does not collect any data.
901901
</div>

docs/genindex.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -452,7 +452,7 @@ <h2>T</h2>
452452
Some rights reserved. Licensed under <a href='https://creativecommons.org/licenses/by-nc-nd/4.0/'>CC BY-NC-ND 4.0</a>.
453453
Built with <a href="https://sphinx-doc.org/">Sphinx</a>
454454
and a customised <a href="https://github.com/pradyunsg/furo">Furo</a> theme.
455-
Last updated on 2025-04-08T13:43:38+0200.
455+
Last updated on 2025-05-21T11:51:24+0200.
456456
This site will never display any ads: it is a non-profit project.
457457
It does not collect any data.
458458
</div>

docs/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -431,7 +431,7 @@ <h1>A Framework for Benchmarking Clustering Algorithms<a class="headerlink" href
431431
Some rights reserved. Licensed under <a href='https://creativecommons.org/licenses/by-nc-nd/4.0/'>CC BY-NC-ND 4.0</a>.
432432
Built with <a href="https://sphinx-doc.org/">Sphinx</a>
433433
and a customised <a href="https://github.com/pradyunsg/furo">Furo</a> theme.
434-
Last updated on 2025-04-08T13:43:38+0200.
434+
Last updated on 2025-05-21T11:51:24+0200.
435435
This site will never display any ads: it is a non-profit project.
436436
It does not collect any data.
437437
</div>

0 commit comments

Comments
 (0)