Skip to content

Commit e66302f

Browse files
authored
Merge pull request #331 from zevv/histogram
v1.5.0-rc1
2 parents ffb83e6 + dc9c8c8 commit e66302f

18 files changed

+762
-83
lines changed

ChangeLog

+22-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,25 @@
1-
1+
1.5.0-rc1 (2024-09-03)
2+
- new: added support for tkrzw backend DB and made it the default
3+
- this DB is newer and under active support compared to
4+
TokyoCabinet, KyotoCabinet, etc.
5+
- also supports really large filesystems. Big thanks to
6+
stuartthebruce for testing and debugging (Issue #300)
7+
- new: added support to tracking topN largest files in
8+
filesystem. (Issue #284)
9+
- new: added '-T' option to duc index to change maximum default number of
10+
topN files to track.
11+
- new: added '-B' option to duc index to change number of histogram buckets.
12+
- new: added 'topn' command to show topN files stored in
13+
DB. defaults to 10 currently.
14+
- new: added 't' key in duc ui to toggle between regular and topN
15+
display mode. Initial support.
16+
- new: added histogram report of filesizes found during
17+
indexing to 'duc info'. (Issue #284)
18+
- new: added 'H' or '--histogram' option to duc info
19+
- new: added 'duc histogram' command (Issue #284)
20+
- needs work still, especially CGI, UI and GUI output.
21+
- fix:
22+
223
1.4.5 (2022-07-29)
324

425
- new: added '-u' option to duc index to index by username

INSTALL

+83-39
Original file line numberDiff line numberDiff line change
@@ -16,20 +16,49 @@ Generate the configure script when it is not available (cloned git repo):
1616
To get the required dependencies on Debian or Ubuntu, run:
1717

1818
$ sudo apt-get install libncursesw5-dev libcairo2-dev libpango1.0-dev \
19-
build-essential libtokyocabinet-dev
19+
build-essential libtkrzw-dev
2020

21+
On Debian 11 (bullseye), you need to have the following line in your
22+
/etc/apt/sources.list file:
2123

22-
On RHEL or CentOS systems, you need to do:
24+
deb http://deb.debian.org/debian bullseye-backports main
25+
26+
Then you would do:
27+
28+
$ sudo apt update
29+
30+
$ sudo apt-get install libncursesw5-dev libcairo2-dev libpango1.0-dev \
31+
build-essential libtkrzw-dev tkrzw-doc tkrzw-utils
32+
33+
34+
On older RHEL or CentOS systems, you need to do:
2335

2436
$ sudo yum install pango-devel cairo-devel tokyocabinet-devel
2537

2638

27-
Duc comes with various user interfaces and a number of backends for database
28-
access and graph drawing. You can choose which options should be used with the
29-
./configure script to build Duc to fit best in your environment.
39+
RHEL 8 & 9 / Rockly Linux 8 & 9 / Alma Linux 8 & 9
40+
41+
Install epel-release & update
42+
43+
$ sudo yum install epel-release
44+
$ sudo yum update
45+
46+
Install tkrzw and other packages:
47+
48+
$ sudo yum install tkrzw tkrzw-devel tkrzw-doc tkrzw-libs pango-devel cairo-devel tokyocabinet-devel
49+
50+
51+
Configuration Options
52+
---------------------
53+
54+
Duc comes with support for various user interfaces and a number of
55+
backends for database access and graph drawing. You can choose which
56+
options should be used with the ./configure script to build Duc to fit
57+
best in your environment.
3058

3159
This document describes the various options which can be passed to the
32-
./configure script, and the impact these options have on Duc functionality.
60+
./configure script, and the impact these options have on Duc
61+
functionality. But the ./configure --help is the definitive source.
3362

3463

3564
User interfaces
@@ -38,7 +67,7 @@ User interfaces
3867
Duc comes with the following user interfaces:
3968

4069
- Command line interface (duc ls): This user interface has no external
41-
dependencies and is always enabled
70+
dependencies and is always enabled.
4271

4372
- Ncurses console interface (duc ui): an interactive console interface, which
4473
depends on ncurses or ncursesw. This user interface is enabled by default. If
@@ -59,51 +88,64 @@ Duc comes with the following user interfaces:
5988
--enable-opengl --disable-x11
6089

6190

62-
Database backend
63-
----------------
91+
Database backends
92+
-----------------
6493

6594
Duc supports various key-value database backends:
6695
- Tokyo Cabinet: tokyocabinet
6796
- LevelDB: leveldb
6897
- Sqlite3: sqlite3
6998
- Lightning Memory-Mapped Database: lmdb
7099
- Kyoto Cabinet: kyotocabinet
100+
- Tkrzw: tkrzw (default as of v1.5.0)
71101

72-
Duc uses Tokyo Cabinet by default: the performance is acceptable and generates
73-
in the smallest database size.
102+
Duc now uses Tkrzw by default: the performance is acceptable and it
103+
handles extremely large databases of volumes with terabytes of storage
104+
and millions of files.
74105

75106
--with-db-backend=ARG
76107

77-
If your system supports none of the above, contact the author to see if we can
78-
add your favourite backend.
108+
If your system supports none of the above, contact the authors to see
109+
if we can add your favourite backend.
110+
111+
Please note: Not all database formats can be shared between machines
112+
with different architectures. Notably, Tokyo Cabinet is built with
113+
non-standard options which break compatibility with other linux
114+
distributions, even on the same architecture [1]. If you are planning
115+
to share databases between different platforms (index machine A,
116+
display on machine B) we recommend using the sqlite3 backend.
79117

80-
Please note: Not all database formats can be shared between machines with
81-
different architectures. Notably, Tokyo Cabinet is built with non-standard
82-
options which break compatibility with other linux distributions, even on the
83-
same architecture [1]. If you are planning to share databases between different
84-
platforms (index machine A, display on machine B) we recommend using the
85-
sqlite3 backend.
118+
Note, Tokyo Cabiner, Kyoto Cabinet, LevelDB and LMDB are all being
119+
deprecated from future versions because the lack of development and
120+
support for these libraries, especially for super large volumes to be
121+
indexed.
86122

87123
1. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=667979
88124

89-
When picking a backend you probably need to choose between speed, size and
90-
robustness. Some measurements on my system of a 372G directory with 1.6M files:
91-
92-
----------------------------------
93-
Database Run time Db size
94-
(s) (kB)
95-
----------------------------------
96-
tokyocabinet [*] 8.4 19.2
97-
leveldb 7.1 31.5
98-
sqlite3 13.5 71.1
99-
lmdb 5.9 78.7
100-
kyotocabinet 8.3 26.7
101-
----------------------------------
102-
103-
[*] Tokyo Cabinet currenty is the default used by Duc because of the good
104-
compression and reasonable performance. A problem is that Tokyo Cabinet is not
105-
very stable and can create corrupt databases when interrupting the indexing. If
106-
this is a problem for you, choose a different db backend.
125+
When picking a backend you probably need to choose between speed, size
126+
and robustness. Some (out of date) measurements on a system with a
127+
372G directory containing 1.6M files:
128+
129+
----------------------------------
130+
Database Run time Db size
131+
(s) (kB)
132+
----------------------------------
133+
tokyocabinet 8.4 19.2
134+
leveldb 7.1 31.5
135+
sqlite3 13.5 71.1
136+
lmdb 5.9 78.7
137+
kyotocabinet 8.3 26.7
138+
tkrzw [*] ??? ???
139+
----------------------------------
140+
141+
142+
[*] Tkrzw currently is the default used by Duc because of it's current
143+
development, good compression and reasonable performance.
144+
145+
Tokyo Cabinet is not very stable and can create corrupt databases when
146+
interrupting the indexing. If this is a problem for you, choose a
147+
different db backend.
148+
107149

108150
Graphics
109151
--------
@@ -137,7 +179,8 @@ embedded systems not all graphics libraries are available.
137179
Testing
138180
-------
139181

140-
Duc comes with a rudimentary test harness which can be run with
182+
Duc comes with a rudimentary test harness which can be run at the top
183+
level directory with:
141184

142185
./test.sh
143186

@@ -148,5 +191,6 @@ If you have valgrind and you want to run the tests using it do:
148191
It will complain if you try this and valgrind isn't installed. The
149192
test harness still needs work and more tests, but should hopefully
150193
help keep us from re-introducing bugs as they are fixed and checked
151-
for.
194+
for. We would love to see more tests and a better harness, patches
195+
welcome!
152196

Makefile.am

+6-2
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ duc_SOURCES := \
77
src/libduc/db.c \
88
src/libduc/db.h \
99
src/libduc/db-tokyo.c \
10+
src/libduc/db-tkrzw.c \
1011
src/libduc/db-kyoto.c \
1112
src/libduc/db-leveldb.c \
1213
src/libduc/db-sqlite3.c \
@@ -44,9 +45,11 @@ duc_SOURCES += \
4445
src/duc/cmd-guigl.c \
4546
src/duc/cmd.h \
4647
src/duc/cmd.h \
48+
src/duc/cmd-histogram.c \
4749
src/duc/cmd-index.c \
4850
src/duc/cmd-info.c \
4951
src/duc/cmd-ls.c \
52+
src/duc/cmd-topn.c \
5053
src/duc/cmd-ui.c \
5154
src/duc/cmd-xml.c \
5255
src/duc/cmd-json.c \
@@ -56,11 +59,12 @@ duc_SOURCES += \
5659

5760

5861
AM_CFLAGS := @CAIRO_CFLAGS@ @PANGO_CFLAGS@ @PANGOCAIRO_CFLAGS@
59-
AM_CFLAGS += @TC_CFLAGS@ @SQLITE3_CFLAGS@ @GLFW3_CFLAGS@ @LMDB_CFLAGS@ @KC_CFLAGS@
62+
AM_CFLAGS += @TC_CFLAGS@ @SQLITE3_CFLAGS@ @GLFW3_CFLAGS@ @LMDB_CFLAGS@
63+
AM_CFLAGS += @KC_CFLAGS@ @TKRZW_CFLAGS@
6064
AM_CFLAGS += -Isrc/libduc -Isrc/libduc-graph -Isrc/glad
6165

6266
duc_LDADD := @CAIRO_LIBS@ @PANGO_LIBS@ @PANGOCAIRO_LIBS@
63-
duc_LDADD += @TC_LIBS@ @SQLITE3_LIBS@ @GLFW3_LIBS@ @LMDB_LIBS@ @KC_LIBS@
67+
duc_LDADD += @TC_LIBS@ @SQLITE3_LIBS@ @GLFW3_LIBS@ @LMDB_LIBS@ @KC_LIBS@ @TKRZW_LIBS@
6468

6569
man1_MANS = \
6670
doc/duc.1

TODO.md

+6
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,12 @@ I'm still keeping the requests here for future reference or for if I ever get
55
bored and have a lot of time on my hands. Anybody is free to pick and implement
66
any of these tasks, of course!
77

8+
### Edit database to remove path(s) from Index and do all cleanup
9+
10+
This should be a simple change to add, though it does require some hackery to
11+
remove entries from the records[] array in the DB. Needs thought. Currently
12+
only solution would be to index to a totally new DB file with only the path(s)
13+
you want.
814

915
### Show increase since last index or time period
1016

configure.ac

+14-4
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
AC_PREREQ([2.13])
99

10-
AC_INIT([duc], [1.4.5], [[email protected]])
10+
AC_INIT([duc], [1.5.0-rc1], [[email protected]])
1111

1212
LIB_CURRENT=1
1313
LIB_REVISION=0
@@ -60,8 +60,8 @@ AC_ARG_ENABLE(
6060

6161
AC_ARG_WITH(
6262
[db-backend],
63-
[AS_HELP_STRING([--with-db-backend], [select database backend (tokyocabinet,leveldb,sqlite3,lmdb,kyotocabinet) @<:@default=tokyocabinet@:>@])], ,
64-
[with_db_backend="tokyocabinet"]
63+
[AS_HELP_STRING([--with-db-backend], [select database backend (tokyocabinet,leveldb,sqlite3,lmdb,kyotocabinet,tkrzw) @<:@default=tkrzw@:>@])], ,
64+
[with_db_backend="tkrzw"]
6565
)
6666

6767
AC_MSG_RESULT([Selected backend ${with_db_backend}])
@@ -75,6 +75,16 @@ case "${with_db_backend}" in
7575
PKG_CHECK_MODULES([TC], [tokyocabinet])
7676
AC_DEFINE([ENABLE_TOKYOCABINET], [1], [Enable tokyocabinet db backend])
7777
;;
78+
tkrzw)
79+
LDFLAGS="$outer_LDFLAGS -ltkrzw"
80+
AC_CHECK_LIB(tkrzw, tkrzw_get_last_status,
81+
[
82+
TKRZW_LIBS="-ltkrzw"
83+
AC_DEFINE([ENABLE_TKRZW], [1], [Enable tkrzw db backend])
84+
], [ AC_MSG_ERROR(Unable to find tkrzw) ])
85+
AC_SUBST([TKRZW_LIBS])
86+
p AC_SUBST([TKRZW_CFLAGS])
87+
;;
7888
leveldb)
7989
AC_CHECK_LIB([leveldb], [leveldb_open])
8090
AC_DEFINE([ENABLE_LEVELDB], [1], [Enable leveldb db backend])
@@ -98,7 +108,7 @@ case "${with_db_backend}" in
98108
AC_DEFINE([ENABLE_KYOTOCABINET], [1], [Enable kyotocabinet db backend])
99109
;;
100110
*)
101-
AC_MSG_ERROR([Unsupported db-backend])
111+
AC_MSG_ERROR([Unsupported db-backend "${with_db_backend}"])
102112
esac
103113

104114
AC_DEFINE_UNQUOTED(DB_BACKEND, ["${with_db_backend}"], [Database backend])

src/duc/cmd-gui.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -235,7 +235,7 @@ int gui_main(duc *duc, int argc, char *argv[])
235235

236236
int r = duc_open(duc, opt_database, DUC_OPEN_RO);
237237
if(r != DUC_OK) {
238-
duc_log(duc, DUC_LOG_FTL, "%s", duc_strerror(duc));
238+
//duc_log(duc, DUC_LOG_FTL, "%s", duc_strerror(duc));
239239
return -1;
240240
}
241241

src/duc/cmd-histogram.c

+88
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
2+
#include "config.h"
3+
4+
#include <limits.h>
5+
#include <stdlib.h>
6+
#include <stdio.h>
7+
#include <string.h>
8+
#include <time.h>
9+
#include <errno.h>
10+
#include <sys/types.h>
11+
#include <sys/stat.h>
12+
#include <unistd.h>
13+
#include <math.h>
14+
15+
#include "cmd.h"
16+
#include "duc.h"
17+
18+
static bool opt_apparent = false;
19+
static bool opt_base = false;
20+
static bool opt_bytes = false;
21+
static char *opt_database = NULL;
22+
23+
static int histogram_db(duc *duc, char *file)
24+
{
25+
struct duc_index_report *report;
26+
duc_size_type st = opt_apparent ? DUC_SIZE_TYPE_APPARENT : DUC_SIZE_TYPE_ACTUAL;
27+
int i = 0;
28+
29+
int r = duc_open(duc, file, DUC_OPEN_RO);
30+
if(r != DUC_OK) {
31+
duc_log(duc, DUC_LOG_FTL, "%s", duc_strerror(duc));
32+
return -1;
33+
}
34+
35+
while(( report = duc_get_report(duc, i)) != NULL) {
36+
37+
printf("Path: %s\n%3s %10s %10s\n",report->path,"Bkt","Size","Count");
38+
39+
size_t count;
40+
size_t bucket_size = 0;
41+
char pretty[32];
42+
setlocale(LC_NUMERIC, "");
43+
for (int i=0; i < report->histogram_buckets; i++) {
44+
count = report->histogram[i];
45+
bucket_size = pow(2, i);
46+
int ret = humanize(bucket_size, 0, 1024, pretty, sizeof pretty);
47+
printf("%3d %10s %'10d\n",i, pretty, count);
48+
}
49+
50+
duc_index_report_free(report);
51+
printf("\n");
52+
i++;
53+
}
54+
55+
duc_close(duc);
56+
57+
return 0;
58+
}
59+
60+
61+
static int histogram_main(duc *duc, int argc, char **argv)
62+
{
63+
return(histogram_db(duc, opt_database));
64+
}
65+
66+
67+
static struct ducrc_option options[] = {
68+
{ &opt_apparent, "apparent", 'a', DUCRC_TYPE_BOOL, "show apparent instead of actual file size" },
69+
{ &opt_bytes, "bytes", 'b', DUCRC_TYPE_BOOL, "show bucket size in exact number of bytes" },
70+
{ &opt_database, "database", 'd', DUCRC_TYPE_STRING, "select database file to use [~/.duc.db]" },
71+
{ &opt_base, "base10", 't', DUCRC_TYPE_BOOL, "show histogram in base 10 bucket spacing, default base2 bucket sizes." },
72+
{ NULL }
73+
};
74+
75+
76+
struct cmd cmd_histogram = {
77+
.name = "histogram",
78+
.descr_short = "Dump histogram of file sizes found.",
79+
.usage = "[options]",
80+
.main = histogram_main,
81+
.options = options,
82+
};
83+
84+
85+
/*
86+
* End
87+
*/
88+

0 commit comments

Comments
 (0)