forked from distributed-system-analysis/smallfile
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdefault.html
1052 lines (1052 loc) · 62 KB
/
default.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=utf-8">
<TITLE>smallfile distributed I/O benchmark | Red Hat Intranet</TITLE>
<META NAME="GENERATOR" CONTENT="LibreOffice 4.1.3.2 (Linux)">
<META NAME="AUTHOR" CONTENT="anonymous">
<META NAME="CREATED" CONTENT="0;0">
<META NAME="CHANGEDBY" CONTENT="Ben England">
<META NAME="CHANGED" CONTENT="20140709;95907050185250">
<META NAME="category-departments" CONTENT="Engineering">
<META NAME="category-keywords" CONTENT="filesystem">
<META NAME="category-offices" CONTENT="Westford, MA">
<META NAME="category-wiki-page-type" CONTENT="Misc">
<META NAME="modified" CONTENT="2012-06-08 11:49:32">
<META NAME="status" CONTENT="1">
<META NAME="type" CONTENT="wiki_page">
<STYLE TYPE="text/css">
<!--
@page { margin: 0.79in }
P { color: #000000; font-family: "Liberation Serif", "Times New Roman", serif; font-size: 11pt; line-height: 138% }
H1 { margin-top: 0.1in; margin-bottom: 0in; border: none; padding: 0in; color: #000000 }
H1.western { font-family: "Liberation Sans", "Lucida Grande", "Helvetica", sans-serif; font-size: 22pt }
H1.cjk { font-family: "Liberation Sans", "Lucida Grande", "Helvetica", sans-serif }
H1.ctl { font-family: "Liberation Sans", "Lucida Grande", "Helvetica", sans-serif }
H2 { margin-top: 0.1in; margin-bottom: 0in; border: none; padding: 0in; color: #000000; font-family: "Liberation Sans", "Lucida Grande", "Helvetica", sans-serif; font-size: 10pt; font-weight: normal; line-height: 130% }
PRE { color: #000000 }
PRE.cjk { font-family: "WenQuanYi Zen Hei", monospace }
PRE.ctl { font-family: "Lohit Devanagari", monospace }
H3 { color: #000000 }
H3.western { font-family: "Albany", sans-serif }
H3.cjk { font-family: "WenQuanYi Zen Hei Sharp" }
H3.ctl { font-family: "Lohit Devanagari" }
P.sdfootnote { margin-left: 0.24in; text-indent: -0.24in; margin-bottom: 0in; font-size: 10pt; line-height: 100% }
A:link { color: #003399; text-decoration: none }
A:visited { color: #000000 }
A.sdfootnoteanc { font-size: 57% }
-->
</STYLE>
</HEAD>
<BODY LANG="en-US" TEXT="#000000" LINK="#003399" VLINK="#000000" DIR="LTR">
<FORM ACTION="/wiki/smallfile-distributed-io-benchmark" METHOD="POST" ENCTYPE="multipart/form-data">
<INPUT TYPE=HIDDEN NAME="form_build_id" VALUE="form-5c8f33b57ec17d3fc622343ca04cc920">
<INPUT TYPE=HIDDEN NAME="form_token" VALUE="79eae801c0b52735704736049baaa583">
<INPUT TYPE=HIDDEN NAME="form_id" VALUE="subscriptions_ui_node_form">
</FORM>
<FORM ACTION="/comment/reply/71422" METHOD="POST">
<INPUT TYPE=HIDDEN NAME="form_build_id" VALUE="form-5d38f8d7377558dd0b9728158589d8f4">
<INPUT TYPE=HIDDEN NAME="form_token" VALUE="d88d096c160d9b2ea61f976935dd016c">
<INPUT TYPE=HIDDEN NAME="form_id" VALUE="comment_form">
</FORM>
<DIV ID="content-wrapper" DIR="LTR" STYLE="background: #dfe1e4">
<P><BR><BR>
</P>
<DIV ID="inner-wrap" DIR="LTR">
<P><BR><BR>
</P>
<DIV ID="center" DIR="LTR">
<P><BR><BR>
</P>
<DIV ID="tabs-wrapper" DIR="LTR">
<H1 CLASS="western" STYLE="margin-top: 0in; margin-bottom: 0.2in">
smallfile distributed I/O benchmark</H1>
</DIV>
<DIV ID="node-71422" DIR="LTR">
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"> This
page describes the <STRONG>smallfile</STRONG> benchmark program.
It is a python-based small-file distributed POSIX workload
generator which can be used to quickly measure performance for a
variety of metadata-intensive workloads across an entire
cluster. It has no dependencies on any specific
filesystem or implementation AFAIK. It is intended to
complement use of iozone benchmark for measuring performance of
large-file workloads, and borrows certain concepts from iozone
and Ric Wheeler's fs_mark. It was developed by
Ben England starting in March 2009, and is now open-source.
Here's an example of the kind of data that can be generated with
it:<IMG SRC="default_files/glusterfs-smallfile-2.jpg" NAME="graphics2" ALIGN=BOTTOM WIDTH=669 HEIGHT=541 BORDER=0></P>
<DIV ID="Table of Contents1" DIR="LTR">
<P><BR><BR>
</P>
<DIV ID="Table of Contents1_Head" DIR="LTR">
<P STYLE="margin-top: 0.17in; line-height: 100%; page-break-after: avoid">
<FONT FACE="Albany, sans-serif"><FONT SIZE=4 STYLE="font-size: 16pt"><B>Table
of Contents</B></FONT></FONT></P>
</DIV>
<P STYLE="margin-left: 0.2in; margin-bottom: 0in; line-height: 100%">
<A HREF="#__RefHeading__123_1677170542"><FONT FACE="Liberation Serif, serif"><FONT SIZE=3>Use
with distributed filesystems 6</FONT></FONT></A></P>
<P STYLE="margin-left: 0.2in; margin-bottom: 0in; line-height: 100%">
<A HREF="#__RefHeading__125_1677170542"><FONT FACE="Liberation Serif, serif"><FONT SIZE=3>Use
with non-networked filesystems 7</FONT></FONT></A></P>
<P STYLE="margin-left: 0.2in; margin-bottom: 0in; line-height: 100%">
<A HREF="#__RefHeading__127_1677170542"><FONT FACE="Liberation Serif, serif"><FONT SIZE=3>Use
of subdirectories 8</FONT></FONT></A></P>
<P STYLE="margin-left: 0.2in; margin-bottom: 0in; line-height: 100%">
<A HREF="#__RefHeading__129_1677170542"><FONT FACE="Liberation Serif, serif"><FONT SIZE=3>Sharing
directories across threads 8</FONT></FONT></A></P>
<P STYLE="margin-left: 0.2in; margin-bottom: 0in; line-height: 100%">
<A HREF="#__RefHeading__131_1677170542"><FONT FACE="Liberation Serif, serif"><FONT SIZE=3>Hashing
files into directory tree 8</FONT></FONT></A></P>
<P STYLE="margin-left: 0.2in; margin-bottom: 0in; line-height: 100%">
<A HREF="#__RefHeading__133_1677170542"><FONT FACE="Liberation Serif, serif"><FONT SIZE=3>Random
file size distribution option 9</FONT></FONT></A></P>
<P STYLE="margin-left: 0.2in; margin-bottom: 0in; line-height: 100%">
<A HREF="#__RefHeading__135_1677170542"><FONT FACE="Liberation Serif, serif"><FONT SIZE=3>Avoiding
caching effects 9</FONT></FONT></A></P>
<P STYLE="margin-left: 0.2in; margin-bottom: 0in; line-height: 100%">
<A HREF="#__RefHeading__137_1677170542"><FONT FACE="Liberation Serif, serif"><FONT SIZE=3>Use
of --pause in multi-thread tests 10</FONT></FONT></A></P>
<P STYLE="margin-left: 0.2in; margin-bottom: 0in; line-height: 100%">
<A HREF="#__RefHeading__139_1677170542"><FONT FACE="Liberation Serif, serif"><FONT SIZE=3>How
to measure asynchronous file copy performance 10</FONT></FONT></A></P>
<P STYLE="margin-left: 0.2in; margin-bottom: 0in; line-height: 100%">
<A HREF="#__RefHeading__141_1677170542"><FONT FACE="Liberation Serif, serif"><FONT SIZE=3>Response
time collection 11</FONT></FONT></A></P>
<P STYLE="margin-left: 0.2in; margin-bottom: 0in; line-height: 100%">
<A HREF="#__RefHeading__143_1677170542"><FONT FACE="Liberation Serif, serif"><FONT SIZE=3>Synchronization
12</FONT></FONT></A></P>
<P STYLE="margin-left: 0.39in; margin-bottom: 0in; line-height: 100%">
<A HREF="#__RefHeading__145_1677170542"><FONT FACE="Liberation Serif, serif"><FONT SIZE=3>How
test parameters are transmitted to worker threads 13</FONT></FONT></A></P>
<P STYLE="margin-left: 0.39in; margin-bottom: 0in; line-height: 100%">
<A HREF="#__RefHeading__147_1677170542"><FONT FACE="Liberation Serif, serif"><FONT SIZE=3>How
remote worker threads are launched 13</FONT></FONT></A></P>
<P STYLE="margin-left: 0.39in; margin-bottom: 0in; line-height: 100%">
<A HREF="#__RefHeading__149_1677170542"><FONT FACE="Liberation Serif, serif"><FONT SIZE=3>How
results are returned to master process 14</FONT></FONT></A></P>
</DIV>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><BR>
</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><BR>
</P>
<H1 CLASS="western" STYLE="margin-top: 0in; margin-bottom: 0.2in">
What it can do</H1>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><BR>
</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">Capabilities
include:</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">- can
manage workload generator processes on multiple hosts</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">-
calculates aggregate throughput for entire set of hosts</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">- can
start and stop all workload generator processes at approximately
the same time (necessary for accurate aggregate throughput
measurement)</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">-
useful for generating "pure" workloads (for example,
just creates, or deletes, or setattr)</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">- easy
to extend to new workload types</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">-
provides CLI for scripted use, but workload generator is separate
from CLI so it is possible to develop a GUI for it</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">-
supports either fixed file size or random exponential file size</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">- can
capture response time data in .csv format, provides utility to
reduce this data to statistics</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">-
supports Windows (different launching method, see below)</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">-
writes unique data pattern in all files, verifies data read
against this pattern</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">- can
write random data pattern that is incompressible</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">- can
measure time required for files to appear in a directory tree
(useful for asynchronous replication tests)</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">- in
multi-host tests, can force all clients to read files written by
different client</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><BR>
</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">Both
python 2.7 and python 3 are supported. Limited support is
available for pypy (JIT compilation).</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><BR>
</P>
<H1 CLASS="western" STYLE="margin-top: 0in; margin-bottom: 0.2in">
Restrictions</H1>
<UL>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">For
a multi-host test, ALL hosts <EM>must provide access to the same
shared directory</EM></P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><EM><SPAN STYLE="font-variant: normal"><SPAN STYLE="font-style: normal">does
not support mixed workloads (mixture of different operation
types)<A CLASS="sdfootnoteanc" NAME="sdfootnote1anc" HREF="#sdfootnote1sym"><SUP>1</SUP></A></SPAN></SPAN></EM></P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">is
not accurate on memory-resident filesystem on single host
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">requires
all hosts to have same DNS domain name (plan to remove this
restriction)
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">does
not support HTTP access (can use ssbench or cosbench for Swift
testing)</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">does
not support mixture of Windows and non-Windows clients at
present</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">For
POSIX-like operating systems, we have only tested with Linux,
but there is a high probability that it would work with Apple OS
and most other UNIX-like operating systems – we just don't
have the time to test with them.</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">We
only have tested on Windows XP and Windows 7 so far, and cannot
guarantee that any other Windows will work, although it is
likely that any Windows after Windows XP should be ok.</P>
</UL>
<H1 CLASS="western" STYLE="margin-top: 0in; margin-bottom: 0.2in">
How to run</H1>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">You
must have password-less ssh access between the test driver node
and the workload generator hosts if you want to run a distributed
(multi-host) test.</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">You
must use a directory visible to all participating hosts to run a
distributed test.</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">To see
what parameters are supported by smallfile_cli.py, do "python
smallfile_cli.py -h". Boolean true/false parameters can be
set to either Y (true) or N (false). Every command consists of a
sequence of parameter name-value pairs with the format –<B>name
value</B> .</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">The
parameters are:</P>
<UL>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--operation</B>
-- operation name, one of the following:
</P>
<UL>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">create
-- create a file and write data to it
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">append
-- open an existing file and append data to it
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">delete
-- delete a file
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">rename
-- rename a file
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">delete_renamed
-- delete a file that had previously been renamed
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">read
-- read an existing file
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">stat
-- just read metadata from an existing file
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">chmod
-- change protection mask for file
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">setxattr
-- set extended attribute values in each file
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">getxattr
- read extended attribute values in each file
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">symlink
-- create a symlink pointing to each file (create must be run
beforehand)
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">mkdir
-- create a subdirectory with 1 file in it
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">rmdir
-- remove a subdirectory and its 1 file
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">readdir
– scan directories only, don't read files or their metadata</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">ls-l
– scan directories and read basic file metadata</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">cleanup
-- delete any pre-existing files from a previous run
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">swift-put
– simulates OpenStack Swift behavior when doing PUT operation</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">swift-get
– simulates OpenStack Swift behavior for each GET operation.
</P>
</UL>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--top
-- </B><SPAN STYLE="font-weight: normal">top-level directory,
all file accesses are done inside this directory tree. If you
wish to use multiple mountpoints,provide a list of top-level
directories separated by comma (no whitespace).</SPAN></P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--host-set</B>
<SPAN STYLE="font-weight: normal">-- comma-separated set of
hosts used for this test, no domain names allowed. Default:
non-distributed test.</SPAN></P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--files</B>
-- how many files should each thread process? </P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--threads</B>
-- how many workload generator threads should each
invocation_cli process create?
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--file-size</B>
-- total amount of data accessed per file. If zero then
no reads or writes are performed.
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">-<B>-file-size-distribution</B>
– only supported value today is <B>exponential. </B><SPAN STYLE="font-weight: normal">Default:
</SPAN>fixed file size.</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--record-size</B>
-- record size in KB, how much data is transferred in a single
read or write system call. If 0 then it is set to the
minimum of the file size and 1-MB record size limit. Default: 0</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--files-per-dir</B>
-- maximum number of files contained in any one directory.
Default: 200</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--dirs-per-dir</B>
-- maximum number of subdirectories contained in any one
directory. Default: 20</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--hash-into-dirs</B>
– if Y then assign next file to a directory using a hash
function, otherwise assign next –files-per-dir files to next
directory. Default: N</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--permute-host-dirs</B>
– if Y then have each host process a different subdirectory
tree than it otherwise would (see below for directory tree
structure). Default: N</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">-<B>-same-dir</B>
-- if Y then threads will share a single directory. Default: N</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--network-sync-dir</B>
– don't need to specify unless you run a multi-host test and
the –<B>top</B> parameter points to a non-shared directory
(see discussion below). Default: <B>network_shared</B>
subdirectory under –<B>top</B> dir.</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--xattr-size</B>
-- size of extended attribute value in bytes (names begin with
'user.smallfile-')
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--xattr-count</B>
-- number of extended attributes per file
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--prefix</B>
-- a string prefix to prepend to files (so they don't collide
with previous runs for example)
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--suffix</B>
-- a string suffix to append to files (so they don't collide
with previous runs for example)
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--incompressible</B>
– (default N) if Y then generate a pure-random file that will
not be compressible (useful for tests where intermediate network
or file copy utility attempts to compress data</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--record-ctime-size</B>
-- default N, if Y then label each created file with an xattr
containing a time of creation and a file size. This will be used
by –<B>await-create</B> operation to compute performance of
asynchonous file replication/copy.</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--finish</B>
-- if Y, thread will complete all requested file operations even
if measurement has finished. Default: Y</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--stonewall</B>
-- if Y then thread will measure throughput as soon as it
detects that another thread has finished. Default: N</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--verify-read
– </B><SPAN STYLE="font-weight: normal">if Y then smallfile
will verify read data is correct. Default: Y</SPAN></P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--response-times</B>
– <SPAN STYLE="font-weight: normal">if Y then save response
time for each file operation in a rsptimes*csv file in the
shared network directory. Record format is </SPAN><FONT FACE="Courier 10 Pitch"><SPAN STYLE="font-weight: normal">operation-type,
start-time, response-time</SPAN></FONT><FONT FACE="Liberation Serif, serif"><SPAN STYLE="font-weight: normal">.
The operation type is included so that you can run different
workloads at the same time and easily merge the data from these
runs. The start-time field is the time that the file operation
started, down to microsecond resolution. The response time field
is the file operation duration down to microsecond resolution.</SPAN></FONT></P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--remote-pgm-dir
</B>– don't need to specify this unless the smallfile software
lives in a different directory on the target hosts and the
test-driver host.
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--pause
</B>-- integer (microseconds) each thread will wait before
starting next file. Default: 0</P>
</UL>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><BR>
</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">So for
example, if you want to run <STRONG>smallfile_cli.py</STRONG> on
1 host with 8 threads each creating 2 GB of 1-MB files, you can
use these options:</P>
<PRE CLASS="western" STYLE="margin-bottom: 0.2in; border: none; padding: 0in"><STRONG> </STRONG><STRONG><FONT SIZE=3># python smallfile_cli.py --operation create --threads 8 --file-size 1024 --files 2048 --top /mnt/gfs/smf</FONT></STRONG></PRE><P STYLE="margin-bottom: 0in; border: none; padding: 0in">
To run a 4-host test doing same thing:</P>
<PRE CLASS="western" STYLE="border: none; padding: 0in"><STRONG> </STRONG><STRONG><FONT SIZE=3># python smallfile_cli.py --operation create --threads 8 --file-size 1024 --files 2048 --top /mnt/gfs/smf \</FONT></STRONG>
<STRONG> </STRONG><STRONG><FONT SIZE=3>--host-set host1,host2,host3,host4</FONT></STRONG> </PRE><P STYLE="margin-bottom: 0in; border: none; padding: 0in">
Errors encountered by worker threads will be saved in
<B>/var/tmp/invoke-N.log</B> where <B>N</B> is the thread number.
After each test, a summary of thread results is displayed, and
overall test results are aggregated for you, in three ways:</P>
<UL>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><I>files/sec</I>
– only metric relevant to all tests</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><I>IOPS</I>
– application I/O operations per second, rate at which
benchmark performed reads/writes</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><I>MB/s</I>
– megabytes/sec, rate at which application transferred data</P>
</UL>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">Users
should never need to run <STRONG>smallfile.py</STRONG> -- this is
the python class which implements the workload generator.
Developers can run this module to invoke its unit test however:</P>
<PRE CLASS="western" STYLE="margin-bottom: 0.2in; border: none; padding: 0in"><STRONG> </STRONG><STRONG><FONT SIZE=3># python smallfile.py </FONT></STRONG></PRE><P STYLE="margin-bottom: 0in; border: none; padding: 0in">
To run just one unit test module run</P>
<PRE CLASS="western" STYLE="margin-bottom: 0.2in; border: none; padding: 0in"><STRONG> </STRONG><STRONG><FONT SIZE=3># python -m unittest smallfile.Test.test_c3_Symlink</FONT></STRONG></PRE><P STYLE="margin-bottom: 0in; line-height: 100%">
<BR>
</P>
<P STYLE="margin-bottom: 0in; line-height: 100%"><BR>
</P>
<H2 STYLE="margin-top: 0in; margin-bottom: 0.2in; font-weight: normal"><A NAME="__RefHeading__123_1677170542"></A><A NAME="__RefHeading__236_244684570"></A>
<FONT SIZE=4 STYLE="font-size: 16pt">Use with distributed
filesystems</FONT></H2>
<P STYLE="margin-bottom: 0.2in">With distributed filesystems, it
is necessary to have multiple hosts simultaneously applying
workload to measure the performance of a distributed filesystem.
The –host-set parameter lets you specify a comma-separated list
of hosts to use.
</P>
<P STYLE="margin-bottom: 0.2in">For any distributed filesystem
test, there must be a single directory which is shared across all
hosts, both test driver and worker hosts, that can be used to
pass test parameters, pass back results, and coordinate activity
across the hosts. This is referred to below as the “shared
directory” in what follows. By default this is the
<B>network_shared/</B> subdirectory of the –top directory, but
you can override this default by specifying the –<B>network-sync-dir</B>
directory parameter, see the next section for why this is useful.</P>
<P STYLE="margin-bottom: 0.2in">Some distributed filesystems,
such as NFS and Gluster, have relaxed, eventual-consistency
caching of directories; this will cause problems for the shared
directory. To work around this problem, you can use a separate
NFS mountpoint exported from a Linux NFS server, mounted with the
option <FONT FACE="Courier 10 Pitch">actimeo=1</FONT> (to limit
duration of time NFS will cache directory entries and metadata).
You then reference this mountpoint using the –<B>network-sync-dir</B>
option of <B>smallfile</B>. For example:</P>
<P STYLE="margin-bottom: 0.2in"><BR><BR>
</P>
<P STYLE="margin-left: 0.49in; margin-bottom: 0in"><FONT SIZE=3>#
<B>mount -t nfs -o actimeo=1</B>
<I>your-linux-server</I>:/<I>your/nfs/export</I> <B>/mnt/nfs</B></FONT></P>
<P STYLE="margin-left: 0.49in; margin-bottom: 0in"><FONT SIZE=3>#
<B>./smallfile_cli.py –top</B> <I>/your/distributed/filesystem</I>
–<B>network-sync-dir /mnt/nfs/smf-shared </B></FONT>
</P>
<H2 STYLE="margin-top: 0in; margin-bottom: 0.2in; border: none; padding: 0in">
<BR><BR>
</H2>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><SPAN STYLE="font-weight: normal">For
non-Windows tests, the user must set up password-less ssh between
the test driver and the host. If security is an issue, a non-root
username can be used throughout, since smallfile requires no
special privileges. Edit the </SPAN><B>$HOME/.ssh/authorized_keys</B>
<SPAN STYLE="font-weight: normal">file to contain the public key
of the account on the test driver. The test driver will bypass
the .ssh/known_hosts file by using </SPAN><B>-o
StrictHostKeyChecking=no</B> <SPAN STYLE="font-weight: normal">option
in the </SPAN><B>ssh</B> <SPAN STYLE="font-weight: normal">command.</SPAN></P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><BR>
</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><SPAN STYLE="font-weight: normal">For
Windows tests, each worker host must be running the</SPAN>
<B>launch_smf_host.py</B> <SPAN STYLE="font-weight: normal">program
that polls the shared network directory for a file that contains
the command to launch </SPAN><B>smallfile_remote.py</B> <SPAN STYLE="font-weight: normal">in
the same way that would happen with ssh on non-Windows tests. The
command-line parameters are:</SPAN></P>
<UL>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--shared
shared-directory</B> – <SPAN STYLE="font-weight: normal">this
must point at the directory shared by all smallfile hosts.
Normally this is the </SPAN><B>network_shared</B> <SPAN STYLE="font-weight: normal">subdirectory
of the </SPAN>–<B>top </B><SPAN STYLE="font-weight: normal">directory
but it could be the </SPAN>–<B>network-sync-dir</B> <SPAN STYLE="font-weight: normal">directory
if that is specified.</SPAN></P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>--as-host
host-name </B>– <SPAN STYLE="font-weight: normal">specify what
hostname identifier will be used for this host. Why not just ask
the host what name to use? Hosts can have multiple network
interfaces, and therefore can have multiple host names. in some
cases we want to use IP addresses instead.</SPAN></P>
</UL>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in; font-weight: normal">
An example of how to start a Windows test with this method
follows, using actual DOS prompt syntax. Something like the first
command must be run on every host participating in the test,
before the test actually is started.</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><BR>
</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><FONT FACE="Courier 10 Pitch"><FONT SIZE=2 STYLE="font-size: 9pt"><SPAN STYLE="font-weight: normal">>
start python launch_smf_host.py –shared <A HREF="/z:/smf">z:\smf</A>\network_shared
–as-host gprfc023</SPAN></FONT></FONT></P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><FONT FACE="Courier 10 Pitch"><FONT SIZE=2 STYLE="font-size: 9pt"><SPAN STYLE="font-weight: normal">>
python smallfile_cli.py –top <A HREF="/z:/smf">z:\smf</A>
–host-set gprfc023</SPAN></FONT></FONT></P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><BR>
</P>
<H2 STYLE="margin-top: 0in; margin-bottom: 0.2in; font-weight: normal"><A NAME="__RefHeading__125_1677170542"></A><A NAME="__RefHeading__238_244684570"></A>
<FONT SIZE=4 STYLE="font-size: 16pt">Use with non-networked
filesystems</FONT></H2>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><BR>
</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">There
are cases where you want to use a distributed filesystem test on
host-local filesystems. One such example is virtualization, where
the “local” filesystem is really layered on a virtual disk
image which may be stored in a network filesystem. The benchmark
needs to share certain files across hosts to return results and
synchronize threads. In such a case, you specify the
–<B>network-sync-dir</B> <I>directory-pathname</I><SPAN STYLE="font-variant: normal">
</SPAN><SPAN STYLE="font-variant: normal"><SPAN STYLE="font-style: normal">parameter
to have the benchmark use a directory in some shared filesystem
external to the test directory (specified with </SPAN></SPAN><SPAN STYLE="font-variant: normal">–</SPAN><SPAN STYLE="font-variant: normal"><SPAN STYLE="font-style: normal"><B>top</B></SPAN></SPAN><SPAN STYLE="font-variant: normal">
</SPAN><SPAN STYLE="font-variant: normal"><SPAN STYLE="font-style: normal">parameter).
By default, if this parameter is not specified then the shared
directory will be the subdirectory </SPAN></SPAN><SPAN STYLE="font-variant: normal"><SPAN STYLE="font-style: normal"><B>network-dir</B></SPAN></SPAN><SPAN STYLE="font-variant: normal">
</SPAN><SPAN STYLE="font-variant: normal"><SPAN STYLE="font-style: normal">underneath
the directory specified with the </SPAN></SPAN><SPAN STYLE="font-variant: normal">–</SPAN><SPAN STYLE="font-variant: normal"><SPAN STYLE="font-style: normal"><B>top</B></SPAN></SPAN><SPAN STYLE="font-variant: normal">
</SPAN><SPAN STYLE="font-variant: normal"><SPAN STYLE="font-style: normal">parameter.</SPAN></SPAN></P>
<H2 STYLE="margin-top: 0in; margin-bottom: 0.2in; border: none; padding: 0in">
<BR><BR>
</H2>
<H2 STYLE="margin-top: 0in; margin-bottom: 0.2in"><A NAME="__RefHeading__127_1677170542"></A><A NAME="__RefHeading__112_244684570"></A>
<FONT SIZE=4 STYLE="font-size: 16pt">Use of subdirectories</FONT></H2>
<P>Before a test even starts, the smallfile benchmark ensures
that the directories needed by that test already exist (there is
a specific operation type for testing performance of subdirectory
creation and deletion). If the top directory (specified by –top
parameter) is <B>D</B>, then the top per-thread directory is
<B>D</B>/host/d<B>TT</B> where <B>TT</B> is a 2-digit thread
number and “host” is the hostname. If the test is not a
distributed test, then it's just whatever host the benchmark
command was issued on, otherwise it is each of the hosts
specified by the –host-set parameter. The first F files (where
F is the value of the –files-per-dir) parameter are placed in
this top per-thread directory. If the test uses more than F
files/thread, then at least one subdirectory from the first level
of subdirectories must be used; these subdirectories have the
path T/host/dTT/dNNN where NNN is the subdirectory number.
Suppose the value of the parameter –subdirs-per-dir is D. Then
there are at most D subdirectories of the top per-thread
directory. If the test requires more than D(F+1) files per
thread, then a second level of subdirectories will have to be
created, with pathnames like T/host/dTT/dNNN/dMMM . This process
of adding subdirectories continues in this fashion until there
are sufficient subdirectories to hold all the files. The purpose
of this approach is to simulate a mixture of directories and
files, and to not require the user to specify how many levels of
directories are required.</P>
<P>The use of multiple mountpoints is supported. This features is
useful for testing NFS, etc.</P>
<P>Note that the test harness does not have to scan the
directories to figure out which files to read or write – it
simply generates the filename sequence itself. If you want to
test directory scanning speed, use <B>readdir</B> or <B>ls-l</B>
operations.
</P>
<H2 STYLE="margin-top: 0in; margin-bottom: 0.2in"><A NAME="__RefHeading__129_1677170542"></A><A NAME="__RefHeading__114_244684570"></A>
<FONT SIZE=4 STYLE="font-size: 16pt">Sharing directories across
threads</FONT></H2>
<P>Some applications require that many threads, possibly spread
across many host machines, need to share a set of directories.
The <B>--same-dir</B> parameter makes it possible for the
benchmark to test this situation. By default this parameter is
set to N, which means each thread has its own non-overlapping
directory tree. This setting provides the best performance and
scalability. However, if the user sets this parameter to Y, then
the top per-thread directory for all threads will be <B>T </B>instead
of <B>T/host/dTT</B> as described in preceding section.</P>
<H2 STYLE="margin-top: 0in; margin-bottom: 0.2in"><A NAME="__RefHeading__131_1677170542"></A><A NAME="__RefHeading__116_244684570"></A>
<FONT SIZE=4 STYLE="font-size: 16pt">Hashing files into directory
tree</FONT></H2>
<P><FONT SIZE=2 STYLE="font-size: 11pt">For applications which
create very large numbers of small files (millions for example),
it is impossible or at the very least impractical to place them
all in the same directory, whether or not the filesystem supports
so many files in a single directory. There are two ways which
applications can use to solve this problem:</FONT></P>
<UL>
<LI><P><FONT SIZE=2 STYLE="font-size: 11pt">insert files into 1
directory at a time – can create I/O and lock contention for
the directory metadata</FONT></P>
<LI><P><FONT SIZE=2 STYLE="font-size: 11pt">insert files into
many directories at the same time – relieves I/O and lock
contention for directory metadata, but increases the amount of
metadata caching needed to avoid cache misses</FONT></P>
</UL>
<P><FONT SIZE=2 STYLE="font-size: 11pt">The –<B>hash-into-dirs</B>
parameter is intended to enable simulation of this latter mode of
operation. By default, the value of this parameter is N, and in
this case a smallfile thread will sequentially access directories
one at a time. In other words, the first D (where D = value of
–<B>files-per-dir </B>parameter) files will be assigned to the
top per-thread directory, then the next D files will be assigned
to the next per-thread directory, and so on. However, if the
–<B>hash-into-dirs</B> parameter is set to Y, then the number
of the file being accessed by the thread will be hashed into the
set of directories that are being used by this thread. </FONT>
</P>
<H2 STYLE="margin-top: 0in; margin-bottom: 0.2in"><A NAME="__RefHeading__133_1677170542"></A><A NAME="__RefHeading__118_244684570"></A>
<FONT SIZE=4 STYLE="font-size: 16pt">Random file size
distribution option</FONT></H2>
<P STYLE="margin-bottom: 0.2in"><FONT SIZE=2 STYLE="font-size: 11pt">In
real life, users don't create files that all have the same size.
Typically there is a file size distribution with a majority of
small files and a lesser number of larger files. This benchmark
supports use of the random exponential distribution to
approximate that behavior. If you specify </FONT>
</P>
<P STYLE="margin-left: 0.79in; margin-bottom: 0.2in">–<FONT FACE="Courier 10 Pitch"><FONT SIZE=2 STYLE="font-size: 11pt">file-size-distribution
</FONT></FONT><FONT FACE="Courier 10 Pitch"><FONT SIZE=2 STYLE="font-size: 11pt"><B>exponential</B></FONT></FONT>
–<FONT FACE="Courier 10 Pitch"><FONT SIZE=2 STYLE="font-size: 11pt">file-size
</FONT></FONT><FONT FACE="Courier 10 Pitch"><FONT SIZE=2 STYLE="font-size: 11pt"><B>S</B></FONT></FONT>
</P>
<P STYLE="margin-bottom: 0.2in"><FONT SIZE=2 STYLE="font-size: 11pt">The
meaning of the –<B>file-size</B> parameter changes to the
<I>maximum</I> file size (<B>S</B> KB), and the mean file size
becomes <B>S</B>/8. All file sizes are rounded down to the
nearest kilobyte boundary, and the smallest allowed file size is
1 KB. When this option is used, the smallfile benchmark saves the
seed for each thread's random number generator object in a <B>.seed</B>
file stored in the <B>TMPDIR</B> directory (typically <B>/var/tmp</B>).
This allows the file reader to recreate the sequence of random
numbers used by the file writer to generate file sizes, so that
the reader knows exactly how big each file should be without
asking the file system for this information. The append operation
works in the same way. All other operations are metadata
operations and do not require that the file size be known in
advance.</FONT></P>
<P STYLE="margin-bottom: 0.2in"><BR><BR>
</P>
<H2 STYLE="margin-top: 0in; margin-bottom: 0.2in"><A NAME="__RefHeading__135_1677170542"></A><A NAME="__RefHeading__120_244684570"></A>
<FONT SIZE=4 STYLE="font-size: 16pt">Avoiding caching effects</FONT></H2>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><A NAME="Avoiding_caching_effects"></A>
THere are two types of caching effects that we wish to avoid,
data caching and metadata caching. If the average object
size is sufficiently large, we need only be concerned about data
caching effects. In order to avoid data caching effects
during a large-object read test, the Linux buffer cache on all
servers must be cleared. In part this is done using the command:
"echo 1 > /proc/sys/vm/drop_caches" on all hosts.
However, gluster has its own internal caches. To
evict all prior data from the cache, the simplest method is to
just use iozone to write a large amount of data into some files
in the gluster filesystem, then delete them. For example,
if the gluster 3.2 server caches 1 GB of data then the amount of
data written should be roughly 2 GB/server and the number of
files used should be roughly 8 times the number of servers.
Use of many separate files ensures that this cache eviction data
is spread across all servers approximately equally.<STRONG> </STRONG></P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><BR>
</P>
<H2 STYLE="margin-top: 0in; margin-bottom: 0.2in; border: none; padding: 0in"><A NAME="__RefHeading__137_1677170542"></A><A NAME="__RefHeading__240_244684570"></A>
<FONT SIZE=4 STYLE="font-size: 16pt">Use of --pause in
multi-thread tests</FONT></H2>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">In some
filesystems, the first thread that starts running will be
operating at memory speed (example: NFS writes) and can easily
finish before other threads have a chance to get started.
This immediately invalidates the test. To make this less
likely, it is possible to insert a per-file delay into each
thread with the --pause option so that the other threads have a
chance to participate in the test during the measurement
interval. It is preferable to run a longer test
instead, because in some cases you might otherwise restrict
throughput unintentionally. But if you know that your
throughput upper bound is X files/sec and you have N threads
running, then your per-thread throughput should be no more than
N/X, so a reasonable pause would be something like 3X/N
microseconds. For example, if you know that you
cannot do better than 100000 files/sec and you have 20 threads
running,try a 60/100000 = 600 microsecond pause. Verify
that this isn't affecting throughput by reducing the pause and
running a longer test.</P>
<H2 STYLE="margin-bottom: 0.2in; font-weight: normal"><A NAME="__RefHeading__139_1677170542"></A>
<FONT SIZE=4>How to measure asynchronous file copy performance</FONT></H2>
<P STYLE="margin-bottom: 0in; font-weight: normal; line-height: 100%">
<FONT FACE="Liberation Serif, serif"><FONT SIZE=3>When we want to
measure performance of an asynchronous file copy (example:
Gluster geo-replication), we can use smallfile to create the
original directory tree, but then we can use the new await-create
operation type to wait for files to appear at the file copy
destination. To do this, we need to specify a separate network
sync directory. So for example, to create the original directory
tree, we could use a command like:</FONT></FONT></P>
<P STYLE="margin-bottom: 0in; line-height: 100%"><BR>
</P>
<P STYLE="margin-bottom: 0in; line-height: 100%"><FONT FACE="Courier 10 Pitch"><FONT SIZE=3><SPAN STYLE="font-weight: normal">./smallfile_cli.py
--top /mnt/glusterfs-master/smf \</SPAN></FONT></FONT></P>
<P STYLE="margin-bottom: 0in; line-height: 100%">–<FONT FACE="Courier 10 Pitch"><FONT SIZE=3><SPAN STYLE="font-weight: normal">threads
16 --files 2000 --file-size 1024 \</SPAN></FONT></FONT></P>
<P STYLE="margin-bottom: 0in; font-weight: normal; line-height: 100%">
<FONT FACE="Courier 10 Pitch"><FONT SIZE=3>--operation create
–incompressible Y --record-ctime-size Y --response-times Y</FONT></FONT></P>
<P STYLE="margin-bottom: 0in; line-height: 100%"><BR>
</P>
<P STYLE="margin-bottom: 0in; line-height: 100%"><FONT FACE="Liberation Serif, serif"><FONT SIZE=3><SPAN STYLE="font-weight: normal">Suppose
that this mountpoint is connected to a Gluster “master”
volume which is being geo-replicated to a “slave” volume in a
remote site asynchronously. We can measure the performance of
this process using a command like this, where
/mnt/glusterfs-slave is a read-only mountpoint accessing the
slave volume.</SPAN></FONT></FONT></P>
<P STYLE="margin-bottom: 0in; line-height: 100%"><BR>
</P>
<P STYLE="margin-bottom: 0in; font-weight: normal; line-height: 100%">
<FONT FACE="Courier 10 Pitch"><FONT SIZE=3>./smallfile_cli.py
--top /mnt/glusterfs-slave/smf \</FONT></FONT></P>
<P STYLE="margin-bottom: 0in; font-weight: normal; line-height: 100%">
<FONT FACE="Courier 10 Pitch"><FONT SIZE=3>--threads 16 --files
2000 --file-size 1024 \</FONT></FONT></P>
<P STYLE="margin-bottom: 0in; font-weight: normal; line-height: 100%">
<FONT FACE="Courier 10 Pitch"><FONT SIZE=3>--operation
await-create –incompressible Y --response-times Y \</FONT></FONT></P>
<P STYLE="margin-bottom: 0in; font-weight: normal; line-height: 100%">
<FONT FACE="Courier 10 Pitch"><FONT SIZE=3>--network-sync-dir
/tmp/other</FONT></FONT></P>
<P STYLE="margin-bottom: 0in; line-height: 100%"><BR>
</P>
<P STYLE="margin-bottom: 0in; font-weight: normal; line-height: 100%">
<FONT FACE="Liberation Serif, serif"><FONT SIZE=3>Requirements:</FONT></FONT></P>
<UL>
<LI><P STYLE="margin-bottom: 0in; font-weight: normal; line-height: 100%">
<FONT FACE="Liberation Serif, serif"><FONT SIZE=3>The parameters
controlling file sizes, directory tree, and number of files must
match in the two commands.</FONT></FONT></P>
<LI><P STYLE="margin-bottom: 0in; font-weight: normal; line-height: 100%">
<FONT FACE="Liberation Serif, serif"><FONT SIZE=3>The
–incompressible option must be set if you want to avoid
situation where async copy software can compress data to exceed
network bandwidth.</FONT></FONT></P>
<LI><P STYLE="margin-bottom: 0in; line-height: 100%"><FONT FACE="Liberation Serif, serif"><FONT SIZE=3><SPAN STYLE="font-weight: normal">The
first command must use the </SPAN>–<B>record-ctime-size Y</B>
<SPAN STYLE="font-weight: normal">option so that the
await-create operation knows when the original file was created
and how big it was. </SPAN></FONT></FONT>
</P>
</UL>
<P STYLE="margin-bottom: 0in; line-height: 100%"><BR>
</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in; line-height: 100%">
<FONT FACE="Liberation Serif, serif"><FONT SIZE=3><SPAN STYLE="font-weight: normal">How
does this work? The first command records information in a
user-defined xattr for each file so that the second command, the
</SPAN><B>await-create</B> <SPAN STYLE="font-weight: normal">operation
can calculate time required to copy the file, which is recorded
as a “response time”, and so that it knows that the entire
file reached the destination.</SPAN></FONT></FONT></P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in; line-height: 100%">
<BR>
</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in; line-height: 100%">
<FONT FACE="Liberation Serif, serif"><FONT SIZE=3><SPAN STYLE="font-weight: normal">WARNING:
the –verify-read option is not supported with –await-create
operation, so smallfile is not yet able to verify that the
contents of the files are correct, only that the file size is
correct.</SPAN></FONT></FONT></P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in; line-height: 100%">
<BR>
</P>
<H1 CLASS="western" STYLE="margin-top: 0in; margin-bottom: 0.2in">
Results</H1>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">All
tests display a "files/sec" result. If the test
performs reads or writes, then a "MB/sec" data transfer
rate and an "IOPS" result (i.e. total read or write
calls/sec) are also displayed. Each thread participating in
the test keeps track of total number of files and I/O requests
that it processes during the test measurement interval.
These results are rolled up per host if it is a single-host
test. For a multi-host test, the per-thread results for
each host are saved in a file within the --top directory, and the
test master then reads in all of the saved results from its
slaves to compute the aggregate result across all client hosts.
The percentage of requested files which were processed in the
measurement interval is also displayed, and if the number is
lower than a threshold (default 70%) then an error is raised.</P>
<H2 STYLE="margin-top: 0in; margin-bottom: 0.2in; border: none; padding: 0in"><A NAME="__RefHeading__141_1677170542"></A><A NAME="__RefHeading__122_244684570"></A>
Response time collection</H2>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">response
times for operations on each file are saved by thread in .csv
form. For example, you can turn these into an X-Y
scatterplot so that you can see how response time varies over
time, to use:<BR> </P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><STRONG><FONT SIZE=4>#
python smallfile_cli.py --response-times Y</FONT></STRONG><BR><STRONG><FONT SIZE=4>#
ls -ltr /var/tmp/rsptimes*.csv</FONT></STRONG></P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">You
should see 1 .csv file per thread. These files should be in
a format<BR>that can be loaded into any spreadsheet application,
such as Excel, and<BR>graphed. An x-y scatterplot can be
useful to see changes over time in response time.</P>
<H1 CLASS="western" STYLE="margin-top: 0in; margin-bottom: 0.2in">
Comparable Benchmarks</H1>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">There
are many existing performance test benchmarks. I have tried just
about all the ones that I've heard of. Here are the ones I have
looked at, I'm sure there are many more that I failed to include
here.</P>
<UL>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>Bonnie++</B>
-- works well for a single host, but you cannot generate load
from multiple hosts because the benchmark will not synchronize
its activities, so different phases of the benchmark will be
running at the same time, whether you want them to or not.
</P>
</UL>
<UL>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>iozone</B>
-- this is a great tool for large-file testing, but it can only
do 1 file/thread in its current form.
</P>
</UL>
<UL>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>postmark</B>
-- works fine for a single client, not as useful for
multi-client tests
</P>
</UL>
<UL>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>grinder</B>
-- has not to date been useful for filesystem testing, though it
works well for web services testing.
</P>
</UL>
<UL>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>JMeter</B>
– has been used successfully by others in the past.
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>fs_mark</B>
-- Ric Wheeler's filesystem benchmark, is very good at creating
files
</P>
</UL>
<UL>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>fio</B>
-- Linux test tool -- broader coverage of Linux system calls
particularly around async. and direct I/O</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>diskperf</B>
– open-source tool that generates limited small-file workloads
for a single host.</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>dbench</B>
– developed by samba team</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in"><B>SPECsfs</B>
– not open-source, but <B>netmist</B> workload generator is
another distributed workload generator (configured similarly to
iozone) but with a wider range of workloads.
</P>
</UL>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"> </P>
<H1 CLASS="western" STYLE="margin-top: 0in; margin-bottom: 0.2in">
Design principles</H1>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">A
cluster-aware test tool ideally should:</P>
<UL>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">start
threads on all hosts at same time
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">stop
measurement of throughput for all threads at the same time
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">be
easy to use in all file system environments
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">be
highly portable and be trivial to install
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">have
very low overhead
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">not
require threads to synchronize (be embarrassingly parallel)
</P>
</UL>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">Although
there may be some useful tests that involve thread
synchronization or contention, but we don't want the tool to
<I>force</I> thread synchronization or contention for resources.
In order to run prolonged small-file tests (which is a
requirement for scalability to very large clusters),, each thread
has to be able to use more than one directory. Since
some filesystems perform very differently as the files/directory
ratio increases, and most applications and users do not rely on
having huge file/directory ratios, this is also important for
testing the filesystem with a realistic use case. This
benchmark does something similar to Ric Wheeler's fs_mark
benchmark with multiple directory levels. This
benchmark imposes no hard limit on how many directories can be
used and how deep the directory tree can go. Instead, it
creates directories according to these constraints:</P>
<OL>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">files
(and directories) are placed as close to the root of the
directory hierarchy as possible
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">no
directory contains more than the number of files specified in
the --files-per-dir test parameter
</P>
<LI><P STYLE="margin-bottom: 0in; border: none; padding: 0in">no
directory contains more than number of subdirectories specified
in the --dirs-per-dir test parameter
</P>
</OL>
<H2 STYLE="margin-top: 0in; margin-bottom: 0.2in"><BR><BR>
</H2>
<H2 STYLE="margin-top: 0in; margin-bottom: 0.2in"><A NAME="__RefHeading__143_1677170542"></A><A NAME="__RefHeading__124_244684570"></A><A NAME="Synchronization"></A>
<FONT SIZE=4 STYLE="font-size: 16pt">Synchronization</FONT></H2>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">A
single directory is used to synchronize the threads and hosts.
This may seem problematic, but we assume here that the file
system is not very busy when the test is run (otherwise why would
you run a load test on it?). So if a file is created by one
thread, it will quickly be visible on the others, as long as the
filesystem is not heavily loaded.
</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><BR>
</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">If it's
a single-host test, any directory is sharable amongst threads,
but in a multi-host test only a directory shared by all
participating hosts can be used. If the –<B>top</B> test
directory is in a network-accessible file system (could be NFS or
Gluster for example), then the synchronization directory is by
default in the network_shared subdirectory by default and need
not be specified. If the –<B>top</B> directory is in a
host-local filesystem, then the –<B>network-sync-dir</B> option
must be used to specify the synchronization directory. When a
network directory is used, change propagation between hosts
cannot be assumed to occur in under two seconds.
</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><BR>
</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">We use
the concept of a "starting gate" -- each thread does
all preparation for test, then waits for a special file, the
"starting gate", to appear in the shared area. When a
thread arrives at the starting gate, it announces its arrival by
creating a filename with the host and thread ID embedded in it.
When all threads have arrived, the controlling process will see
all the expected "thread ready" files, and will then
create the <B>starting gate</B> file. When the starting gate is
seen, the thread pauses for a couple of seconds, then commences
generating workload. This initial pause reduces time required for
all threads to see the starting gate, thereby minimizing chance
of some threads being unable to start on time. Synchronous thread
startup reduces the "warmup time" of the system
significantly.</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><BR>
</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">We also
need a checkered flag (borrowing from car racing metaphor). Once
test starts, each thread looks for a <B>stonewall</B> file in the
synchronization directory. If this file exists, then the thread
stops measuring throughput at this time (but can (and does by
default) optionally continue to perform requested number of
operations). Consequently throughput measurements for each thread
may be added to obtain an accurate aggregate throughput number.
This practice is sometimes called "stonewalling" in the
performance testing world.</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><BR>
</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">Synchronization
operations <I>in theory</I> do not require the worker threads to
read the synchronization directory. For distributed tests, the
test driver host has to check whether the various per-host
synchronization files exist, but this does not require a readdir
operation. The test driver does this check in such a way that the
number of file lookups is only slightly more than the number of
hosts, and this does not require reading the entire directory,
only doing a set of lookup operations on individual files, so
it's <I>O(n)</I> scalable as well.</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><BR>
</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">The bad
news is that some filesystems do not synchronize directories
quickly without an explicit readdir() operation, so we are at
present doing os.listdir() as a workaround -- this may have to be
revisited for very large tests.</P>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in"><BR>
</P>
<H3 CLASS="western" STYLE="font-weight: normal"><A NAME="__RefHeading__145_1677170542"></A><A NAME="__RefHeading__126_244684570"></A>
How test parameters are transmitted to worker threads</H3>
<P STYLE="margin-bottom: 0in; border: none; padding: 0in">The
results of the command line parse are saved in a <B>smf_test_params</B>
object and stored in a python pickle file, which is a
representation independent of CPU architecture or operating
system. The file is placed in the shared network directory.
Remote worker processes are invoked via the <B>smallfile_remote.py</B>