-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathindex.html
More file actions
1922 lines (1512 loc) · 229 KB
/
index.html
File metadata and controls
1922 lines (1512 loc) · 229 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge" >
<title>birdben</title>
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1">
<meta property="og:type" content="website">
<meta property="og:title" content="birdben">
<meta property="og:url" content="https://github.com/birdben/index.html">
<meta property="og:site_name" content="birdben">
<meta name="twitter:card" content="summary">
<meta name="twitter:title" content="birdben">
<link rel="alternative" href="/atom.xml" title="birdben" type="application/atom+xml">
<link rel="icon" href="/images/favicon.ico">
<link rel="stylesheet" href="/css/style.css">
<script type="text/javascript">
var cnzz_protocol = (("https:" == document.location.protocol) ? " https://" : " http://");document.write(unescape("%3Cspan id='cnzz_stat_icon_1260188951'%3E%3C/span%3E%3Cscript src='" + cnzz_protocol + "s4.cnzz.com/z_stat.php%3Fid%3D1260188951' type='text/javascript'%3E%3C/script%3E"));
</script>
</head>
<body>
<div id="container">
<div class="left-col">
<div class="overlay"></div>
<div class="intrude-less">
<header id="header" class="inner">
<a href="/" class="profilepic">
<img lazy-src="/images/logo.png" class="js-avatar">
</a>
<hgroup>
<h1 class="header-author"><a href="/">birdben</a></h1>
</hgroup>
<div class="switch-btn">
<div class="icon">
<div class="icon-ctn">
<div class="icon-wrap icon-house" data-idx="0">
<div class="birdhouse"></div>
<div class="birdhouse_holes"></div>
</div>
<div class="icon-wrap icon-ribbon hide" data-idx="1">
<div class="ribbon"></div>
</div>
<div class="icon-wrap icon-link hide" data-idx="2">
<div class="loopback_l"></div>
<div class="loopback_r"></div>
</div>
</div>
</div>
<div class="tips-box hide">
<div class="tips-arrow"></div>
<ul class="tips-inner">
<li>Menu</li>
<li>Tags</li>
<li>Links</li>
</ul>
</div>
</div>
<div class="switch-area">
<div class="switch-wrap">
<section class="switch-part switch-part1">
<nav class="header-menu">
<ul>
<li><a href="/">主页</a></li>
<li><a href="/archives">所有文章</a></li>
</ul>
</nav>
<nav class="header-nav">
<div class="social">
<a class="github" target="_blank" href="https://github.com/birdben" title="github">github</a>
<a class="weibo" target="_blank" href="#" title="weibo">weibo</a>
</div>
</nav>
</section>
<section class="switch-part switch-part2">
<div class="widget tagcloud" id="js-tagcloud">
<a href="/tags/AWK/" style="font-size: 10.83px;">AWK</a> <a href="/tags/Akka/" style="font-size: 10.83px;">Akka</a> <a href="/tags/Dockerfile/" style="font-size: 20px;">Dockerfile</a> <a href="/tags/Docker命令/" style="font-size: 19.17px;">Docker命令</a> <a href="/tags/Docker环境/" style="font-size: 15px;">Docker环境</a> <a href="/tags/ELK/" style="font-size: 16.67px;">ELK</a> <a href="/tags/ElasticSearch/" style="font-size: 10.83px;">ElasticSearch</a> <a href="/tags/Elasticsearch/" style="font-size: 12.5px;">Elasticsearch</a> <a href="/tags/Flume/" style="font-size: 17.5px;">Flume</a> <a href="/tags/Git命令/" style="font-size: 13.33px;">Git命令</a> <a href="/tags/Go/" style="font-size: 14.17px;">Go</a> <a href="/tags/HBase/" style="font-size: 10px;">HBase</a> <a href="/tags/HDFS/" style="font-size: 18.33px;">HDFS</a> <a href="/tags/Hadoop/" style="font-size: 10px;">Hadoop</a> <a href="/tags/Hadoop原理架构体系/" style="font-size: 13.33px;">Hadoop原理架构体系</a> <a href="/tags/Hive/" style="font-size: 16.67px;">Hive</a> <a href="/tags/JVM/" style="font-size: 11.67px;">JVM</a> <a href="/tags/Java-Web,Socket,Python/" style="font-size: 10px;">Java Web,Socket,Python</a> <a href="/tags/Jenkins环境/" style="font-size: 10px;">Jenkins环境</a> <a href="/tags/Kafka/" style="font-size: 15.83px;">Kafka</a> <a href="/tags/Kibana/" style="font-size: 14.17px;">Kibana</a> <a href="/tags/Linux命令/" style="font-size: 12.5px;">Linux命令</a> <a href="/tags/Logstash/" style="font-size: 15.83px;">Logstash</a> <a href="/tags/Mac/" style="font-size: 10px;">Mac</a> <a href="/tags/MapReduce/" style="font-size: 11.67px;">MapReduce</a> <a href="/tags/Maven配置/" style="font-size: 11.67px;">Maven配置</a> <a href="/tags/MongoDB/" style="font-size: 11.67px;">MongoDB</a> <a href="/tags/MySQL/" style="font-size: 10px;">MySQL</a> <a href="/tags/Nginx/" style="font-size: 10px;">Nginx</a> <a href="/tags/Redis/" style="font-size: 10px;">Redis</a> <a href="/tags/Shadowsocks/" style="font-size: 10px;">Shadowsocks</a> <a href="/tags/Shell/" style="font-size: 16.67px;">Shell</a> <a href="/tags/Spring/" style="font-size: 10.83px;">Spring</a> <a href="/tags/Storm/" style="font-size: 12.5px;">Storm</a> <a href="/tags/Zookeeper/" style="font-size: 12.5px;">Zookeeper</a> <a href="/tags/其他/" style="font-size: 10px;">其他</a>
</div>
</section>
<section class="switch-part switch-part3">
<div id="js-friends">
<a target="_blank" class="main-nav-link switch-friends-link" href="http://blog.csdn.net/birdben">我的CSDN的博客</a>
</div>
</section>
</div>
</div>
</header>
</div>
</div>
<div class="mid-col">
<nav id="mobile-nav">
<div class="overlay">
<div class="slider-trigger"></div>
<h1 class="header-author js-mobile-header hide">birdben</h1>
</div>
<div class="intrude-less">
<header id="header" class="inner">
<div class="profilepic">
<img lazy-src="/images/logo.png" class="js-avatar">
</div>
<hgroup>
<h1 class="header-author">birdben</h1>
</hgroup>
<nav class="header-menu">
<ul>
<li><a href="/">主页</a></li>
<li><a href="/archives">所有文章</a></li>
<div class="clearfix"></div>
</ul>
</nav>
<nav class="header-nav">
<div class="social">
<a class="github" target="_blank" href="https://github.com/birdben" title="github">github</a>
<a class="weibo" target="_blank" href="#" title="weibo">weibo</a>
</div>
</nav>
</header>
</div>
</nav>
<div class="body-wrap">
<article id="post-Docker/Docker实战(三十)Dockerfile最佳实践总结" class="article article-type-post" itemscope itemprop="blogPost">
<div class="article-meta">
<a href="/2017/05/07/Docker/Docker实战(三十)Dockerfile最佳实践总结/" class="article-date">
<time datetime="2017-05-07T06:52:44.000Z" itemprop="datePublished">2017-05-07</time>
</a>
</div>
<div class="article-inner">
<input type="hidden" class="isFancy" />
<header class="article-header">
<h1 itemprop="name">
<a class="article-title" href="/2017/05/07/Docker/Docker实战(三十)Dockerfile最佳实践总结/">Docker实战(三十)Dockerfile最佳实践总结</a>
</h1>
</header>
<div class="article-entry" itemprop="articleBody">
<p>这次重构Docker镜像也参考了网上许多关于Dockerfile编写的建议和技巧。本文主要翻译了官方给出的Dockerfile编写的建议,以及总结了一些网上Dockerfile编写的建议和技巧。</p>
<h3 id="一般准则和建议"><a href="#一般准则和建议" class="headerlink" title="一般准则和建议"></a>一般准则和建议</h3><h4 id="容器应该是”短暂的”"><a href="#容器应该是”短暂的”" class="headerlink" title="容器应该是”短暂的”"></a>容器应该是”短暂的”</h4><p>由Dockerfile定义的image生成的容器应尽可能短暂。 通过”短暂的”,我们意味着容器它可以被stop和destroyed,一个新的容器的构建可以使用绝对最小的设置和配置。</p>
<h4 id="使用-dockerignore"><a href="#使用-dockerignore" class="headerlink" title="使用.dockerignore"></a>使用.dockerignore</h4><p>在大多数情况下,最好将每个Dockerfile放在一个空目录中。 然后,仅添加构建Dockerfile所需的文件。 要增加构建的性能,可以通过将.dockerignore文件添加到该目录来排除文件和目录。</p>
<h4 id="避免安装不必要的安装包"><a href="#避免安装不必要的安装包" class="headerlink" title="避免安装不必要的安装包"></a>避免安装不必要的安装包</h4><p>应该尽量减少容器的复杂性,依赖性,文件的大小,构建的次数,所以应该尽量避免安装不必要的安装包。</p>
<h4 id="每个容器应该只有一个进程-“one-process-per-container”"><a href="#每个容器应该只有一个进程-“one-process-per-container”" class="headerlink" title="每个容器应该只有一个进程 “one process per container”"></a>每个容器应该只有一个进程 “one process per container”</h4><p>将应用程序解耦到多个容器中可以更轻松地水平扩展和重新使用容器。 例如,Web应用程序堆栈可能由三个独立的容器组成,每个容器具有自己独特的映像,以解耦的方式管理web application, database, memory cache。</p>
<p>如果容器之间有依赖关系,应该使用Docker Network解决容器之间的通信。</p>
<h4 id="最小化镜像的层数"><a href="#最小化镜像的层数" class="headerlink" title="最小化镜像的层数"></a>最小化镜像的层数</h4><p>在Dockerfile可读性和保持最少数据层之间找到平衡。一定要慎重引入新的数据层。</p>
<h4 id="排序多行参数"><a href="#排序多行参数" class="headerlink" title="排序多行参数"></a>排序多行参数</h4><p>只要有可能,通过以安装的软件包的字母数字来排序。 这将帮助你避免重复的包,并使列表更容易更新。 这也使得PR更容易阅读和审查。 在反斜杠(\)之前添加空格也有帮助。</p>
<h4 id="构建缓存"><a href="#构建缓存" class="headerlink" title="构建缓存"></a>构建缓存</h4><p>在构建image的过程中,Docker将按照指定的顺序逐步执行你的Dockerfile中的指令。随着每条指令的检查,Docker将在其缓存中查找可重用的现有image,而不是创建一个新的(重复)image。如果你不想使用缓存,可以在docker build命令中使用–no-cache=true选项。</p>
<p>但是,如果你确实让Docker使用其缓存,那么了解何时会找到匹配的image是非常重要的。 Docker将遵循的基本规则如下:</p>
<ul>
<li><p>从基础image开始就已经在缓存中了,将下一条指令与从该基础image导出的所有子image进行比较,以查看其中一条是否使用完全相同的指令构建。如果没有,则缓存无效。</p>
</li>
<li><p>在大多数情况下,只需将Dockerfile中的指令与其中一个子image进行比较即可。但是,某些说明需要更多的检查和解释。</p>
</li>
<li><p>对于ADD和COPY指令,将检查image中文件的内容,并为每个文件计算校验和。在这些校验和中不考虑文件的最后修改和最后访问的时间。在缓存查找期间,将校验和与现有image中的校验和进行比较。如果文件(如内容和元数据)中有任何变化,则缓存无效。</p>
</li>
<li><p>除了ADD和COPY命令之外,缓存检查将不会查看容器中的文件来确定缓存匹配。例如,当处理RUN apt-get -y update命令时,不会检查在容器中更新的文件以确定是否存在高速缓存命中。在这种情况下,只需使用命令字符串本身来查找匹配。</p>
</li>
</ul>
<p>一旦缓存无效,所有后续的Dockerfile命令将生成新的映像,并且高速缓存将不被使用。</p>
<h3 id="Dockerfile的一些建议"><a href="#Dockerfile的一些建议" class="headerlink" title="Dockerfile的一些建议"></a>Dockerfile的一些建议</h3><h4 id="FROM"><a href="#FROM" class="headerlink" title="FROM"></a>FROM</h4><p>只要有可能,使用当前的官方存储库作为你的image的基础。 我们建议使用Debian镜像,因为它是非常严格的控制,并保持最小(目前在150 mb),而仍然是一个完整的分布。</p>
<h4 id="LABEL"><a href="#LABEL" class="headerlink" title="LABEL"></a>LABEL</h4><p>给image添加label标签,能够更好的按照项目组织image信息,这个暂时没用到过。</p>
<h4 id="RUN"><a href="#RUN" class="headerlink" title="RUN"></a>RUN</h4><p>可以在多行上分隔长度或复杂的RUN语句,并以反斜杠分隔。</p>
<p>应该避免运行RUN apt-get upgrade or dist-upgrade,因为基本映像中的许多”essential”程序包将无法在非特权容器内升级。尽量使用apt-get install -y foo更新一个特定的包。</p>
<p>请务必将RUN apt-get update与apt-get install组合在同一个RUN语句中。例如:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">RUN apt-get update && apt-get install -y \</div><div class="line"> package-bar \</div><div class="line"> package-baz \</div><div class="line"> package-foo</div></pre></td></tr></table></figure>
<p>如果在RUN语句中单独使用apt-get update会导致缓存问题和随后的apt-get install说明失败。例如,说你有一个Docker文件:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line">FROM ubuntu:14.04</div><div class="line">RUN apt-get update</div><div class="line">RUN apt-get install -y curl</div></pre></td></tr></table></figure>
<p>构建image后,所有图层都在Docker缓存中。假设你以后通过添加额外的包来修改apt-get install:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line">FROM ubuntu:14.04</div><div class="line">RUN apt-get update</div><div class="line">RUN apt-get install -y curl nginx</div></pre></td></tr></table></figure>
<p>Docker将初始和修改的指令看作是相同的,并重新使用先前步骤的缓存。因此,apt-get update不会执行,因为构建使用缓存版本。因为apt-get update没有运行,你的构建可能会有一个过时的curl和nginx包版本。</p>
<p>使用RUN apt-get update && apt-get install -y可确保你的Dockerfile安装最新的软件包版本,无需进一步的编码或手动干预。这种技术被称为“缓存破解”。你还可以通过指定包版本来实现缓存清除。这被称为版本固定,例如:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">RUN apt-get update && apt-get install -y \</div><div class="line"> package-bar \</div><div class="line"> package-baz \</div><div class="line"> package-foo=1.3.*</div></pre></td></tr></table></figure>
<p>版本锁定强制构建检索特定版本,而不管缓存中有什么。这种技术还可以减少由于所需软件包中意外的更改导致的故障。</p>
<p>如果image以前使用过旧版本,则指定新版本会导致apt-get update的缓存破坏,并确保新版本的安装。在每行上列出包也可以防止包重复中的错误。</p>
<p>下面是一个完整的运行指令,显示所有apt-get建议。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div></pre></td><td class="code"><pre><div class="line">RUN apt-get update && apt-get install -y \</div><div class="line"> aufs-tools \</div><div class="line"> automake \</div><div class="line"> build-essential \</div><div class="line"> curl \</div><div class="line"> dpkg-sig \</div><div class="line"> libcap-dev \</div><div class="line"> libsqlite3-dev \</div><div class="line"> mercurial \</div><div class="line"> reprepro \</div><div class="line"> ruby1.9.1 \</div><div class="line"> ruby1.9.1-dev \</div><div class="line"> s3cmd=1.1.* \</div><div class="line"> && rm -rf /var/lib/apt/lists/*</div></pre></td></tr></table></figure>
<p>另外,可以通过删除/var/lib/apt/lists来清理apt缓存,减少了image大小,因为apt缓存不存储在图层中。由于RUN语句以apt-get update开头,所以在apt-get install之前,包缓存将始终被刷新。</p>
<p>注意:Debian和Ubuntu的图像自动运行apt-get clean,所以不需要显式调用。</p>
<h4 id="使用管道pipes"><a href="#使用管道pipes" class="headerlink" title="使用管道pipes"></a>使用管道pipes</h4><p>一些RUN命令取决于使用管道字符(|)将一个命令的输出管道到另一个命令的能力,如以下示例所示:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">RUN wget -O - https://some.site | wc -l > /number</div></pre></td></tr></table></figure>
<p>Docker使用/bin/sh -c解释器执行这些命令,该解释器仅评估管道中最后一个操作的退出代码以确定成功。在上面的示例中,只要wc -l命令成功,即使wget命令失败,构建步骤也会成功并生成新映像。</p>
<p>如果你希望命令由于管道中任何阶段的错误而失败,请先设置-o pipefail &&以确保意外的错误会阻止构建无意中成功。例如:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">RUN set -o pipefail && wget -O - https://some.site | wc -l > /number</div></pre></td></tr></table></figure>
<p>注意:并非所有的shell都支持-o pipefail选项。在这种情况下(例如,破折号shell,它是基于Debian的映像的默认shell),请考虑使用exec的形式来显式选择一个支持pipefail选项的shell。例如:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">RUN ["/bin/bash", "-c", "set -o pipefail && wget -O - https://some.site | wc -l > /number"]</div></pre></td></tr></table></figure>
<h4 id="CMD"><a href="#CMD" class="headerlink" title="CMD"></a>CMD</h4><p>CMD指令应用于运行image中包含的软件以及任何参数。 CMD几乎总是以CMD [“executable”, “param1”, “param2”…]的形式使用。 因此,如果image用于服务,例如Apache和Rails,则可以运行类似于CMD [“apache2”,”-DFOREGROUND”]的内容。 实际上,这种形式的指令是推荐用于任何基于服务的image。</p>
<p>在大多数其他情况下,应该给CMD一个交互式的shell,比如bash,python和perl。 例如,CMD [“perl”, “-de0”], CMD [“python”], or CMD [“php”, “-a”]。 使用这个表单意味着当你执行像docker run -it python时,你将被丢弃到一个可用的shell中。 CMD应该很少以CMD [“param”, “param”]的方式与ENTRYPOINT一起使用,除非你和你的用户已经非常熟悉ENTRYPOINT是如何工作的。</p>
<h4 id="EXPOSE"><a href="#EXPOSE" class="headerlink" title="EXPOSE"></a>EXPOSE</h4><p>EXPOSE指令指示容器将侦听连接的端口。 因此,你应该为应用程序使用通用的传统端口。 例如,包含Apache Web服务器的映像将使用EXPOSE 80,而包含MongoDB的映像将使用EXPOSE 27017等。</p>
<p>对于外部访问,你的用户可以使用指示如何将指定端口映射到所选端口的标志来执行docker运行。 对于容器链接,Docker提供环境变量(例如:MYSQL_PORT_3306_TCP)从目标容器到源容器的路径。</p>
<h4 id="ENV"><a href="#ENV" class="headerlink" title="ENV"></a>ENV</h4><p>为了使新软件更容易运行,可以为你容器安装的软件使用ENV更新PATH环境变量。 例如,ENV PATH /usr/local/nginx/bin:$PATH将确保CMD [“nginx”]正常工作。</p>
<p>ENV指令也可用于提供特定于要集中化的服务的必需环境变量,例如Postgres的PGDATA。</p>
<p>最后,ENV也可用于设置常用的版本号,以便版本颠覆更容易维护,如下例所示:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">ENV PG_MAJOR 9.3</div><div class="line">ENV PG_VERSION 9.3.4</div><div class="line">RUN curl -SL http://example.com/postgres-$PG_VERSION.tar.xz | tar -xJC / usr / src / postgress && ...</div><div class="line">ENV PATH /usr/local/postgres-$PG_MAJOR/bin:$PATH</div></pre></td></tr></table></figure>
<p>类似于在程序中具有常量变量(与硬编码值相反),这种方法允许你修改单个ENV就能自动控制容器中的软件版本。</p>
<h4 id="ADD-or-COPY"><a href="#ADD-or-COPY" class="headerlink" title="ADD or COPY"></a>ADD or COPY</h4><p>虽然ADD和COPY在功能上是相似的,但一般来说,COPY是首选的。这是因为它比ADD更透明。 COPY只支持将本地文件复制到容器中,而ADD具有一些不是很明显的功能(如本地的tar提取和远程URL支持)。因此,ADD的最佳用途是将本地tar文件自动提取到图像中,如:ADD rootfs.tar.xz /。</p>
<p>如果你有多个Dockerfile步骤可以使用上下文中的不同文件,单独COPY,而不是一次性复制全部文件。如果特定需要的文件更改,这将确保每一步的构建缓存仅被无效(强制该步骤重新运行)。</p>
<p>例如:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line">COPY requirements.txt /tmp/</div><div class="line">RUN pip install --requirement /tmp/requirements.txt</div><div class="line">COPY . /tmp/</div></pre></td></tr></table></figure>
<p>结果就是RUN这步很少的缓存会失效,和把COPY . /tmp/放在RUN之前相比。</p>
<p>由于image大小很重要,因此使用ADD从远程URL获取包是非常不鼓励的,你应该使用curl或wget来代替。这样,你可以删除在解压后不再需要的文件,而不必在image中添加另一个图层。例如,你应该避免这样做:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line">ADD http://example.com/big.tar.xz /usr/src/things/</div><div class="line">RUN tar -xJf /usr/src/things/big.tar.xz -C /usr/src/things</div><div class="line">RUN make -C /usr/src/things all</div></pre></td></tr></table></figure>
<p>而应该这样</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">RUN mkdir -p /usr/src/things \</div><div class="line"> && curl -SL http://example.com/big.tar.xz \</div><div class="line"> | tar -xJC /usr/src/things \</div><div class="line"> && make -C /usr/src/things all</div></pre></td></tr></table></figure>
<p>对于不需要ADD tar自动提取功能的其他项目(文件,目录),应始终使用COPY。</p>
<h4 id="ENTRYPOINT"><a href="#ENTRYPOINT" class="headerlink" title="ENTRYPOINT"></a>ENTRYPOINT</h4><p>ENTRYPOINT的最佳用途是设置image的主命令,允许该image像该命令一样运行(然后使用CMD作为默认标志)。</p>
<p>我们从一个命令行工具s3cmd的图像的例子开始:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">ENTRYPOINT ["s3cmd"]</div><div class="line">CMD ["--help"]</div></pre></td></tr></table></figure>
<p>现在可以像这样运行映像来显示命令的帮助:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ docker run s3cmd</div></pre></td></tr></table></figure>
<p>或使用正确的参数执行命令:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ docker run s3cmd ls s3://mybucket</div></pre></td></tr></table></figure>
<p>这是有用的,因为image名称可以作为二进制文件的参考,如上面的命令所示。</p>
<p>ENTRYPOINT指令也可以与辅助脚本组合使用,允许其以类似于上述命令的方式运行,即使启动工具可能需要多于一个步骤。</p>
<p>例如,Postgres Official Image使用以下脚本作为其ENTRYPOINT</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div></pre></td><td class="code"><pre><div class="line">#!/bin/bash</div><div class="line">set -e</div><div class="line"></div><div class="line">if [ "$1" = 'postgres' ]; then</div><div class="line"> chown -R postgres "$PGDATA"</div><div class="line"></div><div class="line"> if [ -z "$(ls -A "$PGDATA")" ]; then</div><div class="line"> gosu postgres initdb</div><div class="line"> fi</div><div class="line"></div><div class="line"> exec gosu postgres "$@"</div><div class="line">fi</div><div class="line"></div><div class="line">exec "$@"</div></pre></td></tr></table></figure>
<p>注意:此脚本使用exec Bash命令,以便最终运行的应用程序成为容器的PID 1。这允许应用程序接收发送到容器的任何Unix信号。有关详细信息,请参阅ENTRYPOINT帮助。</p>
<p>帮助脚本被复制到容器中,并通过容器起始处的ENTRYPOINT运行:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">COPY ./docker-entrypoint.sh /</div><div class="line">ENTRYPOINT ["/docker-entrypoint.sh"]</div></pre></td></tr></table></figure>
<p>此脚本允许用户以多种方式与Postgres进行交互。</p>
<p>它可以简单地启动Postgres:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ docker run postgres</div></pre></td></tr></table></figure>
<p>或者,它可以用于运行Postgres并将参数–help传递给服务器:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ docker run postgres postgres --help</div></pre></td></tr></table></figure>
<p>最后,它也可以用来启动一个完全不同的工具,比如Bash:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ docker run --rm -it postgres bash</div></pre></td></tr></table></figure>
<h4 id="Volume"><a href="#Volume" class="headerlink" title="Volume"></a>Volume</h4><p>应该使用VOLUME指令来暴露由docker容器创建的任何数据库存储区域,配置存储器或文件/文件夹。</p>
<h4 id="User"><a href="#User" class="headerlink" title="User"></a>User</h4><p>如果服务可以无特权运行,请使用USER更改为非root用户。 可以使用RUN groupadd -r postgres && useradd -r -g postgres postgres可以创建一个普通用户。</p>
<p>注意:image中的用户和组获得非确定性的UID/GID,因为”next”UID/GID被分配,而不管image重建。 所以,如果是至关重要的,你应该分配一个显式的UID/GID。</p>
<p>你应避免安装或使用sudo,因为它具有不可预测的TTY和信号转发行为,可能导致比解决问题更多的问题。 如果你绝对需要类似于sudo的功能(例如,以root用户身份初始化守护程序,但以非root身份运行),则可以使用”gosu”。</p>
<p>最后,为了降低层次和复杂性,请避免频繁地切换USER。</p>
<h4 id="WORKDIR"><a href="#WORKDIR" class="headerlink" title="WORKDIR"></a>WORKDIR</h4><p>为了清晰可靠,你应该始终为WORKDIR使用绝对路径。 此外,应该使用WORKDIR,而不应该使用像RUN CD … && do-something这些难以阅读,排除故障和维护的指令。</p>
<h4 id="ONBUILD"><a href="#ONBUILD" class="headerlink" title="ONBUILD"></a>ONBUILD</h4><p>在当前的Dockerfile构建完成之后执行一个ONBUILD命令。 ONBUILD在从当前image派生的任何子image中执行。将ONBUILD命令视为父Dockerfile为子Dockerfile提供的指令。</p>
<p>Docker构建在子Dockerfile中的任何命令之前执行ONBUILD命令。</p>
<p>ONBUILD对于那些给定FROM的image构建是很有用的。例如,你可以使用ONBUILD作为语言堆栈image,可以在Dockerfile中构建用该语言编写的任意软件,就像在Ruby的ONBUILD变体中所看到的那样。</p>
<p>从ONBUILD构建的image应该有一个单独的标签,例如:ruby:1.9-onbuild或ruby:2.0-onbuild。</p>
<p>将ADD或COPY放在ONBUILD中时要小心。如果新版本的上下文缺少添加的资源,”ONBUILD”image将会失败。</p>
<h3 id="其他建议汇总"><a href="#其他建议汇总" class="headerlink" title="其他建议汇总"></a>其他建议汇总</h3><h4 id="移除构建依赖"><a href="#移除构建依赖" class="headerlink" title="移除构建依赖"></a>移除构建依赖</h4><p>其实官网的建议中也提到了,只是没有特别的强调。如果通过源码编译构建,你的镜像通常比需要的大很多。可能的话,在同一条RUN指令中,安装构建工具、构建软件,然后移除构建工具。这样可以减少image的大小。</p>
<h4 id="选择gosu"><a href="#选择gosu" class="headerlink" title="选择gosu"></a>选择gosu</h4><p>gosu实用工具,通常用在ENTRYPOINT指令调用的脚本中,这些ENTRYPOINT指令位于官方镜像的Dockerfile中。它是个类sudo的简单工具,接受并运行特定用户的特定指令。但是gosu可以避免sudo怪异恼人的TTY和信号转发(signal-forwarding)行为。</p>
<h4 id="不要在-Dockerfile-中修改文件的权限"><a href="#不要在-Dockerfile-中修改文件的权限" class="headerlink" title="不要在 Dockerfile 中修改文件的权限"></a>不要在 Dockerfile 中修改文件的权限</h4><p>因为 docker 镜像是分层的,任何修改都会新增一个层,修改文件或者目录权限也是如此。如果修改大文件或者目录的权限,会把这些文件复制一份,这样很容易导致镜像很大。</p>
<p>解决方案也很简单,要么在添加到 Dockerfile 之前就把文件的权限和用户设置好,要么在容器启动脚本(entrypoint)做这些修改。</p>
<p>这里我也是参考了DockerHub上一些官方镜像的写法。</p>
<h4 id="apt-get注意点"><a href="#apt-get注意点" class="headerlink" title="apt-get注意点"></a>apt-get注意点</h4><p>一个是运行apt-get upgrade 会更新所有包到最新版本 —— 不能这样做的理由是它会妨碍Dockerfile构建的持久与一致性。</p>
<p>另一个是在不同的行之间运行apt-get update与apt-get install命令。不能这样做的原因是,只有apt-get update的代码会在构建过程中被缓存,而且你需要运行apt-get install命令的时候不会每次都被执行。因此,你需要将apt-get update跟所要安装的包都在同一行执行,来确保它们正确的更新。</p>
<h4 id="使用docker-exec而不是sshd"><a href="#使用docker-exec而不是sshd" class="headerlink" title="使用docker exec而不是sshd"></a>使用docker exec而不是sshd</h4><p>需要进入容器要使用docker exec命令,而不要单独安装sshd</p>
<p>参考文章:</p>
<ul>
<li><a href="https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/" target="_blank" rel="external">https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/</a></li>
<li><a href="http://www.cnblogs.com/vikings-blog/p/4337152.html" target="_blank" rel="external">http://www.cnblogs.com/vikings-blog/p/4337152.html</a></li>
<li><a href="http://www.oschina.net/translate/6-dockerfile-tips-official-images" target="_blank" rel="external">http://www.oschina.net/translate/6-dockerfile-tips-official-images</a></li>
<li><a href="http://cizixs.com/2017/03/28/dockerfile-best-practice" target="_blank" rel="external">http://cizixs.com/2017/03/28/dockerfile-best-practice</a></li>
</ul>
</div>
<div class="article-info article-info-index">
<div class="article-tag tagcloud">
<ul class="article-tag-list"><li class="article-tag-list-item"><a class="article-tag-list-link" href="/tags/Dockerfile/">Dockerfile</a></li></ul>
</div>
<div class="article-category tagcloud">
<a class="article-category-link" href="/categories/Docker/">Docker</a>
</div>
<div class="clearfix"></div>
</div>
</div>
</article>
<article id="post-Docker/Docker实战(二十九)DockerCompose搭建ELK集成环境问题汇总" class="article article-type-post" itemscope itemprop="blogPost">
<div class="article-meta">
<a href="/2017/05/06/Docker/Docker实战(二十九)DockerCompose搭建ELK集成环境问题汇总/" class="article-date">
<time datetime="2017-05-06T10:17:17.000Z" itemprop="datePublished">2017-05-06</time>
</a>
</div>
<div class="article-inner">
<input type="hidden" class="isFancy" />
<header class="article-header">
<h1 itemprop="name">
<a class="article-title" href="/2017/05/06/Docker/Docker实战(二十九)DockerCompose搭建ELK集成环境问题汇总/">Docker实战(二十九)DockerCompose搭建ELK集成环境问题汇总</a>
</h1>
</header>
<div class="article-entry" itemprop="articleBody">
<p>今天记录一下在使用docker-compose构建ELK集成环境时遇到的坑,废话不多说了直接来踩坑。</p>
<h3 id="docker网络冲突"><a href="#docker网络冲突" class="headerlink" title="docker网络冲突"></a>docker网络冲突</h3><p>在修改好ELK的docker-compose.yml配置文件后,尝试启动遇到网络冲突的问题,错误提示说”172.18.0.1”这个网络已经存在。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div></pre></td><td class="code"><pre><div class="line">$ docker-compose up -d</div><div class="line">Creating network "5x_elk_net" with driver "bridge"</div><div class="line">ERROR: failed to allocate gateway (172.18.0.1): Address already in use</div><div class="line"></div><div class="line"># 查看现在的docker网络,发现下面几个已经存在的网络配置</div><div class="line">$ docker network ls</div><div class="line"></div><div class="line">NETWORK ID NAME DRIVER SCOPE</div><div class="line">27822b9fb5c5 bridge bridge local</div><div class="line">08b6d63e27d2 host host local</div><div class="line">dd0874e0e097 none null local</div><div class="line">bacc9a64bb83 test_default bridge local</div><div class="line"></div><div class="line"># 依次查看这几个网络的配置,发现bacc9a64bb83这个容器的网络已经使用了"172.18.0.1"</div><div class="line">$ docker network inspect bacc9a64bb83</div><div class="line"></div><div class="line">[</div><div class="line"> {</div><div class="line"> "Name": "test_default",</div><div class="line"> "Id": "bacc9a64bb8323b2e53b1c85b4643061d38699227492f9174855202b6900252a",</div><div class="line"> "Created": "2017-04-21T10:29:37.26843596Z",</div><div class="line"> "Scope": "local",</div><div class="line"> "Driver": "bridge",</div><div class="line"> "EnableIPv6": false,</div><div class="line"> "IPAM": {</div><div class="line"> "Driver": "default",</div><div class="line"> "Options": null,</div><div class="line"> "Config": [</div><div class="line"> {</div><div class="line"> "Subnet": "172.18.0.0/16",</div><div class="line"> "Gateway": "172.18.0.1"</div><div class="line"> }</div><div class="line"> ]</div><div class="line"> },</div><div class="line"> "Internal": false,</div><div class="line"> "Attachable": false,</div><div class="line"> "Containers": {},</div><div class="line"> "Options": {},</div><div class="line"> "Labels": {}</div><div class="line"> }</div><div class="line">]</div></pre></td></tr></table></figure>
<p>找到冲突的地方就好办,两种方式来解决:</p>
<ol>
<li>删除已经存在的网络</li>
<li>更换docker-compose现有的网段</li>
</ol>
<p>因为这个容器对我还有其他用处,所以这里我选择更换docker-compose的网络来解决</p>
<h3 id="logstash-output-elasticsearch插件的host配置不支持特殊符号"><a href="#logstash-output-elasticsearch插件的host配置不支持特殊符号" class="headerlink" title="logstash-output-elasticsearch插件的host配置不支持特殊符号"></a>logstash-output-elasticsearch插件的host配置不支持特殊符号</h3><p>下面是我的docker-compose.yml配置文件(篇幅原因,这里省略了一部分,只是用了ES的容器配置举例)。</p>
<p>docker-compose.yml配置文件链接:</p>
<ul>
<li><a href="https://github.com/birdben/birdDocker/blob/v2/elk/5.x/docker-compose.yml">https://github.com/birdben/birdDocker/blob/v2/elk/5.x/docker-compose.yml</a></li>
</ul>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div></pre></td><td class="code"><pre><div class="line">version: '2'</div><div class="line">services:</div><div class="line"></div><div class="line"> ...</div><div class="line"></div><div class="line"> elasticsearch:</div><div class="line"> # 指定当前构建的Docker容器的镜像</div><div class="line"> image: birdben/elasticsearch_5.x:v2</div><div class="line"> restart: always</div><div class="line"> # 指定当前构建的Docker容器的名称</div><div class="line"> container_name: elasticsearch_5.x</div><div class="line"> networks:</div><div class="line"> elk_net:</div><div class="line"> # 指定当前构建的Docker容器的IP地址</div><div class="line"> ipv4_address: 172.20.0.5</div><div class="line"> # 指定当前构建的Docker容器的host配置</div><div class="line"> extra_hosts:</div><div class="line"> - "filebeat:172.20.0.2"</div><div class="line"> - "redis:172.20.0.3"</div><div class="line"> - "logstash:172.20.0.4"</div><div class="line"> - "elasticsearch:172.20.0.5"</div><div class="line"> - "kibana:172.20.0.6"</div><div class="line"> # 指定当前构建的Docker容器的volume挂在目录设置</div><div class="line"> volumes:</div><div class="line"> - /Users/yunyu/workspace_git/birdDocker/elk/5.x/volumes/elasticsearch/data:/usr/share/elasticsearch/data</div><div class="line"> - /Users/yunyu/workspace_git/birdDocker/elk/5.x/volumes/elasticsearch/config:/usr/share/elasticsearch/config</div><div class="line"> - /Users/yunyu/workspace_git/birdDocker/elk/5.x/volumes/elasticsearch/logs:/usr/share/elasticsearch/logs</div><div class="line"> # 指定当前构建的Docker容器对外开放的端口号映射</div><div class="line"> ports:</div><div class="line"> - "9200:9200"</div><div class="line"> - "9300:9300"</div><div class="line"></div><div class="line"> ...</div></pre></td></tr></table></figure>
<p>如果ES容器的container_name配置为”elasticsearch_5.x”,那么logstash需要在logstash.conf配置文件中使用host来指定ES的服务器为”elasticsearch_5.x”,配置如下:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div></pre></td><td class="code"><pre><div class="line">...</div><div class="line"></div><div class="line">output {</div><div class="line"> stdout {</div><div class="line"> codec => rubydebug</div><div class="line"> }</div><div class="line"> elasticsearch {</div><div class="line"> codec => "json"</div><div class="line"> hosts => ["elasticsearch_5.x:9200"]</div><div class="line"> index => "logstash-%{+YYYY.MM.dd}"</div><div class="line"> document_type => "%{type}"</div><div class="line"> workers => 1</div><div class="line"> flush_size => 20000</div><div class="line"> idle_flush_time => 10</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>启动Logstash之后,会如下报错</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">Sending Logstash's logs to /usr/share/logstash/logs which is now configured via log4j2.properties</div><div class="line">[2017-05-06T11:09:25,349][INFO ][logstash.setting.writabledirectory] Creating directory {:setting=>"path.queue", :path=>"/usr/share/logstash/data/queue"}</div><div class="line">[2017-05-06T11:09:25,382][INFO ][logstash.agent ] No persistent UUID file found. Generating new UUID {:uuid=>"e241d497-58e2-46de-9213-95088242255a", :path=>"/usr/share/logstash/data/uuid"}</div><div class="line">[2017-05-06T11:09:25,771][ERROR][logstash.agent ] Cannot load an invalid configuration {:reason=>"bad URI(is not URI?): elasticsearch_5.x:9200"}</div></pre></td></tr></table></figure>
<p>提示是”elasticsearch_5.x”是一个非法的URI地址,将docker-compose.yml配置文件的container_name和logstash.conf的host配置修改为”elasticsearch5x”之后就不会再报错了。所以推断logstash-output-elasticsearch插件对host要求比较严格,不支持一些特殊符号。</p>
<h3 id="docker-compose配置的容器无法全部正常启动"><a href="#docker-compose配置的容器无法全部正常启动" class="headerlink" title="docker-compose配置的容器无法全部正常启动"></a>docker-compose配置的容器无法全部正常启动</h3><p>使用docker-compose启动ELK的2.x版本服务都一切正常,但是换成ELK的5.x版本后,发现Filebeat,Logstash,Redis,Kibana服务都正常,只有ES的容器起来没有多久就自己挂掉了。<br>看了ES的日志也没有发现什么异常,单独启动ES5.x的容器却能正常使用。</p>
<p>后来实在没办法了,我尝试在docker-compose.yml配置文件中只留下ES的容器,这样运行也没问题。之后尝试一个一个将其他容器的配置加到docker-compose.yml配置文件,发现当Logstash5.x和ES5.x的容器同时启动,ES的容器就会出现上面自己挂掉的情况。</p>
<p>然后我又仔细查看了一下ES的日志文件,发现了一些区别:有问题的ES日志中多了一些GC的日志。</p>
<ul>
<li>有问题的ES日志</li>
</ul>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div></pre></td><td class="code"><pre><div class="line">[2017-05-06T10:36:30,804][INFO ][o.e.n.Node ] [node-1] initialized</div><div class="line">[2017-05-06T10:36:30,805][INFO ][o.e.n.Node ] [node-1] starting ...</div><div class="line">[2017-05-06T10:36:31,162][WARN ][i.n.u.i.MacAddressUtil ] Failed to find a usable hardware address from the network interfaces; using random bytes: 37:86:d0:ae:ee:3d:71:88</div><div class="line">[2017-05-06T10:36:31,437][INFO ][o.e.t.TransportService ] [node-1] publish_address {172.20.0.5:9300}, bound_addresses {[::]:9300}</div><div class="line">[2017-05-06T10:36:31,457][INFO ][o.e.b.BootstrapChecks ] [node-1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks</div><div class="line">[2017-05-06T10:36:33,545][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][young][2][2] duration [1.4s], collections [1]/[1.6s], total [1.4s]/[1.7s], memory [284.7mb]->[51.7mb]/[1.9gb], all_pools {[young] [266.2mb]->[11.2mb]/[266.2mb]}{[survivor] [18.4mb]->[32mb]/[33.2mb]}{[old] [0b]->[8.4mb]/[1.6gb]}</div><div class="line">[2017-05-06T10:36:33,560][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][2] overhead, spent [1.4s] collecting in the last [1.6s]</div><div class="line">[2017-05-06T10:36:50,112][INFO ][o.e.n.Node ] [node-1] initializing ...</div><div class="line">[2017-05-06T10:36:50,400][INFO ][o.e.e.NodeEnvironment ] [node-1] using [1] data paths, mounts [[/usr/share/elasticsearch/data (osxfs)]], net usable_space [6.1gb], net total_space [232.6gb], spins? [possibly], types [fuse.osxfs]</div><div class="line">[2017-05-06T10:36:50,401][INFO ][o.e.e.NodeEnvironment ] [node-1] heap size [1.9gb], compressed ordinary object pointers [true]</div><div class="line">[2017-05-06T10:36:50,415][INFO ][o.e.n.Node ] [node-1] node name [node-1], node ID [x7vSjbIKSdeUbHcAjXWPCw]</div><div class="line">[2017-05-06T10:36:50,417][INFO ][o.e.n.Node ] [node-1] version[5.3.1], pid[1], build[5f9cf58/2017-04-17T15:52:53.846Z], OS[Linux/4.9.13-moby/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_121/25.121-b13]</div></pre></td></tr></table></figure>
<ul>
<li>没有问题的ES日志</li>
</ul>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div></pre></td><td class="code"><pre><div class="line">[2017-05-06T09:35:52,233][INFO ][o.e.n.Node ] [node-1] initialized</div><div class="line">[2017-05-06T09:35:52,238][INFO ][o.e.n.Node ] [node-1] starting ...</div><div class="line">[2017-05-06T09:35:52,408][WARN ][i.n.u.i.MacAddressUtil ] Failed to find a usable hardware address from the network interfaces; using random bytes: 4a:ab:e0:6f:82:87:b0:e5</div><div class="line">[2017-05-06T09:35:52,569][INFO ][o.e.t.TransportService ] [node-1] publish_address {172.20.0.5:9300}, bound_addresses {[::]:9300}</div><div class="line">[2017-05-06T09:35:52,592][INFO ][o.e.b.BootstrapChecks ] [node-1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks</div><div class="line">[2017-05-06T09:35:55,713][INFO ][o.e.c.s.ClusterService ] [node-1] new_master {node-1}{Z0Yoi2zfTl237aiVzEoOug}{N4z8452FTc-SArP7hh7h-g}{172.20.0.5}{172.20.0.5:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)</div><div class="line">[2017-05-06T09:35:55,779][INFO ][o.e.g.GatewayService ] [node-1] recovered [0] indices into cluster_state</div><div class="line">[2017-05-06T09:35:55,790][INFO ][o.e.h.n.Netty4HttpServerTransport] [node-1] publish_address {172.20.0.5:9200}, bound_addresses {[::]:9200}</div><div class="line">[2017-05-06T09:35:55,822][INFO ][o.e.n.Node ] [node-1] started</div><div class="line">[2017-05-06T09:35:58,039][INFO ][o.e.c.m.MetaDataCreateIndexService] [node-1] [logstash-2017.05.06] creating index, cause [auto(bulk api)], templates [logstash], shards [5]/[1], mappings [_default_]</div><div class="line">[2017-05-06T09:35:58,663][INFO ][o.e.c.m.MetaDataMappingService] [node-1] [logstash-2017.05.06/h2c9vcE2TaCdXgYxSjh0IA] create_mapping [log]</div><div class="line">[2017-05-06T09:36:01,108][INFO ][o.e.c.m.MetaDataCreateIndexService] [node-1] [.kibana] creating index, cause [api], templates [], shards [1]/[1], mappings [server, config]</div><div class="line">[2017-05-06T09:36:25,753][WARN ][o.e.c.r.a.DiskThresholdMonitor] [node-1] high disk watermark [90%] exceeded on [Z0Yoi2zfTl237aiVzEoOug][node-1][/usr/share/elasticsearch/data/nodes/0] free: 6.1gb[2.6%], shards will be relocated away from this node</div></pre></td></tr></table></figure>
<p>经过上面的分析,我怀疑是我docker服务设置的内存大小无法支持我启动这么多的容器。后来发现ES和Logstash的5.x版本比2.x版本多了一个jvm.options的配置文件,主要是用来设置ES和Logstash的JVM的配置使用的,在这个配置文件里可以控制JVM的堆大小。这里将ES和Logstash的堆内存调小后,再使用docker-compose启动,ES5.x的容器已经能够正常启动了。</p>
<p>ES的jvm.options</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div></pre></td><td class="code"><pre><div class="line">-Xms2g</div><div class="line">-Xmx2g</div><div class="line"></div><div class="line"># 修改为</div><div class="line"></div><div class="line">-Xms1g</div><div class="line">-Xmx1g</div></pre></td></tr></table></figure>
<p>Logstash的jvm.options</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div></pre></td><td class="code"><pre><div class="line">-Xms256m</div><div class="line">-Xmx1g</div><div class="line"></div><div class="line"># 修改为</div><div class="line"></div><div class="line">-Xms256m</div><div class="line">-Xmx256m</div></pre></td></tr></table></figure>
<p>参考文章:</p>
<ul>
<li><a href="http://www.open-open.com/lib/view/open1451606865542.html" target="_blank" rel="external">http://www.open-open.com/lib/view/open1451606865542.html</a></li>
<li><a href="https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/400">https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/400</a></li>
</ul>
</div>
<div class="article-info article-info-index">
<div class="article-tag tagcloud">
<ul class="article-tag-list"><li class="article-tag-list-item"><a class="article-tag-list-link" href="/tags/Docker环境/">Docker环境</a></li></ul>
</div>
<div class="article-category tagcloud">
<a class="article-category-link" href="/categories/Docker/">Docker</a>
</div>
<div class="clearfix"></div>
</div>
</div>
</article>
<article id="post-Docker/Docker实战(二十八)Docker的Volume挂载权限" class="article article-type-post" itemscope itemprop="blogPost">
<div class="article-meta">
<a href="/2017/05/02/Docker/Docker实战(二十八)Docker的Volume挂载权限/" class="article-date">
<time datetime="2017-05-02T13:46:56.000Z" itemprop="datePublished">2017-05-02</time>
</a>
</div>
<div class="article-inner">
<input type="hidden" class="isFancy" />
<header class="article-header">
<h1 itemprop="name">
<a class="article-title" href="/2017/05/02/Docker/Docker实战(二十八)Docker的Volume挂载权限/">Docker实战(二十八)Docker的Volume挂载权限</a>
</h1>
</header>
<div class="article-entry" itemprop="articleBody">
<p>最近在重构Docker镜像的时候,遇到了Volume挂载文件的权限问题。这里测试使用的是Elasticsearch官方提供的镜像。</p>
<p>Elasticsearch官方的Dockerfile文件</p>
<ul>
<li><a href="https://github.com/docker-library/elasticsearch/blob/35d99e915d909688807c507a59a2c06039ac92b2/5/Dockerfile">https://github.com/docker-library/elasticsearch/blob/35d99e915d909688807c507a59a2c06039ac92b2/5/Dockerfile</a></li>
<li><a href="https://github.com/docker-library/elasticsearch/blob/35d99e915d909688807c507a59a2c06039ac92b2/5/docker-entrypoint.sh">https://github.com/docker-library/elasticsearch/blob/35d99e915d909688807c507a59a2c06039ac92b2/5/docker-entrypoint.sh</a></li>
</ul>
<p>在制作自己的ES镜像的时候,参考了Elasticsearch官方的Dockerfile,有个地方没有弄明白,为什么Dockerfile和docker-entrypoint都要去chown下面的两个目录,在Dockerfile执行一次chown不就可以了吗?不执行chown会有什么问题呢?。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">chown -R elasticsearch:elasticsearch: /usr/share/elasticsearch/data</div><div class="line">chown -R elasticsearch:elasticsearch: /usr/share/elasticsearch/logs</div></pre></td></tr></table></figure>
<p>带着上面的疑问,我开始做了下面的尝试,这里我在本地使用修改后的Elasticsearch官方的Dockerfile开始构建Docker镜像,然后和官方pull下来的镜像做对比。</p>
<p>先下载Elasticsearch官方的Docker镜像</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div></pre></td><td class="code"><pre><div class="line"># 下载Elasticsearch官方的Docker镜像</div><div class="line">$ docker pull elasticsearch:5.3.1</div><div class="line"></div><div class="line"># 运行Elasticsearch的Docker容器,并且挂载对应的data目录</div><div class="line">$ docker run -d -v /Users/yunyu/workspace_git/birdDocker/elasticsearch/official_5/data:/usr/share/elasticsearch/data --name elasticsearch_official_5x elasticsearch:5.3.1</div><div class="line"></div><div class="line"># 进入Docker容器</div><div class="line">$ docker exec -it elasticsearch_official_5x /bin/bash</div><div class="line"></div><div class="line"># 查看Docker容器内/usr/share/elasticsearch目录的权限</div><div class="line">$ ls -lh /usr/share/elasticsearch</div><div class="line">total 228K</div><div class="line">-rw-r--r-- 1 root root 190K Apr 17 15:55 NOTICE.txt</div><div class="line">-rw-r--r-- 1 root root 9.4K Apr 17 15:55 README.textile</div><div class="line">drwxr-xr-x 2 root root 4.0K Apr 27 00:01 bin</div><div class="line">drwxr-xr-x 1 elasticsearch elasticsearch 4.0K Apr 27 00:01 config</div><div class="line">drwxr-xr-x 3 elasticsearch elasticsearch 102 May 4 06:20 data</div><div class="line">drwxr-xr-x 2 root root 4.0K Apr 27 00:01 lib</div><div class="line">drwxr-xr-x 1 elasticsearch elasticsearch 4.0K Apr 27 00:01 logs</div><div class="line">drwxr-xr-x 12 root root 4.0K Apr 27 00:01 modules</div><div class="line">drwxr-xr-x 2 root root 4.0K Apr 17 15:55 plugins</div><div class="line"></div><div class="line"># 查看Docker容器内/usr/share/elasticsearch/data目录的权限</div><div class="line">$ ls -lh /usr/share/elasticsearch/data/</div><div class="line">total 0</div><div class="line">drwxr-xr-x 3 root root 102 May 4 06:20 nodes</div><div class="line"></div><div class="line"># 查看Elasticsearch进程</div><div class="line">$ ps -ef | grep elasticsearch</div><div class="line">elastic+ 1 0 16 06:19 ? 00:00:17 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms2g -Xmx2g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/elasticsearch-5.3.1.jar:/usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch</div><div class="line">root 105 94 0 06:21 ? 00:00:00 grep elasticsearch</div></pre></td></tr></table></figure>
<p>使用Elasticsearch官方的Docker容器,又发现新的问题,因为Dockerfile中使用了chown -R elasticsearch:elasticsearch /usr/share/elasticsearch/data将该目录所有者修改为elasticsearch用户了,为什么/usr/share/elasticsearch/data目录下的文件和文件夹确实属于root用户呢?这个问题暂时先放一边,后面会给出解释,我们先继续之前的尝试。</p>
<p>注意:下面的尝试,每次都要从宿主机中删除挂载的目录,这样能避免docker-entrypoint.sh中执行chown修改目录的所属用户</p>
<p>在进行下面的尝试之前,我们需要先修改config/log4j2.properties配置文件,让elasticsearch的日志可以写入到日志文件中。(Elasticsearch5.x版本使用了log4j2,默认是只将日志输出到控制台的,这里和Elasticsearch2.x版本不同)</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div></pre></td><td class="code"><pre><div class="line">status = error</div><div class="line"></div><div class="line">appender.console.type = Console</div><div class="line">appender.console.name = console</div><div class="line">appender.console.layout.type = PatternLayout</div><div class="line">appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] %marker%m%n</div><div class="line"></div><div class="line">appender.rolling.type = RollingFile</div><div class="line">appender.rolling.name = rolling</div><div class="line">appender.rolling.fileName = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}.log</div><div class="line">appender.rolling.layout.type = PatternLayout</div><div class="line">appender.rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c] %.10000m%n</div><div class="line">appender.rolling.filePattern = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}-%d{yyyy-MM-dd}.log</div><div class="line">appender.rolling.policies.type = Policies</div><div class="line">appender.rolling.policies.time.type = TimeBasedTriggeringPolicy</div><div class="line">appender.rolling.policies.time.interval = 1</div><div class="line">appender.rolling.policies.time.modulate = true</div><div class="line"></div><div class="line">rootLogger.level = info</div><div class="line">rootLogger.appenderRef.console.ref = console</div><div class="line">rootLogger.appenderRef.all.ref = rolling</div></pre></td></tr></table></figure>
<h3 id="尝试一:删掉Dockerfile和docker-entrypoint-sh的chown语句"><a href="#尝试一:删掉Dockerfile和docker-entrypoint-sh的chown语句" class="headerlink" title="尝试一:删掉Dockerfile和docker-entrypoint.sh的chown语句"></a>尝试一:删掉Dockerfile和docker-entrypoint.sh的chown语句</h3><figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div></pre></td><td class="code"><pre><div class="line"># 构建修改后的Elasticsearch的Docker镜像</div><div class="line">$ docker build -t "birdben/elasticsearch:5.3.1" .</div><div class="line"></div><div class="line"># 运行Elasticsearch的Docker容器,并且挂载对应的data目录</div><div class="line">$ docker run -itd -v /Users/yunyu/workspace_git/birdDocker/elasticsearch/me/data:/usr/share/elasticsearch/data --name elasticsearch_me_5x birdben/elasticsearch:5.3.1</div><div class="line"></div><div class="line"># 查看Docker容器的日志</div><div class="line">$ docker logs 4353faea17cb</div><div class="line">2017-05-04 06:07:39,353 main ERROR Unable to create file /usr/share/elasticsearch/logs/elasticsearch.log java.io.IOException: Permission denied</div><div class="line"> at java.io.UnixFileSystem.createFileExclusively(Native Method)</div><div class="line"> at java.io.File.createNewFile(File.java:1012)</div><div class="line"> at org.apache.logging.log4j.core.appender.rolling.RollingFileManager$RollingFileManagerFactory.createManager(RollingFileManager.java:463)</div><div class="line"> at org.apache.logging.log4j.core.appender.rolling.RollingFileManager$RollingFileManagerFactory.createManager(RollingFileManager.java:445)</div><div class="line"> at org.apache.logging.log4j.core.appender.AbstractManager.getManager(AbstractManager.java:112)</div><div class="line"> ...</div><div class="line"></div><div class="line"># 查看Docker容器内/usr/share/elasticsearch目录的权限</div><div class="line">root@4353faea17cb:/usr/share/elasticsearch# ls -lh</div><div class="line">total 228K</div><div class="line">-rw-r--r-- 1 root root 190K Apr 17 15:55 NOTICE.txt</div><div class="line">-rw-r--r-- 1 root root 9.4K Apr 17 15:55 README.textile</div><div class="line">drwxr-xr-x 2 root root 4.0K May 4 06:05 bin</div><div class="line">drwxr-xr-x 1 root root 4.0K May 4 06:06 config</div><div class="line">drwxr-xr-x 3 root root 102 May 4 06:07 data</div><div class="line">drwxr-xr-x 2 root root 4.0K May 4 06:05 lib</div><div class="line">drwxr-xr-x 2 root root 4.0K May 4 06:05 logs</div><div class="line">drwxr-xr-x 12 root root 4.0K May 4 06:05 modules</div><div class="line">drwxr-xr-x 2 root root 4.0K Apr 17 15:55 plugins</div><div class="line"></div><div class="line"># 查看Docker容器内/usr/share/elasticsearch/data目录的权限</div><div class="line">$ ls -lh /usr/share/elasticsearch/data/</div><div class="line">total 0</div><div class="line">drwxr-xr-x 3 root root 102 May 4 06:33 nodes</div><div class="line"></div><div class="line"># 查看Docker容器内/usr/share/elasticsearch/logs目录的权限(没有日志文件,因为上面没有权限的问题)</div><div class="line">$ ls -lh /usr/share/elasticsearch/logs/</div><div class="line">total 0</div><div class="line"></div><div class="line"># 查看Elasticsearch进程</div><div class="line">$ ps -ef | grep elasticsearch</div><div class="line">elastic+ 1 0 11 06:33 ? 00:00:19 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms2g -Xmx2g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/elasticsearch-5.3.1.jar:/usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch</div><div class="line">root 106 92 0 06:36 ? 00:00:00 grep elasticsearch</div></pre></td></tr></table></figure>
<p>这里elasticsearch进程是属于elasticsearch用户的,而/usr/share/elasticsearch/data和/usr/share/elasticsearch/logs目录都属于root用户,所以没有权限在/usr/share/elasticsearch/logs目录下创建elasticsearch.log日志文件,这点理解起来比较容易。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div></pre></td><td class="code"><pre><div class="line"># 新建文档</div><div class="line">$ curl -XPOST 'http://127.0.0.1:9200/user/1/1' -d '{"name":"birdben"}'</div><div class="line"></div><div class="line"># 查看索引文件</div><div class="line">$ ls -lh /usr/share/elasticsearch/data/nodes/0/indices/ycOyM3onRbeZAPlYMVze_w/0/index/</div><div class="line">total 4.0K</div><div class="line">-rw-r--r-- 1 root root 130 May 4 06:38 segments_1</div><div class="line">-rw-r--r-- 1 root root 0 May 4 06:38 write.lock</div></pre></td></tr></table></figure>
<p>可以看出新建文档的索引文件所属用户也是root。</p>
<h3 id="尝试二:只删掉docker-entrypoint-sh的chown语句"><a href="#尝试二:只删掉docker-entrypoint-sh的chown语句" class="headerlink" title="尝试二:只删掉docker-entrypoint.sh的chown语句"></a>尝试二:只删掉docker-entrypoint.sh的chown语句</h3><figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div></pre></td><td class="code"><pre><div class="line"># 构建修改后的Elasticsearch的Docker镜像</div><div class="line">$ docker build -t "birdben/elasticsearch:5.3.1" .</div><div class="line"></div><div class="line"># 运行Elasticsearch的Docker容器,并且挂载对应的data目录</div><div class="line">$ docker run -itd -v /Users/yunyu/workspace_git/birdDocker/elasticsearch/me/data:/usr/share/elasticsearch/data --name elasticsearch_me_5x birdben/elasticsearch:5.3.1</div><div class="line"></div><div class="line"># 查看Docker容器的日志,没有问题</div><div class="line">$ docker logs 6db67f60ed6e</div><div class="line"></div><div class="line"># 查看Docker容器内/usr/share/elasticsearch目录的权限</div><div class="line">root@6db67f60ed6e:/usr/share/elasticsearch# ls -lh</div><div class="line">total 228K</div><div class="line">-rw-r--r-- 1 root root 190K Apr 17 15:55 NOTICE.txt</div><div class="line">-rw-r--r-- 1 root root 9.4K Apr 17 15:55 README.textile</div><div class="line">drwxr-xr-x 2 root root 4.0K May 4 07:02 bin</div><div class="line">drwxr-xr-x 1 elasticsearch elasticsearch 4.0K May 4 07:02 config</div><div class="line">drwxr-xr-x 3 root root 102 May 4 09:16 data</div><div class="line">drwxr-xr-x 2 root root 4.0K May 4 07:02 lib</div><div class="line">drwxr-xr-x 1 elasticsearch elasticsearch 4.0K May 4 09:16 logs</div><div class="line">drwxr-xr-x 12 root root 4.0K May 4 07:02 modules</div><div class="line">drwxr-xr-x 2 root root 4.0K Apr 17 15:55 plugins</div><div class="line"></div><div class="line"># 查看Docker容器内/usr/share/elasticsearch/data目录的权限</div><div class="line">$ ls -lh /usr/share/elasticsearch/data/</div><div class="line">total 0</div><div class="line">drwxr-xr-x 3 root root 102 May 4 06:33 nodes</div><div class="line"></div><div class="line"># 查看Docker容器内/usr/share/elasticsearch/logs目录的权限(生成日志文件了,但是logs目录的所属用户还是root,但是config和logs的所属用户却是elasticsearch,因为在Dockerfile中chown更改目录的所属用户后,又使用Volume挂载了data目录,而挂载的目录的所属用户就会被就修改为root用户,这也就解释了为什么data所属用户是root,config和logs所属用户是elasticsearch)</div><div class="line">$ ls -lh /usr/share/elasticsearch/logs/</div><div class="line">total 8.0K</div><div class="line">-rw-r--r-- 1 elasticsearch elasticsearch 5.3K May 4 09:19 elasticsearch.log</div><div class="line"></div><div class="line"># 查看Elasticsearch进程</div><div class="line">$ ps -ef | grep elasticsearch</div><div class="line">elastic+ 1 0 4 09:16 ? 00:00:20 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms2g -Xmx2g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/elasticsearch-5.3.1.jar:/usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch</div><div class="line">root 104 94 0 09:23 ? 00:00:00 grep elasticsearch</div></pre></td></tr></table></figure>
<p>这里elasticsearch进程也是属于elasticsearch用户的,而/usr/share/elasticsearch/data属于root用户,/usr/share/elasticsearch/config和/usr/share/elasticsearch/logs目录都属于elasticsearch用户,所以现在有权限在/usr/share/elasticsearch/logs目录下创建elasticsearch.log日志文件,这里猜测因为/usr/share/elasticsearch/logs没有挂载到宿主机,所以logs目录和目录下创建的elasticsearch.log日志文件都属于elasticsearch用户(因为Dockerfile中对logs目录进行了chown)。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div></pre></td><td class="code"><pre><div class="line"># 新建文档</div><div class="line">$ curl -XPOST 'http://127.0.0.1:9200/user/1/1' -d '{"name":"birdben"}'</div><div class="line"></div><div class="line"># 查看索引文件</div><div class="line">$ ls -lh /usr/share/elasticsearch/data/nodes/0/indices/meAhtSJXRl-cKzoqQZifBQ/0/index/</div><div class="line">total 4.0K</div><div class="line">-rw-r--r-- 1 root root 130 May 4 09:27 segments_1</div><div class="line">-rw-r--r-- 1 root root 0 May 4 09:27 write.lock</div></pre></td></tr></table></figure>
<p>可以看出新建文档的索引文件所属用户仍然是root。</p>
<p>下面我们证实一下我上面的猜测,/usr/share/elasticsearch/logs没有挂载到宿主机,所以logs目录和目录下创建的elasticsearch.log日志文件都属于elasticsearch用户,而不是root用户。这里推测一下,如果我把/usr/share/elasticsearch/logs挂载到宿主机,那logs目录和目录下的创建的elasticsearch.log日志文件就会属于root用户,而不是elasticsearch用户。(前提docker run的时候,-u使用的默认root用户,而不是elasticsearch用户)</p>
<h3 id="尝试三:只删掉docker-entrypoint-sh的chown语句,然后挂载-usr-share-elasticsearch-logs目录"><a href="#尝试三:只删掉docker-entrypoint-sh的chown语句,然后挂载-usr-share-elasticsearch-logs目录" class="headerlink" title="尝试三:只删掉docker-entrypoint.sh的chown语句,然后挂载/usr/share/elasticsearch/logs目录"></a>尝试三:只删掉docker-entrypoint.sh的chown语句,然后挂载/usr/share/elasticsearch/logs目录</h3><figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div></pre></td><td class="code"><pre><div class="line"># 运行Elasticsearch的Docker容器,并且挂载对应的data目录</div><div class="line">$ docker run -itd -v /Users/yunyu/workspace_git/birdDocker/elasticsearch/me/data:/usr/share/elasticsearch/data -v /Users/yunyu/workspace_git/birdDocker/elasticsearch/me/logs:/usr/share/elasticsearch/logs --name elasticsearch_me_5x birdben/elasticsearch:5.3.1</div><div class="line"></div><div class="line"># 查看Docker容器的日志,没有问题</div><div class="line">$ docker logs 17734a3549ad</div><div class="line"></div><div class="line"># 查看Docker容器内/usr/share/elasticsearch目录的权限(果然和推测的一样,logs目录的所属用户变成了root)</div><div class="line">root@17734a3549ad:/usr/share/elasticsearch# ls -lh</div><div class="line">total 224K</div><div class="line">-rw-r--r-- 1 root root 190K Apr 17 15:55 NOTICE.txt</div><div class="line">-rw-r--r-- 1 root root 9.4K Apr 17 15:55 README.textile</div><div class="line">drwxr-xr-x 2 root root 4.0K May 4 07:02 bin</div><div class="line">drwxr-xr-x 1 elasticsearch elasticsearch 4.0K May 4 07:02 config</div><div class="line">drwxr-xr-x 3 root root 102 May 4 09:37 data</div><div class="line">drwxr-xr-x 2 root root 4.0K May 4 07:02 lib</div><div class="line">drwxr-xr-x 3 root root 102 May 4 09:37 logs</div><div class="line">drwxr-xr-x 12 root root 4.0K May 4 07:02 modules</div><div class="line">drwxr-xr-x 2 root root 4.0K Apr 17 15:55 plugins</div><div class="line"></div><div class="line"># 查看Docker容器内/usr/share/elasticsearch/data目录的权限(不变,和之前一样)</div><div class="line">$ ls -lh /usr/share/elasticsearch/data/</div><div class="line">total 0</div><div class="line">drwxr-xr-x 3 root root 102 May 4 06:33 nodes</div><div class="line"></div><div class="line"># 查看Docker容器内/usr/share/elasticsearch/logs目录的权限(这里也和推测的一样,logs目录下创建的elasticsearch.log日志文件也属于root用户)</div><div class="line">$ ls -lh /usr/share/elasticsearch/logs/</div><div class="line">total 8.0K</div><div class="line">-rw-r--r-- 1 root root 4.3K May 4 09:39 elasticsearch.log</div><div class="line"></div><div class="line"># 查看Elasticsearch进程(不变,和之前一样)</div><div class="line">$ ps -ef | grep elasticsearch</div><div class="line">elastic+ 1 0 9 09:37 ? 00:00:18 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms2g -Xmx2g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/elasticsearch-5.3.1.jar:/usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch</div><div class="line">root 106 94 0 09:40 ? 00:00:00 grep elasticsearch</div></pre></td></tr></table></figure>
<p>新建文档也和之前一样(忽略)</p>
<p>通过上面的尝试结果,可以得出如下结论:</p>
<ul>
<li>即使在Dockerfile使用chown修改了目录的所属用户,但是只要目录被挂载到宿主机,则该目录的所属用户又会被修改为root用户。</li>
<li>如果不在Dockerfile中进行chown操作,当使用elasticsearch用户启动进程时,是无法访问root用户的目录的(目录被挂载后,目录的所属用户被修改为root用户的除外)</li>
</ul>
<p>OK,前面的尝试隐藏了一点,我没有做特殊说明,就是我们前面的尝试都使用的root用户启动的容器。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line"># 这里我们没有指定-u或者--user参数,默认就是使用root用户启动容器</div><div class="line">$ docker run -itd -v /Users/yunyu/workspace_git/birdDocker/elasticsearch/me/data:/usr/share/elasticsearch/data --name elasticsearch_me_5x birdben/elasticsearch:5.3.1</div></pre></td></tr></table></figure>
<p>所以docker-entrypoint.sh脚本中,有个if判断条件是不是root用户启动的容器”$(id -u)” = ‘0’,如果是root用户启动的容器,则使用gosu切换到elasticsearch启动elasticsearch进程。在这之前还进行了chown操作,将/usr/share/elasticsearch/data和/usr/share/elasticsearch/logs目录的所有者修改为elasticsearch用户。再回想下我们尝试一是把Dockerfile和docker-entrypoint.sh中的chown操作都删除掉了,所以elasticsearch进程才无法将日志写入到所属root用户的logs目录下。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div></pre></td><td class="code"><pre><div class="line">#!/bin/bash</div><div class="line"></div><div class="line">set -e</div><div class="line"></div><div class="line"># Add elasticsearch as command if needed</div><div class="line">if [ "${1:0:1}" = '-' ]; then</div><div class="line"> set -- elasticsearch "$@"</div><div class="line">fi</div><div class="line"></div><div class="line"># Drop root privileges if we are running elasticsearch</div><div class="line"># allow the container to be started with `--user`</div><div class="line">if [ "$1" = 'elasticsearch' -a "$(id -u)" = '0' ]; then</div><div class="line"> # Change the ownership of user-mutable directories to elasticsearch</div><div class="line"> for path in \</div><div class="line"> /usr/share/elasticsearch/data \</div><div class="line"> /usr/share/elasticsearch/logs \</div><div class="line"> ; do</div><div class="line"> chown -R elasticsearch:elasticsearch "$path"</div><div class="line"> done</div><div class="line"> </div><div class="line"> set -- gosu elasticsearch "$@"</div><div class="line"> #exec gosu elasticsearch "$BASH_SOURCE" "$@"</div><div class="line">fi</div><div class="line"></div><div class="line"># As argument is not related to elasticsearch,</div><div class="line"># then assume that user wants to run his own process,</div><div class="line"># for example a `bash` shell to explore this image</div><div class="line">exec "$@"</div></pre></td></tr></table></figure>
<p>这里可能有人会有疑问,那把Dockerfile中的chown操作删除,docker-entrypoint.sh中的chown操作加上的效果会不会和尝试二的结果一样呢?也可以正常将ES日志写入文件呢?我们再来尝试一下</p>
<h3 id="尝试四:只删掉Dockerfile的chown语句,然后挂载-usr-share-elasticsearch-logs目录"><a href="#尝试四:只删掉Dockerfile的chown语句,然后挂载-usr-share-elasticsearch-logs目录" class="headerlink" title="尝试四:只删掉Dockerfile的chown语句,然后挂载/usr/share/elasticsearch/logs目录"></a>尝试四:只删掉Dockerfile的chown语句,然后挂载/usr/share/elasticsearch/logs目录</h3><figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div></pre></td><td class="code"><pre><div class="line"># 构建修改后的Elasticsearch的Docker镜像</div><div class="line">$ docker build -t "birdben/elasticsearch:5.3.1" .</div><div class="line"></div><div class="line"># 运行Elasticsearch的Docker容器,并且挂载对应的data目录</div><div class="line">$ docker run -itd -v /Users/yunyu/workspace_git/birdDocker/elasticsearch/me/data:/usr/share/elasticsearch/data -v /Users/yunyu/workspace_git/birdDocker/elasticsearch/me/logs:/usr/share/elasticsearch/logs --name elasticsearch_me_5x birdben/elasticsearch:5.3.1</div><div class="line"></div><div class="line"># 查看Docker容器的日志,没有问题</div><div class="line">$ docker logs ca6eb8e80593</div><div class="line"></div><div class="line"># 查看Docker容器内/usr/share/elasticsearch目录的权限(因为这里是在docker-entrypoint.sh对data和logs进行chown,所以只有data和logs所属elasticsearch用户)</div><div class="line">root@ca6eb8e80593:/usr/share/elasticsearch# ls -lh</div><div class="line">total 224K</div><div class="line">-rw-r--r-- 1 root root 190K Apr 17 15:55 NOTICE.txt</div><div class="line">-rw-r--r-- 1 root root 9.4K Apr 17 15:55 README.textile</div><div class="line">drwxr-xr-x 2 root root 4.0K May 4 10:54 bin</div><div class="line">drwxr-xr-x 1 root root 4.0K May 4 10:54 config</div><div class="line">drwxr-xr-x 3 elasticsearch elasticsearch 102 May 4 10:58 data</div><div class="line">drwxr-xr-x 2 root root 4.0K May 4 10:54 lib</div><div class="line">drwxr-xr-x 3 elasticsearch elasticsearch 102 May 4 10:58 logs</div><div class="line">drwxr-xr-x 12 root root 4.0K May 4 10:54 modules</div><div class="line">drwxr-xr-x 2 root root 4.0K Apr 17 15:55 plugins</div><div class="line"></div><div class="line"># 查看Docker容器内/usr/share/elasticsearch/data目录的权限</div><div class="line">$ ls -lh /usr/share/elasticsearch/data/</div><div class="line">total 0</div><div class="line">drwxr-xr-x 3 root root 102 May 4 10:58 nodes</div><div class="line"></div><div class="line"># 查看Docker容器内/usr/share/elasticsearch/logs目录的权限</div><div class="line">$ ls -lh /usr/share/elasticsearch/logs/</div><div class="line">total 4.0K</div><div class="line">-rw-r--r-- 1 root root 3.9K May 4 10:59 elasticsearch.log</div><div class="line"></div><div class="line"># 查看Elasticsearch进程</div><div class="line">$ ps -ef | grep elasticsearch</div><div class="line">elastic+ 1 0 4 10:58 ? 00:00:21 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms2g -Xmx2g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/elasticsearch-5.3.1.jar:/usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch</div><div class="line">root 107 96 0 11:06 ? 00:00:00 grep elasticsearch</div></pre></td></tr></table></figure>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div></pre></td><td class="code"><pre><div class="line"># 新建文档</div><div class="line">$ curl -XPOST 'http://127.0.0.1:9200/user/1/1' -d '{"name":"birdben"}'</div><div class="line"></div><div class="line"># 查看索引文件</div><div class="line">$ ls -lh /usr/share/elasticsearch/data/nodes/0/indices/meAhtSJXRl-cKzoqQZifBQ/0/index/</div><div class="line">total 4.0K</div><div class="line">-rw-r--r-- 1 root root 130 May 4 11:07 segments_1</div><div class="line">-rw-r--r-- 1 root root 0 May 4 11:07 write.lock</div></pre></td></tr></table></figure>
<p>这里data和logs目录都属于elasticsearch用户,但是data和logs目录下的文件却都属于root用户,这是什么情况呢?</p>
<p>因为docker run运行容器的时候,没有指定-u或者–user参数,这样就默认使用root用户启动容器,而在卷中创建的文件和文件夹将具有与在容器中创建它们的用户(root用户)相同的uid:gid(数字)。 如果你在容器内添加一个用户,具有与容器相同的uid:gid,并将其作为该用户(elasticsearch用户)运行,就可以使在卷中创建的文件和文件夹将具有与在容器中创建它们的用户(elasticsearch用户)相同的uid:gid(数字)。</p>
<p>所以这里docker-entrypoint.sh中的chown也很重要,因为只有root用户启动容器(docker run -u root)的时候会执行chown操作,如果是使用elasticsearch用户启动容器(docker run -u elasticsearch)的时候就不会执行chown操作。所以此种情况需要在Dockerfile中先执行chown操作。</p>
<p>总结如下:</p>
<p>volume挂载的目录默认属于root用户,如果没有chown给其他用户的话,在Volume卷中创建的文件和文件夹将具有与在容器中创建它们的用户相同的uid:gid(数字)。</p>
<p>参考文章:</p>
<ul>
<li><a href="https://yq.aliyun.com/articles/53990" target="_blank" rel="external">https://yq.aliyun.com/articles/53990</a></li>
<li><a href="https://github.com/moby/moby/issues/3124">https://github.com/moby/moby/issues/3124</a></li>
</ul>
</div>
<div class="article-info article-info-index">
<div class="article-tag tagcloud">
<ul class="article-tag-list"><li class="article-tag-list-item"><a class="article-tag-list-link" href="/tags/Dockerfile/">Dockerfile</a></li><li class="article-tag-list-item"><a class="article-tag-list-link" href="/tags/Docker命令/">Docker命令</a></li></ul>
</div>
<div class="article-category tagcloud">
<a class="article-category-link" href="/categories/Docker/">Docker</a>
</div>
<div class="clearfix"></div>
</div>
</div>
</article>
<article id="post-Docker/Docker实战(二十七)Docker容器之间的通信" class="article article-type-post" itemscope itemprop="blogPost">
<div class="article-meta">
<a href="/2017/05/02/Docker/Docker实战(二十七)Docker容器之间的通信/" class="article-date">
<time datetime="2017-05-02T06:00:38.000Z" itemprop="datePublished">2017-05-02</time>
</a>
</div>
<div class="article-inner">
<input type="hidden" class="isFancy" />
<header class="article-header">
<h1 itemprop="name">
<a class="article-title" href="/2017/05/02/Docker/Docker实战(二十七)Docker容器之间的通信/">Docker实战(二十七)Docker容器之间的通信</a>
</h1>
</header>
<div class="article-entry" itemprop="articleBody">
<p>最近在修改我以前写的Docker镜像,才发现我一直都没有把Docker用好,连Docker的容器之前如何通信都不知道。之前的做法是把不同的环境安装在一个Docker容器中,就不存在容器间通信的问题。但是Docker推荐的用法是一个Docker容器只运行一个进程,所以我将以前写的Docker镜像进行了重构。下面来总结下Docker容器之间的通信。</p>
<h3 id="Docker的网络模式"><a href="#Docker的网络模式" class="headerlink" title="Docker的网络模式"></a>Docker的网络模式</h3><p>docker目前支持以下5种网络模式:</p>
<p>docker run 创建 Docker 容器时,可以用 –net 选项指定容器的网络模式。</p>
<ul>
<li>host模式 : 使用 –net=host 指定。与宿主机共享网络,此时容器没有使用网络的namespace,宿主机的所有设备,如Dbus会暴露到容器中,因此存在安全隐患。</li>
<li>container模式 : 使用 –net=container:NAME_or_ID 指定。指定与某个容器实例共享网络。</li>
<li>none模式 : 使用 –net=none 指定。不设置网络,相当于容器内没有配置网卡,用户可以手动配置。</li>
<li>bridge模式 : 使用 –net=bridge 指定,默认设置。此时docker引擎会创建一个veth对,一端连接到容器实例并命名为eth0,另一端连接到指定的网桥中(比如docker0),因此同在一个主机的容器实例由于连接在同一个网桥中,它们能够互相通信。容器创建时还会自动创建一条SNAT规则,用于容器与外部通信时。如果用户使用了-p或者-Pe端口端口,还会创建对应的端口映射规则。</li>
<li>自定义模式 : 使用自定义网络,可以使用docker network create创建,并且默认支持多种网络驱动,用户可以自由创建桥接网络或者overlay网络。</li>
</ul>
<p>默认是桥接模式,网络地址为172.17.0.0/16,同一主机的容器实例能够通信,但不能跨主机通信。</p>
<h4 id="host模式"><a href="#host模式" class="headerlink" title="host模式"></a>host模式</h4><p>如果启动容器的时候使用 host 模式,那么这个容器将不会获得一个独立的 Network Namespace,而是和宿主机共用一个 Network Namespace。容器将不会虚拟出自己的网卡,配置自己的 IP 等,而是使用宿主机的 IP 和端口。</p>
<h4 id="container模式"><a href="#container模式" class="headerlink" title="container模式"></a>container模式</h4><p>这个模式指定新创建的容器和已经存在的一个容器共享一个 Network Namespace,而不是和宿主机共享。新创建的容器不会创建自己的网卡,配置自己的 IP,而是和一个指定的容器共享 IP、端口范围等。同样,两个容器除了网络方面,其他的如文件系统、进程列表等还是隔离的。两个容器的进程可以通过 lo 网卡设备通信。</p>
<h4 id="none模式"><a href="#none模式" class="headerlink" title="none模式"></a>none模式</h4><p>这个模式和前两个不同。在这种模式下,Docker 容器拥有自己的 Network Namespace,但是,并不为 Docker容器进行任何网络配置。也就是说,这个 Docker 容器没有网卡、IP、路由等信息。需要我们自己为 Docker 容器添加网卡、配置 IP 等。</p>
<h4 id="bridge模式"><a href="#bridge模式" class="headerlink" title="bridge模式"></a>bridge模式</h4><p>bridge 模式是 Docker 默认的网络设置,此模式会为每一个容器分配 Network Namespace、设置 IP 等,并将一个主机上的 Docker 容器连接到一个虚拟网桥上。</p>
<p>当 Docker server 启动时,会在主机上创建一个名为 docker0 的虚拟网桥,此主机上启动的 Docker 容器会连接到这个虚拟网桥上。虚拟网桥的工作方式和物理交换机类似,这样主机上的所有容器就通过交换机连在了一个二层网络中。</p>
<p>接下来就要为容器分配 IP 了,Docker 会从 RFC1918 所定义的私有 IP 网段中,选择一个和宿主机不同的IP地址和子网分配给 docker0,连接到 docker0 的容器就从这个子网中选择一个未占用的 IP 使用。如一般 Docker 会使用 172.17.0.0/16 这个网段,并将 172.17.42.1/16 分配给 docker0 网桥(在主机上使用 ifconfig 命令是可以看到 docker0 的,可以认为它是网桥的管理接口,在宿主机上作为一块虚拟网卡使用)</p>
<p>当创建一个 Docker 容器的时候,同时会创建了一对 veth pair 接口(当数据包发送到一个接口时,另外一个接口也可以收到相同的数据包)。这对接口一端在容器内,即 eth0;另一端在本地并被挂载到 docker0 网桥,名称以 veth 开头(例如 vethAQI2QT)。通过这种方式,主机可以跟容器通信,容器之间也可以相互通信。Docker 就创建了在主机和所有容器之间一个虚拟共享网络。</p>
<p><img src="http://img.blog.csdn.net/20170502213752341?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvYmlyZGJlbg==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="Docker bridge模式"></p>
<h3 id="同主机不同容器之间通信"><a href="#同主机不同容器之间通信" class="headerlink" title="同主机不同容器之间通信"></a>同主机不同容器之间通信</h3><p>这里同主机不同容器之间通信主要使用Docker桥接(Bridge)模式。该bridge接口在本地一个单独的Docker宿主机上运行,并且它是我们后面提到的所有三种连接方式的背后机制。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div></pre></td><td class="code"><pre><div class="line">$ ifconfig docker0</div><div class="line">docker0 Link encap:Ethernet HWaddr 56:84:7a:fe:97:99 </div><div class="line"> inet addr:172.17.42.1 Bcast:0.0.0.0 Mask:255.255.0.0</div><div class="line"> UP BROADCAST MULTICAST MTU:1500 Metric:1</div><div class="line"> RX packets:0 errors:0 dropped:0 overruns:0 frame:0</div><div class="line"> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0</div><div class="line"> collisions:0 txqueuelen:0 </div><div class="line"> RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)</div></pre></td></tr></table></figure>
<h4 id="连接方式"><a href="#连接方式" class="headerlink" title="连接方式"></a>连接方式</h4><ul>
<li>方式一:可以通过使用容器的IP地址来通信。这种方式会导致IP地址的硬编码,不方便迁移,并且容器重启后IP地址可能会改变,除非使用固定的IP地址。</li>
<li>方式二:可以通过宿主机的IP加上容器暴露出的端口号来通信。这种方式比较单一,只能依靠监听在暴露出的端口的进程来进行有限的通信。</li>
<li>方式三:可以使用容器名,通过docker的link机制通信。这种方式通过docker的link机制可以通过一个name来和另一个容器通信,link机制方便了容器去发现其它的容器并且可以安全的传递一些连接信息给其它的容器。使用name给容器起一个别名,方便记忆和使用。即使容器重启了,地址发生了变化,不会影响两个容器之间的连接。</li>
</ul>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line"># 查看容器的内部IP</div><div class="line">$ docker inspect --format='{{.NetworkSettings.IPAddress}}' $CONTAINER_ID</div></pre></td></tr></table></figure>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div></pre></td><td class="code"><pre><div class="line"># Elasticsearch容器</div><div class="line">$ docker inspect --format='{{.NetworkSettings.IPAddress}}' 4d5e7a1058de</div><div class="line">172.17.0.2</div><div class="line"></div><div class="line"># Kibana容器</div><div class="line">$ docker inspect --format='{{.NetworkSettings.IPAddress}}' 4f26e64bfe82</div><div class="line">172.17.0.4</div></pre></td></tr></table></figure>
<p>方式一:使用容器的IP地址来通信</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div></pre></td><td class="code"><pre><div class="line"># 进入Kibana容器</div><div class="line">$ docker exec -it 4f26e64bfe82 /bin/bash</div><div class="line"></div><div class="line"># 在Kibana容器使用ES容器的IP地址来访问ES服务</div><div class="line">$ curl -XGET 'http://172.17.0.2:9200/_cat/health?pretty'</div><div class="line">1493707223 06:40:23 ben-es yellow 1 1 11 11 0 0 11 0 - 50.0%</div></pre></td></tr></table></figure>
<p>方式二:使用宿主机的IP加上容器暴露出的端口号来通信</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div></pre></td><td class="code"><pre><div class="line"># 进入Kibana容器</div><div class="line">$ docker exec -it 4f26e64bfe82 /bin/bash</div><div class="line"></div><div class="line"># 在Kibana容器使用宿主机的IP地址来访问ES服务(我这里本机的IP地址是10.10.1.129)</div><div class="line">$ curl -XGET 'http://10.10.1.129:9200/_cat/health?pretty'</div><div class="line">1493707223 06:40:23 ben-es yellow 1 1 11 11 0 0 11 0 - 50.0%</div></pre></td></tr></table></figure>
<p>方式三:使用docker的link机制通信</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div></pre></td><td class="code"><pre><div class="line"># 先启动ES容器,并且使用--name指定容器名称为:elasticsearch_2.x_yunyu</div><div class="line">$ docker run -itd -p 9200:9200 -p 9300:9300 --name elasticsearch_2.x_yunyu birdben/elasticsearch_2.x:v2</div><div class="line"></div><div class="line"># 启动Kibana容器,并且使用--link指定关联的容器名称为ES的容器名称:elasticsearch_2.x_yunyu</div><div class="line">$ docker run -itd -p 5601:5601 --link elasticsearch_2.x_yunyu --name kibana_4.x_yunyu birdben/kibana_4.x:v2</div><div class="line"></div><div class="line"># 查看运行的容器</div><div class="line">$ docker ps</div><div class="line">CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS</div><div class="line">4f26e64bfe82 birdben/kibana_4.x:v2 "docker-entrypoint..." 25 hours ago Up 15 minutes 0.0.0.0:5601->5601/tcp kibana_4.x_yunyu</div><div class="line">4d5e7a1058de birdben/elasticsearch_2.x:v2 "docker-entrypoint..." 26 hours ago Up 19 hours 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp elasticsearch_2.x_yunyu</div><div class="line"></div><div class="line"># 在Kibana容器使用--link的容器名称来访问ES服务</div><div class="line">$ curl -XGET 'http://elasticsearch_2.x_yunyu:9200/_cat/health?pretty'</div><div class="line">1493707223 06:40:23 ben-es yellow 1 1 11 11 0 0 11 0 - 50.0%</div></pre></td></tr></table></figure>
<p>实际上–link机制就是在Docker容器中的/etc/hosts文件中添加了一个ES容器的名称解析。有了这个名称解析后就可以不使用IP来和目标容器通信了,除此之外当目标容器重启,Docker会负责更新/etc/hosts文件,因此可以不用担心容器重启后IP地址发生了改变,解析无法生效的问题。</p>
<p>Kibana容器的/etc/hosts文件</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div></pre></td><td class="code"><pre><div class="line">127.0.0.1 localhost</div><div class="line">::1 localhost ip6-localhost ip6-loopback</div><div class="line">fe00::0 ip6-localnet</div><div class="line">ff00::0 ip6-mcastprefix</div><div class="line">ff02::1 ip6-allnodes</div><div class="line">ff02::2 ip6-allrouters</div><div class="line">172.17.0.2 4d5e7a1058de</div></pre></td></tr></table></figure>
<p>ES容器的/etc/hosts文件</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div></pre></td><td class="code"><pre><div class="line">127.0.0.1 localhost</div><div class="line">::1 localhost ip6-localhost ip6-loopback</div><div class="line">fe00::0 ip6-localnet</div><div class="line">ff00::0 ip6-mcastprefix</div><div class="line">ff02::1 ip6-allnodes</div><div class="line">ff02::2 ip6-allrouters</div><div class="line">172.17.0.2 elasticsearch_2.x_yunyu 4d5e7a1058de</div><div class="line">172.17.0.4 4f26e64bfe82</div></pre></td></tr></table></figure>
<p>当docker引入网络新特性后,link机制变的有些多余,但是为了兼容早期版本,–link机制在默认网络上的功能依旧没有发生变化,docker引入网络新特性后,内置了一个DNS Server,但是只有用户创建了自定义网络后,这个DNS Server才会起作用。</p>
<h3 id="跨主机不同容器之间通信"><a href="#跨主机不同容器之间通信" class="headerlink" title="跨主机不同容器之间通信"></a>跨主机不同容器之间通信</h3><p>(待续)</p>
<h3 id="使用DockerCompose"><a href="#使用DockerCompose" class="headerlink" title="使用DockerCompose"></a>使用DockerCompose</h3><p>(待续)</p>
<p>参考文章:</p>
<ul>
<li><a href="https://jiajially.gitbooks.io/dockerguide/content/chapter_network_pro/index.html" target="_blank" rel="external">https://jiajially.gitbooks.io/dockerguide/content/chapter_network_pro/index.html</a></li>
<li><a href="https://opskumu.gitbooks.io/docker/content/chapter6.html" target="_blank" rel="external">https://opskumu.gitbooks.io/docker/content/chapter6.html</a></li>
<li><a href="http://tonybai.com/2016/01/15/understanding-container-networking-on-single-host/" target="_blank" rel="external">http://tonybai.com/2016/01/15/understanding-container-networking-on-single-host/</a></li>
<li><a href="https://yq.aliyun.com/articles/55912" target="_blank" rel="external">https://yq.aliyun.com/articles/55912</a></li>
<li><a href="https://yq.aliyun.com/articles/30345?spm=5176.100239.blogcont40494.28.FnfzAV" target="_blank" rel="external">https://yq.aliyun.com/articles/30345?spm=5176.100239.blogcont40494.28.FnfzAV</a></li>
<li><a href="http://int32bit.me/2016/05/10/Docker%E5%AE%9E%E7%8E%B0%E8%B7%A8%E4%B8%BB%E6%9C%BA%E9%80%9A%E4%BF%A1/" target="_blank" rel="external">http://int32bit.me/2016/05/10/Docker%E5%AE%9E%E7%8E%B0%E8%B7%A8%E4%B8%BB%E6%9C%BA%E9%80%9A%E4%BF%A1/</a></li>
</ul>
</div>
<div class="article-info article-info-index">
<div class="article-tag tagcloud">
<ul class="article-tag-list"><li class="article-tag-list-item"><a class="article-tag-list-link" href="/tags/Dockerfile/">Dockerfile</a></li><li class="article-tag-list-item"><a class="article-tag-list-link" href="/tags/Docker命令/">Docker命令</a></li></ul>
</div>
<div class="article-category tagcloud">
<a class="article-category-link" href="/categories/Docker/">Docker</a>
</div>
<div class="clearfix"></div>
</div>
</div>
</article>
<article id="post-Shell/AWK学习(二)" class="article article-type-post" itemscope itemprop="blogPost">
<div class="article-meta">
<a href="/2017/03/11/Shell/AWK学习(二)/" class="article-date">
<time datetime="2017-03-11T06:09:05.000Z" itemprop="datePublished">2017-03-11</time>
</a>
</div>
<div class="article-inner">
<input type="hidden" class="isFancy" />
<header class="article-header">
<h1 itemprop="name">
<a class="article-title" href="/2017/03/11/Shell/AWK学习(二)/">AWK学习(二)</a>
</h1>
</header>
<div class="article-entry" itemprop="articleBody">
<h3 id="awk用法"><a href="#awk用法" class="headerlink" title="awk用法"></a>awk用法</h3><p>注意:使用awk标准版可以不必安装gawk,使用gawk扩展功能必须要先安装gawk</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line"># Ubuntu环境</div><div class="line">$ sudo apt-get install gawk</div><div class="line"></div><div class="line"># Mac环境</div><div class="line">$ brew install gawk</div></pre></td></tr></table></figure>
<h3 id="awk命令行格式"><a href="#awk命令行格式" class="headerlink" title="awk命令行格式"></a>awk命令行格式</h3><figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line"># 方式一:awk命令直接指定过滤规则</div><div class="line">awk [options] file ...</div><div class="line"></div><div class="line"># 方式二:指定awk的脚本文件,脚本文件内是指定的过滤规则</div><div class="line">awk [options] -f file ....</div></pre></td></tr></table></figure>
<p>方式一:awk命令直接指定过滤规则</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">awk 'BEGIN print{"HEAD1\tHEAD2\tHEAD3\tHEAD4\n"} {print} END print{"END1\tEND2\tEND3\tEND4\n"}' test.txt</div></pre></td></tr></table></figure>
<p>方式二:指定awk的脚本文件,脚本文件内是指定的过滤规则</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">awk -f command.awk marks.txt</div></pre></td></tr></table></figure>
<p>command.awk</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">BEGIN print{"HEAD1\tHEAD2\tHEAD3\tHEAD4\n"} {print} END print{"END1\tEND2\tEND3\tEND4\n"} test.txt</div></pre></td></tr></table></figure>
<h3 id="awk结构"><a href="#awk结构" class="headerlink" title="awk结构"></a>awk结构</h3><p>一个awk程序包含一系列的 模式 {动作指令} 或是函数定义。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div></pre></td><td class="code"><pre><div class="line"># 动作指令需要以{}引起来</div><div class="line">$ awk 'BEGIN {print "start"} {print} END {print "end"}' test.txt</div><div class="line"></div><div class="line"># BEGIN rule(s)</div><div class="line">BEGIN</div><div class="line">{</div><div class="line"> print "start"</div><div class="line">}</div><div class="line"></div><div class="line"># Rule(s)</div><div class="line">{</div><div class="line"> # $0是隐含参数,输出整行内容</div><div class="line"> print $0</div><div class="line">}</div><div class="line"></div><div class="line"># END rule(s)</div><div class="line">END</div><div class="line">{</div><div class="line"> print "end"</div><div class="line">}</div></pre></td></tr></table></figure>
<h3 id="awk原理"><a href="#awk原理" class="headerlink" title="awk原理"></a>awk原理</h3><p>1). awk逐行扫描文件,从第一行到最后一行,寻找匹配特定模式的行,并在这些行上进行你想要的操作。<br>2). awk基本结构包括模式匹配(用于找到要处理的行)和处理过程(即处理动作)。<br> pattern {action}</p>
<h1 id="提示:awk读取文件内容的每一行时,将对比改行是否与给定的模式相匹配,如果匹配则执行处理过程,否则对该行不做任何处理。"><a href="#提示:awk读取文件内容的每一行时,将对比改行是否与给定的模式相匹配,如果匹配则执行处理过程,否则对该行不做任何处理。" class="headerlink" title="提示:awk读取文件内容的每一行时,将对比改行是否与给定的模式相匹配,如果匹配则执行处理过程,否则对该行不做任何处理。"></a>提示:awk读取文件内容的每一行时,将对比改行是否与给定的模式相匹配,如果匹配则执行处理过程,否则对该行不做任何处理。</h1><p>如果没有指定处理脚本,则把匹配的行显示到标准输出,即默认处理动作是print打印行;<br>如果没有指定模式匹配,则默认匹配所有数据。<br>3). awk有两个特殊的模式:BEGIN和END,他们被放置在没有读取任何数据之前以及在所有数据读取完成以后执行。</p>
<h3 id="标准awk选项"><a href="#标准awk选项" class="headerlink" title="标准awk选项"></a>标准awk选项</h3><figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line"># -v : 该选项将一个值赋予一个变量,它会在程序开始之前进行赋值,可以通过--dump-variables[=file]输出出来</div><div class="line">$ awk -v bird=birdben 'BEGIN {print "bird=" bird}'</div><div class="line">bird=birdben</div></pre></td></tr></table></figure>
<h3 id="标准awk内置变量"><a href="#标准awk内置变量" class="headerlink" title="标准awk内置变量"></a>标准awk内置变量</h3><figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div><div class="line">49</div><div class="line">50</div><div class="line">51</div><div class="line">52</div><div class="line">53</div><div class="line">54</div><div class="line">55</div><div class="line">56</div><div class="line">57</div><div class="line">58</div><div class="line">59</div><div class="line">60</div><div class="line">61</div><div class="line">62</div><div class="line">63</div><div class="line">64</div><div class="line">65</div><div class="line">66</div><div class="line">67</div><div class="line">68</div><div class="line">69</div><div class="line">70</div><div class="line">71</div><div class="line">72</div><div class="line">73</div><div class="line">74</div><div class="line">75</div><div class="line">76</div><div class="line">77</div><div class="line">78</div><div class="line">79</div><div class="line">80</div><div class="line">81</div><div class="line">82</div><div class="line">83</div><div class="line">84</div><div class="line">85</div><div class="line">86</div><div class="line">87</div><div class="line">88</div><div class="line">89</div><div class="line">90</div><div class="line">91</div><div class="line">92</div><div class="line">93</div><div class="line">94</div><div class="line">95</div><div class="line">96</div><div class="line">97</div><div class="line">98</div><div class="line">99</div><div class="line">100</div><div class="line">101</div><div class="line">102</div><div class="line">103</div><div class="line">104</div><div class="line">105</div><div class="line">106</div><div class="line">107</div><div class="line">108</div><div class="line">109</div><div class="line">110</div><div class="line">111</div><div class="line">112</div><div class="line">113</div><div class="line">114</div><div class="line">115</div><div class="line">116</div><div class="line">117</div><div class="line">118</div><div class="line">119</div><div class="line">120</div><div class="line">121</div><div class="line">122</div><div class="line">123</div><div class="line">124</div><div class="line">125</div><div class="line">126</div><div class="line">127</div><div class="line">128</div><div class="line">129</div><div class="line">130</div><div class="line">131</div><div class="line">132</div><div class="line">133</div><div class="line">134</div><div class="line">135</div><div class="line">136</div><div class="line">137</div><div class="line">138</div><div class="line">139</div><div class="line">140</div><div class="line">141</div><div class="line">142</div><div class="line">143</div><div class="line">144</div><div class="line">145</div><div class="line">146</div><div class="line">147</div><div class="line">148</div><div class="line">149</div><div class="line">150</div><div class="line">151</div></pre></td><td class="code"><pre><div class="line"># ARGC : awk命令行参数个数</div><div class="line">$ awk 'BEGIN {print "ARGC=" ARGC}'</div><div class="line">ARGC=1</div><div class="line"></div><div class="line">$ awk 'BEGIN {print "ARGC=" ARGC}' test1 test2</div><div class="line">ARGC=3</div><div class="line"></div><div class="line"># ARGV : 命令行参数数组,存储命令行参数的数组,索引范围从0 - ARGC - 1。</div><div class="line">$ awk 'BEGIN {print "ARGV[0]=" ARGV[0]}'</div><div class="line">ARGV[0]=awk</div><div class="line"></div><div class="line">$ awk 'BEGIN {print "ARGV[1]=" ARGV[1] "\t" "ARGV[2]=" ARGV[2]}' test1 test2</div><div class="line">ARGV[1]=test1 ARGV[2]=test2</div><div class="line"></div><div class="line"># 循环输出ARGV数组中的参数值</div><div class="line">$ awk 'BEGIN {</div><div class="line"> for (i = 0; i <= ARGC - 1; i++) { </div><div class="line"> print "ARGV[" i "] = " ARGV[i] </div><div class="line"> printf "ARGV[%d] = %s\n", i, ARGV[i] </div><div class="line"> } </div><div class="line">}' test1 test2</div><div class="line">ARGV[0] = awk</div><div class="line">ARGV[0] = awk</div><div class="line">ARGV[1] = test1</div><div class="line">ARGV[1] = test1</div><div class="line">ARGV[2] = test2</div><div class="line">ARGV[2] = test2</div><div class="line"></div><div class="line"># 这里顺便说一下print和printf函数的区别</div><div class="line"># print函数是不格式化直接输出函数,默认自动换行</div><div class="line"># printf()函数是格式化输出函数,默认不会自动换行</div><div class="line"></div><div class="line"># 上面是printf()函数的简写方式,完整的写法应该如下</div><div class="line">$ awk 'BEGIN {</div><div class="line"> for (i = 0; i <= ARGC - 1; i++) { </div><div class="line"> print "ARGV[" i "] = " ARGV[i]</div><div class="line"> printf("ARGV[%d] = %s\n", i, ARGV[i]) </div><div class="line"> } </div><div class="line">}' test1 test2</div><div class="line">ARGV[0] = awk</div><div class="line">ARGV[0] = awk</div><div class="line">ARGV[1] = test1</div><div class="line">ARGV[1] = test1</div><div class="line">ARGV[2] = test2</div><div class="line">ARGV[2] = test2</div><div class="line"></div><div class="line"># CONVFMT : 此变量表示数据转换为字符串的格式,其默认值为 %.6g</div><div class="line">$ awk 'BEGIN { print "Conversion Format =" CONVFMT }'</div><div class="line">Conversion Format = %.6g</div><div class="line"></div><div class="line"># ENVIRON : 此变量是与环境变量相关的关联数组变量,以key-value的方式查看系统环境变量的值。</div><div class="line">$ awk 'BEGIN { print ENVIRON["JAVA_HOME"] }'</div><div class="line">/Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home</div><div class="line"></div><div class="line"># FILENAME : 此变量表示当前文件名称。</div><div class="line"># 注意:这里一定要是END才能读取到文件名,因为在BEGIN开始快还没有开始读取文件test.txt的内容,也就是FILENAME是未定义的。</div><div class="line">$ awk 'END {print "FILENAME = " FILENAME}' test.txt</div><div class="line">FILENAME = test.txt</div><div class="line"></div><div class="line"># FS : 此变量表示输入的数据域之间的分隔符,其默认值是空格。你可以使用 -F 命令行选项改变它的默认值。</div><div class="line">$ awk 'BEGIN {print "FS = " FS}' | cat -vte</div><div class="line">FS = $</div><div class="line"></div><div class="line"># NF : 此变量表示当前输入记录中域的数量。(简单理解,域:当前行用分隔符分开数据的就是数据域,如下面的例子,One,Two,Three都是数据域)</div><div class="line"># 输出每一行的数据域数量</div><div class="line">$ echo -e "One Two\nOne Two Three\nOne Two Three Four" | awk '{print "NF = " NF}'</div><div class="line">NF = 2</div><div class="line">NF = 3</div><div class="line">NF = 4</div><div class="line"></div><div class="line"># 输出每一行的数据域数量大于2的</div><div class="line">$ echo -e "One Two\nOne Two Three\nOne Two Three Four" | awk 'NF > 2'</div><div class="line">$ echo -e "One Two\nOne Two Three\nOne Two Three Four" | awk 'NF > 2 {print}'</div><div class="line">One Two Three</div><div class="line">One Two Three Four</div><div class="line"></div><div class="line"># NR : 此变量表示当前记录的数量。</div><div class="line"># 输出每一行的当前记录数量,也就是当前行的游标</div><div class="line">$ echo -e "One Two\nOne Two Three\nOne Two Three Four" | awk '{print "NR = " NR}'</div><div class="line">NR = 1</div><div class="line">NR = 2</div><div class="line">NR = 3</div><div class="line"></div><div class="line"># 输出每一行的当前记录的游标大于2的</div><div class="line">$ echo -e "One Two\nOne Two Three\nOne Two Three Four" | awk 'NR > 2 {print}'</div><div class="line">One Two Three Four</div><div class="line"></div><div class="line"># FNR : 该变量与 NR 类似,不过它是相对于当前文件而言的。此变量在处理多个文件输入时有重要的作用。每当从新的文件中读入时 FNR 都会被重新设置为 0。</div><div class="line">$ awk '{print "FNR = " FNR "\t" "NR = " NR}' test.txt test1.txt</div><div class="line">FNR = 1 NR = 1</div><div class="line">FNR = 2 NR = 2</div><div class="line">FNR = 3 NR = 3</div><div class="line">FNR = 4 NR = 4</div><div class="line">FNR = 5 NR = 5</div><div class="line">FNR = 1 NR = 6</div><div class="line">FNR = 2 NR = 7</div><div class="line">FNR = 3 NR = 8</div><div class="line">FNR = 4 NR = 9</div><div class="line">FNR = 5 NR = 10</div><div class="line"></div><div class="line"># OFMT : 此变量表示数值输出的格式,它的默认值为 %.6g。</div><div class="line">$ awk 'BEGIN {print "OFMT = " OFMT}'</div><div class="line">OFMT = %.6g</div><div class="line"></div><div class="line"># OFS : 此变量表示输出域之间的分割符,其默认为空格。</div><div class="line">$ awk 'BEGIN {print "OFS = " OFS}' | cat -vte</div><div class="line"></div><div class="line"># 这里^I就是我们test.txt的分隔符:制表符\t</div><div class="line">$ awk 'BEGIN {print "OFS = " OFS} {print}' test.txt | cat -vte</div><div class="line">OFS = $</div><div class="line">1^Ibirdben^I^Ibejing^I^I28$</div><div class="line">2^Ierhuo^I^Ishanghai^I30$</div><div class="line">3^Izhangsan^Ishanghai^I20$</div><div class="line">4^Ilisi^I^Ishenzhen^I25$</div><div class="line">5^Iwangwu^I^Ibeijing^I^I28$</div><div class="line"></div><div class="line"># ORS : 此变量表示输出记录(行)之间的分割符,其默认值是换行符。</div><div class="line">$ awk 'BEGIN {print "ORS = " ORS}' | cat -vte</div><div class="line">ORS = $</div><div class="line">$</div><div class="line"></div><div class="line"># RLENGTH : 此变量表示 match 函数匹配的字符串长度。AWK 的 match 函数用于在输入的字符串中搜索指定字符串。</div><div class="line">$ awk 'BEGIN { if (match("One Two Three", "re")) { print RLENGTH } }'</div><div class="line"></div><div class="line"># RS : 此变量表示输入记录的分割符,其默认值为换行符。</div><div class="line">$ awk 'BEGIN {print "RS = " RS}' | cat -vte</div><div class="line">RS = $</div><div class="line">$</div><div class="line"></div><div class="line"># RSTART : 此变量表示由 match 函数匹配的字符串的第一个字符的位置。从1开始。</div><div class="line">$ awk 'BEGIN { if (match("One Two Three", "Thre")) { print RSTART } }'</div><div class="line"></div><div class="line"># SUBSEP : 此变量表示数组下标的分割行符,其默认值为 \034 。</div><div class="line">$ awk 'BEGIN { print "SUBSEP = " SUBSEP }' | cat -vte</div><div class="line">SUBSEP = ^\$</div><div class="line"></div><div class="line"># $0 : 此变量表示整个输入记录。</div><div class="line">$ awk '{print $0}' test.txt</div><div class="line">1 birdben bejing 28</div><div class="line">2 erhuo shanghai 30</div><div class="line">3 zhangsan shanghai 20</div><div class="line">4 lisi shenzhen 25</div><div class="line">5 wangwu beijing 28</div><div class="line"></div><div class="line"># $n : 此变量表示当前输入记录的第 n 个域,这些域之间由 FS 分割。</div><div class="line">$ awk '{print $1 "\t" $2}' test.txt</div><div class="line">1 birdben</div><div class="line">2 erhuo</div><div class="line">3 zhangsan</div><div class="line">4 lisi</div><div class="line">5 wangwu</div></pre></td></tr></table></figure>
<h3 id="gawk内置变量"><a href="#gawk内置变量" class="headerlink" title="gawk内置变量"></a>gawk内置变量</h3><figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div></pre></td><td class="code"><pre><div class="line"># ARGIND : 此变量表示当前文件中正在处理的 ARGV 数组的索引值。</div><div class="line">$ gawk '{</div><div class="line"> print "ARGIND = " ARGIND "\t" "FileName = " ARGV[ARGIND]</div><div class="line">}' test.txt test1.txt</div><div class="line">ARGIND = 1 FileName = test.txt</div><div class="line">ARGIND = 1 FileName = test.txt</div><div class="line">ARGIND = 1 FileName = test.txt</div><div class="line">ARGIND = 1 FileName = test.txt</div><div class="line">ARGIND = 1 FileName = test.txt</div><div class="line">ARGIND = 2 FileName = test1.txt</div><div class="line">ARGIND = 2 FileName = test1.txt</div><div class="line">ARGIND = 2 FileName = test1.txt</div><div class="line">ARGIND = 2 FileName = test1.txt</div><div class="line">ARGIND = 2 FileName = test1.txt</div><div class="line"></div><div class="line"># IGNORECASE : 当此变量被设置后,GAWK将变得大小写不敏感。</div><div class="line">$ gawk 'BEGIN{IGNORECASE=1} /BIRDBEN/' test.txt</div><div class="line">1 birdben bejing 28</div><div class="line"></div><div class="line"># LINT : 此变量提供了在 GAWK 程序中动态控制 --lint 选项的一种途径。当这个变量被设置后, GAWK 会输出 lint 警告信息。如果给此变量赋予字符值 fatal,lint 的所有警告信息将会变了致命错误信息(fatal errors)输出,这和 --lint=fatal 效果一样。</div><div class="line"># 设置LINT级别后,会检查awk语法并根据LINT设置的级别给出相应的提示信息</div><div class="line">$ gawk 'BEGIN {LINT=1; a}'</div><div class="line">gawk: cmd. line:1: warning: reference to uninitialized variable `a'</div><div class="line">gawk: cmd. line:1: warning: statement has no effect</div></pre></td></tr></table></figure>
<h3 id="gawk选项"><a href="#gawk选项" class="headerlink" title="gawk选项"></a>gawk选项</h3><figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div></pre></td><td class="code"><pre><div class="line"># --dump-variables[=file] : 该选项会输出排好序的全局变量列表和它们最终的值到文件中,默认的文件是 awkvars.out</div><div class="line">$ gawk -v bird=birdben --dump-variables=bird_var.out 'BEGIN {print "bird="bird}'</div><div class="line">bird=birdben</div><div class="line"></div><div class="line">$ cat bird_var.out</div><div class="line">ARGC: 1</div><div class="line">ARGIND: 0</div><div class="line">ARGV: array, 1 elements</div><div class="line">BINMODE: 0</div><div class="line">CONVFMT: "%.6g"</div><div class="line">ENVIRON: array, 36 elements</div><div class="line">ERRNO: ""</div><div class="line">FIELDWIDTHS: ""</div><div class="line">FILENAME: ""</div><div class="line">FNR: 0</div><div class="line">FPAT: "[^[:space:]]+"</div><div class="line">FS: " "</div><div class="line">FUNCTAB: array, 41 elements</div><div class="line">IGNORECASE: 0</div><div class="line">LINT: 0</div><div class="line">NF: 0</div><div class="line">NR: 0</div><div class="line">OFMT: "%.6g"</div><div class="line">OFS: " "</div><div class="line">ORS: "\n"</div><div class="line">PREC: 53</div><div class="line">PROCINFO: array, 31 elements</div><div class="line">RLENGTH: 0</div><div class="line">ROUNDMODE: "N"</div><div class="line">RS: "\n"</div><div class="line">RSTART: 0</div><div class="line">RT: ""</div><div class="line">SUBSEP: "\034"</div><div class="line">SYMTAB: array, 29 elements</div><div class="line">TEXTDOMAIN: "messages"</div><div class="line">bird: "birdben"</div><div class="line"></div><div class="line"># --profile[=file] : 该选项会输出一份格式化之后的程序到文件中,默认文件是 awkprof.out</div><div class="line">$ gawk --profile=bird_profile.out -v bird=birdben 'BEGIN {print "bird="bird}'</div><div class="line">bird=birdben</div><div class="line"></div><div class="line">$ cat bird_profile.out</div><div class="line"># gawk profile, created Sat Mar 4 13:25:49 2017</div><div class="line"># BEGIN rule(s)</div><div class="line">BEGIN {</div><div class="line"> 1 print "bird=" bird</div><div class="line">}</div></pre></td></tr></table></figure>
<h3 id="awk条件判断"><a href="#awk条件判断" class="headerlink" title="awk条件判断"></a>awk条件判断</h3><figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line"># if-else条件判断</div><div class="line">if (condition)</div><div class="line"> action-1</div><div class="line">else</div><div class="line"> action-2</div></pre></td></tr></table></figure>
<h3 id="awk循环用法"><a href="#awk循环用法" class="headerlink" title="awk循环用法"></a>awk循环用法</h3><figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div></pre></td><td class="code"><pre><div class="line"># for循环语法</div><div class="line">for (initialisation; condition; increment/decrement)</div><div class="line"> action</div><div class="line"></div><div class="line"># while循环语法</div><div class="line">while (condition)</div><div class="line"> action</div><div class="line"></div><div class="line"># do-while循环语法</div><div class="line">do</div><div class="line"> action</div><div class="line">while (condition)</div><div class="line"></div><div class="line"># Break : 用以结束循环过程。</div><div class="line"></div><div class="line"># Continue : 用于在循环体内部结束本次循环,从而直接进入下一次循环迭代。</div><div class="line"></div><div class="line"># Exit : 用于结束脚本程序的执行。</div><div class="line"></div><div class="line"># Next : 用于跳过你所提供的所有剩下的模式和表达式,直接处理下一个输入行,帮助你阻止运行命令执行过程中多余的步骤。一般配合if-else使用。</div></pre></td></tr></table></figure>
<h3 id="awk内置函数"><a href="#awk内置函数" class="headerlink" title="awk内置函数"></a>awk内置函数</h3><figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div></pre></td><td class="code"><pre><div class="line"># 字符串函数</div><div class="line">asort(arr [, d [, how] ]) : asort 函数使用 GAWK 值比较的一般规则排序 arr 中的内容,然后用以 1 开始的有序整数替换排序内容的索引。</div><div class="line">asorti(arr [, d [, how] ]) : asorti 函数的行为与 asort 函数的行为很相似,二者的差别在于 aosrt 对数组的值排序,而 asorti 对数组的索引排序。</div><div class="line">gsub(regex, sub, string) : gsub 是全局替换( global substitution )的缩写。它将出现的子串(sub)替换为 regx。第三个参数 string 是可选的,默认值为 $0,表示在整个输入记录中搜索子串。</div><div class="line">index(str, sub) : index 函数用于检测字符串 sub 是否是 str 的子串。如果 sub 是 str 的子串,则返回子串 sub 在字符串 str 的开始位置;若不是其子串,则返回 0。str 的字符位置索引从 1 开始计数。</div><div class="line">length(str) : length 函数返回字符串的长度。</div><div class="line">match(str, regex) : match 返回正则表达式在字符串 str 中第一个最长匹配的位置。如果匹配失败则返回0。</div><div class="line">split(str, arr, regex) : split 函数使用正则表达式 regex 分割字符串 str。分割后的所有结果存储在数组 arr 中。如果没有指定 regex 则使用 FS 切分。</div><div class="line">sprintf(format, expr-list) : sprintf 函数按指定的格式( format )将参数列表 expr-list 构造成字符串然后返回。</div><div class="line">strtonum(str) : strtonum 将字符串 str 转换为数值。 如果字符串以 0 开始,则将其当作十进制数;如果字符串以 0x 或 0X 开始,则将其当作十六进制数;否则,将其当作浮点数。</div><div class="line">sub(regex, sub, string) : sub 函数执行一次子串替换。它将第一次出现的子串用 regex 替换。第三个参数是可选的,默认为 $0。</div><div class="line">substr(str, start, l) : substr 函数返回 str 字符串中从第 start 个字符开始长度为 l 的子串。如果没有指定 l 的值,返回 str 从第 start 个字符开始的后缀子串。</div><div class="line">tolower(str) : 此函数将字符串 str 中所有大写字母转换为小写字母然后返回。注意,字符串 str 本身并不被改变。</div><div class="line">toupper(str) : 此函数将字符串 str 中所有小写字母转换为大写字母然后返回。注意,字符串 str 本身不被改变。</div><div class="line"></div><div class="line"># 时间函数</div><div class="line">systime : 此函数返回从 Epoch 以来到当前时间的秒数(在 POSIX 系统上,Epoch 为1970-01-01 00:00:00 UTC)。</div><div class="line">mktime(datespec) : 此函数将字符串 dataspec 转换为与 systime 返回值相似的时间戳。 dataspec 字符串的格式为 YYYY MM DD HH MM SS。</div><div class="line">strftime([format [, timestamp[, utc-flag]]]) : 此函数根据 format 指定的格式将时间戳 timestamp 格式化。</div></pre></td></tr></table></figure>
<h3 id="awk基本用法"><a href="#awk基本用法" class="headerlink" title="awk基本用法"></a>awk基本用法</h3><p>示例test.txt文件内容</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line">1 birdben bejing 28</div><div class="line">2 erhuo shanghai 30</div><div class="line">3 zhangsan shanghai 20</div><div class="line">4 lisi shenzhen 25</div><div class="line">5 wangwu beijing 28</div></pre></td></tr></table></figure>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div><div class="line">49</div><div class="line">50</div><div class="line">51</div><div class="line">52</div><div class="line">53</div></pre></td><td class="code"><pre><div class="line"># 输出test.txt文件中的内容</div><div class="line">$ awk '{print}' test.txt</div><div class="line">$ awk '{print $0}' test.txt</div><div class="line">1 birdben bejing 28</div><div class="line">2 erhuo shanghai 30</div><div class="line">3 zhangsan shanghai 20</div><div class="line">4 lisi shenzhen 25</div><div class="line">5 wangwu beijing 28</div><div class="line"></div><div class="line"># 输出test.txt文件中的指定列的内容</div><div class="line">$ awk '{print $2}' test.txt</div><div class="line">birdben</div><div class="line">erhuo</div><div class="line">zhangsan</div><div class="line">lisi</div><div class="line">wangwu</div><div class="line"></div><div class="line">$ awk '{print $1 "\t" $2}' test.txt</div><div class="line">1 birdben</div><div class="line">2 erhuo</div><div class="line">3 zhangsan</div><div class="line">4 lisi</div><div class="line">5 wangwu</div><div class="line"></div><div class="line"># 输出test.txt文件中匹配的内容,下面两种方式是等价的</div><div class="line">$ awk '/birdben/' test.txt</div><div class="line">$ awk '/birdben/ {print}' test.txt</div><div class="line">1 birdben bejing 28</div><div class="line"></div><div class="line"># 输出test.txt文件中匹配的指定列的内容</div><div class="line">$ awk '/birdben/ {print $1 "\t" $2}' test.txt</div><div class="line">1 birdben</div><div class="line"></div><div class="line"># 在最后输出test.txt文件中匹配的行数</div><div class="line">$ awk '/birdben/{++matchCount} END {print "matchCount="matchCount}' test.txt</div><div class="line">matchCount=1</div><div class="line"></div><div class="line"># 添加列头然后输出</div><div class="line">$ awk 'BEGIN {printf "No\tName\t\t\City\t\tAge\n"} {print}' test.txt</div><div class="line">$ awk 'BEGIN {print "No\tName\t\t\City\t\tAge"} {print}' test.txt</div><div class="line">No Name City Age</div><div class="line">1 birdben bejing 28</div><div class="line">2 erhuo shanghai 30</div><div class="line">3 zhangsan shanghai 20</div><div class="line">4 lisi shenzhen 25</div><div class="line">5 wangwu beijing 28</div><div class="line"></div><div class="line"># 输出字符超过20的内容</div><div class="line">$ awk 'length($0) > 20' test.txt</div><div class="line">$ awk 'length($0) > 20 {print}' test.txt</div><div class="line">1 birdben bejing 28</div><div class="line">3 zhangsan shanghai 20</div><div class="line">5 wangwu beijing 28</div></pre></td></tr></table></figure>
<p>参考文章:</p>
<ul>
<li><a href="http://wiki.jikexueyuan.com/project/awk/" target="_blank" rel="external">http://wiki.jikexueyuan.com/project/awk/</a></li>
</ul>
</div>
<div class="article-info article-info-index">
<div class="article-tag tagcloud">
<ul class="article-tag-list"><li class="article-tag-list-item"><a class="article-tag-list-link" href="/tags/AWK/">AWK</a></li></ul>
</div>
<div class="article-category tagcloud">
<a class="article-category-link" href="/categories/Shell/">Shell</a>
</div>
<div class="clearfix"></div>
</div>
</div>
</article>
<article id="post-Java/Python和Java服务器通信实现的理解和比较" class="article article-type-post" itemscope itemprop="blogPost">
<div class="article-meta">
<a href="/2017/03/05/Java/Python和Java服务器通信实现的理解和比较/" class="article-date">
<time datetime="2017-03-05T10:41:38.000Z" itemprop="datePublished">2017-03-05</time>
</a>
</div>
<div class="article-inner">
<input type="hidden" class="isFancy" />
<header class="article-header">
<h1 itemprop="name">
<a class="article-title" href="/2017/03/05/Java/Python和Java服务器通信实现的理解和比较/">Python和Java服务器通信实现的理解和比较</a>
</h1>
</header>
<div class="article-entry" itemprop="articleBody">
<h3 id="Python的WSGI和Java的Servlet-API"><a href="#Python的WSGI和Java的Servlet-API" class="headerlink" title="Python的WSGI和Java的Servlet API"></a>Python的WSGI和Java的Servlet API</h3><h4 id="Python的WSGI"><a href="#Python的WSGI" class="headerlink" title="Python的WSGI"></a>Python的WSGI</h4><p>最近在学习使用Python进行WebServer的编程,发现WSGI(Web Server Gateway Interface)的概念。PythonWeb服务器网关接口(Python Web Server Gateway Interface,缩写为WSGI)是Python应用程序或框架和Web服务器之间的一种接口,已经被广泛接受,它已基本达成它的可移植性方面的目标。WSGI 没有官方的实现,因为WSGI更像一个协议。只要遵照这些协议,WSGI应用(Application)都可以在任何服务器(Server)上运行,反之亦然。</p>
<p>如果没有WSGI,你选择的Python网络框架将会限制所能够使用的 Web 服务器。</p>
<p>这就意味着,你基本上只能使用能够正常运行的服务器与框架组合,而不能选择你希望使用的服务器或框架。</p>
<p>那么,你怎样确保可以在不修改 Web 服务器代码或网络框架代码的前提下,使用自己选择的服务器,并且匹配多个不同的网络框架呢?为了解决这个问题,就出现了PythonWeb 服务器网关接口(Web Server Gateway Interface,WSGI)。</p>
<p>WSGI的出现,让开发者可以将网络框架与 Web 服务器的选择分隔开来,不再相互限制。现在,你可以真正地将不同的 Web 服务器与网络开发框架进行混合搭配,选择满足自己需求的组合。例如,你可以使用Gunicorn或Nginx/uWSGI或Waitress服务器来运行Django、Flask或Pyramid应用。正是由于服务器和框架均支持WSGI,才真正得以实现二者之间的自由混合搭配。</p>
<h4 id="Java的Servlet-API"><a href="#Java的Servlet-API" class="headerlink" title="Java的Servlet API"></a>Java的Servlet API</h4><p>下面将类比Java来说明一下:</p>
<p>如果没有Java Servlet API,你选择的Java Web容器(Java Socket编程框架实现)将会限制所能够使用的Java Web框架(因为没有Java Servlet API,那么SpringMVC可能会实现一套SpringMVCHttpRequest和SpringMVCHttpResponse标准,Struts2可能会实现一套Struts2HttpRequest和Struts2HttpResponse标准,如果Tomcat只支持SpringMVC的API,那么选择Tomcat服务器就只能使用SpringMVC的Web框架来写服务端代码)。</p>
<p>这就意味着,你基本上只能使用能够正常运行的服务器(Tomcat)与框架(SpringMVC)组合,而不能选择你希望使用的服务器或框架(比如:我要换成Tomcat + Struts2的组合)。</p>
<p>注意:这里假设没有Java Servlet API,这样就相当于SpringMVC和Struts2可能都要自己实现一套Servlet封装HttpRequest和HttpResponse,这样从SpringMVC更换成Struts2就几乎需要重写服务器端的代码。为了解决这个问题,Java提出了Java Servlet API协议,让所有的Web服务框架都实现此Java Servlet API协议来和Java Web服务器(例如:Tomcat)交互,而复杂的网络连接控制等等都交由Java Web服务器来控制,Java Web服务器用Java Socket编程实现了复杂的网络连接管理。</p>
<h4 id="详细说说Python的WSGI"><a href="#详细说说Python的WSGI" class="headerlink" title="详细说说Python的WSGI"></a>详细说说Python的WSGI</h4><p>Python Web 开发中,服务端程序可以分为两个部分,一是服务器程序,二是应用程序。前者负责把客户端请求接收,整理,后者负责具体的逻辑处理。为了方便应用程序的开发,我们把常用的功能封装起来,成为各种Web开发框架,例如 Django, Flask, Tornado。不同的框架有不同的开发方式,但是无论如何,开发出的应用程序都要和服务器程序配合,才能为用户提供服务。这样,服务器程序就需要为不同的框架提供不同的支持。这样混乱的局面无论对于服务器还是框架,都是不好的。对服务器来说,需要支持各种不同框架,对框架来说,只有支持它的服务器才能被开发出的应用使用。</p>
<p>这时候,标准化就变得尤为重要。我们可以设立一个标准,只要服务器程序支持这个标准,框架也支持这个标准,那么他们就可以配合使用。一旦标准确定,双方各自实现。这样,服务器可以支持更多支持标准的框架,框架也可以使用更多支持标准的服务器。</p>
<ul>
<li>服务器端:</li>
</ul>
<p>服务器必须将可迭代对象的内容传递给客户端,可迭代对象会产生bytestrings,必须完全完成每个bytestring后才能请求下一个。</p>
<ul>
<li>应用程序:</li>
</ul>
<p>服务器程序会在每次客户端的请求传来时,调用我们写好的应用程序,并将处理好的结果返回给客户端。</p>
<p>总结:</p>
<ul>
<li>Web Server Gateway Interface是Python编写Web业务统一接口。</li>
<li>无论多么复杂的Web应用程序,入口都是一个WSGI处理函数。</li>
<li>Web应用程序就是写一个WSGI的处理函数,主要功能在于交互式地浏览和修改数据,生成动态Web内容,针对每个HTTP请求进行响应。</li>
</ul>
<h4 id="实现Python的Web应用程序能被访问的方式"><a href="#实现Python的Web应用程序能被访问的方式" class="headerlink" title="实现Python的Web应用程序能被访问的方式"></a>实现Python的Web应用程序能被访问的方式</h4><p>要使 Python 写的程序能在 Web 上被访问,还需要搭建一个支持 Python 的 HTTP 服务器(也就是实现了WSGI server(WSGI协议)的Http服务器)。有如下几种方式:</p>
<ul>
<li>可以自己使用Python Socket编程实现一个Http服务器</li>
<li>使用支持Python的开源的Http服务器(如:uWSGI,wsgiref,Mod_WSGI等等)。如果是使用Nginx,Apache,Lighttpd等Http服务器需要单独安装支持WSGI server的模块插件。</li>
<li>使用Python开源Web框架(如:Flask,Django等等)内置的Http服务器(Django自带的WSGI Server,一般测试使用)</li>
</ul>
<h5 id="Python标准库对WSGI的实现"><a href="#Python标准库对WSGI的实现" class="headerlink" title="Python标准库对WSGI的实现"></a>Python标准库对WSGI的实现</h5><p>wsgiref 是Python标准库给出的 WSGI 的参考实现。simple_server 这一模块实现了一个简单的 HTTP 服务器。</p>
<p>Python源码中的wsgiref的simple_server.py正好说明上面的分工情况,server的主要作用是接受client的请求,並把的收到的请求交給RequestHandlerClass处理,RequestHandlerClass处理完成后回传结果给client</p>
<h5 id="uWSGI服务器"><a href="#uWSGI服务器" class="headerlink" title="uWSGI服务器"></a>uWSGI服务器</h5><p>uWSGI是一个Web服务器,它实现了WSGI协议、uwsgi、http等协议。注意uwsgi是一种通信协议,而uWSGI是实现uwsgi协议和WSGI协议的Web服务器。</p>
<h5 id="Django框架内置的WSGI-Server服务器"><a href="#Django框架内置的WSGI-Server服务器" class="headerlink" title="Django框架内置的WSGI Server服务器"></a>Django框架内置的WSGI Server服务器</h5><p>Django的WSGIServer继承自wsgiref.simple_server.WSGIServer,而WSGIRequestHandler继承自wsgiref.simple_server.WSGIRequestHandler</p>
<p>之前说到的application,在Django中一般是django.core.handlers.wsgi.WSGIHandler对象,WSGIHandler继承自django.core.handlers.base.BaseHandler,这个是Django处理request的核心逻辑,它会创建一个WSGIRequest实例,而WSGIRequest是从http.HttpRequest继承而来</p>
<h4 id="Python和Java的类比"><a href="#Python和Java的类比" class="headerlink" title="Python和Java的类比"></a>Python和Java的类比</h4><h5 id="Python和Java的服务器结构"><a href="#Python和Java的服务器结构" class="headerlink" title="Python和Java的服务器结构"></a>Python和Java的服务器结构</h5><ul>
<li>独立WSGI server(实现了Http服务器功能) + Python Web应用程序<ul>
<li>例如:Gunicorn,uWSGI + Django,Flask</li>
</ul>
</li>
<li>独立Servlet引擎(Java应用服务器)(实现了Http服务器功能) + Java Web应用程序<ul>
<li>例如:Jetty,Tomcat + SpringMVC,Struts2</li>
</ul>
</li>
</ul>
<h5 id="Python和Java服务器共同点"><a href="#Python和Java服务器共同点" class="headerlink" title="Python和Java服务器共同点"></a>Python和Java服务器共同点</h5><ul>
<li><p>WSGI server(例如Gunicorn和uWSGI)</p>
<ul>
<li>WSGI server服务器内部都有组建来实现Socket连接的创建和管理。</li>
<li>WSGI server服务器都实现了Http服务器功能,能接受Http请求,并且通过Python Web应用程序处理之后返回动态Web内容。</li>
</ul>
</li>
<li><p>Java应用服务器(Jetty和Tomcat)</p>
<ul>
<li>Java应用服务器内部都有Connector组件来实现Socket连接的创建和管理。</li>
<li>Java应用服务器都实现了Http服务器功能,能接受Http请求,并且通过Java Web应用程序处理之后返回动态Web内容。</li>
</ul>
</li>
</ul>
<p>参考文章:</p>
<ul>
<li><a href="http://baike.baidu.com/link?url=xqzfWOAE6U5ZuymeXiaX6VPtfoiYWjtjcfa1uejqEdy0oXlgO8KCra3tu0vU-4k6m9L6BV9l9P4RSDXWBQOjTq" target="_blank" rel="external">http://baike.baidu.com/link?url=xqzfWOAE6U5ZuymeXiaX6VPtfoiYWjtjcfa1uejqEdy0oXlgO8KCra3tu0vU-4k6m9L6BV9l9P4RSDXWBQOjTq</a></li>
<li><a href="http://www.kaimingwan.com/post/python/zi-ji-dong-shou-kai-fa-ge-webfu-wu-qi-er" target="_blank" rel="external">http://www.kaimingwan.com/post/python/zi-ji-dong-shou-kai-fa-ge-webfu-wu-qi-er</a></li>
<li><a href="http://blog.csdn.net/on_1y/article/details/18803563" target="_blank" rel="external">http://blog.csdn.net/on_1y/article/details/18803563</a></li>
<li><a href="http://blog.csdn.net/on_1y/article/details/18818081" target="_blank" rel="external">http://blog.csdn.net/on_1y/article/details/18818081</a></li>
<li><a href="http://www.nowamagic.net/academy/detail/1330328" target="_blank" rel="external">http://www.nowamagic.net/academy/detail/1330328</a></li>
<li><a href="http://www.ibm.com/developerworks/cn/java/j-lo-tomcat1/" target="_blank" rel="external">http://www.ibm.com/developerworks/cn/java/j-lo-tomcat1/</a></li>
</ul>
</div>
<div class="article-info article-info-index">
<div class="article-tag tagcloud">
<ul class="article-tag-list"><li class="article-tag-list-item"><a class="article-tag-list-link" href="/tags/Java-Web,Socket,Python/">Java Web,Socket,Python</a></li></ul>
</div>
<div class="article-category tagcloud">
<a class="article-category-link" href="/categories/Java,Python/">Java,Python</a>
</div>
<div class="clearfix"></div>
</div>
</div>
</article>
<article id="post-Logstash/Logstash学习(八)Logstash的metrics告警使用" class="article article-type-post" itemscope itemprop="blogPost">
<div class="article-meta">
<a href="/2017/02/15/Logstash/Logstash学习(八)Logstash的metrics告警使用/" class="article-date">
<time datetime="2017-02-15T09:53:16.000Z" itemprop="datePublished">2017-02-15</time>
</a>
</div>
<div class="article-inner">
<input type="hidden" class="isFancy" />
<header class="article-header">
<h1 itemprop="name">
<a class="article-title" href="/2017/02/15/Logstash/Logstash学习(八)Logstash的metrics告警使用/">Logstash学习(八)Logstash的metrics告警使用</a>
</h1>
</header>
<div class="article-entry" itemprop="articleBody">
<p>最近为了提高系统的运行稳定性,在日志收集的过程中要求添加错误日志的告警,这里主要使用Logstash自带的metrics功能。Logstash可以在filter中根据某些字段进行日志的分类,如果某一类的日志出现次数不在正常范围,就会触发metrics event然后进行告警操作,这里我们只是使用简单的发邮件的告警方式。</p>
<p>Java的日志格式</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div></pre></td><td class="code"><pre><div class="line">2017-02-14 14:36:40 [ INFO] - com.yunyu.birdben.task.RiskTask -RiskTask.java(97) -我是日志信息</div><div class="line">2017-02-14 14:36:41 [ INFO] - com.yunyu.birdben.task.RiskTask -RiskTask.java(97) -我是日志信息</div><div class="line">2017-02-14 14:36:42 [ INFO] - com.yunyu.birdben.task.RiskTask -RiskTask.java(97) -我是日志信息</div><div class="line">2017-02-14 14:36:43 [ INFO] - com.yunyu.birdben.task.RiskTask -RiskTask.java(97) -我是日志信息</div><div class="line">2017-02-14 14:36:44 [ INFO] - com.yunyu.birdben.task.RiskTask -RiskTask.java(97) -我是日志信息</div><div class="line">2017-02-14 14:36:45 [ INFO] - com.yunyu.birdben.task.RiskTask -RiskTask.java(97) -我是日志信息</div><div class="line">2017-02-14 14:36:46 [ INFO] - com.yunyu.birdben.task.OtherTask -OtherTask.java(97) -我是日志信息</div><div class="line">2017-02-14 14:36:47 [ INFO] - com.yunyu.birdben.task.OtherTask -OtherTask.java(97) -我是日志信息</div><div class="line">2017-02-14 14:36:48 [ INFO] - com.yunyu.birdben.task.OtherTask -OtherTask.java(97) -我是日志信息</div><div class="line">2017-02-14 14:36:49 [ INFO] - com.yunyu.birdben.task.OtherTask -OtherTask.java(97) -我是日志信息</div><div class="line">2017-02-14 14:36:50 [ INFO] - com.yunyu.birdben.task.OtherTask -OtherTask.java(97) -我是日志信息</div><div class="line">2017-02-14 14:36:51 [ INFO] - com.yunyu.birdben.task.OtherTask -OtherTask.java(97) -我是日志信息</div><div class="line">2017-02-14 14:36:52 [ INFO] - com.yunyu.birdben.task.OtherTask -OtherTask.java(97) -我是日志信息</div><div class="line">2017-02-14 14:36:53 [ INFO] - com.yunyu.birdben.task.OtherTask -OtherTask.java(97) -我是日志信息</div></pre></td></tr></table></figure>
<p>Grok表达式</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">JAVA_TIMESTAMP %{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}</div><div class="line">JAVA_LOGS %{JAVA_TIMESTAMP:timestamp} \[ %{DATA:level}\] - %{DATA:class_name} -%{DATA:file_name}.java\(%{DATA:line}\) -%{GREEDYDATA:msg}</div></pre></td></tr></table></figure>
<p>日志中有两个文件RiskTask.java和OtherTask.java文件,我们的需求是5分钟内,如果RiskTask的日志一条都没有出现就发送告警邮件。</p>
<p>这里使用了三个新的插件metrics,ruby,email</p>
<ul>
<li>metrics : 用来定时统计和生成metrics event的</li>
<li>ruby : 使用ruby代码来定制metrics event失效的条件</li>
<li>email : 不需要多说,就是用来发送告警邮件的</li>
</ul>
<h4 id="metrics插件"><a href="#metrics插件" class="headerlink" title="metrics插件"></a>metrics插件</h4><p>默认情况下或根据flush_interval,每5秒刷新一次指标。 指标在事件流中显示为新事件,并执行发生在事件流以及输出之后的任何过滤器。</p>
<p>一般来说,您需要为指标添加标记,并让输出显式查找该标记。</p>
<p>被刷新的事件将以以下方式包括每个计量器和计时器度量:</p>
<ul>
<li>meter : 计量器度量</li>
</ul>
<p>meter => [ “event_%{field_name}” ]</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">"[event_%{field_name}] [count]" - 事件的总数</div><div class="line">"[event_%{field_name}] [rate_1m]" - 1分钟滑动窗口中的每秒事件率</div><div class="line">"[event_%{field_name}] [rate_5m]" - 5分钟滑动窗口中的每秒事件率</div><div class="line">"[event_%{field_name}] [rate_15m]" - 15分钟滑动窗口中的每秒事件率</div></pre></td></tr></table></figure>
<ul>
<li>timer : 计时器度量</li>
</ul>
<p>timer => [ “thing”, “%{duration}” ]</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div></pre></td><td class="code"><pre><div class="line">"[thing] [count]" - 事件的总数</div><div class="line">"[thing] [rate_1m]" - 1分钟滑动窗口中的每秒事件率</div><div class="line">"[thing] [rate_5m]" - 5分钟滑动窗口中的每秒事件率</div><div class="line">"[thing] [rate_15m]" - 15分钟滑动窗口中的每秒事件率</div><div class="line">"[thing] [min]" - 此指标的最小值</div><div class="line">"[thing] [max]" - 此指标的最大值</div><div class="line">"[thing] [stddev]" - 此指标的标准差</div><div class="line">"[thing] [mean]" - 这个指标的平均值</div><div class="line">"[thing] [pXX]" - 此指标的第XX个百分位数(请参阅百分位数)</div></pre></td></tr></table></figure>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div></pre></td><td class="code"><pre><div class="line">metrics {</div><div class="line"> # 定义metrics计数器数据保存的字段名 field_name的值就是上面Grok表达式解析出来的字段名</div><div class="line"> meter => [ "event_%{field_name}" ]</div><div class="line"> # 给该metrics添加tag标签,用于区分metrics</div><div class="line"> add_tag => [ "metric" ]</div><div class="line"> # 每隔5分钟统计一次(测试环境可以适当改小)</div><div class="line"> flush_interval => 300</div><div class="line"> # 每隔5分钟(flush_interval + 1秒)清空计数器(测试环境可以适当改小)</div><div class="line"> clear_interval => 301</div><div class="line"> # 10秒内的message数据才统计,避免延迟</div><div class="line"> ignore_older_than => 10</div><div class="line">}</div></pre></td></tr></table></figure>
<h4 id="ruby插件"><a href="#ruby插件" class="headerlink" title="ruby插件"></a>ruby插件</h4><figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div></pre></td><td class="code"><pre><div class="line">if "metric" in [tags] {</div><div class="line"> ruby {</div><div class="line"> # code是定义metrics的过滤规则,满足什么条件删除metric event日志</div><div class="line"> # 如果code为空,就是metric event不会被cancel,那么最终metric event会output到elasticsearch/stdout/email,如果不想每个metric event都触发告警事件,就只能通过ruby插件的code添加ruby代码来控制metric event的取消条件</div><div class="line"> # code => ""</div><div class="line"> # 如果status_code是500的日志count小于100条,就忽略此事件(即不发送任何消息)。</div><div class="line"> code => "event.cancel if event['event_500']['count'] < 100"</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<h4 id="email插件"><a href="#email插件" class="headerlink" title="email插件"></a>email插件</h4><figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div></pre></td><td class="code"><pre><div class="line"># 测试环境建议注释掉邮件发送,否则邮箱容易爆炸</div><div class="line">email {</div><div class="line"> # stmp服务器地址</div><div class="line"> address => "smtpdm.aliyun.com"</div><div class="line"> # 发件人邮箱地址</div><div class="line"> username => "service@post.XXX.com"</div><div class="line"> # 发件人邮箱密码</div><div class="line"> password => "123456"</div><div class="line"> # 发件人邮箱</div><div class="line"> from => "service@post.XXX.com"</div><div class="line"> # 收件人邮箱</div><div class="line"> to => "birdben@XXX.com"</div><div class="line"> # 邮件主题</div><div class="line"> subject => "告警:风控任务未执行"</div><div class="line"> # 邮件内容</div><div class="line"> htmlbody => "告警内容:com.yunyu.birdben.task.RiskTask没有执行"</div><div class="line">}</div></pre></td></tr></table></figure>
<p>总结一下我所理解的metrics原理:</p>
<p>配置文件定义好metrics之后,Logstash每隔flush<em>interval设置的时间就会自动创建一个metrics event,可以把metrics event理解成是Logstash自己创建的一条新的日志,这条新的日志有个名称是event</em>%{field_name}的字段(可能是event_A,event_B,field<em>name根据Grok表达式解析出来的结果确定的),event</em>%{field_name}的字段下有四个字段</p>
<ul>
<li>“[event_%{field_name}] [count]” - 事件的总数</li>
<li>“[event_%{field_name}] [rate_1m]” - 1分钟滑动窗口中的每秒事件率</li>
<li>“[event_%{field_name}] [rate_5m]” - 5分钟滑动窗口中的每秒事件率</li>
<li>“[event_%{field_name}] [rate_15m]” - 15分钟滑动窗口中的每秒事件率</li>
</ul>
<p>我们可以根据count(事件的总数)的值,来统计每隔flush<em>interval时间,我们要统计的event</em>%{field_name}日志的数量。举个例子,如果field_name是status_code,那我们要统计的日志就是event_200,event_302,event_400,event_500等等。那么,event_200.count就是每隔flush_interval时间内,stats_code是200的事件个数,其他同理。如果metrics event被保存到ES索引中,那么查看到的ES结果就会类似下面的结构。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div></pre></td><td class="code"><pre><div class="line">"_source": {</div><div class="line"></div><div class="line"> "@version": "1",</div><div class="line"> "@timestamp": "2017-02-15T11:06:37.402Z",</div><div class="line"> "message": "hadoop1",</div><div class="line"> "evnet_500": {</div><div class="line"> "count": 17,</div><div class="line"> "rate_1m": 3.4,</div><div class="line"> "rate_5m": 3.4,</div><div class="line"> "rate_15m": 3.4</div><div class="line"> },</div><div class="line"> "evnet_302": {</div><div class="line"> "count": 1074,</div><div class="line"> "rate_1m": 197.62554026237865,</div><div class="line"> "rate_5m": 211.24966828088344,</div><div class="line"> "rate_15m": 213.60997535145182</div><div class="line"> },</div><div class="line"> "event_200": {</div><div class="line"> "count": 10483,</div><div class="line"> "rate_1m": 982.4,</div><div class="line"> "rate_5m": 982.4,</div><div class="line"> "rate_15m": 982.4</div><div class="line"> },</div><div class="line"> "tags": [</div><div class="line"> "metric"</div><div class="line"> ]</div><div class="line"></div><div class="line">}</div></pre></td></tr></table></figure>
<p>这里给metrics event添加了一个metric标签,这样方便与其他业务日志区分开,在后续的ruby处理,email发送邮件,存储ES时,都使用了tag中是否包含metric标签来判断,该日志是否为metrics event。如果是metrics event我们才进行ruby处理,进行event的条件过滤。如果是metrics event我们才发送邮件,并且不保存到ES索引中。</p>
<p>这里我建议使用event.count来作为判断依据,而不是使用rate。因为count更适合用于是判断日志的收集数量,而rate更适合用于判断日志的收集速率。</p>
<p>参考文章:</p>
<ul>
<li><a href="https://www.elastic.co/guide/en/logstash/2.3/plugins-filters-metrics.html#plugins-filters-metrics-clear_interval" target="_blank" rel="external">https://www.elastic.co/guide/en/logstash/2.3/plugins-filters-metrics.html#plugins-filters-metrics-clear_interval</a></li>
<li><a href="https://www.elastic.co/guide/en/logstash/2.3/plugins-outputs-email.html#plugins-outputs-email-address" target="_blank" rel="external">https://www.elastic.co/guide/en/logstash/2.3/plugins-outputs-email.html#plugins-outputs-email-address</a></li>
<li><a href="https://www.elastic.co/guide/en/logstash/2.3/plugins-filters-ruby.html#plugins-filters-ruby-code" target="_blank" rel="external">https://www.elastic.co/guide/en/logstash/2.3/plugins-filters-ruby.html#plugins-filters-ruby-code</a></li>
<li><a href="http://chenlinux.com/2013/07/11/howto-filter-count-in-logstash/" target="_blank" rel="external">http://chenlinux.com/2013/07/11/howto-filter-count-in-logstash/</a></li>
<li><a href="http://xiaorui.cc/2015/04/16/使用kibana4的新功能metric做数据聚合/" target="_blank" rel="external">http://xiaorui.cc/2015/04/16/使用kibana4的新功能metric做数据聚合/</a></li>
<li><a href="https://developer.rackspace.com/blog/using-logstash-to-push-metrics-to-graphite/" target="_blank" rel="external">https://developer.rackspace.com/blog/using-logstash-to-push-metrics-to-graphite/</a></li>
</ul>
</div>
<div class="article-info article-info-index">
<div class="article-tag tagcloud">
<ul class="article-tag-list"><li class="article-tag-list-item"><a class="article-tag-list-link" href="/tags/Logstash/">Logstash</a></li></ul>
</div>
<div class="article-category tagcloud">
<a class="article-category-link" href="/categories/Log/">Log</a>
</div>
<div class="clearfix"></div>
</div>
</div>
</article>
<article id="post-Git/Git思维导图(转)" class="article article-type-post" itemscope itemprop="blogPost">
<div class="article-meta">
<a href="/2017/02/13/Git/Git思维导图(转)/" class="article-date">
<time datetime="2017-02-13T08:11:33.000Z" itemprop="datePublished">2017-02-13</time>
</a>
</div>
<div class="article-inner">
<input type="hidden" class="isFancy" />