Skip to content

Commit

Permalink
Merge pull request #10 from gaochao1/pr/9
Browse files Browse the repository at this point in the history
v3.20
  • Loading branch information
freedomkk-qfeng committed Mar 15, 2016
2 parents b654e2d + 172df7d commit daa0c10
Show file tree
Hide file tree
Showing 10 changed files with 124 additions and 44 deletions.
30 changes: 30 additions & 0 deletions ChangeLog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Changelog #
## 3.2.0 ##
#### 新功能 ####
1. 增加接口广播包数量的采集
* IfHCInBroadcastPkts
* IfHCOutBroadcastPkts

2. 增加接口组播包数量的采集
* IfHCInMulticastPkts
* IfHCOutMulticastPkts

3. 增加接口状态的采集
* IfOperStatus(1 up, 2 down, 3 testing, 4 unknown, 5 dormant, 6 notPresent, 7 lowerLayerDown)

4. 内置了更多交换机型号的 CPU, 内存的 OID 和计算方式。(锐捷,Juniper, 华为, 华三的一些型号等)

PS: 虽然 if 采集是并发的,不过采集项开的太多还是可能会影响 snmp 的采集效率,尤其是华为等 snmp 返回比较慢的交换机…………故谨慎选择,按需开启。

#### 改进 ####
1. 解决了 if 采集乱序的问题,现在即便使用 gosnmp 采集返回乱序也可以正确处理了。已测试过的华为型号现在均使用 gosnmp 采集。(v5.13,v5.70,v3.10)
2. 现在 log 中 打印 panic 信息的时候,应该会带上具体的 ip 地址了。
3. 现在默认采集 bit 单位的网卡流量了。
4. 去掉了默认配置文件里的 hostname 和 ip 选项,以免产生歧义,反正也没什么用…………
5. 修改默认 http 端口为 1989,避免和 agent 的端口冲突。

PS: func/swifstat.go 151行的注释代码,会在 debug 模式下打印具体的 ifstat 输出。如果交换机采集数据出现不准确的情况,可开启这段代码来进行排查。

#### bug修复 ####
1. 修复了在并发 ping 的情况下,即便 ip 地址不通,也有小概率 ping 通地址的 bug。(很神奇是不是……反正在我这里有出现这现象。。。)。方案是替换为 [go-fastping](https://github.com/tatsushid/go-fastping) 来做 ping 探测,通过 fastPingMode 配置选项开启。
2. 修复了思科 ASA-5585 9.1 和 9.2 两个版本 cpu, memory 的 oid 不一致带来的采集问题。(这坑爹玩意!)。现在应该可以根据他的版本号来选择不同的 oid 进行采集了。
33 changes: 24 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,18 @@

* CPU利用率
* 内存利用率
* Ping延时
* Ping延时(正常返回延时,超时返回 -1,可以用于存活告警)
* 网络连接数(监控防火墙时,目前仅支持Cisco ASA)
* IfHCInOctets
* IfHCOutOctets
* IfHCInUcastPkts
* IfHCOutUcastPkts
* IfHCInBroadcastPkts
* IfHCOutBroadcastPkts
* IfHCInMulticastPkts
* IfHCOutMulticastPkts
* IfOperStatus(接口状态,1 up, 2 down, 3 testing, 4 unknown, 5 dormant, 6 notPresent, 7 lowerLayerDown)


CPU和内存的OID私有,根据设备厂家和OS版本可能不同。目前测试过的设备:

Expand All @@ -27,7 +33,13 @@ CPU和内存的OID私有,根据设备厂家和OS版本可能不同。目前测
* Cisco ASA (Version 9)
* Ruijie 10G Routing Switch
* Huawei VRP(Version 8)
* Huawei VRP(Version 5.20)
* Huawei VRP(Version 5.120)
* Huawei VRP(Version 5.130)
* Huawei VRP(Version 5.70)
* Juniper JUNOS(Version 10)
* H3C(Version 5)
* H3C(Version 5.20)
* H3C(Version 7)

##源码安装
Expand All @@ -48,12 +60,11 @@ swcollector需要部署到有交换机SNMP访问权限的服务器上。

使用Go原生的ICMP协议进行Ping探测,swcollector需要root权限运行。

Huawei交换机使用Go原生SNMP协议会报超时或乱序错误。暂时解决方法是SNMP接口流量查询前先判断设备型号,对Huawei设备,调用snmpwalk命令进行数据收集。
Cisco IOS XR使用源生SNMP也有些问题,亦采用snmpwalk来解决问题
因此如果监控以上型号的交换机,需要在监控探针服务器上安装snmpwalk命令
部分交换机使用Go原生SNMP协议会超时。暂时解决方法是SNMP接口流量查询前先判断设备型号,对部分此类设备,调用snmpwalk命令进行数据收集。(一些华为设备和思科的IOS XR)
因此最好在监控探针服务器上也装个snmpwalk命令


##配置说明
#配置说明

配置文件请参照cfg.example.json,修改该文件名为cfg.json,将该文件里的IP换成实际使用的IP。

Expand All @@ -67,13 +78,17 @@ switch配置项说明:
"172.16.114.233"
],
"pingTimeout":300, #Ping超时时间,单位毫秒
"pingRetry":3, #Ping探测重试次数
"pingRetry":4, #Ping探测重试次数
"community":"public", #SNMP认证字符串
"snmpTimeout":1000, #SNMP超时时间,单位毫秒
"snmpTimeout":2000, #SNMP超时时间,单位毫秒
"snmpRetry":5, #SNMP重试次数
"ignoreIface": ["Nu","NU","Vlan","Vl"], #忽略的接口,如Nu匹配ifName为*Nu*的接口
"ignoreIface": ["Nu","NU","Vlan","Vl","LoopBack"], #忽略的接口,如Nu匹配ifName为*Nu*的接口
"ignorePkt": true, #不采集IfHCInUcastPkts和IfHCOutUcastPkts
"displayByBit": false, #true时,上报的流量单位为bit,为false则单位为byte。
"ignoreBroadcastPkt": true, #不采集IfHCInBroadcastPkts和IfHCOutBroadcastPkts
"ignoreMulticastPkt": true, #不采集IfHCInMulticastPkts和IfHCOutMulticastPkts
"ignoreOperStatus": true, #不采集IfOperStatus
"displayByBit": true, #true时,上报的流量单位为bit,为false则单位为byte。
"fastPingMode": false, #是否开启 fastPing 模式,开启 Ping 的效率更高,并能解决高并发时,会有小概率 ping 通宕机的交换机地址的情况。但 fastPing 可能被防火墙过滤。
"limitConcur": 1000 #限制SNMP请求并发数
}

Expand Down
12 changes: 7 additions & 5 deletions cfg.example.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
{
"debug": true,
"hostname": "",
"ip": "",
"switch":{
"enabled": true,
"ipRange":[
Expand All @@ -10,13 +8,17 @@
"172.16.114.233"
],
"pingTimeout":300,
"pingRetry":3,
"pingRetry":4,
"community":"public",
"snmpTimeout":1000,
"snmpRetry":5,
"ignoreIface": ["Nu","NU","Vlan","Vl"],
"ignorePkt": true,
"displayByBit": false,
"ignoreBroadcastPkt": true,
"ignoreMulticastPkt": true,
"ignoreOperStatus": true,
"displayByBit": true,
"fastPingMode": false,
"limitConcur": 1000
},
"heartbeat": {
Expand All @@ -33,6 +35,6 @@
},
"http": {
"enabled": true,
"listen": ":1988"
"listen": ":1989"
}
}
5 changes: 3 additions & 2 deletions funcs/swconn.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,12 @@ import (
"github.com/gaochao1/swcollector/g"
"github.com/open-falcon/common/model"
"log"
"strings"
"time"
)

type SwConn struct {
Ip string
Ip string
ConnectionStat int
}

Expand All @@ -34,7 +35,7 @@ func ConnMetrics() (L []*model.MetricValue) {
func connMetrics(ip string, ch chan SwConn) {
var swConn SwConn
vendor, _ := sw.SysVendor(ip, community, snmpTimeout)
if vendor != "Cisco_ASA"{
if !strings.Contains(vendor, "Cisco_ASA") {
ch <- swConn
return
}
Expand Down
49 changes: 36 additions & 13 deletions funcs/swifstat.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,24 +25,34 @@ var (
pingTimeout int
pingRetry int

community string
snmpTimeout int
snmpRetry int

ignoreIface []string
ignorePkt bool
community string
snmpTimeout int
snmpRetry int
displayByBit bool
ignoreIface []string
ignorePkt bool
ignoreBroadcastPkt bool
ignoreMulticastPkt bool
ignoreOperStatus bool
fastPingMode bool
)

func initVariable() {
pingTimeout = g.Config().Switch.PingTimeout
fastPingMode = g.Config().Switch.FastPingMode
pingRetry = g.Config().Switch.PingRetry

community = g.Config().Switch.Community
snmpTimeout = g.Config().Switch.SnmpTimeout
snmpRetry = g.Config().Switch.SnmpRetry

displayByBit = g.Config().Switch.DisplayByBit

ignoreIface = g.Config().Switch.IgnoreIface
ignorePkt = g.Config().Switch.IgnorePkt
ignoreOperStatus = g.Config().Switch.IgnoreOperStatus
ignoreBroadcastPkt = g.Config().Switch.IgnoreBroadcastPkt
ignoreMulticastPkt = g.Config().Switch.IgnoreMulticastPkt
}

func AllSwitchIp() (allIp []string) {
Expand Down Expand Up @@ -101,16 +111,28 @@ func swIfMetrics() (L []*model.MetricValue) {
ifNameTag := "ifName=" + ifStat.IfName
ifIndexTag := "ifIndex=" + strconv.Itoa(ifStat.IfIndex)
ip := chIfStat.Ip
if g.Config().Switch.DisplayByBit == true {
if ignoreOperStatus == false {
L = append(L, GaugeValueIp(ifStat.TS, ip, "switch.if.OperStatus", ifStat.IfOperStatus, ifNameTag, ifIndexTag))
}
if ignoreBroadcastPkt == false {
L = append(L, CounterValueIp(ifStat.TS, ip, "switch.if.InBroadcastPkt", ifStat.IfHCInBroadcastPkts, ifNameTag, ifIndexTag))
L = append(L, CounterValueIp(ifStat.TS, ip, "switch.if.OutBroadcastPkt", ifStat.IfHCOutBroadcastPkts, ifNameTag, ifIndexTag))
}
if ignoreMulticastPkt == false {
L = append(L, CounterValueIp(ifStat.TS, ip, "switch.if.InMulticastPkt", ifStat.IfHCInMulticastPkts, ifNameTag, ifIndexTag))
L = append(L, CounterValueIp(ifStat.TS, ip, "switch.if.OutMulticastPkt", ifStat.IfHCOutMulticastPkts, ifNameTag, ifIndexTag))
}

if displayByBit == true {
L = append(L, CounterValueIp(ifStat.TS, ip, "switch.if.In", 8*ifStat.IfHCInOctets, ifNameTag, ifIndexTag))
L = append(L, CounterValueIp(ifStat.TS, ip, "switch.if.Out", 8*ifStat.IfHCOutOctets, ifNameTag, ifIndexTag))
}else{
} else {
L = append(L, CounterValueIp(ifStat.TS, ip, "switch.if.In", ifStat.IfHCInOctets, ifNameTag, ifIndexTag))
L = append(L, CounterValueIp(ifStat.TS, ip, "switch.if.Out", ifStat.IfHCOutOctets, ifNameTag, ifIndexTag))

}
//如果IgnorePkt为false,采集Pkt
if g.Config().Switch.IgnorePkt == false {
if ignorePkt == false {
L = append(L, CounterValueIp(ifStat.TS, ip, "switch.if.InPkts", ifStat.IfHCInUcastPkts, ifNameTag, ifIndexTag))
L = append(L, CounterValueIp(ifStat.TS, ip, "switch.if.OutPkts", ifStat.IfHCOutUcastPkts, ifNameTag, ifIndexTag))
}
Expand All @@ -126,6 +148,7 @@ func swIfMetrics() (L []*model.MetricValue) {
for i, v := range AliveIp {
log.Println("AliveIp:", i, v)
}
//log.Println(L)
}

return
Expand All @@ -134,7 +157,7 @@ func swIfMetrics() (L []*model.MetricValue) {
func pingCheck(ip string) bool {
var pingResult bool
for i := 0; i < pingRetry; i++ {
pingResult = sw.Ping(ip, pingTimeout)
pingResult = sw.Ping(ip, pingTimeout, fastPingMode)
if pingResult == true {
break
}
Expand Down Expand Up @@ -165,9 +188,9 @@ func coreSwIfMetrics(ip string, ch chan ChIfStat, limitCh chan bool) {

vendor, _ := sw.SysVendor(ip, community, snmpTimeout)
if vendor == "Huawei" || vendor == "Cisco_IOS_XR" {
ifList, err = sw.ListIfStatsSnmpWalk(ip, community, snmpTimeout*5, ignoreIface, snmpRetry, ignorePkt)
ifList, err = sw.ListIfStatsSnmpWalk(ip, community, snmpTimeout*5, ignoreIface, snmpRetry, ignorePkt, ignoreOperStatus, ignoreBroadcastPkt, ignoreMulticastPkt)
} else {
ifList, err = sw.ListIfStats(ip, community, snmpTimeout, ignoreIface, snmpRetry, ignorePkt)
ifList, err = sw.ListIfStats(ip, community, snmpTimeout, ignoreIface, snmpRetry, ignorePkt, ignoreOperStatus, ignoreBroadcastPkt, ignoreMulticastPkt)
}

if err != nil {
Expand Down
18 changes: 11 additions & 7 deletions funcs/swping.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,18 +33,22 @@ func PingMetrics() (L []*model.MetricValue) {

func pingMetrics(ip string, ch chan SwPing) {
var swPing SwPing
timeout := g.Config().Switch.PingTimeout * 4

rtt, err := sw.PingRtt(ip, timeout)
timeout := g.Config().Switch.PingTimeout * g.Config().Switch.PingRetry
fastPingMode := g.Config().Switch.FastPingMode
rtt, err := sw.PingRtt(ip, timeout, fastPingMode)
if err != nil {
log.Println(ip, err)
swPing.Ip = ip
swPing.Ping = -1
ch <- swPing
}else{
swPing.Ip = ip
swPing.Ping = rtt
ch <- swPing
return
}
if g.Config().Debug {
log.Println(ip, rtt)
}
swPing.Ip = ip
swPing.Ping = rtt
ch <- swPing
return

}
2 changes: 1 addition & 1 deletion funcs/swsystem.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ type SwSystem struct {
Cpu int `json:"cpu"`
Mem int `json:"mem"`
Ping string `json:"ping"`
Conn int `json:"Conn"`
Conn int `json:"Conn"`
}

func SwSystemInfo() (swList []SwSystem) {
Expand Down
12 changes: 8 additions & 4 deletions g/cfg.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,14 @@ type SwitchConfig struct {
SnmpTimeout int `json:"snmpTimeout"`
SnmpRetry int `json:"snmpRetry"`

IgnoreIface []string `json:"ignoreIface"`
IgnorePkt bool `json:"ignorePkt"`
DisplayByBit bool `json:"displayByBit"`
LimitConcur int `json:"limitConcur"`
IgnoreIface []string `json:"ignoreIface"`
IgnorePkt bool `json:"ignorePkt"`
IgnoreOperStatus bool `json:"ignoreOperStatus"`
IgnoreBroadcastPkt bool `json:"ignoreBroadcastPkt"`
IgnoreMulticastPkt bool `json:"ignoreMulticastPkt"`
DisplayByBit bool `json:"displayByBit"`
LimitConcur int `json:"limitConcur"`
FastPingMode bool `json:"fastPingMode"`
}

type HeartbeatConfig struct {
Expand Down
3 changes: 2 additions & 1 deletion g/const.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ import (
// 3.1.4: bugfix ignore configuration
// 3.1.5: more sw support, DisplayByBit cfg
// 3.1.6
// 3.2.0: more sw support, fix ping bug, add ifOperStatus,ifBroadcastPkt,ifMulticastPkt
const (
VERSION = "3.1.6"
VERSION = "3.2.0"
COLLECT_INTERVAL = time.Second
)
4 changes: 2 additions & 2 deletions public/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
<div class="nav-collapse">
<ul class="nav pull-right">
<li>
<a target="_blank" href="https://github.com/open-falcon">
<a target="_blank" href="https://github.com/gaochao1/swcollector">
<i class="lead icon-github-sign"></i>
<span class="lead">Contribute on GitHub</span>
</a>
Expand Down Expand Up @@ -134,7 +134,7 @@ <h3>
Patches, suggestions, and comments are welcome.
</div>
<div class="sfc-member">
Powered by <a href="http://ulricqin.com">UlricQin</a>
Powered by gaochao1 freedomkk-qfeng</a>
</div>
</footer>

Expand Down

0 comments on commit daa0c10

Please sign in to comment.