Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install and configure Datadog plugin on ci.jenkins.io #3573

Closed
3 of 5 tasks
lemeurherve opened this issue May 11, 2023 · 20 comments
Closed
3 of 5 tasks

Install and configure Datadog plugin on ci.jenkins.io #3573

lemeurherve opened this issue May 11, 2023 · 20 comments

Comments

@lemeurherve
Copy link
Member

lemeurherve commented May 11, 2023

Service(s)

ci.jenkins.io, Datadog

Summary

Since #2844 the ELK stack has been removed, and the corresponding plugins have been removed from ci.jenkins.io

We should gather logs again from ci.jenkins.io, for that we have to:

Potential improvements

Related: #2769

@lemeurherve
Copy link
Member Author

The Datadog plugin is now installed and configured on ci.jenkins.io but nothing is shown on Datadog.

FTR, here is the result of datadog-agent status from ci.jenkins.io VM:

Details
root@ci:/# datadog-agent status
Getting the status from the agent.


===============
Agent (v7.44.0)
===============

  Status date: 2023-05-16 12:39:07.896 UTC (1684240747896)
  Agent start: 2023-05-08 08:27:29.656 UTC (1683534449656)
  Pid: 1286
  Go Version: go1.19.7
  Python Version: 3.8.16
  Build arch: amd64
  Agent flavor: agent
  Check Runners: 4
  Log File: /var/log/datadog/agent.log
  Log Level: info

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    NTP offset: 5.796ms
    System time: 2023-05-16 12:39:07.896 UTC (1684240747896)

  Host Info
  =========
    bootTime: 2023-05-08 08:27:20 UTC (1683534440000)
    hostId: 4b527c37-dea2-5e41-8cfd-51067ed3f9ac
    kernelArch: x86_64
    kernelVersion: 5.4.0-1107-azure
    os: linux
    platform: ubuntu
    platformFamily: debian
    platformVersion: 18.04
    procs: 185
    uptime: 13s
    virtualizationRole: host
    virtualizationSystem: kvm

  Hostnames
  =========
    host_aliases: [377c524b-a2de-415e-8cfd-51067ed3f9ac]
    hostname: ci.jenkins.io
    socket-fqdn: ci.internal.cloudapp.net.
    socket-hostname: ci
    hostname provider: configuration

  Metadata
  ========
    agent_version: 7.44.0
    cloud_provider: Azure
    config_apm_dd_url: 
    config_dd_url: 
    config_logs_dd_url: 
    config_logs_socks5_proxy_address: 
    config_no_proxy: []
    config_process_dd_url: 
    config_proxy_http: 
    config_proxy_https: 
    config_site: 
    feature_apm_enabled: false
    feature_cspm_enabled: false
    feature_cws_enabled: false
    feature_enable_http_stats_by_status_code: false
    feature_logs_enabled: true
    feature_networks_enabled: false
    feature_networks_http_enabled: false
    feature_networks_https_enabled: false
    feature_otlp_enabled: false
    feature_process_enabled: true
    feature_processes_container_enabled: false
    feature_usm_go_tls_enabled: false
    feature_usm_http2_enabled: false
    feature_usm_java_tls_enabled: false
    feature_usm_kafka_enabled: false
    flavor: agent
    hostname_source: configuration
    install_method_installer_version: datadog_module-3.20.0
    install_method_tool: puppet
    install_method_tool_version: puppet-6.28.0
    logs_transport: HTTP

=========
Collector
=========

  Running Checks
  ==============

    apache (4.2.0)
    --------------
      Instance ID: apache:18dab1b2ea051988 [ERROR]
      Configuration Source: file:/etc/datadog-agent/conf.d/apache.yaml
      Total Runs: 47,086
      Metric Samples: Last Run: 25, Total: 1,168,950
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 47,086
      Average Execution Time : 7ms
      Last Execution Date : 2023-05-16 12:38:55 UTC (1684240735000)
      Last Successful Execution Date : 2023-05-16 11:16:55 UTC (1684235815000)
      metadata:
        version.major: 2
        version.minor: 4
        version.patch: 29
        version.raw: 2.4.29
        version.scheme: semver
      Error: HTTPConnectionPool(host='localhost', port=80): Max retries exceeded with url: /server-status?auto (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f119811a880>: Failed to establish a new connection: [Errno 111] Connection refused'))
      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 174, in _new_conn
          conn = connection.create_connection(
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/connection.py", line 95, in create_connection
          raise err
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/connection.py", line 85, in create_connection
          sock.connect(sa)
      ConnectionRefusedError: [Errno 111] Connection refused

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
          httplib_response = self._make_request(
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py", line 398, in _make_request
          conn.request(method, url, **httplib_request_kw)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 244, in request
          super(HTTPConnection, self).request(method, url, body=body, headers=headers)
        File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 1256, in request
          self._send_request(method, url, body, headers, encode_chunked)
        File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 1302, in _send_request
          self.endheaders(body, encode_chunked=encode_chunked)
        File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 1251, in endheaders
          self._send_output(message_body, encode_chunked=encode_chunked)
        File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 1011, in _send_output
          self.send(msg)
        File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 951, in send
          self.connect()
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 205, in connect
          conn = self._new_conn()
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 186, in _new_conn
          raise NewConnectionError(
      urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f119811a880>: Failed to establish a new connection: [Errno 111] Connection refused

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/adapters.py", line 489, in send
          resp = conn.urlopen(
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py", line 787, in urlopen
          retries = retries.increment(
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/retry.py", line 592, in increment
          raise MaxRetryError(_pool, url, error or ResponseError(cause))
      urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=80): Max retries exceeded with url: /server-status?auto (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f119811a880>: Failed to establish a new connection: [Errno 111] Connection refused'))

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py", line 1122, in run
          self.check(instance)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/apache/apache.py", line 90, in check
          r = self.http.get(url)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py", line 355, in get
          return self._request('get', url, options)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py", line 419, in _request
          response = self.make_request_aia_chasing(request_method, method, url, new_options, persist)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py", line 425, in make_request_aia_chasing
          response = request_method(url, **new_options)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/api.py", line 73, in get
          return request("get", url, params=params, **kwargs)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/api.py", line 59, in request
          return session.request(method=method, url=url, **kwargs)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/sessions.py", line 587, in request
          resp = self.send(prep, **send_kwargs)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/sessions.py", line 701, in send
          r = adapter.send(request, **kwargs)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/adapters.py", line 565, in send
          raise ConnectionError(e, request=request)
      requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=80): Max retries exceeded with url: /server-status?auto (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f119811a880>: Failed to establish a new connection: [Errno 111] Connection refused'))
      Instance ID: apache:4b41c883c989bee0 [ERROR]
      Configuration Source: file:/etc/datadog-agent/conf.d/apache.d/conf.yaml
      Total Runs: 47,087
      Metric Samples: Last Run: 25, Total: 1,168,950
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 47,087
      Average Execution Time : 7ms
      Last Execution Date : 2023-05-16 12:39:03 UTC (1684240743000)
      Last Successful Execution Date : 2023-05-16 11:16:48 UTC (1684235808000)
      metadata:
        version.major: 2
        version.minor: 4
        version.patch: 29
        version.raw: 2.4.29
        version.scheme: semver
      Error: HTTPConnectionPool(host='localhost', port=80): Max retries exceeded with url: /server-status?auto (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f11981263d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 174, in _new_conn
          conn = connection.create_connection(
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/connection.py", line 95, in create_connection
          raise err
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/connection.py", line 85, in create_connection
          sock.connect(sa)
      ConnectionRefusedError: [Errno 111] Connection refused

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
          httplib_response = self._make_request(
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py", line 398, in _make_request
          conn.request(method, url, **httplib_request_kw)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 244, in request
          super(HTTPConnection, self).request(method, url, body=body, headers=headers)
        File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 1256, in request
          self._send_request(method, url, body, headers, encode_chunked)
        File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 1302, in _send_request
          self.endheaders(body, encode_chunked=encode_chunked)
        File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 1251, in endheaders
          self._send_output(message_body, encode_chunked=encode_chunked)
        File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 1011, in _send_output
          self.send(msg)
        File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 951, in send
          self.connect()
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 205, in connect
          conn = self._new_conn()
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 186, in _new_conn
          raise NewConnectionError(
      urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f11981263d0>: Failed to establish a new connection: [Errno 111] Connection refused

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/adapters.py", line 489, in send
          resp = conn.urlopen(
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py", line 787, in urlopen
          retries = retries.increment(
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/retry.py", line 592, in increment
          raise MaxRetryError(_pool, url, error or ResponseError(cause))
      urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=80): Max retries exceeded with url: /server-status?auto (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f11981263d0>: Failed to establish a new connection: [Errno 111] Connection refused'))

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py", line 1122, in run
          self.check(instance)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/apache/apache.py", line 90, in check
          r = self.http.get(url)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py", line 355, in get
          return self._request('get', url, options)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py", line 419, in _request
          response = self.make_request_aia_chasing(request_method, method, url, new_options, persist)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py", line 425, in make_request_aia_chasing
          response = request_method(url, **new_options)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/api.py", line 73, in get
          return request("get", url, params=params, **kwargs)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/api.py", line 59, in request
          return session.request(method=method, url=url, **kwargs)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/sessions.py", line 587, in request
          resp = self.send(prep, **send_kwargs)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/sessions.py", line 701, in send
          r = adapter.send(request, **kwargs)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/adapters.py", line 565, in send
          raise ConnectionError(e, request=request)
      requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=80): Max retries exceeded with url: /server-status?auto (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f11981263d0>: Failed to establish a new connection: [Errno 111] Connection refused'))

    container
    ---------
      Instance ID: container [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/container.d/conf.yaml.default
      Total Runs: 47,086
      Metric Samples: Last Run: 26, Total: 1,224,236
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 1ms
      Last Execution Date : 2023-05-16 12:39:01 UTC (1684240741000)
      Last Successful Execution Date : 2023-05-16 12:39:01 UTC (1684240741000)


    cpu
    ---
      Instance ID: cpu [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default
      Total Runs: 47,086
      Metric Samples: Last Run: 9, Total: 423,767
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2023-05-16 12:38:53 UTC (1684240733000)
      Last Successful Execution Date : 2023-05-16 12:38:53 UTC (1684240733000)


    disk (4.9.0)
    ------------
      Instance ID: disk:4dd570d1e081f089 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/disk.d/conf.yaml
      Total Runs: 47,086
      Metric Samples: Last Run: 116, Total: 5,461,976
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 7ms
      Last Execution Date : 2023-05-16 12:39:02 UTC (1684240742000)
      Last Successful Execution Date : 2023-05-16 12:39:02 UTC (1684240742000)


    docker
    ------
      Instance ID: docker [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/docker.d/conf.yaml
      Total Runs: 47,086
      Metric Samples: Last Run: 7, Total: 329,602
      Events: Last Run: 1, Total: 5
      Service Checks: Last Run: 1, Total: 47,086
      Average Execution Time : 4ms
      Last Execution Date : 2023-05-16 12:38:54 UTC (1684240734000)
      Last Successful Execution Date : 2023-05-16 12:38:54 UTC (1684240734000)


    file_handle
    -----------
      Instance ID: file_handle [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default
      Total Runs: 47,086
      Metric Samples: Last Run: 5, Total: 235,430
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2023-05-16 12:39:00 UTC (1684240740000)
      Last Successful Execution Date : 2023-05-16 12:39:00 UTC (1684240740000)


    io
    --
      Instance ID: io [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/io.d/conf.yaml.default
      Total Runs: 47,087
      Metric Samples: Last Run: 171, Total: 8,051,760
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2023-05-16 12:39:07 UTC (1684240747000)
      Last Successful Execution Date : 2023-05-16 12:39:07 UTC (1684240747000)


    load
    ----
      Instance ID: load [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/load.d/conf.yaml.default
      Total Runs: 47,086
      Metric Samples: Last Run: 6, Total: 282,516
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2023-05-16 12:38:59 UTC (1684240739000)
      Last Successful Execution Date : 2023-05-16 12:38:59 UTC (1684240739000)


    memory
    ------
      Instance ID: memory [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/memory.d/conf.yaml.default
      Total Runs: 47,087
      Metric Samples: Last Run: 20, Total: 941,740
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2023-05-16 12:39:06 UTC (1684240746000)
      Last Successful Execution Date : 2023-05-16 12:39:06 UTC (1684240746000)


    network (2.9.3)
    ---------------
      Instance ID: network:4b0649b7e11f0772 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/network.d/conf.yaml.default
      Total Runs: 47,086
      Metric Samples: Last Run: 110, Total: 5,179,460
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 3ms
      Last Execution Date : 2023-05-16 12:38:58 UTC (1684240738000)
      Last Successful Execution Date : 2023-05-16 12:38:58 UTC (1684240738000)


    ntp
    ---
      Instance ID: ntp:3c427a42a70bbf8 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default
      Total Runs: 785
      Metric Samples: Last Run: 1, Total: 785
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 785
      Average Execution Time : 31ms
      Last Execution Date : 2023-05-16 12:27:33 UTC (1684240053000)
      Last Successful Execution Date : 2023-05-16 12:27:33 UTC (1684240053000)


    uptime
    ------
      Instance ID: uptime [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default
      Total Runs: 47,087
      Metric Samples: Last Run: 1, Total: 47,087
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2023-05-16 12:39:05 UTC (1684240745000)
      Last Successful Execution Date : 2023-05-16 12:39:05 UTC (1684240745000)

========
JMXFetch
========

  Information
  ==================
  Initialized checks
  ==================
    no checks

  Failed checks
  =============
    no checks

=========
Forwarder
=========

  Transactions
  ============
    Cluster: 0
    ClusterRole: 0
    ClusterRoleBinding: 0
    CronJob: 0
    CustomResource: 0
    CustomResourceDefinition: 0
    DaemonSet: 0
    Deployment: 0
    Dropped: 0
    HighPriorityQueueFull: 0
    Ingress: 0
    Job: 0
    Namespace: 0
    Node: 0
    OrchestratorManifest: 0
    PersistentVolume: 0
    PersistentVolumeClaim: 0
    Pod: 0
    ReplicaSet: 0
    Requeued: 0
    Retried: 0
    RetryQueueSize: 0
    Role: 0
    RoleBinding: 0
    Service: 0
    ServiceAccount: 0
    StatefulSet: 0
    VerticalPodAutoscaler: 0

  Transaction Successes
  =====================
    Total number: 99278
    Successes By Endpoint:
      check_run_v1: 47,086
      intake: 3,929
      metadata_v1: 1,177
      series_v2: 47,086

  On-disk storage
  ===============
    On-disk storage is disabled. Configure `forwarder_storage_max_size_in_bytes` to enable it.

  API Keys status
  ===============
    API key ending with 31d65: API Key valid

==========
Endpoints
==========
  https://app.datadoghq.com - API Key ending with:
      - 31d65

==========
Logs Agent
==========
    Reliable: Sending compressed logs in HTTPS to agent-http-intake.logs.datadoghq.com on port 443
    BytesSent: 3.877812375e+09
    EncodedBytesSent: 2.36565304e+08
    LogsProcessed: 6.648938e+06
    LogsSent: 6.648938e+06
    CoreAgentProcessOpenFiles: 30
    OSFileLimit: 4096

  apache
  ------
    - Type: file
      Path: /var/log/apache2/*.log
      Service: apache
      Source: apache
      Status: Error: open /var/log/apache2/access.log: permission denied
        5 files tailed out of 5 files matching
      Inputs:
        /var/log/apache2/other_vhosts_access.log
        /var/log/apache2/default_error.log  
      Bytes Read: 0   
      Pipeline Latency:
        Average Latency (ms): 0
        24h Average Latency (ms): 0
        Peak Latency (ms): 0
        24h Peak Latency (ms): 0
    - Type: file
      Path: /var/log/apache2/*/*.log.*
      Service: apache
      Source: apache
      Status: OK
        3 files tailed out of 3 files matching
      Inputs:
        /var/log/apache2/ci.jenkins.io/access.log.20230516000000
        /var/log/apache2/azure.ci.jenkins.io/access.log.20190318000000
        /var/log/apache2/ci.jenkins.io/error.log.20230516000000  
      Bytes Read: 172832109   
      Pipeline Latency:
        Average Latency (ms): 0
        24h Average Latency (ms): 0
        Peak Latency (ms): 0
        24h Peak Latency (ms): 11

  container_collect_all
  ---------------------
    - Type: docker
      Service: jenkins
      Source: jenkins
      Status: OK
        The log file tailer could not be made, falling back to socket
      Inputs:
        8000873e8100181ee2dca72dc39a30983a52b6aad91e589c37927c61400d08e9  
      Bytes Read: 974585481   
      Pipeline Latency:
        Average Latency (ms): 0
        24h Average Latency (ms): 0
        Peak Latency (ms): 39
        24h Peak Latency (ms): 39


=============
Process Agent
=============

  Version: 7.44.0
  Status date: 2023-05-16 12:39:07.9 UTC (1684240747900)
  Process Agent Start: 2023-05-08 08:27:30.038 UTC (1683534450038)
  Pid: 1292
  Go Version: go1.19.7
  Build arch: amd64
  Log Level: info
  Enabled Checks: [process rtprocess]
  Allocated Memory: 22,721,712 bytes
  Hostname: ci.jenkins.io
  System Probe Process Module Status: Not running

  =================
  Process Endpoints
  =================
    https://process.datadoghq.com - API Key ending with:
        - 31d65

  =========
  Collector
  =========
    Last collection time: 2023-05-16 12:39:02
    Docker socket: /var/run/docker.sock
    Number of processes: 55
    Number of containers: 1
    Process Queue length: 0
    RTProcess Queue length: 0
    Connections Queue length: 0
    Event Queue length: 0
    Pod Queue length: 0
    Process Bytes enqueued: 0
    RTProcess Bytes enqueued: 0
    Connections Bytes enqueued: 0
    Event Bytes enqueued: 0
    Pod Bytes enqueued: 0
    Drop Check Payloads: []

=========
APM Agent
=========

  Status: Not running or unreachable on localhost:5012.
  Error: Get "http://localhost:5012/debug/vars": dial tcp 127.0.0.1:5012: connect: connection refused

==========
Aggregator
==========
  Checks Metric Sample: 26,825,000
  Dogstatsd Metric Sample: 1,954,072
  Event: 6
  Events Flushed: 6
  Number Of Flushes: 47,086
  Series Flushed: 27,105,305
  Service Check: 707,865
  Service Checks Flushed: 754,945

=========
DogStatsD
=========
  Event Packets: 0
  Event Parse Errors: 0
  Metric Packets: 1,954,071
  Metric Parse Errors: 0
  Service Check Packets: 0
  Service Check Parse Errors: 0
  Udp Bytes: 204,706,466
  Udp Packet Reading Errors: 0
  Udp Packets: 1,084,872
  Uds Bytes: 0
  Uds Origin Detection Errors: 0
  Uds Packet Reading Errors: 0
  Uds Packets: 0
  Unterminated Metric Errors: 0

====
OTLP
====

  Status: Not enabled
  Collector status: Not running

I'm wondering if the datadog "site" from its config shouldn't point to jenkins.datadoghq.com instead of datadoghq.com, cf its configuration:

cat /etc/datadog-agent/datadog.yaml
### MANAGED BY PUPPET


---
api_key: xxxxxxxxxxx31d65
dd_url: ''
site: datadoghq.com
cmd_port: 5001
hostname_fqdn: false
collect_ec2_tags: false
collect_gce_tags: false
confd_path: "/etc/datadog-agent/conf.d"
enable_metadata_collection: true
dogstatsd_port: 8125
dogstatsd_socket: ''
dogstatsd_non_local_traffic: false
log_file: "/var/log/datadog/agent.log"
log_level: info
tags: []
apm_config:
  enabled: false
  env: none
  apm_non_local_traffic: false
process_config:
  enabled: 'true'
  scrub_args: true
  custom_sensitive_words: []
logs_enabled: true
logs_config:
  container_collect_all: true
hostname: ci.jenkins.io

@dduportal
Copy link
Contributor

dduportal commented May 16, 2023

The Datadog plugin is now installed and configured on ci.jenkins.io but nothing is shown on Datadog.

FTR, here is the result of datadog-agent status from ci.jenkins.io VM:
Details

I realize that the default value for the agent hostname is 127.0.0.1 as per https://github.com/jenkins-infra/jenkins-infra/blob/84de1e8e1a716be95db783a19a3460ef890bd65a/dist/profile/templates/jenkinscontroller/casc/datadog.yaml.erb#L7 (because https://github.com/jenkins-infra/jenkins-infra/pull/2836/files does not set up the targetHost value).

But since the plugins runs in the ci.jenkins.io, it's inside the container, while the datadog agent runs on the host.
You can verify with a curl -v localhost:8125 from within the container: it won't work at all.

What are the logs from the controller? For sure, if it does not work, Jenkins will prints some logs to guide you.

@lemeurherve
Copy link
Member Author

What are the logs from the controller?

Nothing related to Datadog in https://ci.jenkins.io/manage/log/all

@dduportal
Copy link
Contributor

What are the logs from the controller?

Nothing related to Datadog in https://ci.jenkins.io/manage/log/all

I got the following in this page (required a reload/restart of the controller since this plugin is initialized at startup)

May 16, 2023 2:03:21 PM WARNING org.datadog.jenkins.plugins.datadog.traces.DatadogTraceBuildLogic finishBuildTrace
May 16, 2023 2:03:21 PM SEVERE org.datadog.jenkins.plugins.datadog.DatadogUtilities severe
Connection refused (Connection refused)

May 16, 2023 2:03:21 PM INFO org.datadog.jenkins.plugins.datadog.DatadogUtilities severe

Connection refused (Connection refused): java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:412)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:255)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:237)
at java.base/java.net.Socket.connect(Socket.java:609)
at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:507)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:602)
at java.base/sun.net.www.http.HttpClient.(HttpClient.java:275)
at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:374)
at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:395)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1253)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081)
at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1015)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1367)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1342)
at org.datadog.jenkins.plugins.datadog.transport.HttpSender.blockingSend(HttpSender.java:77)
at org.datadog.jenkins.plugins.datadog.transport.HttpSender.run(HttpSender.java:55)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)

@dduportal
Copy link
Contributor

Besides:

root@ci:~# grep -c 'use_dogstatsd' /etc/datadog-agent/datadog.yaml
0
root@ci:~# telnet 127.0.0.1 8125
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused

so not only localhost:8125 cannot work, but also the dogstatd agent is not exposed on the host itsefl (ref. https://docs.datadoghq.com/developers/dogstatsd/?tab=hostagent).

@lemeurherve
Copy link
Member Author

While working on #3558 I've noticed that the jenkins.package.available metric used in https://github.com/jenkins-infra/datadog/blob/main/monitor-packages.tf isn't collecting any event, hence its "no data" current state.

I think this issue could resolve that, to be checked on completion.

@dduportal
Copy link
Contributor

While working on #3558 I've noticed that the jenkins.package.available metric used in https://github.com/jenkins-infra/datadog/blob/main/monitor-packages.tf isn't collecting any event, hence its "no data" current state.

I think this issue could resolve that, to be checked on completion.

What is the relationship with the ci.jenkins.io metric collection?

(the monitor you're pointing looks related to the PKG VM, hosting the packages and index)

@lemeurherve
Copy link
Member Author

lemeurherve commented May 24, 2023

What is the relationship with the ci.jenkins.io metric collection?

The fact that all jenkins.* metrics aren't collecting anything, and that their "last configured" value is empty made me think of this relation.

Details image

@dduportal
Copy link
Contributor

What is the relationship with the ci.jenkins.io metric collection?

The fact that all jenkins.* metrics aren't collecting anything, and that their "last configured" value is empty made me think of this relation.
image

🤔 fair, but jenkins.package.available looks like a custom metric. Not sure what it would mean in the context of the datadog plugin for Jenkins?

@lemeurherve
Copy link
Member Author

You're right, this is a custom metric. It doesn't report anything though.

@lemeurherve
Copy link
Member Author

lemeurherve commented Jun 9, 2023

Temporarily disabling Datadog plugin on ci.jenkins.io to avoid numerous connection error logs from it until we figure out how to enable communication between the Datadog agent on the VM host and the controller inside a container running on the VM host.

screenshot image

@lemeurherve
Copy link
Member Author

It didn't went well, ci.jenkins.io was down for ~5 minutes (corresponding status incident: jenkins-infra/status#315)

Error log:

image

Fixed by deleting $JENKINS_HOME/plugins/datadog.jpi.disabled file in ci.jenkins.io volume.

I'll comment out datadog JCasc then disable again the plugin on ci.jenkins.io

@lemeurherve
Copy link
Member Author

Datadog plugin temporarily disabled.

@lemeurherve
Copy link
Member Author

While working on #3558 I've noticed that the jenkins.package.available metric used in jenkins-infra/datadog@main/monitor-packages.tf isn't collecting any event, hence its "no data" current state.

I think this issue could resolve that, to be checked on completion.

You're right, this is a custom metric. It doesn't report anything though.

Wasn't related to this issue, see #3608 (comment)

Note: the last data in this shared dashboard are from the 10th of May as seen in Datadog (https://jenkins.datadoghq.com/dashboard/fnh-4vm-t6k/latest-package-available), which would correspond to #3558 (probably related)

It was. Fixed in jenkins-infra/kubernetes-management#4056, the dashboard isn't empty anymore.

@dduportal
Copy link
Contributor

$ nc -u 127.0.0.1 8125 -z -v
Connection to 127.0.0.1 8125 port [udp/*] succeeded!
  • From the container jenkins, same host, as root:
apt update
apt install netcat -y

## Fails as expected: No process is listening inside the container on the UDP/8125
# Neither on the container's loopback interface
nc -u 127.0.0.1 8125 -z -v
# OR on the container virtual ethernet interface
nc -u 172.17.0.2 8125 -z -v
  • Confirming the behavior by checking the process listening on the UDP/8125 ports:
root@ci:~# netstat -alnp | grep udp | grep 8125
udp        0      0 127.0.0.1:8125          0.0.0.0:*                           25918/agent         
udp        0      0 127.0.0.1:57636         127.0.0.1:8125          ESTABLISHED 25919/process-agent 

vs.

root@ci:~# docker exec -u root -ti  jenkins bash
root@e2d16f2ca5d3:/# netstat -alnp | grep udp | grep 8125
root@e2d16f2ca5d3:/#
  • The datadog agent configuration need to be set up to allow incoming udp traffic:
    • https://github.com/DataDog/datadog-agent/blob/04300ae7b919f67d22c0a0a7634d12b5decb66d7/pkg/config/config_template.yaml#LL1947C3-L1947C30

    • Setting it to true (and systemctl restart agent) makes the agent to listen on all interfaces, all transports (IPv4/IPv6):

      $ docker exec -u root -ti  jenkins nc -u 172.17.0.1 8125 -z -v
      Connection to 172.17.0.1 8125 port [udp/*] succeeded!
      $ nc -u 127.0.0.1 8125 -z -v
      Connection to 127.0.0.1 8125 port [udp/*] succeeded!
      $ netstat -alnp | grep udp | grep 8125
      udp        0      0 127.0.0.1:50065         127.0.0.1:8125          ESTABLISHED 3649/process-agent  
      udp6       0      0 :::8125                 :::*                                3648/agent
      
    • But it is now available from the outside: we need a firewall rule

      ➜ nc -u 104.208.238.39 8125 -v -z
      Connection to 104.208.238.39 8125 port [udp/*] succeeded!
      ➜ nc -u 10.0.2.4 8125 -v -z      
      Connection to 10.0.2.4 8125 port [udp/*] succeeded!
      

@dduportal
Copy link
Contributor

@timja
Copy link
Member

timja commented Jun 20, 2023

Add an NSG only allowing port 80, 443 and the inbound agent port in?

i.e. not a firewall rule to block it but a firewall rule to only allow required traffic

Let's also add a firewall rule (puppet, azure, both?)

I would prefer just Azure, it's a lot easier to reason about and debug issues imo.

@dduportal
Copy link
Contributor

Update:

  • The datadog agent can specify a custom bind_host (but only one). Easier to restrict to "0.0.0.0" (remove IPv6 at least). Avoid allowing the full external incoming requests and easier to block

Add an NSG only allowing port 80, 443 and the inbound agent port in?

i.e. not a firewall rule to block it but a firewall rule to only allow required traffic

Let's also add a firewall rule (puppet, azure, both?)

I would prefer just Azure, it's a lot easier to reason about and debug issues imo.

Both should be required as they protect against different threats:

  • I agree with the NSG method, to ensure the virtual network takes care of requests
  • But adding local iptable rules is clearly needed: we only want to restrict to the loopback and the docker0 interface (and nothing else: not even a locally bound veth or a local bridge).

it's a lot easier to reason about and debug issues imo

Yes but it would remove a layer of protection.

@dduportal
Copy link
Contributor

  • Datadog is now fully enabled and integrated with ci.jenkins.io
  • Metrics, build logs and traces are collected and visible within Digital Ocean:
Capture d’écran 2023-06-22 à 19 22 55

Handing over back to @lemeurherve to finish the work on this issue

@lemeurherve
Copy link
Member Author

Closing, potential improvements:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants