Skip to content

[CrowdStrike]: Processing of different events can lead to identical _ids #13720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
garethhumphriesgkc opened this issue Apr 29, 2025 · 1 comment · May be fixed by #13779
Open

[CrowdStrike]: Processing of different events can lead to identical _ids #13720

garethhumphriesgkc opened this issue Apr 29, 2025 · 1 comment · May be fixed by #13779
Assignees
Labels
Integration:crowdstrike CrowdStrike needs:triage Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations]

Comments

@garethhumphriesgkc
Copy link

garethhumphriesgkc commented Apr 29, 2025

Integration Name

CrowdStrike [crowdstrike]

Dataset Name

crowdstrike.falcon

Integration Version

latest (1.63.0?)

Agent Version

8.15.3

Agent Output Type

elasticsearch

Elasticsearch Version

8.15.3

OS Version and Architecture

Ubuntu

Software/API Version

No response

Error Message

:response=>{"create"=>{"status"=>409, "error"=>{"type"=>"version_conflict_engine_exception", "reason"=>"[TAYxxxxxxxxxxxxxxxxxxxxxx5s=]: version conflict, document already exists (current version [1])"}}}

Event Original

No response

What did you do?

Installed current pipeline, pointed collected events at it

What did you see?

Error message above

What did you expect to see?

No error, all events loaded to elasticsearch successfully

Anything else?

The crowdstrike/data_stream/falcon ingest pipeline calculates a document id based on a subset of fields:

  - fingerprint:
      fields:
        - '@timestamp'
        - crowdstrike.event.SessionId
        - crowdstrike.event.DetectId
        - crowdstrike.metadata.eventType
        - crowdstrike.metadata.customerIDString
      target_field: _id
      tag: fingerprint
      ignore_missing: true

We've found that often under heavy load these fields aren't specific enough to uniquely identify a record - we often see events arrive in the same second with no difference in the above fields. For example, here is a (heavily redacted) example of two records which generate the same _id.

{                                                                                            {
  "metadata": {                                                                                   "metadata": {
        "customerIDString": "56xxxxxxxxxxxxxxxxxxxxxxxxxxxx14",                                         "customerIDString": "56xxxxxxxxxxxxxxxxxxxxxxxxxxxx14",
        "offset": 476337828,                                                                |           "offset": 476337829,
        "eventType": "FirewallMatchEvent",                                                              "eventType": "FirewallMatchEvent",
        "eventCreationTime": 1745816290000,                                                             "eventCreationTime": 1745816290000,
        "version": "1.0"                                                                                "version": "1.0"
    },                                                                                              },
    "event": {                                                                                      "event": {
        "DeviceId": "40xxxxxxxxxxxxxxxxxxxxxxxxxxxxbb",                                                 "DeviceId": "40xxxxxxxxxxxxxxxxxxxxxxxxxxxxbb",
        "CustomerId": "56xxxxxxxxxxxxxxxxxxxxxxxxxxxx14",                                               "CustomerId": "56xxxxxxxxxxxxxxxxxxxxxxxxxxxx14",
        "Ipv": "ipv4",                                                                                  "Ipv": "ipv4",
        "ConnectionDirection": "1",                                                                     "ConnectionDirection": "1",
        "EventType": "FirewallRuleIP4Matched",                                                          "EventType": "FirewallRuleIP4Matched",
        "Flags": {                                                                                      "Flags": {
            "Audit": false,                                                                                 "Audit": false,
            "Log": true,                                                                                    "Log": true,
            "Monitor": true                                                                                 "Monitor": true
        },                                                                                              },
        "HostName": "xxx-xxx-xxnn",                                                                     "HostName": "xxx-xxx-xxnn",
        "ICMPCode": "",                                                                                 "ICMPCode": "",
        "ICMPType": "",                                                                                 "ICMPType": "",
        "LocalAddress": "nnn.nnn.nn.nn",                                                                "LocalAddress": "nnn.nnn.nn.nn",
        "LocalPort": "nnn0",                                                                |           "LocalPort": "nnn1",
        "MatchCount": 1,                                                                                "MatchCount": 1,
        "MatchCountSinceLastReport": 36,                                                                "MatchCountSinceLastReport": 36,
        "NetworkProfile": "1",                                                                          "NetworkProfile": "1",
        "PID": "nnnnnnnn9243",                                                              |           "PID": "nnnnnnnn7543",
        "PolicyName": "xxxxxxxx-xxxxxxx-xxnn",                                                          "PolicyName": "xxxxxxxx-xxxxxxx-xxnn",
        "PolicyID": "8fxxxxxxxxxxxxxxxxxxxxxxxxxxxx7f",                                                 "PolicyID": "8fxxxxxxxxxxxxxxxxxxxxxxxxxxxx7f",
        "Protocol": "6",                                                                                "Protocol": "6",
        "RemoteAddress": "nnn.nnn.nn.nnn",                                                              "RemoteAddress": "nnn.nnn.nn.nnn",
        "RemotePort": "nnnn9",                                                              |           "RemotePort": "nnnn8",
        "RuleAction": "1",                                                                              "RuleAction": "1",
        "RuleDescription": "xxxxxx xxx xxxxxxxx xx xxxxxxxxxxx xxx xxxxxxxx, xxxx xxx xxx   |           "RuleDescription": "xxxx xx xxx xxxx xxxxxxx xxx xxxxxxxx xx nn xxx-xxx-xxxnn, xx
        "RuleFamilyID": "96xxxxxxxxxxxxxxxxxxxxxxxxxxxx09",                                 |           "RuleFamilyID": "e5xxxxxxxxxxxxxxxxxxxxxxxxxxxx25",
        "RuleGroupName": "xxxxxxx-xxxxxxx-xxnn-x",                                                      "RuleGroupName": "xxxxxxx-xxxxxxx-xxnn-x",
        "RuleName": "xxxxx xxx xxx xxx xxxxx - xxxxx xxxxxxx xxxxx",                        |           "RuleName": "xxxxxxxx xxnn",
        "RuleId": "65xxxxxxxxxxxxxxx66",                                                    |           "RuleId": "27xxxxxxxxxxxxxxx10",
        "Status": "",                                                                                   "Status": "",
        "Timestamp": "2025-04-28T04:58:09Z",                                                            "Timestamp": "2025-04-28T04:58:09Z",
        "TreeID": "",                                                                                   "TreeID": "",
        "Platform": "windows"                                                                           "Platform": "windows"
    }                                                                                               }
}

I propose including one (or more) additional fields in the fingerprint step to ensure IDs are unique:

  • offset - Since this is the location within the log file the event starts, it is guaranteed to be unique for each entry. This is sufficient in our case, but may not be a generic fix as it assumes all incoming data is from a log file.
  • RuleId - As there were two different rules that matched, this would differentiate the above entries. There may however be occasions where the same rule matches twice within a second, so this may not be sufficient in all cases either.
  • PID - Not certain to be unique, but often will differ between events close in time. If no guaranteed surrogate key can be found, this may help reduce the chances of a collision.
@garethhumphriesgkc garethhumphriesgkc changed the title [CrowdStrike]: Processing of different canlead to identical _ids [CrowdStrike]: Processing of different events can lead to identical _ids Apr 29, 2025
@andrewkroh andrewkroh added Integration:crowdstrike CrowdStrike Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] labels Apr 29, 2025
@elasticmachine
Copy link

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@efd6 efd6 self-assigned this May 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Integration:crowdstrike CrowdStrike needs:triage Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations]
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants