Skip to content

feat(cdp): segment destination mapping #31336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 105 commits into from
May 23, 2025
Merged

feat(cdp): segment destination mapping #31336

merged 105 commits into from
May 23, 2025

Conversation

MarconLP
Copy link
Member

@MarconLP MarconLP commented Apr 17, 2025

Important

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Changes

PostHog / Segment view

PostHog

2025-04-17 at 10 56 29

Segment

2025-04-17 at 10 59 27

Todos:

  • translate FQL to posthog filters
  • ability to select an entire object (we can technically do this using a {event.properties} string field) convert to string and use {event.properties} for now
    2025-04-17 at 11 08 33
  • execute different actions depending on the mapping

TODOs:

Refer to this PR for the followup status #32651

Follow-up:

  • default values defined here are broken
  • debug log option for all (segment?) destinations
  • select entire object / individual keys switcher
  • make hog_function_template endpoint paginated
  • do not use inline snapshots for bigger results
  • implement retry to cyclotron

Does this work well for both Cloud and self-hosted?

How did you test this code?

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

Here's my review of the changes for the Segment destination mapping functionality:

Added comprehensive Amplitude destination integration with extensive test coverage and utility functions for PostHog's CDP.

Key changes:

  • Implemented core Amplitude actions (logEvent, logPurchase, identifyUser, groupIdentifyUser) with proper type definitions and field mappings
  • Added robust user property handling with support for UTM tracking, referrer data, and user agent parsing
  • Created regional endpoint support for both North America and Europe with configurable batch/single event processing
  • Built timestamp conversion utilities and session ID formatting for proper event timing
  • Introduced comprehensive test coverage across all major components with edge case handling

Note: The template.ts file needs significant revision as it contains deprecated code, hardcoded values and inadequate error handling.

34 file(s) reviewed, 45 comment(s)
Edit PR Review Bot Settings | Greptile

Copy link
Contributor

github-actions bot commented Apr 17, 2025

Size Change: +10 B (0%)

Total Size: 3.71 MB

ℹ️ View Unchanged
Filename Size Change
frontend/dist/toolbar.js 3.71 MB +10 B (0%)

compressed-size-action

Comment on lines 253 to 256
default: field.type !== 'object' || typeof field.default !== 'undefined' && ('@path' in field.default) ? translateInputs(field.default) : Object.fromEntries(Object.entries(field.properties ?? {}).map(([key, _]) => {
const defaultVal = field.default as Record<string, object> ?? {}
return [key, translateInputs(defaultVal[key])]
})),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value logic has two potential issues:

  1. The condition typeof field.default !== 'undefined' && ('@path' in field.default) may cause runtime errors if field.default is undefined, as the second part would attempt to check a property on an undefined value.

  2. The type assertion field.default as Record<string, object> could be unsafe when field.default is undefined.

Consider using optional chaining and nullish coalescing to handle these cases more safely:

default: field.type !== 'object' || (field.default && '@path' in field.default) 
  ? translateInputs(field.default) 
  : Object.fromEntries(Object.entries(field.properties ?? {}).map(([key, _]) => {
      const defaultVal = field.default ? (field.default as Record<string, object>) : {};
      return [key, translateInputs(defaultVal[key])];
  })),
Suggested change
default: field.type !== 'object' || typeof field.default !== 'undefined' && ('@path' in field.default) ? translateInputs(field.default) : Object.fromEntries(Object.entries(field.properties ?? {}).map(([key, _]) => {
const defaultVal = field.default as Record<string, object> ?? {}
return [key, translateInputs(defaultVal[key])]
})),
default: field.type !== 'object' || (field.default && '@path' in field.default) ? translateInputs(field.default) : Object.fromEntries(Object.entries(field.properties ?? {}).map(([key, _]) => {
const defaultVal = field.default ? (field.default as Record<string, object>) : {}
return [key, translateInputs(defaultVal[key])]
})),

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 7)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 7)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@MarconLP MarconLP enabled auto-merge (squash) May 19, 2025 22:04
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of this code is untested. please add tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added tests


public async processInvocations(invocations: HogFunctionInvocation[]): Promise<HogFunctionInvocationResult[]> {
// Segment plugins fire fetch requests and so need to be run in true parallel
return await Promise.all(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fail-fast or fail-safe?
If one plugin throws an error, Promise.all will reject immediately. Ask yourself:
Do you want all or nothing (fail-fast)? → Promise.all is fine.
Do you want to collect all results/errors? → Use Promise.allSettled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@meikelmosby not fully sure on this one - technically, we want to capture all results & errors from all invocations, but wouldn't the same apply to the plugin worker?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jumping in - the case here is basically that the executor code should only throw if there is a genuine programming error. If so we essentially have no choice but to crash out so there isn't much of a difference between all and allSettled, as we have no course of action to partially handle a batch.

Generally it would be good if we could partially handle a batch, but thats not how kafka works so this is an issue with out architecture moreso than this specific code path

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding some tests would be good here as well..

Copy link
Member Author

@MarconLP MarconLP May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tests for the executor cover this one as well


public async processInvocations(invocations: HogFunctionInvocation[]): Promise<HogFunctionInvocationResult[]> {
// Segment plugins fire fetch requests and so need to be run in true parallel
return await Promise.all(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jumping in - the case here is basically that the executor code should only throw if there is a genuine programming error. If so we essentially have no choice but to crash out so there isn't much of a difference between all and allSettled, as we have no course of action to partially handle a batch.

Generally it would be good if we could partially handle a batch, but thats not how kafka works so this is an issue with out architecture moreso than this specific code path

@MarconLP
Copy link
Member Author

Copy link
Contributor

@benjackwhite benjackwhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to go. Still some follow up needed for sure but lets iterate

}
})

expect(invocationResults[0].logs).toMatchInlineSnapshot(`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you have really big snapshots like this we should just use toMatchSnapshot - inline ones are good when there is a limited number of options. Now the file is so huge its virtually unreadable.

pluginExecutionDuration.observe(performance.now() - start)
} catch (e) {
if (e instanceof RetryError) {
// NOTE: Schedule as a retry to cyclotron?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be implemented?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be left out for this PR but needs to be done as follow up asap

@MarconLP MarconLP disabled auto-merge May 23, 2025 07:54
@MarconLP MarconLP dismissed meikelmosby’s stale review May 23, 2025 08:03

adjusted requested changes

@MarconLP MarconLP merged commit c3944dc into master May 23, 2025
93 of 96 checks passed
@MarconLP MarconLP deleted the external-destinations branch May 23, 2025 08:03
@MarconLP MarconLP mentioned this pull request May 26, 2025
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants