Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingestion) Adding vertexAI ingestion source (v2 - experiment and experiment run) #12836

Open
wants to merge 174 commits into
base: master
Choose a base branch
from

Conversation

ryota-cloud
Copy link
Collaborator

@ryota-cloud ryota-cloud commented Mar 10, 2025

  • this PR is to add ingestion of experiment and runs.
  • i found main script(vertexai.py) too long, so did refactored, moved and took out some utils out of it.
  • in review, you can find main diff in following parts
        # Fetch and Ingest Experiments
        yield from self._get_experiments_workunits()
        # Fetch and Ingest Experiment Runs
        yield from auto_workunit(self._get_experiment_runs_mcps())

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@ryota-cloud ryota-cloud requested a review from hsheth2 March 18, 2025 06:25
Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shared a couple more comments in Slack

name=mock_model.display_name,
description=mock_model.description,
created=TimeStampClass(
time=datetime_to_ts_millis(datetime.fromtimestamp(1647878400)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why go timestamp -> datetime object -> back to timestamp?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of re-exporting types in init files

In general, init files should probably be empty

Copy link
Collaborator Author

@ryota-cloud ryota-cloud Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for practical reasons after refactored main script, moved to module.

  • ingestion job invocation failed without it
  • wanted to avoid redundant import statement, vertexai/vertexai..

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community devops PR or Issue related to DataHub backend & deployment ingestion PR or Issue related to the ingestion of metadata pending-submitter-response Issue/request has been reviewed but requires a response from the submitter product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants