Skip to content

Conversation

@edgarrmondragon
Copy link
Member

New streams for GitHub Actions :

  • workflows: All workflows in a repository.
  • workflow_runs: All workflow runs in a repository.
  • workflow_run_jobs: All jobs executed in a workflow run. For example, every value in a matrix corresponds to a job.

@edgarrmondragon edgarrmondragon changed the title Feature/workflow streams Workflow streams Oct 11, 2021
@aaronsteers
Copy link
Contributor

Exciting!

@edgarrmondragon
Copy link
Member Author

@aaronsteers Now just gotta work around the rate limiting 🙃

{"message":"API rate limit exceeded for installation ID 19254008.","documentation_url":"https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting"}

Comment on lines 1058 to 1060
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._schema_emitted = False
Copy link
Member Author

@edgarrmondragon edgarrmondragon Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronsteers I also made schema messages a bit quieter for jobs. It was outputting one every time a new parent workflow run was synced, and some targets flush records when they receive a new schema for an existing table, which would cause the target to write lots of really small batches and affect performance.

Maybe this makes sense as a builtin option in the SDK?

Copy link
Contributor

@aaronsteers aaronsteers Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @edgarrmondragon - We certainly did see target performance issues due to flushing every time a schema message was received in target-athena (and other community-created targets). These were addressed at the target level within the SDK, adding a schema diff check before draining records, and other changes to just reduce the frequency of those operations.

And yes, I do think something like a _schema_emitted tracker is good to add in the SDK at the tap level also. However, we may need to check schema messages against each other in case the schema has indeed been changed since the last iteration, and also this may need to be coordinated across multiple instances of stream objects having the same type. (For parent-child streams, where this is the biggest issue, I believe the next child stream instance may be a brand new object of the same type.)

Copy link
Member Author

@edgarrmondragon edgarrmondragon Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronsteers I agree that just ignoring repeated schemas is not ideal, there may be some evolution from one to the next 👍.

For parent-child streams, where this is the biggest issue, I believe the next child stream instance may be a brand new object of the same type.

For the built-in parent-child functionality, child streams are actually a single instance created at Tap.discover_streams, but synced with multiple contexts so having _schema_emitted at the class level works.

@ericboucher
Copy link
Contributor

@edgarrmondragon is this still in progress? What's blocking?

@edgarrmondragon
Copy link
Member Author

@edgarrmondragon is this still in progress? What's blocking?

@ericboucher thanks for the ping. I had forgotten about this one but last time tests were failing due to rate limits.

I just rebased and if tests are ✅ , this might be ready for review 😄

@edgarrmondragon edgarrmondragon marked this pull request as ready for review March 11, 2022 19:49
@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
22.5% 22.5% Duplication

@ericboucher
Copy link
Contributor

Looks good to me! Same blocker as #93 for SonarCloud tho...

@ericboucher
Copy link
Contributor

@edgarrmondragon I think you can go ahead and merge :shipit:

@edgarrmondragon edgarrmondragon merged commit 025f713 into MeltanoLabs:main Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants