Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Async for EMR Operators and Sensors #120

Closed
12 of 16 tasks
phanikumv opened this issue Mar 14, 2022 · 8 comments
Closed
12 of 16 tasks

Implement Async for EMR Operators and Sensors #120

phanikumv opened this issue Mar 14, 2022 · 8 comments
Assignees
Labels
area/async Deferrable/async operators pri/medium Medium priority
Milestone

Comments

@phanikumv
Copy link
Collaborator

phanikumv commented Mar 14, 2022

Please look at docs to see if there are other operators we should make Async (Principle - Only create Async operators for the “sync-version” of operators that do some level of polling; take more than a few seconds to complete) for the EMR

Implement async versions for the following operators (Aligned on descending order of priority)


Acceptance Criteria:

@phanikumv phanikumv added area/async Deferrable/async operators pri/medium Medium priority labels Mar 14, 2022
@sunank200
Copy link
Collaborator

Both EmrCreateJobFlowOperator and EmrAddStepsOperator does not require the async implementation as an Operator.

If you look at the boto3 documentation for run_job_flow, it instantaneously returns the response with JobFlowId and ClusterArn. EmrCreateJobFlowOperator just takes the job_flow_overrides as an argument and creates an EMR JobFlow using job_flow_overrides and it is not a blocking call (it returns the JobFlowId instantaneously) The sync implementation of the operator does not care about the status of the EMR JobFlow and it's not tracking it. It just instantiates the creation of EMR Job but does not track the status of it.

Similarly in the EmrAddStepsOperator, it just adds the adds steps to an existing EMR job_flow and returns the step-ids using boto3 add_job_flow_steps but this operators does not track the status of the steps. So it does not require async implementation.

@kaxil @phanikumv

@phanikumv
Copy link
Collaborator Author

Yes I agree, I think only the below sensors will take a while to run and will need async implementation

  • EmrContainerSensor
  • EmrJobFlowSensor
  • EmrStepSensor

@kaxil
Copy link
Collaborator

kaxil commented Mar 16, 2022

@kaxil
Copy link
Collaborator

kaxil commented Mar 21, 2022

Is this completed, I don't see a PR for the following

  • EmrContainerSensor
  • EmrJobFlowSensor
  • EmrStepSensor

@sunank200
Copy link
Collaborator

I am working on SparkSqlOperator as Phani had mentioned it has a higher priority.

@phanikumv phanikumv changed the title Implement Async for EMR Operators Implement Async for EMR Operators and Sensors Mar 22, 2022
@phanikumv
Copy link
Collaborator Author

Thats correct @kaxil , I requested Ankit and Bharani to switch over to implementing native Spark operators as part of #119 .

@pankajastro
Copy link
Contributor

I have started working EmrStepSensor

@pankajastro
Copy link
Contributor

PR opened EmrStepSensor #184

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/async Deferrable/async operators pri/medium Medium priority
Projects
None yet
Development

No branches or pull requests

5 participants