Skip to content

add jira-ticket parameter to paasta spark-run #4073

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 30, 2025

Conversation

sidsatyelp
Copy link
Member

@sidsatyelp sidsatyelp commented May 28, 2025

What this change does

In order to more accurately attribute the costs of feature development to infrastructure costs we are making a change to how we submit adhoc spark jobs. Users will need to pass an additional parameter --jira-ticket in paasta spark-run. The value of this parameter should be the top level jira ticket being used to track your project. This will allow us to aggregate the entire project's spark development costs over time.

Will work only after the flag spark.yelp.jira_ticket.enabled is set to true in srv-configs and the new version with these changes are included in paasta and spark-tools. Before we enable this we will communicate about this to spark users at Yelp.

Related PRs and documentation

https://github.yelpcorp.com/sysgit/srv-configs/pull/44570
Yelp/service_configuration_lib#161
New release of service_configuration_lib being used here.
https://github.yelpcorp.com/python-packages/spark_tools/pull/136
https://yelpwiki.yelpcorp.com/spaces/AML/pages/402885641/--jira-ticket+in+spark-run

Testing

# Shouldn't launch because we don't provide jira_ticket
paasta spark-run --aws-profile=dev-cloud-economics --cmd "spark-submit /code/integration_tests/s3.py"
# output: https://fluffy.yelpcorp.com/i/XDdDpJB7vrXLvcqVNkLBxvBdtTFl44RX.html

# Should launch because we provide jira_ticket
paasta spark-run --aws-profile=dev-cloud-economics --jira-ticket=ABC-123 --cmd "spark-submit /code/integration_tests/s3.py"
# output: https://fluffy.yelpcorp.com/i/1VCTvzjk0wK6mbC97KF8JPRqt2WmsXTd.html

jira_ticket label is being added to executors
https://app.cloudzero.com/explorer?activeCostType=real_cost&granularity=hourly&partitions=k8slabel%3Aspark_yelp_com_jira_ticket&dateRange=Last%2024%20Hours&k8slabel%3Aspark_yelp_com_user=sids&showRightFlyout=filters

Tech Spec
Jira Ticket

@sidsatyelp sidsatyelp changed the title add jira-ticket parameter to spark-run add jira-ticket parameter to paasta spark-run May 28, 2025
@sidsatyelp sidsatyelp marked this pull request as ready for review May 29, 2025 19:13
Copy link
Member

@nemacysts nemacysts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from past experience with blocking workflows that require tickets, i'm a little skeptical of the data quality we'll get from this over something coarser (but more automatic) like team-based allocation based on the launching user, but i'm happy to be proven wrong :)

(note: ML Compute owns spark-run in its entirety - you don't need CI shipit for this PR since this doesn't touch any CODEOWNER'd files)

"The top level jira ticket used to track the project that this spark-job is related to. "
"eg: --jira-ticket=PROJ-123. "
"Must be passed for all adhoc jobs. "
"See https://yelpwiki.yelpcorp.com/spaces/AML/pages/402885641. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably worth replacing this with a y/ link :)

@sidsatyelp sidsatyelp merged commit cc9e83f into master May 30, 2025
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants