Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Donation Proposal]: (Mini project) Transcripting #2623

Open
moxious opened this issue Mar 20, 2025 · 2 comments
Open

[Donation Proposal]: (Mini project) Transcripting #2623

moxious opened this issue Mar 20, 2025 · 2 comments
Labels
area/donation Donation Proposal

Comments

@moxious
Copy link

moxious commented Mar 20, 2025

Description

At Grafana in developer advocacy we use a simple set of scripts to fetch all YouTube transcripts and check the textual transcripts of what's in videos into a repo. You can see that here

We do this for a number of reasons:

  1. We use it to draft early first drafts when we're working on some new docs; basically, interview engineers for an hour, and then use the rich back & forth as grist for a blog post, a new docs page, whatever
  2. We have "expert Q&A LLM answer agents" which can be fed everything that the technical experts said on video. This expands the reach of automatic Q&A techniques like this.

It would not be that hard to contribute this code to OTel for its YouTube channel if that's desirable. What you'd get is a small set of python scripts & instructions, plus a directory of the community's choosing where all the transcripts would be checked in (effectively the output of the scripts).

This makes "video greppable" for the community, and enables any subsequent downstream LLM approaches community may want to use.

Mini POC

I took this really cool OTel video (thanks @reese-lee!) and pulled its transcript

$ cat otel/2023-09-28T04\:00\:36Z-otel-end-user-discussions-amer-january-2023.md | aichat --prompt "Please create a list of questions that are raised in this video transcript; omitting the answers"

1. How do you deal with helping in languages you're not an expert in?
2. How do you define an enabler in a company?
3. Have other companies experienced this struggle with language expertise?
4. Do you create champions in those languages you're not familiar with?
5. Have those people you worked with come back with questions or seek information on their own?
6. How long did it take for individuals to get up to speed with observability concepts?
7. Is auto-instrumentation a good approach to help people get started with OpenTelemetry?
8. What strategies can improve understanding of the purpose and value of OpenTelemetry within a team?
9. Would having real-world case documentation around OpenTelemetry be helpful?
10. Should there be a periodic forum for pointed Q&A with experts in the community?
11. How are you dealing with clock/time drift in data?
12. Is it useful to create a processor that detects data points from the future?
13. What are best practices for bifurcating data in a pipeline using filters and conditions?
14. What would be the advantage of using a router solution versus a filter processor for data separation?
15. How do you scale collector deployments correctly?
16. When should the configuration settings like the number of consumers be modified for optimal performance?
17. How many collectors should be scaled horizontally to meet traffic demands?
18. Is using a number of consumers setting purely for non-scalable deployments?

Benefits to the OpenTelemetry community

Better content reuse & google-ability of technical resources OTel is already publishing; faster process of improving docs & writing blog posts for everyone in the community.

Reasons for donation

I don't know if "donation" is the best concept here; the actual code itself is really not complicated, it's just an offer to put this in place if people like the idea. I've found it very useful, and it's frankly not that hard and doesn't impact any other components of the ecosystem, it's just focused on trying to get more value out of the good stuff people are already doing.

Repository

https://github.com/grafana/developer-advocacy/

Existing usage

(Covered above)

Maintenance

As new videos are published the script needs to be periodicially re-run + git commit + git push

Actual script maintenance is minimal unless the YouTube API changes

Licenses

I'm the author of this code, and I think I can clear it to be licensed Apache 2.0 but I need to verify this if/when proposal is accepted

Trademarks

N/A

Other notes

No response

@tedsuo
Copy link
Contributor

tedsuo commented Mar 20, 2025

Thanks @moxious! On a related note, something like this could also be useful for meeting summaries. OTel has a lot of zoom meetings. I don't think we should feed those into an answer agent, but it would be nice to have high quality meeting summaries. We've been testing the zoom summary feature at the GC meeting and so far it's been pretty underwhelming.

@moxious
Copy link
Author

moxious commented Mar 20, 2025

If desirable, I've verified we can do this Apache 2.0

@svrnm svrnm added the area/donation Donation Proposal label Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/donation Donation Proposal
Projects
None yet
Development

No branches or pull requests

3 participants