[Donation Proposal]: (Mini project) Transcripting #2623

moxious · 2025-03-20T15:13:21Z

Description

At Grafana in developer advocacy we use a simple set of scripts to fetch all YouTube transcripts and check the textual transcripts of what's in videos into a repo. You can see that here

We do this for a number of reasons:

We use it to draft early first drafts when we're working on some new docs; basically, interview engineers for an hour, and then use the rich back & forth as grist for a blog post, a new docs page, whatever
We have "expert Q&A LLM answer agents" which can be fed everything that the technical experts said on video. This expands the reach of automatic Q&A techniques like this.

It would not be that hard to contribute this code to OTel for its YouTube channel if that's desirable. What you'd get is a small set of python scripts & instructions, plus a directory of the community's choosing where all the transcripts would be checked in (effectively the output of the scripts).

This makes "video greppable" for the community, and enables any subsequent downstream LLM approaches community may want to use.

Mini POC

I took this really cool OTel video (thanks @reese-lee!) and pulled its transcript

$ cat otel/2023-09-28T04\:00\:36Z-otel-end-user-discussions-amer-january-2023.md | aichat --prompt "Please create a list of questions that are raised in this video transcript; omitting the answers"

1. How do you deal with helping in languages you're not an expert in?
2. How do you define an enabler in a company?
3. Have other companies experienced this struggle with language expertise?
4. Do you create champions in those languages you're not familiar with?
5. Have those people you worked with come back with questions or seek information on their own?
6. How long did it take for individuals to get up to speed with observability concepts?
7. Is auto-instrumentation a good approach to help people get started with OpenTelemetry?
8. What strategies can improve understanding of the purpose and value of OpenTelemetry within a team?
9. Would having real-world case documentation around OpenTelemetry be helpful?
10. Should there be a periodic forum for pointed Q&A with experts in the community?
11. How are you dealing with clock/time drift in data?
12. Is it useful to create a processor that detects data points from the future?
13. What are best practices for bifurcating data in a pipeline using filters and conditions?
14. What would be the advantage of using a router solution versus a filter processor for data separation?
15. How do you scale collector deployments correctly?
16. When should the configuration settings like the number of consumers be modified for optimal performance?
17. How many collectors should be scaled horizontally to meet traffic demands?
18. Is using a number of consumers setting purely for non-scalable deployments?

Benefits to the OpenTelemetry community

Better content reuse & google-ability of technical resources OTel is already publishing; faster process of improving docs & writing blog posts for everyone in the community.

Reasons for donation

I don't know if "donation" is the best concept here; the actual code itself is really not complicated, it's just an offer to put this in place if people like the idea. I've found it very useful, and it's frankly not that hard and doesn't impact any other components of the ecosystem, it's just focused on trying to get more value out of the good stuff people are already doing.

Repository

https://github.com/grafana/developer-advocacy/

Existing usage

(Covered above)

Maintenance

As new videos are published the script needs to be periodicially re-run + git commit + git push

Actual script maintenance is minimal unless the YouTube API changes

Licenses

I'm the author of this code, and I think I can clear it to be licensed Apache 2.0 but I need to verify this if/when proposal is accepted

Trademarks

N/A

Other notes

No response

The text was updated successfully, but these errors were encountered:

tedsuo · 2025-03-20T15:59:25Z

Thanks @moxious! On a related note, something like this could also be useful for meeting summaries. OTel has a lot of zoom meetings. I don't think we should feed those into an answer agent, but it would be nice to have high quality meeting summaries. We've been testing the zoom summary feature at the GC meeting and so far it's been pretty underwhelming.

moxious · 2025-03-20T20:55:29Z

If desirable, I've verified we can do this Apache 2.0

svrnm added the area/donation Donation Proposal label Mar 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Donation Proposal]: (Mini project) Transcripting #2623

[Donation Proposal]: (Mini project) Transcripting #2623

moxious commented Mar 20, 2025

tedsuo commented Mar 20, 2025

moxious commented Mar 20, 2025

[Donation Proposal]: (Mini project) Transcripting #2623

[Donation Proposal]: (Mini project) Transcripting #2623

Comments

moxious commented Mar 20, 2025

Description

Mini POC

Benefits to the OpenTelemetry community

Reasons for donation

Repository

Existing usage

Maintenance

Licenses

Trademarks

Other notes

tedsuo commented Mar 20, 2025

moxious commented Mar 20, 2025