Add RubyLLM.transcribe method. #97

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

keithrbennett wants to merge 4 commits into crmne:main from keithrbennett:dedicated-transcription-interface

Contributor

keithrbennett commented Apr 4, 2025

Add top level transcribe method to match chat, embed, paint. Addresses #92 .


          Add RubyLLM.transcribe method.

736d337

keithrbennett marked this pull request as draft

April 4, 2025 16:41

keithrbennett added 3 commits

April 5, 2025 17:21


          Add unit tests and associated VCR files

b0bf924


          Add transcribe method to OpenAI provider.rb.

Add Transcription module to Providers::OpenAI.
Model now defaults to whisper-1 and does not need prompt.
Add transcription docs.
Make transcription model default a config option.
Enhance Content.mime_type_for to support audio content types.
Simplify RubyLLM unit tests to only test that correct methods are called, no LLM access.


          Merge branch 'main' into dedicated-transcription-interface

096ffd3

# Conflicts:
#	lib/ruby_llm/configuration.rb

keithrbennett marked this pull request as ready for review

April 9, 2025 16:01

Contributor Author

keithrbennett commented Apr 9, 2025

@crmne I believe this PR is ready. It took a surprising amount of code and time to implement, but I think it adds value beyond a simple transcribe method and is a good addition to the code base. I hope you think so too! :)

schappim commented Apr 12, 2025

@crmne any chance of getting this merged? 🙏

crmne requested changes

View reviewed changes

Owner

crmne left a comment

This PR doesn't follow the RubyLLM design in multiple ways.

Take a look at RubyLLM::Image and RubyLLM::Embedding.
It only implements it for OpenAI.
The OpenAI implementation doesn't follow how the providers are implemented in RubyLLM.
There are extra changes that have nothing to do with this PR (vibe coded?)
There are no VCR cassettes.

I'd be happy to merge it if you are willing to improve it.

lib/ruby_llm/providers/openai/transcription.rb

Comment on lines +29 to +31

+                      def api_base
+                        'https://api.openai.com/v1'
+                      end

Owner

crmne Jun 11, 2025

Please remove this. It's at the top level of the provider module.

lib/ruby_llm/providers/openai/transcription.rb

Comment on lines +25 to +27

+                      def transcription_url
+                        "#{api_base}/audio/transcriptions"
+                      end

Owner

crmne Jun 11, 2025

Please remove api_base from this method. It should only contain the path, not the host.

lib/ruby_llm/providers/openai/transcription.rb

Comment on lines +12 to +15

+                      def self.extended(base)
+                        # module_function causes the 'transcribe' method to be private, but we need it to be public
+                        base.public_class_method :transcribe
+                      end

Owner

crmne Jun 11, 2025

no need for this. simply move your transcribe above module_function

lib/ruby_llm/providers/openai/transcription.rb

+                  module OpenAI
+                    # Handles audio transcription functionality for the OpenAI API
+                    module Transcription
+                      # Helper methods as module_function

Owner

crmne Jun 11, 2025

Remove this comment.

lib/ruby_llm/providers/openai/transcription.rb

Comment on lines +39 to +52

+                      def post_multipart(url, payload)
+                        connection = Faraday.new(url: api_base) do |f|
+                          f.request :multipart
+                          f.request :url_encoded
+                          f.adapter Faraday.default_adapter
+                        end
+                        response = connection.post(url) do |req|
+                          req.headers.merge!(headers)
+                          req.body = payload
+                        end
+                        JSON.parse(response.body)
+                      end

Owner

crmne Jun 11, 2025

This really should not be in the Transcription module of the OpenAI provider. This is a generic method that should go in RubyLLM::Connection

lib/ruby_llm/providers/openai/transcription.rb

Comment on lines +33 to +37

+                      def headers
+                        {
+                          'Authorization' => "Bearer #{RubyLLM.config.openai_api_key}"
+                        }
+                      end

Owner

crmne Jun 11, 2025

Why is this here?

lib/ruby_llm/content.rb

Comment on lines +37 to +64

+                  # Determine the MIME type based on file extension
+                  def self.mime_type_for(path) # rubocop:disable Metrics/CyclomaticComplexity, Metrics/MethodLength
+                    ext = File.extname(path).delete('.').downcase
+                    case ext
+                    when 'jpeg', 'jpg'
+                      'image/jpeg'
+                    when 'png'
+                      'image/png'
+                    when 'gif'
+                      'image/gif'
+                    when 'webp'
+                      'image/webp'
+                    when 'mgpa', 'mp3', 'mpeg'
+                      'audio/mpeg'
+                    when 'm4a', 'mp4'
+                      'audio/mp4'
+                    when 'wav'
+                      'audio/wav'
+                    when 'ogg'
+                      'audio/ogg'
+                    when 'webm'
+                      'audio/webm'
+                    else
+                      # Default to the extension as the subtype
+                      "application/#{ext}"
+                    end
+                  end

Owner

crmne Jun 11, 2025

We now have a MimeType module.

lib/ruby_llm/model_info.rb

Comment on lines -18 to +19

-                              :supports_json_mode, :input_price_per_million, :output_price_per_million, :type, :family
+                              :supports_json_mode, :input_price_per_million, :output_price_per_million,
+                              :type, :family

Owner

crmne Jun 11, 2025

Why did this change?

README.md

Comment on lines +52 to +54

		# Transcribe audio files
		RubyLLM.transcribe "interview.wav"

Owner

crmne Jun 11, 2025

This is in a very awkward spot.

crmne linked an issue

that may be closed by this pull request

Add dedicated transcription interface for audio-to-text models #92

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet