Skip to content

Add RubyLLM.transcribe method. #97

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

keithrbennett
Copy link
Contributor

Add top level transcribe method to match chat, embed, paint. Addresses #92 .

@keithrbennett keithrbennett marked this pull request as draft April 4, 2025 16:41
Add Transcription module to Providers::OpenAI.
Model now defaults to whisper-1 and does not need prompt.
Add transcription docs.
Make transcription model default a config option.
Enhance Content.mime_type_for to support audio content types.
Simplify RubyLLM unit tests to only test that correct methods are called, no LLM access.
# Conflicts:
#	lib/ruby_llm/configuration.rb
@keithrbennett keithrbennett marked this pull request as ready for review April 9, 2025 16:01
@keithrbennett
Copy link
Contributor Author

@crmne I believe this PR is ready. It took a surprising amount of code and time to implement, but I think it adds value beyond a simple transcribe method and is a good addition to the code base. I hope you think so too! :)

@schappim
Copy link

@crmne any chance of getting this merged? 🙏

Copy link
Owner

@crmne crmne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR doesn't follow the RubyLLM design in multiple ways.

  1. Take a look at RubyLLM::Image and RubyLLM::Embedding.
  2. It only implements it for OpenAI.
  3. The OpenAI implementation doesn't follow how the providers are implemented in RubyLLM.
  4. There are extra changes that have nothing to do with this PR (vibe coded?)
  5. There are no VCR cassettes.

I'd be happy to merge it if you are willing to improve it.

Comment on lines +29 to +31
def api_base
'https://api.openai.com/v1'
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this. It's at the top level of the provider module.

Comment on lines +25 to +27
def transcription_url
"#{api_base}/audio/transcriptions"
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove api_base from this method. It should only contain the path, not the host.

Comment on lines +12 to +15
def self.extended(base)
# module_function causes the 'transcribe' method to be private, but we need it to be public
base.public_class_method :transcribe
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for this. simply move your transcribe above module_function

module OpenAI
# Handles audio transcription functionality for the OpenAI API
module Transcription
# Helper methods as module_function
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this comment.

Comment on lines +39 to +52
def post_multipart(url, payload)
connection = Faraday.new(url: api_base) do |f|
f.request :multipart
f.request :url_encoded
f.adapter Faraday.default_adapter
end

response = connection.post(url) do |req|
req.headers.merge!(headers)
req.body = payload
end

JSON.parse(response.body)
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really should not be in the Transcription module of the OpenAI provider. This is a generic method that should go in RubyLLM::Connection

Comment on lines +33 to +37
def headers
{
'Authorization' => "Bearer #{RubyLLM.config.openai_api_key}"
}
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here?

Comment on lines +37 to +64
# Determine the MIME type based on file extension
def self.mime_type_for(path) # rubocop:disable Metrics/CyclomaticComplexity, Metrics/MethodLength
ext = File.extname(path).delete('.').downcase

case ext
when 'jpeg', 'jpg'
'image/jpeg'
when 'png'
'image/png'
when 'gif'
'image/gif'
when 'webp'
'image/webp'
when 'mgpa', 'mp3', 'mpeg'
'audio/mpeg'
when 'm4a', 'mp4'
'audio/mp4'
when 'wav'
'audio/wav'
when 'ogg'
'audio/ogg'
when 'webm'
'audio/webm'
else
# Default to the extension as the subtype
"application/#{ext}"
end
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have a MimeType module.

Comment on lines -18 to +19
:supports_json_mode, :input_price_per_million, :output_price_per_million, :type, :family
:supports_json_mode, :input_price_per_million, :output_price_per_million,
:type, :family
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change?

Comment on lines +52 to +54
# Transcribe audio files
RubyLLM.transcribe "interview.wav"

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in a very awkward spot.

@crmne crmne linked an issue Jun 11, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add dedicated transcription interface for audio-to-text models
3 participants