-
-
Notifications
You must be signed in to change notification settings - Fork 160
Add RubyLLM.transcribe method. #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add RubyLLM.transcribe method. #97
Conversation
Add Transcription module to Providers::OpenAI. Model now defaults to whisper-1 and does not need prompt. Add transcription docs. Make transcription model default a config option. Enhance Content.mime_type_for to support audio content types. Simplify RubyLLM unit tests to only test that correct methods are called, no LLM access.
# Conflicts: # lib/ruby_llm/configuration.rb
@crmne I believe this PR is ready. It took a surprising amount of code and time to implement, but I think it adds value beyond a simple transcribe method and is a good addition to the code base. I hope you think so too! :) |
@crmne any chance of getting this merged? 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR doesn't follow the RubyLLM design in multiple ways.
- Take a look at
RubyLLM::Image
andRubyLLM::Embedding
. - It only implements it for OpenAI.
- The OpenAI implementation doesn't follow how the providers are implemented in RubyLLM.
- There are extra changes that have nothing to do with this PR (vibe coded?)
- There are no VCR cassettes.
I'd be happy to merge it if you are willing to improve it.
def api_base | ||
'https://api.openai.com/v1' | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this. It's at the top level of the provider module.
def transcription_url | ||
"#{api_base}/audio/transcriptions" | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove api_base
from this method. It should only contain the path, not the host.
def self.extended(base) | ||
# module_function causes the 'transcribe' method to be private, but we need it to be public | ||
base.public_class_method :transcribe | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for this. simply move your transcribe above module_function
module OpenAI | ||
# Handles audio transcription functionality for the OpenAI API | ||
module Transcription | ||
# Helper methods as module_function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this comment.
def post_multipart(url, payload) | ||
connection = Faraday.new(url: api_base) do |f| | ||
f.request :multipart | ||
f.request :url_encoded | ||
f.adapter Faraday.default_adapter | ||
end | ||
|
||
response = connection.post(url) do |req| | ||
req.headers.merge!(headers) | ||
req.body = payload | ||
end | ||
|
||
JSON.parse(response.body) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really should not be in the Transcription module of the OpenAI provider. This is a generic method that should go in RubyLLM::Connection
def headers | ||
{ | ||
'Authorization' => "Bearer #{RubyLLM.config.openai_api_key}" | ||
} | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this here?
# Determine the MIME type based on file extension | ||
def self.mime_type_for(path) # rubocop:disable Metrics/CyclomaticComplexity, Metrics/MethodLength | ||
ext = File.extname(path).delete('.').downcase | ||
|
||
case ext | ||
when 'jpeg', 'jpg' | ||
'image/jpeg' | ||
when 'png' | ||
'image/png' | ||
when 'gif' | ||
'image/gif' | ||
when 'webp' | ||
'image/webp' | ||
when 'mgpa', 'mp3', 'mpeg' | ||
'audio/mpeg' | ||
when 'm4a', 'mp4' | ||
'audio/mp4' | ||
when 'wav' | ||
'audio/wav' | ||
when 'ogg' | ||
'audio/ogg' | ||
when 'webm' | ||
'audio/webm' | ||
else | ||
# Default to the extension as the subtype | ||
"application/#{ext}" | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now have a MimeType
module.
:supports_json_mode, :input_price_per_million, :output_price_per_million, :type, :family | ||
:supports_json_mode, :input_price_per_million, :output_price_per_million, | ||
:type, :family |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this change?
# Transcribe audio files | ||
RubyLLM.transcribe "interview.wav" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in a very awkward spot.
Add top level transcribe method to match chat, embed, paint. Addresses #92 .