Skip to content

[OPENAI] Support image edits with gpt-image-1 #152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

sbounmy
Copy link

@sbounmy sbounmy commented May 5, 2025

Still a draft but I use it in production in my app https://github.com/sbounmy/hongbao_bitcoin

Usage

avatar = User.last.avatar
image = RubyLLM.edit(
  "Transform into ghibli style",
  model: "gpt-image-1",
  with: { image: [ ActiveStorage::Blob.service.path_for(avatar.key) ] } # accepts a path or remote url
  options: { size: '1024x1024', quality: 'medium' }
)

image.to_blob # image to store 
image.usage # {'input_tokens' => 362, 'input_tokens_details' => { 'image_tokens' => 323, 'text_tokens' => 39 }, 'output_tokens' => 4160, 'total_tokens' => 4522 })
image.total_cost) # 0.17002
image.input_cost) # 0.00362
image.output_cost # 0.1664

Todo :

@sbounmy
Copy link
Author

sbounmy commented May 8, 2025

issue #138

sbounmy added 4 commits May 12, 2025 10:56
…ith-image

* 'main' of github.com:crmne/ruby_llm: (24 commits)
  Enhance Rails guide with detailed persistence flow explanation and setup instructions
  Remove work-in-progress warning from models documentation generation
  Add validation considerations for Message model and update persistence flow documentation
  Add note about upcoming OpenAI headers support in v1.3.0
  Handle OpenAI organization and project IDs (crmne#162)
  Refactor acts_as_message and acts_as_tool_call methods to improve parameter handling and default values
  Remove reasoning section from available models documentation and rake task
  Remove debug logging for pricing in OpenRouter models
  Updated models page
  Fixed pricing parsing for OpenRouter
  Updated models
  Add warning about work in progress for Parsera integration in available models documentation
  Major refactoring of ModelInfo and Parsera API support for listing LLM capabilities and pricing.
  Fix inflector (crmne#159)
  Use foreign_key instead of to_s for acts_as methods (crmne#157)
  Fixes #embed fails when using default embedding model
  Add support for logging to file via configuration (crmne#148)
  Updated acts_as_* helpers to use canonical 'rails-style' foreign keys (crmne#151)
  refactor(media): streamline content formatting methods across providers
  Fixed Calling `chat.to_llm` keeps appending messages to the message array
  ...
@sbounmy
Copy link
Author

sbounmy commented May 12, 2025

would be great to have your feedback @crmne

I am a bit struggling with capabilities for gpt-image-1 / models.json generation

@sbounmy sbounmy marked this pull request as ready for review May 12, 2025 09:36
@tpaulshippy tpaulshippy mentioned this pull request May 22, 2025
crmne pushed a commit that referenced this pull request May 22, 2025
Resolves #200 

Seems pretty simple to me. Borrowed the model passing idea from #152
@sbounmy
Copy link
Author

sbounmy commented Jun 5, 2025

@crmne would be great to not ghost PRs. at least pin out what could be done better if you don't agree

@crmne
Copy link
Owner

crmne commented Jun 5, 2025

Thanks for the work on this. Image editing is definitely in scope, but I'd prefer extending the existing paint method rather than adding a separate edit method:

# Generate from scratch (current behavior)
RubyLLM.paint("a sunset over mountains")

# Edit existing image (new behavior)  
RubyLLM.paint("make it more vibrant", with: "path/to/image.png")

This keeps the API consistent with how chat.ask handles attachments.

On "ghosting": I respond when I can. This is unpaid work I do between running my business and other priorities. Characterizing my delayed responses as "ghosting" is inappropriate and creates a toxic environment.

I'll review this properly when I have time.

@sbounmy
Copy link
Author

sbounmy commented Jun 5, 2025

@crmne thats the feeling I had, my bad if you were hurt by "ghosting".

As you know we also contribute (for free), when a PR doesn't get merged and we keep having conflicts as we try to catch up main branch which is frustrating.

Regarding the PR I wanted to use #paint but the edit API call involved different changes (multipart/form-data etc) that I thought might be cleaner to have a separate one. #138

Thanks for maintaining this gem

@crmne
Copy link
Owner

crmne commented Jun 5, 2025

I appreciate the apology. The merge conflict frustration is understandable.

I still prefer extending paint rather than adding edit. The technical complexity (multipart vs JSON) should be hidden from users - that's an implementation detail. Having both methods for what's essentially the same operation (image generation) is confusing.

The API should be:

RubyLLM.paint("prompt") # generate
RubyLLM.paint("prompt", with: "path") # edit

This matches how chat.ask handles attachments and keeps the interface clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants