Skip to content

1.5.0

Compare
Choose a tag to compare
@gabrielmbmb gabrielmbmb released this 17 Jan 08:28
· 16 commits to main since this release
b261b23

✨ Release highlights

🖼️ Image Generation Support

We're excited to introduce ImageGenerationModel, a new abstraction for working with image generation models. This addition enables seamless integration with models that can transform text prompts into images.

Available Services

  • 🤗 InferenceEndpointsImageGeneration: Integration with Hugging Face's Inference Endpoints
  • OpenAIImageGeneration: Integration with OpenAI's DALL-E

Architecture

Just as LLMs are used by a Task, we've introduced ImageTask as a high-level abstraction for image generation workflows. ImageTask defines how a step should use an ImageGenerationModel to accomplish specific image generation tasks.

Our first implementation, the ImageGeneration task, provides a straightforward interface: given a text prompt, it generates the corresponding image, leveraging any of the supported image generation models.

We've also added a small tutorial on how to generate images using distilabel: distilabel - Tutorials - Image generation with distilabel

Images as inputs for LLMs

We've added initial support for providing images as input to an LLM through the new TextGenerationWithImage task. We've updated and tested InferenceEndpointsLLM and OpenAILLM with this new task, but we'll image as input compatibility in the next releases for others such as vLLM.

Check the tutorial distilabel - Tutorials - Text generation with images in distilabel to get started!

💻 New MlxLLM integration

We've integrated mlx-lm package with the new MlxLLM class, enabling native machine learning acceleration on Apple Silicon Macs. This integration supercharges synthetic data generation by leveraging MLX's highly optimized framework designed specifically for the M-series chips.

New InstructionResponsePipeline template

We've started making changes so distilabel is easier to use since minute one. We'll start adding presets or templates that allows to quickly get a pipeline with some sensible preconfigured defaults for generating data for certain tasks. The first task we've worked on is the SFT or Instruction Response tuning pipeline which you can use like:

from distilabel.pipeline import InstructionResponsePipeline

pipeline = InstructionResponsePipeline()
distiset = pipeline.run()

Define load stages

We've added a way for users to define which steps of the pipeline should be loaded together, allowing for more efficient resource management and better control over the execution flow. This new feature is particularly useful in scenarios where resource-constrained environments limit the ability to execute all steps simultaneously, requiring steps to be executed in distinct stages.

We've added a detailed guide on how to use this feature: distilabel - How-to guides - Load groups and execution stages.

What's Changed

New Contributors

Full Changelog: 1.4.2...1.5.0