✨ Release highlights

🖼️ Image Generation Support

We're excited to introduce ImageGenerationModel, a new abstraction for working with image generation models. This addition enables seamless integration with models that can transform text prompts into images.

Available Services

🤗 InferenceEndpointsImageGeneration: Integration with Hugging Face's Inference Endpoints
OpenAIImageGeneration: Integration with OpenAI's DALL-E

Architecture

Just as LLMs are used by a Task, we've introduced ImageTask as a high-level abstraction for image generation workflows. ImageTask defines how a step should use an ImageGenerationModel to accomplish specific image generation tasks.

Our first implementation, the ImageGeneration task, provides a straightforward interface: given a text prompt, it generates the corresponding image, leveraging any of the supported image generation models.

We've also added a small tutorial on how to generate images using distilabel: distilabel - Tutorials - Image generation with distilabel

Images as inputs for `LLM`s

We've added initial support for providing images as input to an LLM through the new TextGenerationWithImage task. We've updated and tested InferenceEndpointsLLM and OpenAILLM with this new task, but we'll image as input compatibility in the next releases for others such as vLLM.

Check the tutorial distilabel - Tutorials - Text generation with images in distilabel to get started!

💻 New `MlxLLM` integration

We've integrated mlx-lm package with the new MlxLLM class, enabling native machine learning acceleration on Apple Silicon Macs. This integration supercharges synthetic data generation by leveraging MLX's highly optimized framework designed specifically for the M-series chips.

New `InstructionResponsePipeline` template

We've started making changes so distilabel is easier to use since minute one. We'll start adding presets or templates that allows to quickly get a pipeline with some sensible preconfigured defaults for generating data for certain tasks. The first task we've worked on is the SFT or Instruction Response tuning pipeline which you can use like:

from distilabel.pipeline import InstructionResponsePipeline

pipeline = InstructionResponsePipeline()
distiset = pipeline.run()

Define load stages

We've added a way for users to define which steps of the pipeline should be loaded together, allowing for more efficient resource management and better control over the execution flow. This new feature is particularly useful in scenarios where resource-constrained environments limit the ability to execute all steps simultaneously, requiring steps to be executed in distinct stages.

We've added a detailed guide on how to use this feature: distilabel - How-to guides - Load groups and execution stages.

What's Changed

Add common typing module by @plaguss in #1029
docs: textcat tutorial by @sdiazlor in #949
Add task decorator by @gabrielmbmb in #1028
Update docs workflows to use uv by @gabrielmbmb in #1032
fix: simplify prompt template ArgillaLabeller by @davidberenstein1957 in #1033
Add dataset_batch_size argument by @gabrielmbmb in #1039
Move all LLMs to distilabel.models by @plaguss in #1045
Fix a tiny typo in _Step docstring by @sadra-barikbin in #1051
docs: improve docs for MinHashDedup Step by @anakin87 in #1050
Fix new response_format variable in openai api by @plaguss in #1053
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #1043
Update LLM.generate output to include statistics by @plaguss in #1034
Add example of structured output. by @plaguss in #1061
feat: implenent basic SFT pipeline based on synthetic data generator by @burtenshaw in #1059
fix: broken import in instruction by @burtenshaw in #1063
Fix StepOutput type by @plaguss in #1072
docs: update issue templates by @sdiazlor in #1074
Update unload method from vLLM to properly free resources by @gabrielmbmb in #1077
Add tasks to replicate Math-shepherd by @plaguss in #1052
Add load_groups argument to run by @gabrielmbmb in #1075
Add TextGenerationWithImage task by @plaguss in #1066
Create columns with LLM returned extra keys by @gabrielmbmb in #1078
Fix vLLM unload logic when model is None by @gabrielmbmb in #1080
Fix merge_distilabel_metadata function when handling outputs from Task with group_generations==True by @gabrielmbmb in #1082
chore: update base.py by @eltociear in #1085
Add magpie support llama cpp ollama by @davidberenstein1957 in #1086
Feat/954 llama cpp by @bikash119 in #1000
fix import by replacing GeneratorOutput with GeneratorStepOutput by @davidberenstein1957 in #1093
add mlx support by @davidberenstein1957 in #1089
Support custom default headers in OpenAILLM class. by @khulaifi95 in #1088
fix/pip install messages by @davidberenstein1957 in #1095
Fix handling empty list statistics by @gabrielmbmb in #1094
update to outlines010 by @davidberenstein1957 in #1092
update: search by match by @sdiazlor in #1096
Add Legend to Component Gallery Icons by @ParagEkbote in #1090
Image Language Models and ImageGeneration task by @plaguss in #1060
Update LLMs to support prompt logprobs use-case by @gabrielmbmb in #1099
1.5.0 by @gabrielmbmb in #1100

New Contributors

@sadra-barikbin made their first contribution in #1051
@anakin87 made their first contribution in #1050
@pre-commit-ci made their first contribution in #1043
@eltociear made their first contribution in #1085
@bikash119 made their first contribution in #1000
@khulaifi95 made their first contribution in #1088
@ParagEkbote made their first contribution in #1090

Full Changelog: 1.4.2...1.5.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.5.0

✨ Release highlights

🖼️ Image Generation Support

Available Services

Architecture

Images as inputs for `LLM`s

💻 New `MlxLLM` integration

New `InstructionResponsePipeline` template

Define load stages

What's Changed

New Contributors

Contributors

1.5.0

✨ Release highlights

🖼️ Image Generation Support

Available Services

Architecture

Images as inputs for LLMs

💻 New MlxLLM integration

New InstructionResponsePipeline template

Define load stages

What's Changed

New Contributors

Contributors

Images as inputs for `LLM`s

💻 New `MlxLLM` integration

New `InstructionResponsePipeline` template