1.5.0
✨ Release highlights
🖼️ Image Generation Support
We're excited to introduce ImageGenerationModel
, a new abstraction for working with image generation models. This addition enables seamless integration with models that can transform text prompts into images.
Available Services
- 🤗
InferenceEndpointsImageGeneration
: Integration with Hugging Face's Inference Endpoints OpenAIImageGeneration
: Integration with OpenAI's DALL-E
Architecture
Just as LLM
s are used by a Task
, we've introduced ImageTask
as a high-level abstraction for image generation workflows. ImageTask
defines how a step should use an ImageGenerationModel
to accomplish specific image generation tasks.
Our first implementation, the ImageGeneration
task, provides a straightforward interface: given a text prompt, it generates the corresponding image, leveraging any of the supported image generation models.
We've also added a small tutorial on how to generate images using distilabel
: distilabel - Tutorials - Image generation with distilabel
Images as inputs for LLM
s
We've added initial support for providing images as input to an LLM
through the new TextGenerationWithImage
task. We've updated and tested InferenceEndpointsLLM
and OpenAILLM
with this new task, but we'll image as input compatibility in the next releases for others such as vLLM
.
Check the tutorial distilabel - Tutorials - Text generation with images in distilabel
to get started!
💻 New MlxLLM
integration
We've integrated mlx-lm package with the new MlxLLM
class, enabling native machine learning acceleration on Apple Silicon Macs. This integration supercharges synthetic data generation by leveraging MLX's highly optimized framework designed specifically for the M-series chips.
New InstructionResponsePipeline
template
We've started making changes so distilabel
is easier to use since minute one. We'll start adding presets or templates that allows to quickly get a pipeline with some sensible preconfigured defaults for generating data for certain tasks. The first task we've worked on is the SFT or Instruction Response tuning pipeline which you can use like:
from distilabel.pipeline import InstructionResponsePipeline
pipeline = InstructionResponsePipeline()
distiset = pipeline.run()
Define load stages
We've added a way for users to define which steps of the pipeline should be loaded together, allowing for more efficient resource management and better control over the execution flow. This new feature is particularly useful in scenarios where resource-constrained environments limit the ability to execute all steps simultaneously, requiring steps to be executed in distinct stages.
We've added a detailed guide on how to use this feature: distilabel - How-to guides - Load groups and execution stages.
What's Changed
- Add common typing module by @plaguss in #1029
- docs: textcat tutorial by @sdiazlor in #949
- Add
task
decorator by @gabrielmbmb in #1028 - Update
docs
workflows to useuv
by @gabrielmbmb in #1032 - fix: simplify prompt template
ArgillaLabeller
by @davidberenstein1957 in #1033 - Add
dataset_batch_size
argument by @gabrielmbmb in #1039 - Move all LLMs to distilabel.models by @plaguss in #1045
- Fix a tiny typo in
_Step
docstring by @sadra-barikbin in #1051 - docs: improve docs for
MinHashDedup
Step
by @anakin87 in #1050 - Fix new response_format variable in openai api by @plaguss in #1053
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #1043
- Update
LLM.generate
output to includestatistics
by @plaguss in #1034 - Add example of structured output. by @plaguss in #1061
- feat: implenent basic SFT pipeline based on synthetic data generator by @burtenshaw in #1059
- fix: broken import in instruction by @burtenshaw in #1063
- Fix StepOutput type by @plaguss in #1072
- docs: update issue templates by @sdiazlor in #1074
- Update
unload
method fromvLLM
to properly free resources by @gabrielmbmb in #1077 - Add tasks to replicate Math-shepherd by @plaguss in #1052
- Add
load_groups
argument torun
by @gabrielmbmb in #1075 - Add
TextGenerationWithImage
task by @plaguss in #1066 - Create columns with
LLM
returned extra keys by @gabrielmbmb in #1078 - Fix
vLLM
unload logic when model isNone
by @gabrielmbmb in #1080 - Fix
merge_distilabel_metadata
function when handling outputs fromTask
withgroup_generations==True
by @gabrielmbmb in #1082 - chore: update base.py by @eltociear in #1085
- Add magpie support llama cpp ollama by @davidberenstein1957 in #1086
- Feat/954 llama cpp by @bikash119 in #1000
- fix import by replacing GeneratorOutput with GeneratorStepOutput by @davidberenstein1957 in #1093
- add mlx support by @davidberenstein1957 in #1089
- Support custom default headers in
OpenAILLM
class. by @khulaifi95 in #1088 - fix/pip install messages by @davidberenstein1957 in #1095
- Fix handling empty list statistics by @gabrielmbmb in #1094
- update to outlines010 by @davidberenstein1957 in #1092
- update: search by match by @sdiazlor in #1096
- Add Legend to Component Gallery Icons by @ParagEkbote in #1090
- Image Language Models and
ImageGeneration
task by @plaguss in #1060 - Update
LLM
s to support prompt logprobs use-case by @gabrielmbmb in #1099 1.5.0
by @gabrielmbmb in #1100
New Contributors
- @sadra-barikbin made their first contribution in #1051
- @anakin87 made their first contribution in #1050
- @pre-commit-ci made their first contribution in #1043
- @eltociear made their first contribution in #1085
- @bikash119 made their first contribution in #1000
- @khulaifi95 made their first contribution in #1088
- @ParagEkbote made their first contribution in #1090
Full Changelog: 1.4.2...1.5.0