Releases: argilla-io/distilabel
Releases · argilla-io/distilabel
1.0.0
What's Changed
- Add
Step
abstract class and newPipeline
by @gabrielmbmb in #338 - Add runtime parameters validation by @gabrielmbmb in #345
- Pipeline local execution by @gabrielmbmb in #346
- Add
Task
(minimal implementation) by @alvarobartt in #347 - Refactor
_BatchManager
to have list of batches per step by @gabrielmbmb in #353 - Refactor getting parameters from
Step.process
method by @gabrielmbmb in #355 - Add
LLM
,OpenAILLM
,TransformersLLM
, andLlamaCppLLM
by @alvarobartt in #354 - Fix
Task
andTextGeneration
by @alvarobartt in #356 - Add
combine_dicts
function andCombineColumns
class by @alvarobartt in #358 - Add
PushToHub
step and fixtyping
by @alvarobartt in #357 - Add serialization for the new components by @plaguss in #349
- Fix
OpenAILLM.api_key
due toSecretStr
andStepInput
wrong imports by @alvarobartt in #359 - Add
GlobalStep
, fix_BatchManager
, and addlogging
by @alvarobartt in #362 - Migrate vllm to the new API by @plaguss in #361
- Update
_BatchManager
to work withGlobalStep
s andinput_batch_size
per step by @gabrielmbmb in #366 - Clean up outdated / unused files by @alvarobartt in #369
- Add
input_mappings
andoutput_mappings
attributes by @gabrielmbmb in #367 - Move batching from
Task
toLLM
, fixvLLM.generate
and addDISTILABEL_LOG_LEVEL
by @alvarobartt in #371 - Improve runtime parameter definition by @gabrielmbmb in #372
- Add
AsyncOpenAI
and updateOpenAILLM
accordingly by @alvarobartt in #381 - Update serde by @gabrielmbmb in #382
- Add
MistralLLM
and addgeneration_kwargs
asRuntimeParameters
by @alvarobartt in #383 - Move
steps
out ofpipeline
by @gabrielmbmb in #384 - Add tests and docstring for
Task
and subclasses by @alvarobartt in #385 - Add
step
decorator by @gabrielmbmb in #387 - Add
input
propagation throughTask.process
by @alvarobartt in #399 - Improve
Pipeline
error handling by @gabrielmbmb in #400 - Fix
combine_dicts
andStepInput
import inPushToHub
by @alvarobartt in #401 - Improve
GlobalStep
error handling by @gabrielmbmb in #402 - Changed " by italics in EvolInstruct tutorial where one "" was missing by @ignacioct in #398
- Add
get_last_hidden_states
method and updateTransformersLLM
by @gabrielmbmb in #414 - docs: correct small typos in tutorial by @sdiazlor in #419
- docs: readme positioning by @davidberenstein1957 in #386
- Add
num_generations
andgroup_generations
parameters toTask
by @gabrielmbmb in #416 - Add
Argilla
andPromptCompletionToArgilla
by @alvarobartt in #420 - Add
EvolInstruct
andEvolInstructGenerator
tasks by @alvarobartt in #407 - Wrap optional
LLM
dependencies underload
by @alvarobartt in #428 - Add
ComplexityScorer
task by @gabrielmbmb in #421 - Implement caching mechanism for the pipelines by @plaguss in #370
- Add method to Pipeline to handle keyboard interruptions via ctrl+c by @plaguss in #406
- Add
GenerateEmbeddings
task by @gabrielmbmb in #427 - Add
api_key
withinLLM.load
and addllm_kwargs
asRuntimeParameter
by @alvarobartt in #432 - Add
GeneratorStep.process
validation inDAG
and smaller fixes by @alvarobartt in #435 - Add
EvolComplexity
task by @davidberenstein1957 in #415 - Add
QualityScorer
Task by @ignacioct in #425 - Add
CudaDevicePlacementMixin
class by @gabrielmbmb in #436 - Return
distiset
fromPipeline.run
by @plaguss in #417 - Update README.md by @strickvl in #451
- Add
InferenceEndpointsLLM
by @alvarobartt in #439 - Fix
Distiset
afterPushToHub
and smaller fixes by @alvarobartt in #452 - Fix
Step.process_applying_mappings
by @alvarobartt in #453 - Add
AnyscaleLLM
by @davidberenstein1957 in #447 - Add general function to obtain schema for parquet writer by @plaguss in #454
- Add
TogetherLLM
by @davidberenstein1957 in #449 - Fix
LLM
subclasses based onOpenAILLM
by @alvarobartt in #455 - Improve batching and caching by @gabrielmbmb in #457
- Add
EvolQuality
task by @davidberenstein1957 in #429 - Add
VertexAILLM
by @davidberenstein1957 in #445 - Add
use_cache
toBasePipeline
by @plaguss in #463 - Add
AnthropicLLM
by @sdiazlor in #444 - Add
multiprocess
dependency by @gabrielmbmb in #467 - Add
UltraFeedback
by @alvarobartt in #464 - Add
OllamaLLM
by @davidberenstein1957 in #405 - Add
RuntimeParametersMixin
andLLM
runtime parameters by @gabrielmbmb in #466 - Add
LiteLLM
by @davidberenstein1957 in #441 - Add CLI by @gabrielmbmb in #471
- Set
_batch_manager
toNone
after run by @gabrielmbmb in #473 - Add create_distiset function by @plaguss in #480
- Add
overload
tostep
decorator by @gabrielmbmb in #474 - Move Enum to Dict[str, str] to avoid serialization errors during caching by @plaguss in #482
- Include a dataset card and the
pipeline.yaml
onDistiset.push_to_hub
by @plaguss in #479 - Add
PairRM
task for ranking responses by @plaguss in #450 - Update
_WriteBuffer
to write several parquet files by @gabrielmbmb in #483 - Extend
Argilla
integrationTextGeneration
,Preference
, and more by @alvarobartt in #472 - Add
DeitaFiltering
step by @gabrielmbmb in #481 - Add
InstructionBacktranslation
by @alvarobartt in #486 - Fix huggingface_hub TextGenerationError import by @Wauplin in #485
- Improve azure openai support by @BramVanroy in #461
- Add
SelfInstruct
task by @ignacioct in #456 - Use
QueueHandler
forPipeline
logging by @gabrielmbmb in #489 - Improve
_stop
andlogging
by @gabrielmbmb in #491 - Fix creating empty
Dataset
increate_distiset
function by @gabrielmbmb in #492 - Add imports from
__init__
modules by @gabrielmbmb in #493 batch_size
andinput_batch_size
runtime parameters by @gabrielmbmb in #495- Update serialization method of _BatchManager to write each step on its own file by @plaguss in #496
- Fix
asyncio
inAsyncLLM
to use the running event loop if any by @alvarobartt in #501 - Added authentication header to allow private/gated dataset use by @bjoernpl in https://github.com/argilla-io/distila...
0.6.0
What's Changed
- Fix typo in docstring of to_argilla metrics_ to metric_ by @burtenshaw in #334
- Implement a JSON responding OpenAI LLM as JSONOpenAILLM by @burtenshaw in #331
- Add examples for the deita paper tasks by @plaguss in #329
- Add checkpoint strategy to automatically push to hub by @plaguss in #321
- docs: update tutorials avoid argilla installation error by @sdiazlor in #337
- Fix
CustomDataset.load_from_disk
withstr
/Path
objects by @plaguss in #341 - Clalrify number of generations produced when using LLMPool in docs by @davanstrien in #339
- Refactor _build_dataset piece for speed by @plaguss in #344
- Fix documentation and type variables in
CustomDataset
checkpoint methods by @plaguss in #342 - US Spelling and other typo correction on Distilabel tutorials by @ignacioct in #324
- docs: add a tutorial for evolinstruct by @sdiazlor in #327
- Fix Openai api error with OpenAI-compatible providers by @jphme in #351
- Add fix for labels not returned by openai api by @plaguss in #364
- Refactor model availability check in is_serverless_endpoint_available by @davanstrien in #363
New Contributors
- @burtenshaw made their first contribution in #334
- @jphme made their first contribution in #351
Full Changelog: 0.5.0...0.6.0
0.5.0
What's Changed
- fix: Correct import error by @plaguss in #279
- fix: Filter examples for which len generations != len ratings by @plaguss in #284
- feat: Add sentence transformers support for the to argilla method by @davidberenstein1957 in #262
- feat: Add text descriptives support to the to argilla methods by @davidberenstein1957 in #271
- feat: Add
to_argilla
method toEvolInstructTask
generated datasets by @plaguss in #291 - docs: Shorten titles tutorials and update core example by @davidberenstein1957 in #289
- feat: Add new serialization strategy by @plaguss in #288
- feat: Review
OllamaLLM
andTogetherInferenceLLM
by @alvarobartt in #305 - refactor: Remove Metadata for Ratings by @ignacioct in #303
- docs: Add missing VertexAI information within
README.md
anddocs/index.md
by @alvarobartt in #308 - feat: Add functionality to push tasks to the HuggingFace hub and download them automatically. by @plaguss in #297
- feat: Add
ComplexityScorer
andQualityScorer
tasks from Deita by @plaguss in #302 - fix: Fix logging visualization of labeller pipelines by @plaguss in #310
- feat: Add
Improving Text Embeddings with LLMs
tutorial by @alvarobartt in #313 - feat: Add
EvolComplexity
andEvolQuality
by @davidberenstein1957 in #299 - feat: Add
validate_prompts
method to LLMs to help validating the prompts by @plaguss in #314 - fix: typo in clean an existing preference dataset by @sdiazlor in #312
- feat: Add new column for sft fine tuning with
prepare_dataset
by @plaguss in #309 - docs: Custom Task Documentation by @ignacioct in #275
- refactor: Align the
LLM
subclasses args by @alvarobartt in #315 - feat: Include rationale of the model responses on
prepare_dataset
if available by @plaguss in #317 - feat: Add embedding tutorial to docs by @ignacioct in #319
- feat: Add
MistralAILLM
by @plaguss in #293 - feat: Use
ollama
Python client withinOllamaLLM
by @sdiazlor in #307
Full Changelog: 0.4.0...0.5.0
0.4.0
What's Changed
- docs: Notus end2end example for preference and instruction generation by @ignacioct in #145
- docs: binders anchors by @ignacioct in #235
- feat: Add support for dedicated and serverless inference endpoints via inference API by @philschmid in #238
- docs: Update links to arxiv landing pages rather than PDFs by @davanstrien in #249
- feat: add ETA to progress bar and fix not showing the progress bar if irrelavant by @ignacioct in #253
- feat: Add Evol instruct task by @plaguss in #237
- docs: rename
enable_checkpoints
tocheckpoint_strategy
by @davidberenstein1957 in #257 - feat: Fixing progress bar and ETA by @ignacioct in #260
- fix: resolved error with self instruct to argilla method by @plaguss in #265
- chore: Add extra check in llmpool to ensure all the tasks share the same parent class by @plaguss in #266
- fix: fix for Notus tutorial after bug in record unwrap by @ignacioct in #267
- feat: add customizable criteria for query generation in SelfInstructTask by @ignacioct in #269
- docs: add a tutorial on "clean a DPO/preference dataset with distilabel" by @sdiazlor in #270
- feat: Add new functionality to binarize preference datasets directly from distilabel by @plaguss in #264
- feat: add support
ollama
api by @davidberenstein1957 in #250
New Contributors
- @philschmid made their first contribution in #238
- @davanstrien made their first contribution in #249
- @sdiazlor made their first contribution in #270
Full Changelog: 0.3.0...0.4.0
0.3.0
What's Changed
- Add
VertexAILLM
&VertexAIEndpointLLM
classes by @gabrielmbmb in #204 - Add draft with social cards by @plaguss in #197
- Relax
LLMPool
check to match parentTask
instead by @plaguss in #210 - Align
README.md
withdocs/
and minor fixes / improvements by @alvarobartt in #214 - Add
TogetherInferenceLLM
by @alvarobartt in #215 - Add checking valid
inputs
before calling_generate
by @gabrielmbmb in #216 - Add
TogetherInferenceLLM
tests by @alvarobartt in #217 - Add Vertex AI
LLM
s documentation by @gabrielmbmb in #222 - Documentation review by @alvarobartt in #223
- Rename
for_text_quality
tofor_overall_quality
method inUltraFeedbackTask
by @alvarobartt in #224 - Add Anyscale endpoints by @plaguss in #213
- Feature dataset checkpoint strategy by @plaguss in #194
- Fix
rating
parsing inRatingToArgillaMixin.to_argilla_record
by @alvarobartt in #227 - Add badges to readme by @plaguss in #226
- Fix badges by @dvsrepo in #228
- Update
LICENSE
and addLICENSE_HEADER
by @davidberenstein1957 in #221
Full Changelog: 0.2.1...0.3.0
0.2.1
What's Changed
- Fix
PrometheusTask
could not be imported by @gabrielmbmb in #190 - Fix
LLM.return_futures
by @gabrielmbmb in #192 - Remove learn section from docs until developed by @plaguss in #188
- Add markdown to fields by default by @plaguss in #189
- Fix
PrometheusTask
andUltraCMTask
could not be chained withTextGenerationTask
by @gabrielmbmb in #195 - Add missing
use_markdown
for every field by @plaguss in #196 - Add
to_argilla_{dataset,record}
forCritiqueTask
by @gabrielmbmb in #198 - Update
generate_prompt
inTask
subclasses to always returnPrompt
by @alvarobartt in #199 - Add
CritiqueTask
documentation by @alvarobartt in #200 - Fix
UltraCMTask
scoring range and alignargilla
imports by @alvarobartt in #201
Full Changelog: 0.2.0...0.2.1
0.2.0
What's Changed
- adds accelerate example by @edbeeching in #141
- Add a dry-run when calling
Pipeline.generate
by @alvarobartt in #146 - Add Notus format in
Prompt.format_as
and updateexamples/*.py
by @alvarobartt in #147 - Add
ProcessLLM
class by @gabrielmbmb in #151 - Adds
CritiqueTask
,UltraCMTask
and more by @alvarobartt in #152 - docs: add
llama.cpp
to extras by @davidberenstein1957 in #154 - Fix
_build_dataset
asprocessed_labels
were ignored by @plaguss in #158 - Add
to_argilla_{dataset,record}
methods inTextGenerationTask
by @alvarobartt in #159 - Fix
UltraFeedbackTask.to_argilla_dataset
ratings values by @alvarobartt in #160 - Align
typing
andtyping_extensions
with supported Python versions by @alvarobartt in #161 - Add
LLMPool
class by @gabrielmbmb in #156 - Add missing
CritiqueTask
andUltraCMTask
in__init__
and moveargilla_utils
toutils.argilla
by @alvarobartt in #162 - Add
test
workflow by @gabrielmbmb in #163 - Update
LLM
to returnFuture[List[List[LLMOutput]]]
by @gabrielmbmb in #164 - Add
PrometheusTask
by @alvarobartt in #165 - Randomise generations order by @gabrielmbmb in #167
- Add custom
to_argilla_{dataset,record}
toSelfInstructTask
by @alvarobartt in #169 - Fix
shuffle_before_labelling
and progress bar inPipeline.generate
by @alvarobartt in #170 - Replace
multiprocessing
withmultiprocess
by @gabrielmbmb in #171 - Refactor and improve docs by @plaguss in #134
- Fix
SelfInstructTask.{parse_output,to_argilla_record}
methods and_build_dataset
by @alvarobartt in #172 - Fix
results
didn't have same order asfutures
by @gabrielmbmb in #173 - Remove unnecesary plugin by @plaguss in #174
- Add
{generation,labelling}_model
column as metadata in Argilla by @alvarobartt in #175 - Fix exporting model name to Argilla with
LLMPool
by @gabrielmbmb in #177 - Update docs to include info about
ProcessLLM
andLLMPool
by @gabrielmbmb in #176
New Contributors
- @edbeeching made their first contribution in #141
- @davidberenstein1957 made their first contribution in #154
Full Changelog: 0.1.1...0.2.0
0.1.1
What's Changed
- Template for Documentation Issue created by @ignacioct in #128
- self.thread_pool_executor can be None, protecting it for print by @ignacioct in #129
- Use
do_sample
intransformers
example by @dvsrepo in #138 - Fix
llama-cpp
andhf-inference-endpoints
extras inpyproject.toml
by @plaguss in #139 - Fix
llama_cpp_python
dependency check by @plaguss in #140
New Contributors
- @ignacioct made their first contribution in #128
- @plaguss made their first contribution in #139
Full Changelog: 0.1.0...0.1.1
0.1.0
Stable Release - v0.1.0
0.1.0rc2
distilabel 0.1.0rc2