You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today, SDG has logic to validate a taxonomy, determine changed leaf nodes, download knowledge documents, convert knowledge documents, and format them into a dataset that is then used as input to our data generation pipelines.
Instead of doing all that, we'd like SDG to really just focus on data generation and data mixing, so that we're more generically usable for any input dataset and not bound only to things using taxonomies or the broader InstructLab end-to-end workflows.
See the dev doc at https://github.com/instructlab/dev-docs/blob/main/docs/sdg/sdg-refactor.md for some larger context here. The scope of this issue is to only track moving the preprocessing sections into the core repository, at least within the scope of that dev doc. And the dev doc can be considered a living document that we may need to tweak a bit as we get into the process of doing this.
The text was updated successfully, but these errors were encountered:
Today, SDG has logic to validate a taxonomy, determine changed leaf nodes, download knowledge documents, convert knowledge documents, and format them into a dataset that is then used as input to our data generation pipelines.
Instead of doing all that, we'd like SDG to really just focus on data generation and data mixing, so that we're more generically usable for any input dataset and not bound only to things using taxonomies or the broader InstructLab end-to-end workflows.
See the dev doc at https://github.com/instructlab/dev-docs/blob/main/docs/sdg/sdg-refactor.md for some larger context here. The scope of this issue is to only track moving the preprocessing sections into the core repository, at least within the scope of that dev doc. And the dev doc can be considered a living document that we may need to tweak a bit as we get into the process of doing this.
The text was updated successfully, but these errors were encountered: