support two more calib datasets and fix embedding layer bug #653

wenhuach21 · 2025-07-09T05:58:23Z

No description provided.

Copilot

Pull Request Overview

This PR introduces support for the ultrachat_200k dataset, extends dataset registration to multiple aliases, and refines the embedding quantization logic.

Extend register_dataset decorator to accept multiple dataset names and integrate ultrachat_200k
Import and standardize load_dataset usage across existing dataset functions
Modify quantize_embedding_layer to return whether any layers were actually quantized

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File	Description
auto_round/utils.py	Sort `GGUF_CONFIG` keys for deterministic ordering in `_gguf_format`
auto_round/calib_dataset.py	Added `load_dataset` import, multi-name registration, ultrachat_200k support, and hardcoded dataset fixes
auto_round/autoround.py	Introduce `to_quantize` flag and change return value of `quantize_embedding_layer`

Comments suppressed due to low confidence (1)

auto_round/autoround.py:783

Changing the return value to to_quantize alters the previous always-True behavior. Downstream callers expecting True on completion may now misinterpret False as failure. Either update callers or restore the original return semantics and expose to_quantize via a separate API.

        return to_quantize

auto_round/calib_dataset.py

2 github code pick mit and apache

README.md

wenhuach21 · 2025-07-10T07:39:36Z

The lambada_openai evaluation appears to have some issues maybe due to the update of datasets library and couldn't be reproduced locally. Merging for now; will address the problem in a future fix.

wenhuach21 added 4 commits July 9, 2025 13:56

support dataset

4c5134d

refine a little

311e7a9

fix

b503054

fix embedding_layer bug

9cfba59

wenhuach21 changed the title ~~support ultrachat_200k dataset~~ support ultrachat_200k dataset and fix embedding layer bug Jul 9, 2025

wenhuach21 requested review from Copilot, WeiweiZhang1 and n1ck-guo July 9, 2025 08:04

Copilot AI reviewed Jul 9, 2025

View reviewed changes

wenhuach21 added 7 commits July 9, 2025 16:21

update readme

7bfe27c

refine

958b016

refine one variable name

29a8cd5

fix typo

77ebeca

tmp change

83c5858

1 shuffle and then take

0612c62

2 github code pick mit and apache

support one more dataset

92f3767

wenhuach21 changed the title ~~support ultrachat_200k dataset and fix embedding layer bug~~ support two more calib datasets and fix embedding layer bug Jul 10, 2025

n1ck-guo approved these changes Jul 10, 2025

View reviewed changes

WeiweiZhang1 approved these changes Jul 10, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

wenhuach21 added 2 commits July 10, 2025 13:46

fix split

84a77b2

fix typo

8b43380

wenhuach21 merged commit 7d72403 into main Jul 10, 2025
6 of 7 checks passed

wenhuach21 deleted the data_200k branch July 10, 2025 07:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support two more calib datasets and fix embedding layer bug #653

support two more calib datasets and fix embedding layer bug #653

Uh oh!

wenhuach21 commented Jul 9, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Jul 10, 2025

Uh oh!

Uh oh!

Uh oh!

support two more calib datasets and fix embedding layer bug #653

support two more calib datasets and fix embedding layer bug #653

Uh oh!

Conversation

wenhuach21 commented Jul 9, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Jul 10, 2025

Uh oh!

Uh oh!

Uh oh!