arxiv:2405.07863
Wei Xiong
weqweasdas
AI & ML interests
Machine learning, RLHF
Recent Activity
updated
a dataset
about 9 hours ago
selfcorrexp/type1_and_type2_separate_pr
updated
a dataset
about 9 hours ago
selfcorrexp/type1_and_halftype2_halftype3_and_halftype4_separate_pr
updated
a dataset
about 9 hours ago
selfcorrexp/llama3_non_delete_rr40k_3ep_dpo_gen_augmath_1_type4
Organizations
models
23
weqweasdas/zephyr-7b-dpo-full
Text Generation
•
Updated
•
14
weqweasdas/zephyr-7b-gemma-dpo
Updated
weqweasdas/zephyr-7b-sft-full
Updated
weqweasdas/zephyr-7b-dpo-qlora
Updated
weqweasdas/gpt2-cpt-dutch
Text Generation
•
Updated
•
70
weqweasdas/zephyr-7b-gemma-sft
Updated
weqweasdas/raft_baseline_zephyr_packing_model6_1_4_e6_weight085
Text Generation
•
Updated
•
8
weqweasdas/raft_baseline_zephyr_packing_model6_1_4_e6
Text Generation
•
Updated
•
8
weqweasdas/raft_baseline_zephyr_packing_model6
Text Generation
•
Updated
•
11
weqweasdas/raft_baseline_openchat_llama13b_model1
Text Generation
•
Updated
•
11
datasets
156
weqweasdas/llama3_openmath_em_ep1_tmp07_with_lesscorr_orm_rewards_vllmexp
Viewer
•
Updated
•
5k
•
6
weqweasdas/llama3_openmath_em_ep1_tmp10_with_lesscorr_orm_rewards_vllmexp
Viewer
•
Updated
•
5k
•
6
weqweasdas/llama3_sft_w2r125k_r2r60k_r60k_ep3_tmp10_vllmexp
Viewer
•
Updated
•
5k
•
6
weqweasdas/llama3_sft_balanced_rr60k_train_on_corr_ep3_full_testtmp07_vllmexp
Viewer
•
Updated
•
15k
•
6
weqweasdas/llama3_sft_balanced_rr60k_train_on_corr_ep3_full_testtmp10_vllmexp
Viewer
•
Updated
•
15k
•
5
weqweasdas/Hanning_Llama3-sft-less-corr-rr60k-3eptmp07_vllmexp
Viewer
•
Updated
•
5k
•
5
weqweasdas/Hanning_Llama3-sft-less-corr-rr60k-3eptmp10_vllmexp
Viewer
•
Updated
•
5k
•
7
weqweasdas/llama3_sft_balanced_rr60k_train_on_corr_ep3tmp07_vllmexp
Viewer
•
Updated
•
1k
•
8
weqweasdas/llama3_sft_balanced_rr60k_train_on_corr_ep3tmp10_vllmexp
Viewer
•
Updated
•
1k
•
9
weqweasdas/llama3_it_gen_tmp10_gold_tmpexp_prompt_tmp07_gen
Viewer
•
Updated
•
10k
•
7