Safetensors
English
qwen2
axolotl
dpo
trl
Eval Results

Enhancing Human-Like Responses in Large Language Models

   | πŸ€— Models   |    πŸ“Š Dataset   |    πŸ“„Paper   |

πŸš€ Human-Like-Qwen2.5-7B-Instruct

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct, specifically optimized to generate more human-like and conversational responses.

The fine-tuning process employed both Low-Rank Adaptation (LoRA) and Direct Preference Optimization (DPO) to enhance natural language understanding, conversational coherence, and emotional intelligence in interactions.

The proccess of creating this models is detailed in the research paper β€œEnhancing Human-Like Responses in Large Language Models”.

πŸ› οΈ Training Configuration

  • Base Model: Qwen2.5-7B-Instruct
  • Framework: Axolotl v0.4.1
  • Hardware: 2x NVIDIA A100 (80 GB) GPUs
  • Training Time: ~2 hours 15 minutes
  • Dataset: Synthetic dataset with β‰ˆ11,000 samples across 256 diverse topics
See axolotl config

axolotl version: 0.4.1

base_model: Qwen/Qwen2.5-7B-Instruct
model_type: AutoModalForCausalLM
tokenizer_type: AutoTokenizer

trust_remote_code: true

load_in_8bit: true
load_in_4bit: false
strict: false

chat_template: chatml
rl: dpo
datasets:
  - path: HumanLLMs/humanish-dpo-project
    type: chatml.prompt_pairs
    chat_template: chatml

dataset_prepared_path:
val_set_size: 0.05
output_dir: ./humanish-qwen2.5-7b-instruct

sequence_len: 8192
sample_packing: false
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 8
lora_alpha: 4
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: Humanish-DPO
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

hub_model_id: HumanLLMs/Humanish-Qwen2.5-7B-Instruct

gradient_accumulation_steps: 8
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
s2_attention:

warmup_steps: 10
evals_per_epoch: 2
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:

save_safetensors: true

πŸ’¬ Prompt Template

You can use ChatML prompt template while using the model:

ChatML

<|im_start|>system
{system}<|im_end|>
<|im_start|>user
{user}<|im_end|>
<|im_start|>assistant
{asistant}<|im_end|>

This prompt template is available as a chat template, which means you can format messages using the tokenizer.apply_chat_template() method:

messages = [
    {"role": "system", "content": "You are helpful AI asistant."},
    {"role": "user", "content": "Hello!"}
]
gen_input = tokenizer.apply_chat_template(message, return_tensors="pt")
model.generate(**gen_input)

πŸ€– Models

Model Download
Human-Like-Llama-3-8B-Instruct πŸ€— HuggingFace
Human-Like-Qwen-2.5-7B-Instruct πŸ€— HuggingFace
Human-Like-Mistral-Nemo-Instruct πŸ€— HuggingFace

🎯 Benchmark Results

Group Model Average IFEval BBH MATH Lvl 5 GPQA MuSR MMLU-PRO
Llama Models Human-Like-Llama-3-8B-Instruct 22.37 64.97 28.01 8.45 0.78 2.00 30.01
Llama-3-8B-Instruct 23.57 74.08 28.24 8.68 1.23 1.60 29.60
Difference (Human-Like) -1.20 -9.11 -0.23 -0.23 -0.45 +0.40 +0.41
Qwen Models Human-Like-Qwen-2.5-7B-Instruct 26.66 72.84 34.48 0.00 6.49 8.42 37.76
Qwen-2.5-7B-Instruct 26.86 75.85 34.89 0.00 5.48 8.45 36.52
Difference (Human-Like) -0.20 -3.01 -0.41 0.00 +1.01 -0.03 +1.24
Mistral Models Human-Like-Mistral-Nemo-Instruct 22.88 54.51 32.70 7.62 5.03 9.39 28.00
Mistral-Nemo-Instruct 23.53 63.80 29.68 5.89 5.37 8.48 27.97
Difference (Human-Like) -0.65 -9.29 +3.02 +1.73 -0.34 +0.91 +0.03

πŸ“Š Dataset

The dataset used for fine-tuning was generated using LLaMA 3 models. The dataset includes 10,884 samples across 256 distinct topics such as technology, daily life, science, history, and arts. Each sample consists of:

  • Human-like responses: Natural, conversational answers mimicking human dialogue.
  • Formal responses: Structured and precise answers with a more formal tone.

The dataset has been open-sourced and is available at:

More details on the dataset creation process can be found in the accompanying research paper.

πŸ“ Citation

@misc{Γ§alΔ±k2025enhancinghumanlikeresponseslarge,
      title={Enhancing Human-Like Responses in Large Language Models}, 
      author={Ethem Yağız Γ‡alΔ±k and Talha RΓΌzgar Akkuş},
      year={2025},
      eprint={2501.05032},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.05032}, 
}
Downloads last month
16
Safetensors
Model size
7.62B params
Tensor type
BF16
Β·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for HumanLLMs/Human-Like-Qwen2.5-7B-Instruct

Base model

Qwen/Qwen2.5-7B
Finetuned
(195)
this model
Finetunes
1 model
Merges
4 models
Quantizations
4 models

Dataset used to train HumanLLMs/Human-Like-Qwen2.5-7B-Instruct

Collection including HumanLLMs/Human-Like-Qwen2.5-7B-Instruct

Evaluation results