SentenceTransformer based on agentlans/deberta-v3-xsmall-zyda-2

This is a sentence-transformers model finetuned from agentlans/deberta-v3-xsmall-zyda-2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

It was finetuned in the same way as agentlans/deberta-v3-base-zyda-2-v2. However, the training loss is much higher probably due its small model size.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: agentlans/deberta-v3-xsmall-zyda-2
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("agentlans/deberta-v3-xsmall-zyda-2-v2")
# Run inference
sentences = [
    'The expansion of European colonies resulted in the dissemination of their cultural ideas and institutions to other regions.',
    'How long do dogs bleed during menstruation?',
    'The team added a second car for Thed Björk in 2006 , and was replaced by Richard Göransson in 2009 .',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,079,040 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 7 tokens
    • mean: 22.43 tokens
    • max: 104 tokens
    • min: 7 tokens
    • mean: 20.92 tokens
    • max: 77 tokens
    • min: 0.0
    • mean: 0.33
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    Can attaching a CAR with cab companies such as OLA, Taxi for Sure, and Meru Cabs result in financial gain? What are the final returns after factoring in all practical earnings and expenses? A problem is regarded as inherently difficult if its solution requires significant resources, whatever the algorithm used. 0.0
    She was loaned the money with the specific aim of providing for the child's needs. The Army's training and doctrine command spokesperson, Maj. Mike Kenfield, stated that the program had been recognized for its role in reducing non-lethal operations and that there were plans to expand the team's reach beyond Iraq and Afghanistan. 0.0
    Two rotavirus vaccines against Rotavirus A infection are safe and effective in children : Rotarix by GlaxoSmithKline and RotaTeq by Merck . contact lists were wiped after the makers of the game enjoyed by . 0.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step Training Loss
0.0074 500 2.6583
0.0148 1000 1.5993
0.0222 1500 1.0375
0.0297 2000 0.8232
0.0371 2500 0.6996
0.0445 3000 0.6607
0.0519 3500 0.6087
0.0593 4000 0.5447
0.0667 4500 0.5691
0.0741 5000 0.5576
0.0816 5500 0.5405
0.0890 6000 0.4901
0.0964 6500 0.5432
0.1038 7000 0.4969
0.1112 7500 0.5058
0.1186 8000 0.4935
0.1260 8500 0.5072
0.1335 9000 0.4525
0.1409 9500 0.5121
0.1483 10000 0.5217
0.1557 10500 0.5012
0.1631 11000 0.4475
0.1705 11500 0.4788
0.1779 12000 0.4687
0.1853 12500 0.4651
0.1928 13000 0.4056
0.2002 13500 0.485
0.2076 14000 0.4738
0.2150 14500 0.4194
0.2224 15000 0.4522
0.2298 15500 0.5182
0.2372 16000 0.4746
0.2447 16500 0.4762
0.2521 17000 0.4804
0.2595 17500 0.4041
0.2669 18000 0.4
0.2743 18500 0.4459
0.2817 19000 0.4258
0.2891 19500 0.4218
0.2966 20000 0.4951
0.3040 20500 0.4687
0.3114 21000 0.446
0.3188 21500 0.5007
0.3262 22000 0.4506
0.3336 22500 0.4916
0.3410 23000 0.403
0.3485 23500 0.4527
0.3559 24000 0.4052
0.3633 24500 0.4387
0.3707 25000 0.4238
0.3781 25500 0.4208
0.3855 26000 0.4363
0.3929 26500 0.429
0.4004 27000 0.4837
0.4078 27500 0.4042
0.4152 28000 0.465
0.4226 28500 0.4259
0.4300 29000 0.4342
0.4374 29500 0.4521
0.4448 30000 0.397
0.4523 30500 0.4213
0.4597 31000 0.4309
0.4671 31500 0.473
0.4745 32000 0.4081
0.4819 32500 0.3937
0.4893 33000 0.4402
0.4967 33500 0.4685
0.5042 34000 0.4309
0.5116 34500 0.4349
0.5190 35000 0.4357
0.5264 35500 0.5066
0.5338 36000 0.4424
0.5412 36500 0.4532
0.5486 37000 0.4576
0.5560 37500 0.4634
0.5635 38000 0.4742
0.5709 38500 0.4565
0.5783 39000 0.4613
0.5857 39500 0.385
0.5931 40000 0.4613
0.6005 40500 0.4129
0.6079 41000 0.4066
0.6154 41500 0.4372
0.6228 42000 0.4426
0.6302 42500 0.4561
0.6376 43000 0.4557
0.6450 43500 0.4163
0.6524 44000 0.3948
0.6598 44500 0.4461
0.6673 45000 0.4717
0.6747 45500 0.3877
0.6821 46000 0.4421
0.6895 46500 0.4977
0.6969 47000 0.433
0.7043 47500 0.4292
0.7117 48000 0.4749
0.7192 48500 0.4418
0.7266 49000 0.4091
0.7340 49500 0.412
0.7414 50000 0.465
0.7488 50500 0.4649
0.7562 51000 0.4311
0.7636 51500 0.4238
0.7711 52000 0.4228
0.7785 52500 0.4491
0.7859 53000 0.4434
0.7933 53500 0.4364
0.8007 54000 0.435
0.8081 54500 0.4196
0.8155 55000 0.4866
0.8230 55500 0.4684
0.8304 56000 0.4264
0.8378 56500 0.4061
0.8452 57000 0.4813
0.8526 57500 0.4596
0.8600 58000 0.4602
0.8674 58500 0.4342
0.8749 59000 0.4358
0.8823 59500 0.4693
0.8897 60000 0.4794
0.8971 60500 0.4515
0.9045 61000 0.4574
0.9119 61500 0.388
0.9193 62000 0.408
0.9267 62500 0.4204
0.9342 63000 0.4001
0.9416 63500 0.4995
0.9490 64000 0.477
0.9564 64500 0.4395
0.9638 65000 0.4498
0.9712 65500 0.4893
0.9786 66000 0.4205
0.9861 66500 0.4511
0.9935 67000 0.4393
1.0009 67500 0.4694
1.0083 68000 0.4305
1.0157 68500 0.4272
1.0231 69000 0.3722
1.0305 69500 0.4147
1.0380 70000 0.4019
1.0454 70500 0.4306
1.0528 71000 0.4514
1.0602 71500 0.4377
1.0676 72000 0.4222
1.0750 72500 0.4682
1.0824 73000 0.4684
1.0899 73500 0.4234
1.0973 74000 0.4583
1.1047 74500 0.4659
1.1121 75000 0.4413
1.1195 75500 0.4591
1.1269 76000 0.4363
1.1343 76500 0.4202
1.1418 77000 0.4485
1.1492 77500 0.4817
1.1566 78000 0.4796
1.1640 78500 0.4041
1.1714 79000 0.3975
1.1788 79500 0.4199
1.1862 80000 0.4582
1.1937 80500 0.4115
1.2011 81000 0.4636
1.2085 81500 0.4611
1.2159 82000 0.4025
1.2233 82500 0.4725
1.2307 83000 0.4905
1.2381 83500 0.4346
1.2456 84000 0.4832
1.2530 84500 0.465
1.2604 85000 0.3884
1.2678 85500 0.4228
1.2752 86000 0.4086
1.2826 86500 0.4548
1.2900 87000 0.4022
1.2974 87500 0.5155
1.3049 88000 0.4158
1.3123 88500 0.4638
1.3197 89000 0.4645
1.3271 89500 0.4357
1.3345 90000 0.4144
1.3419 90500 0.412
1.3493 91000 0.3951
1.3568 91500 0.4384
1.3642 92000 0.4292
1.3716 92500 0.391
1.3790 93000 0.4262
1.3864 93500 0.4783
1.3938 94000 0.4474
1.4012 94500 0.4367
1.4087 95000 0.4055
1.4161 95500 0.4471
1.4235 96000 0.4472
1.4309 96500 0.4555
1.4383 97000 0.4854
1.4457 97500 0.389
1.4531 98000 0.4308
1.4606 98500 0.4565
1.4680 99000 0.4344
1.4754 99500 0.4332
1.4828 100000 0.4179
1.4902 100500 0.4546
1.4976 101000 0.4667
1.5050 101500 0.4418
1.5125 102000 0.4462
1.5199 102500 0.4841
1.5273 103000 0.4768
1.5347 103500 0.4072
1.5421 104000 0.453
1.5495 104500 0.4863
1.5569 105000 0.5193
1.5644 105500 0.4476
1.5718 106000 0.4141
1.5792 106500 0.4454
1.5866 107000 0.4072
1.5940 107500 0.4339
1.6014 108000 0.4519
1.6088 108500 0.4432
1.6163 109000 0.4408
1.6237 109500 0.4438
1.6311 110000 0.4188
1.6385 110500 0.4621
1.6459 111000 0.3997
1.6533 111500 0.3953
1.6607 112000 0.4459
1.6681 112500 0.4905
1.6756 113000 0.4067
1.6830 113500 0.4705
1.6904 114000 0.4883
1.6978 114500 0.4553
1.7052 115000 0.4644
1.7126 115500 0.4733
1.7200 116000 0.4591
1.7275 116500 0.4112
1.7349 117000 0.4354
1.7423 117500 0.4771
1.7497 118000 0.4418
1.7571 118500 0.4927
1.7645 119000 0.4273
1.7719 119500 0.4424
1.7794 120000 0.4979
1.7868 120500 0.4479
1.7942 121000 0.4344
1.8016 121500 0.4285
1.8090 122000 0.444
1.8164 122500 0.4389
1.8238 123000 0.4661
1.8313 123500 0.4203
1.8387 124000 0.4452
1.8461 124500 0.4731
1.8535 125000 0.4654
1.8609 125500 0.4802
1.8683 126000 0.445
1.8757 126500 0.4279
1.8832 127000 0.4832
1.8906 127500 0.4754
1.8980 128000 0.4675
1.9054 128500 0.4248
1.9128 129000 0.4189
1.9202 129500 0.4098
1.9276 130000 0.4308
1.9351 130500 0.4118
1.9425 131000 0.4508
1.9499 131500 0.4327
1.9573 132000 0.4557
1.9647 132500 0.4688
1.9721 133000 0.4743
1.9795 133500 0.4362
1.9870 134000 0.4782
1.9944 134500 0.4441
2.0018 135000 0.4344
2.0092 135500 0.4414
2.0166 136000 0.4432
2.0240 136500 0.3841
2.0314 137000 0.4706
2.0388 137500 0.455
2.0463 138000 0.4336
2.0537 138500 0.4215
2.0611 139000 0.4369
2.0685 139500 0.4539
2.0759 140000 0.4395
2.0833 140500 0.4303
2.0907 141000 0.4272
2.0982 141500 0.4857
2.1056 142000 0.4832
2.1130 142500 0.4579
2.1204 143000 0.4695
2.1278 143500 0.4174
2.1352 144000 0.4167
2.1426 144500 0.4766
2.1501 145000 0.4676
2.1575 145500 0.4878
2.1649 146000 0.4259
2.1723 146500 0.4185
2.1797 147000 0.4656
2.1871 147500 0.4278
2.1945 148000 0.4322
2.2020 148500 0.4321
2.2094 149000 0.439
2.2168 149500 0.4254
2.2242 150000 0.5099
2.2316 150500 0.4311
2.2390 151000 0.4404
2.2464 151500 0.4868
2.2539 152000 0.4572
2.2613 152500 0.3887
2.2687 153000 0.4222
2.2761 153500 0.4465
2.2835 154000 0.4298
2.2909 154500 0.4386
2.2983 155000 0.5101
2.3058 155500 0.4677
2.3132 156000 0.4299
2.3206 156500 0.4585
2.3280 157000 0.4335
2.3354 157500 0.4298
2.3428 158000 0.4167
2.3502 158500 0.4132
2.3577 159000 0.4135
2.3651 159500 0.4453
2.3725 160000 0.4093
2.3799 160500 0.4249
2.3873 161000 0.4968
2.3947 161500 0.4763
2.4021 162000 0.4496
2.4095 162500 0.452
2.4170 163000 0.4688
2.4244 163500 0.3847
2.4318 164000 0.4752
2.4392 164500 0.4463
2.4466 165000 0.3764
2.4540 165500 0.4515
2.4614 166000 0.4342
2.4689 166500 0.4163
2.4763 167000 0.4306
2.4837 167500 0.4131
2.4911 168000 0.4657
2.4985 168500 0.446
2.5059 169000 0.4342
2.5133 169500 0.4293
2.5208 170000 0.4388
2.5282 170500 0.4935
2.5356 171000 0.4124
2.5430 171500 0.4519
2.5504 172000 0.4886
2.5578 172500 0.4552
2.5652 173000 0.4628
2.5727 173500 0.4277
2.5801 174000 0.4048
2.5875 174500 0.434
2.5949 175000 0.43
2.6023 175500 0.4637
2.6097 176000 0.4151
2.6171 176500 0.4334
2.6246 177000 0.4592
2.6320 177500 0.4548
2.6394 178000 0.4622
2.6468 178500 0.3954
2.6542 179000 0.417
2.6616 179500 0.4429
2.6690 180000 0.4639
2.6765 180500 0.3764
2.6839 181000 0.4809
2.6913 181500 0.4518
2.6987 182000 0.4526
2.7061 182500 0.464
2.7135 183000 0.4487
2.7209 183500 0.4213
2.7284 184000 0.3954
2.7358 184500 0.4081
2.7432 185000 0.4707
2.7506 185500 0.4218
2.7580 186000 0.4552
2.7654 186500 0.4371
2.7728 187000 0.4286
2.7802 187500 0.4626
2.7877 188000 0.4075
2.7951 188500 0.4263
2.8025 189000 0.4215
2.8099 189500 0.428
2.8173 190000 0.4919
2.8247 190500 0.459
2.8321 191000 0.4122
2.8396 191500 0.4404
2.8470 192000 0.4358
2.8544 192500 0.472
2.8618 193000 0.4541
2.8692 193500 0.4378
2.8766 194000 0.4281
2.8840 194500 0.4745
2.8915 195000 0.4642
2.8989 195500 0.4637
2.9063 196000 0.4311
2.9137 196500 0.3999
2.9211 197000 0.4125
2.9285 197500 0.426
2.9359 198000 0.4357
2.9434 198500 0.4743
2.9508 199000 0.4519
2.9582 199500 0.4294
2.9656 200000 0.4603
2.9730 200500 0.4824
2.9804 201000 0.4003
2.9878 201500 0.4161
2.9953 202000 0.4853

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.0
  • Transformers: 4.43.3
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.33.0
  • Datasets: 3.2.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
Downloads last month
5
Safetensors
Model size
70.7M params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for agentlans/deberta-v3-xsmall-zyda-2-v2

Finetuned
(4)
this model