mt5-small-finetuned-Drishtants-summaries

This model is a fine-tuned version of google/mt5-small on the None dataset. It achieves the following results on the evaluation set:

Loss: 1.8276
Rouge1: 0.3953
Rouge2: 0.2206
Rougel: 0.3789
Rougelsum: 0.3822

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5.6e-05
train_batch_size: 10
eval_batch_size: 10
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 40

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum
24.1138	1.0	13	15.3479	0.0044	0.0	0.0043	0.0044
19.7323	2.0	26	13.7879	0.0044	0.0	0.0043	0.0044
18.329	3.0	39	11.7699	0.0042	0.0	0.0039	0.0042
15.8092	4.0	52	12.9758	0.0067	0.0	0.0064	0.0067
13.8072	5.0	65	8.1803	0.0048	0.0	0.0048	0.0048
11.9323	6.0	78	6.4151	0.0048	0.0	0.0048	0.0048
10.8486	7.0	91	5.3122	0.0067	0.0	0.0067	0.0067
10.2067	8.0	104	5.1497	0.0098	0.0	0.0097	0.0096
9.4972	9.0	117	4.9039	0.0136	0.0	0.0135	0.0132
8.4609	10.0	130	3.9617	0.0272	0.0013	0.0273	0.0269
7.2721	11.0	143	3.4252	0.0526	0.0093	0.0522	0.0492
5.943	12.0	156	3.1756	0.0746	0.0170	0.0640	0.0658
5.5122	13.0	169	2.9797	0.0649	0.0121	0.0610	0.0573
5.1628	14.0	182	2.8133	0.0818	0.0215	0.0738	0.0733
4.9023	15.0	195	2.6725	0.0798	0.0262	0.0767	0.0765
4.4493	16.0	208	2.5408	0.0924	0.0348	0.0881	0.0891
4.3145	17.0	221	2.4332	0.0914	0.0361	0.0796	0.0800
3.978	18.0	234	2.3434	0.0952	0.0422	0.0835	0.0843
3.9377	19.0	247	2.2749	0.1289	0.0617	0.1138	0.1137
3.6415	20.0	260	2.2123	0.1701	0.0698	0.1471	0.1451
3.4801	21.0	273	2.1490	0.1682	0.0758	0.1497	0.1480
3.5114	22.0	286	2.0997	0.1885	0.0858	0.1658	0.1662
3.3784	23.0	299	2.0567	0.1971	0.0931	0.1730	0.1729
3.2501	24.0	312	2.0291	0.1969	0.0952	0.1752	0.1753
3.208	25.0	325	2.0057	0.1959	0.0883	0.1746	0.1753
3.0992	26.0	338	1.9769	0.1984	0.0961	0.1759	0.1762
2.9069	27.0	351	1.9474	0.1938	0.0975	0.1734	0.1734
3.0772	28.0	364	1.9259	0.1897	0.0978	0.1714	0.1710
2.8778	29.0	377	1.9098	0.1766	0.0934	0.1584	0.1582
2.8723	30.0	390	1.8937	0.1752	0.0860	0.1551	0.1551
2.8102	31.0	403	1.8786	0.1808	0.0889	0.1610	0.1603
2.8453	32.0	416	1.8660	0.1971	0.0919	0.1745	0.1752
2.925	33.0	429	1.8544	0.2724	0.1441	0.2562	0.2564
2.8222	34.0	442	1.8468	0.3749	0.2099	0.3583	0.3592
2.7711	35.0	455	1.8414	0.3950	0.2216	0.3742	0.3785
2.8176	36.0	468	1.8367	0.3953	0.2206	0.3789	0.3822
2.7044	37.0	481	1.8321	0.3947	0.2201	0.3781	0.3817
2.7696	38.0	494	1.8295	0.3953	0.2206	0.3789	0.3822
2.6015	39.0	507	1.8281	0.3953	0.2206	0.3789	0.3822
2.6849	40.0	520	1.8276	0.3953	0.2206	0.3789	0.3822

Framework versions

Transformers 4.47.1
Pytorch 2.5.1+cu121
Datasets 3.2.0
Tokenizers 0.21.0

ak2603
/

mt5-small-finetuned-Drishtants-summaries

mt5-small-finetuned-Drishtants-summaries

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for ak2603/mt5-small-finetuned-Drishtants-summaries

Evaluation results