-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathgenerative_AI _engineering_and_fine-tuning_transformers.html
539 lines (538 loc) · 229 KB
/
generative_AI _engineering_and_fine-tuning_transformers.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
<html><head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<style>
.linenums {
list-style-type: none;
}
.formatted-line-numbers {
display: none;
}
.action-code-block {
display: none;
}
table {
border-collapse: collapse;
width: 100%;
}
table, th, td {
border: 1px solid black;
padding: 8px;
text-align: left;
}
</style>
</head><body><h1><span class="header-link octicon octicon-link"></span>Cheat Sheet: Generative AI Engineering and Fine-Tuning Transformers</h1><table>
<colgroup>
<col style="width:9%;">
<col style="width:57%;">
<col style="width:57%;">
</colgroup>
<thead>
<tr class="header">
<th>Package/Method</th>
<th>Description</th>
<th>Code example</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Positional encoding</td>
<td>Pivotal in transformers and sequence-to-sequence models, conveying
critical information regarding the positions or sequencing of elements
within a given sequence.
</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li></ol><ol class="linenums"><li class="L0"><span class="kwd">class</span><span class="pln"> </span><span class="typ">PositionalEncoding</span><span class="pun">(</span><span class="pln">nn</span><span class="pun">.</span><span class="typ">Module</span><span class="pun">):</span></li><li class="L1"><span class="pln"> </span><span class="str">"""</span></li><li class="L2"><span class="str"> https://pytorch.org/tutorials/beginner/transformer_tutorial.html</span></li><li class="L3"><span class="str"> """</span></li><li class="L4"><span class="pln"> </span><span class="kwd">def</span><span class="pln"> __init__</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">,</span><span class="pln"> d_model</span><span class="pun">,</span><span class="pln"> vocab_size</span><span class="pun">=</span><span class="lit">5000</span><span class="pun">,</span><span class="pln"> dropout</span><span class="pun">=</span><span class="lit">0.1</span><span class="pun">):</span></li><li class="L5"><span class="pln"> </span><span class="kwd">super</span><span class="pun">().</span><span class="pln">__init__</span><span class="pun">()</span></li><li class="L6"><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">dropout </span><span class="pun">=</span><span class="pln"> nn</span><span class="pun">.</span><span class="typ">Dropout</span><span class="pun">(</span><span class="pln">p</span><span class="pun">=</span><span class="pln">dropout</span><span class="pun">)</span></li><li class="L7"><span class="pln"> pe </span><span class="pun">=</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">zeros</span><span class="pun">(</span><span class="pln">vocab_size</span><span class="pun">,</span><span class="pln"> d_model</span><span class="pun">)</span></li><li class="L8"><span class="pln"> position </span><span class="pun">=</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">arange</span><span class="pun">(</span><span class="lit">0</span><span class="pun">,</span><span class="pln"> vocab_size</span><span class="pun">,</span><span class="pln"> dtype</span><span class="pun">=</span><span class="pln">torch</span><span class="pun">.</span><span class="kwd">float</span><span class="pun">).</span><span class="pln">unsqueeze</span><span class="pun">(</span><span class="lit">1</span><span class="pun">)</span></li><li class="L9"><span class="pln"> div_term </span><span class="pun">=</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">exp</span><span class="pun">(</span></li><li class="L0"><span class="pln"> torch</span><span class="pun">.</span><span class="pln">arange</span><span class="pun">(</span><span class="lit">0</span><span class="pun">,</span><span class="pln"> d_model</span><span class="pun">,</span><span class="pln"> </span><span class="lit">2</span><span class="pun">).</span><span class="kwd">float</span><span class="pun">()</span></li><li class="L1"><span class="pln"> </span><span class="pun">*</span><span class="pln"> </span><span class="pun">(-</span><span class="pln">math</span><span class="pun">.</span><span class="pln">log</span><span class="pun">(</span><span class="lit">10000.0</span><span class="pun">)</span><span class="pln"> </span><span class="pun">/</span><span class="pln"> d_model</span><span class="pun">)</span></li><li class="L2"><span class="pln"> </span><span class="pun">)</span></li><li class="L3"><span class="pln"> pe</span><span class="pun">[:,</span><span class="pln"> </span><span class="lit">0</span><span class="pun">::</span><span class="lit">2</span><span class="pun">]</span><span class="pln"> </span><span class="pun">=</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">sin</span><span class="pun">(</span><span class="pln">position </span><span class="pun">*</span><span class="pln"> div_term</span><span class="pun">)</span></li><li class="L4"><span class="pln"> pe</span><span class="pun">[:,</span><span class="pln"> </span><span class="lit">1</span><span class="pun">::</span><span class="lit">2</span><span class="pun">]</span><span class="pln"> </span><span class="pun">=</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">cos</span><span class="pun">(</span><span class="pln">position </span><span class="pun">*</span><span class="pln"> div_term</span><span class="pun">)</span></li><li class="L5"><span class="pln"> pe </span><span class="pun">=</span><span class="pln"> pe</span><span class="pun">.</span><span class="pln">unsqueeze</span><span class="pun">(</span><span class="lit">0</span><span class="pun">)</span></li><li class="L6"><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">register_buffer</span><span class="pun">(</span><span class="str">"pe"</span><span class="pun">,</span><span class="pln"> pe</span><span class="pun">)</span></li><li class="L7"><span class="pln"> </span><span class="kwd">def</span><span class="pln"> forward</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">,</span><span class="pln"> x</span><span class="pun">):</span></li><li class="L8"><span class="pln"> x </span><span class="pun">=</span><span class="pln"> x </span><span class="pun">+</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">pe</span><span class="pun">[:,</span><span class="pln"> </span><span class="pun">:</span><span class="pln"> x</span><span class="pun">.</span><span class="pln">size</span><span class="pun">(</span><span class="lit">1</span><span class="pun">),</span><span class="pln"> </span><span class="pun">:]</span></li><li class="L9"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">dropout</span><span class="pun">(</span><span class="pln">x</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-0">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>Importing IMBD data set</td>
<td>The IMDB data set contains movie reviews from the internet movie
database (IMDB) and is commonly used for binary sentiment classification
tasks. It's a popular data set for training and testing models in
natural language processing (NLP), particularly in sentiment analysis.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li></ol><ol class="linenums"><li class="L0"><span class="pln">urlopened </span><span class="pun">=</span><span class="pln"> urlopen</span><span class="pun">(</span><span class="str">'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/35t-FeC-2uN1ozOwPs7wFg.gz'</span><span class="pun">)</span></li><li class="L1"><span class="pln">tar </span><span class="pun">=</span><span class="pln"> tarfile</span><span class="pun">.</span><span class="pln">open</span><span class="pun">(</span><span class="pln">fileobj</span><span class="pun">=</span><span class="pln">io</span><span class="pun">.</span><span class="typ">BytesIO</span><span class="pun">(</span><span class="pln">urlopened</span><span class="pun">.</span><span class="pln">read</span><span class="pun">()))</span></li><li class="L2"><span class="pln">tempdir </span><span class="pun">=</span><span class="pln"> tempfile</span><span class="pun">.</span><span class="typ">TemporaryDirectory</span><span class="pun">()</span></li><li class="L3"><span class="pln">tar</span><span class="pun">.</span><span class="pln">extractall</span><span class="pun">(</span><span class="pln">tempdir</span><span class="pun">.</span><span class="pln">name</span><span class="pun">)</span></li><li class="L4"><span class="pln">tar</span><span class="pun">.</span><span class="pln">close</span><span class="pun">()</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-1">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>IMDBDataset class to create iterators for the train and test datasets</td>
<td>Creates iterators for training and testing data sets that involve
various steps, such as data loading, preprocessing, and creating
iterators.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li></ol><ol class="linenums"><li class="L0"><span class="pln">root_dir </span><span class="pun">=</span><span class="pln"> tempdir</span><span class="pun">.</span><span class="pln">name </span><span class="pun">+</span><span class="pln"> </span><span class="str">'/'</span><span class="pln"> </span><span class="pun">+</span><span class="pln"> </span><span class="str">'imdb_dataset'</span></li><li class="L1"><span class="pln">train_iter </span><span class="pun">=</span><span class="pln"> </span><span class="typ">IMDBDataset</span><span class="pun">(</span><span class="pln">root_dir</span><span class="pun">=</span><span class="pln">root_dir</span><span class="pun">,</span><span class="pln"> train</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">)</span><span class="pln"> </span><span class="com"># For training data</span></li><li class="L2"><span class="pln">test_iter </span><span class="pun">=</span><span class="pln"> </span><span class="typ">IMDBDataset</span><span class="pun">(</span><span class="pln">root_dir</span><span class="pun">=</span><span class="pln">root_dir</span><span class="pun">,</span><span class="pln"> train</span><span class="pun">=</span><span class="kwd">False</span><span class="pun">)</span><span class="pln"> </span><span class="com"># For test data</span></li><li class="L3"><span class="pln">start</span><span class="pun">=</span><span class="pln">train_iter</span><span class="pun">.</span><span class="pln">pos_inx</span></li><li class="L4"><span class="kwd">for</span><span class="pln"> i </span><span class="kwd">in</span><span class="pln"> range</span><span class="pun">(-</span><span class="lit">10</span><span class="pun">,</span><span class="lit">10</span><span class="pun">):</span></li><li class="L5"><span class="pln"> </span><span class="kwd">print</span><span class="pun">(</span><span class="pln">train_iter</span><span class="pun">[</span><span class="pln">start</span><span class="pun">+</span><span class="pln">i</span><span class="pun">])</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-2">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>GloVe embeddings</td>
<td>An unsupervised learning algorithm to obtain vector representations
for words. GloVe model is trained on the aggregated global word-to-word
co-occurrence statistics from a corpus, and the resulting
representations show linear substructures of the word vector base.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li><li>23</li><li>24</li><li>25</li></ol><ol class="linenums"><li class="L0"><span class="kwd">class</span><span class="pln"> </span><span class="typ">GloVe_override</span><span class="pun">(</span><span class="typ">Vectors</span><span class="pun">):</span></li><li class="L1"><span class="pln"> url </span><span class="pun">=</span><span class="pln"> </span><span class="pun">{</span></li><li class="L2"><span class="pln"> </span><span class="str">"6B"</span><span class="pun">:</span><span class="pln"> </span><span class="str">"https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/tQdezXocAJMBMPfUJx_iUg/glove-6B.zip"</span><span class="pun">,</span></li><li class="L3"><span class="pln"> </span><span class="pun">}</span></li><li class="L4"><span class="pln"> </span><span class="kwd">def</span><span class="pln"> __init__</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">,</span><span class="pln"> name</span><span class="pun">=</span><span class="str">"6B"</span><span class="pun">,</span><span class="pln"> dim</span><span class="pun">=</span><span class="lit">100</span><span class="pun">,</span><span class="pln"> </span><span class="pun">**</span><span class="pln">kwargs</span><span class="pun">)</span><span class="pln"> </span><span class="pun">-></span><span class="pln"> </span><span class="kwd">None</span><span class="pun">:</span></li><li class="L5"><span class="pln"> url </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">url</span><span class="pun">[</span><span class="pln">name</span><span class="pun">]</span></li><li class="L6"><span class="pln"> name </span><span class="pun">=</span><span class="pln"> </span><span class="str">"glove.{}.{}d.txt"</span><span class="pun">.</span><span class="pln">format</span><span class="pun">(</span><span class="pln">name</span><span class="pun">,</span><span class="pln"> str</span><span class="pun">(</span><span class="pln">dim</span><span class="pun">))</span></li><li class="L7"><span class="pln"> </span><span class="com">#name = "glove.{}/glove.{}.{}d.txt".format(name, name, str(dim))</span></li><li class="L8"><span class="pln"> </span><span class="kwd">super</span><span class="pun">(</span><span class="typ">GloVe_override</span><span class="pun">,</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">).</span><span class="pln">__init__</span><span class="pun">(</span><span class="pln">name</span><span class="pun">,</span><span class="pln"> url</span><span class="pun">=</span><span class="pln">url</span><span class="pun">,</span><span class="pln"> </span><span class="pun">**</span><span class="pln">kwargs</span><span class="pun">)</span></li><li class="L9"><span class="kwd">class</span><span class="pln"> </span><span class="typ">GloVe_override2</span><span class="pun">(</span><span class="typ">Vectors</span><span class="pun">):</span></li><li class="L0"><span class="pln"> url </span><span class="pun">=</span><span class="pln"> </span><span class="pun">{</span></li><li class="L1"><span class="pln"> </span><span class="str">"6B"</span><span class="pun">:</span><span class="pln"> </span><span class="str">"https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/tQdezXocAJMBMPfUJx_iUg/glove-6B.zip"</span><span class="pun">,</span></li><li class="L2"><span class="pln"> </span><span class="pun">}</span></li><li class="L3"><span class="pln"> </span><span class="kwd">def</span><span class="pln"> __init__</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">,</span><span class="pln"> name</span><span class="pun">=</span><span class="str">"6B"</span><span class="pun">,</span><span class="pln"> dim</span><span class="pun">=</span><span class="lit">100</span><span class="pun">,</span><span class="pln"> </span><span class="pun">**</span><span class="pln">kwargs</span><span class="pun">)</span><span class="pln"> </span><span class="pun">-></span><span class="pln"> </span><span class="kwd">None</span><span class="pun">:</span></li><li class="L4"><span class="pln"> url </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">url</span><span class="pun">[</span><span class="pln">name</span><span class="pun">]</span></li><li class="L5"><span class="pln"> </span><span class="com">#name = "glove.{}.{}d.txt".format(name, str(dim))</span></li><li class="L6"><span class="pln"> name </span><span class="pun">=</span><span class="pln"> </span><span class="str">"glove.{}/glove.{}.{}d.txt"</span><span class="pun">.</span><span class="pln">format</span><span class="pun">(</span><span class="pln">name</span><span class="pun">,</span><span class="pln"> name</span><span class="pun">,</span><span class="pln"> str</span><span class="pun">(</span><span class="pln">dim</span><span class="pun">))</span></li><li class="L7"><span class="pln"> </span><span class="kwd">super</span><span class="pun">(</span><span class="typ">GloVe_override2</span><span class="pun">,</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">).</span><span class="pln">__init__</span><span class="pun">(</span><span class="pln">name</span><span class="pun">,</span><span class="pln"> url</span><span class="pun">=</span><span class="pln">url</span><span class="pun">,</span><span class="pln"> </span><span class="pun">**</span><span class="pln">kwargs</span><span class="pun">)</span></li><li class="L8"><span class="kwd">try</span><span class="pun">:</span></li><li class="L9"><span class="pln"> glove_embedding </span><span class="pun">=</span><span class="pln"> </span><span class="typ">GloVe_override</span><span class="pun">(</span><span class="pln">name</span><span class="pun">=</span><span class="str">"6B"</span><span class="pun">,</span><span class="pln"> dim</span><span class="pun">=</span><span class="lit">100</span><span class="pun">)</span></li><li class="L0"><span class="kwd">except</span><span class="pun">:</span></li><li class="L1"><span class="pln"> </span><span class="kwd">try</span><span class="pun">:</span></li><li class="L2"><span class="pln"> glove_embedding </span><span class="pun">=</span><span class="pln"> </span><span class="typ">GloVe_override2</span><span class="pun">(</span><span class="pln">name</span><span class="pun">=</span><span class="str">"6B"</span><span class="pun">,</span><span class="pln"> dim</span><span class="pun">=</span><span class="lit">100</span><span class="pun">)</span></li><li class="L3"><span class="pln"> </span><span class="kwd">except</span><span class="pun">:</span></li><li class="L4"><span class="pln"> glove_embedding </span><span class="pun">=</span><span class="pln"> </span><span class="typ">GloVe</span><span class="pun">(</span><span class="pln">name</span><span class="pun">=</span><span class="str">"6B"</span><span class="pun">,</span><span class="pln"> dim</span><span class="pun">=</span><span class="lit">100</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-3">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>Building vocabulary object from pretrained GloVe
word embedding model</td>
<td>Involves various steps for creating a structured representation of words and their corresponding vector embeddings.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li></ol><ol class="linenums"><li class="L0"><span class="kwd">from</span><span class="pln"> torchtext</span><span class="pun">.</span><span class="pln">vocab </span><span class="kwd">import</span><span class="pln"> </span><span class="typ">GloVe</span><span class="pun">,</span><span class="pln">vocab</span></li><li class="L1"><span class="com"># Build vocab from glove_vectors</span></li><li class="L2"><span class="pln">vocab </span><span class="pun">=</span><span class="pln"> vocab</span><span class="pun">(</span><span class="pln">glove_embedding </span><span class="pun">.</span><span class="pln">stoi</span><span class="pun">,</span><span class="pln"> </span><span class="lit">0</span><span class="pun">,</span><span class="pln">specials</span><span class="pun">=(</span><span class="str">'<unk>'</span><span class="pun">,</span><span class="pln"> </span><span class="str">'<pad>'</span><span class="pun">))</span></li><li class="L3"><span class="pln">vocab</span><span class="pun">.</span><span class="pln">set_default_index</span><span class="pun">(</span><span class="pln">vocab</span><span class="pun">[</span><span class="str">"<unk>"</span><span class="pun">])</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-4">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>Convert the training and testing iterators to map-style data sets</td>
<td>The training data set will contain 95% of the samples in the
original training set, while the validation data set will contain the
remaining 5%. These data sets can be used for training and evaluating a
machine-learning model for text classification on the IMDB data set. The
final performance of the model will be evaluated on the hold-out test
set.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li></ol><ol class="linenums"><li class="L0"><span class="pln">train_dataset </span><span class="pun">=</span><span class="pln"> to_map_style_dataset</span><span class="pun">(</span><span class="pln">train_iter</span><span class="pun">)</span></li><li class="L1"><span class="pln">test_dataset </span><span class="pun">=</span><span class="pln"> to_map_style_dataset</span><span class="pun">(</span><span class="pln">test_iter</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-5">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>CUDA-compatible GPU</td>
<td>Available in the system using PyTorch, a popular deep-learning
framework. If a GPU is available, it assigns the device variable to
"cuda" (CUDA is the parallel computing platform and application
programming interface model developed by NVIDIA). If a GPU is not
available, it assigns the device variable to "cpu" (which means the code
will run on the CPU instead).</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li></ol><ol class="linenums"><li class="L0"><span class="pln">device </span><span class="pun">=</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">device</span><span class="pun">(</span><span class="str">"cuda"</span><span class="pln"> </span><span class="kwd">if</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">cuda</span><span class="pun">.</span><span class="pln">is_available</span><span class="pun">()</span><span class="pln"> </span><span class="kwd">else</span><span class="pln"> </span><span class="str">"cpu"</span><span class="pun">)</span></li><li class="L1"><span class="pln">device</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-6">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>collate_fn</td>
<td>Shows that collate_fn function is used in conjunction with data
loaders to customize the way batches are created from individual
samples. A collate_batch function in PyTorch is used with data loaders
to customize batch creation from individual samples. It processes a
batch of data, including labels and text sequences. It applies the
text_pipeline function to preprocess the text. The processed data is
then converted into PyTorch tensors and returned as a tuple containing
the label tensor, text tensor, and offsets tensor representing the
starting positions of each text sequence in the combined tensor. The
function also ensures that the returned tensors are moved to the
specified device (GPU) for efficient computation.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li></ol><ol class="linenums"><li class="L0"><span class="kwd">from</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">nn</span><span class="pun">.</span><span class="pln">utils</span><span class="pun">.</span><span class="pln">rnn </span><span class="kwd">import</span><span class="pln"> pad_sequence</span></li><li class="L1"><span class="kwd">def</span><span class="pln"> collate_batch</span><span class="pun">(</span><span class="pln">batch</span><span class="pun">):</span></li><li class="L2"><span class="pln"> label_list</span><span class="pun">,</span><span class="pln"> text_list </span><span class="pun">=</span><span class="pln"> </span><span class="pun">[],</span><span class="pln"> </span><span class="pun">[]</span></li><li class="L3"><span class="pln"> </span><span class="kwd">for</span><span class="pln"> _label</span><span class="pun">,</span><span class="pln"> _text </span><span class="kwd">in</span><span class="pln"> batch</span><span class="pun">:</span></li><li class="L4"><span class="pln"> label_list</span><span class="pun">.</span><span class="pln">append</span><span class="pun">(</span><span class="pln">_label</span><span class="pun">)</span></li><li class="L5"><span class="pln"> text_list</span><span class="pun">.</span><span class="pln">append</span><span class="pun">(</span><span class="pln">torch</span><span class="pun">.</span><span class="pln">tensor</span><span class="pun">(</span><span class="pln">text_pipeline</span><span class="pun">(</span><span class="pln">_text</span><span class="pun">),</span><span class="pln"> dtype</span><span class="pun">=</span><span class="pln">torch</span><span class="pun">.</span><span class="pln">int64</span><span class="pun">))</span></li><li class="L6"><span class="pln"> label_list </span><span class="pun">=</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">tensor</span><span class="pun">(</span><span class="pln">label_list</span><span class="pun">,</span><span class="pln"> dtype</span><span class="pun">=</span><span class="pln">torch</span><span class="pun">.</span><span class="pln">int64</span><span class="pun">)</span></li><li class="L7"><span class="pln"> text_list </span><span class="pun">=</span><span class="pln"> pad_sequence</span><span class="pun">(</span><span class="pln">text_list</span><span class="pun">,</span><span class="pln"> batch_first</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">)</span></li><li class="L8"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> label_list</span><span class="pun">.</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">),</span><span class="pln"> text_list</span><span class="pun">.</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-7">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>Convert the data set objects to data loaders</td>
<td>Used in PyTorch-based projects. It includes creating data set
objects, specifying data loading parameters, and converting these data
sets into data loaders.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li></ol><ol class="linenums"><li class="L0"><span class="pln">ATCH_SIZE </span><span class="pun">=</span><span class="pln"> </span><span class="lit">32</span></li><li class="L1"><span class="pln">train_dataloader </span><span class="pun">=</span><span class="pln"> </span><span class="typ">DataLoader</span><span class="pun">(</span></li><li class="L2"><span class="pln"> split_train_</span><span class="pun">,</span><span class="pln"> batch_size</span><span class="pun">=</span><span class="pln">BATCH_SIZE</span><span class="pun">,</span><span class="pln"> shuffle</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">,</span><span class="pln"> collate_fn</span><span class="pun">=</span><span class="pln">collate_batch</span></li><li class="L3"><span class="pun">)</span></li><li class="L4"><span class="pln">valid_dataloader </span><span class="pun">=</span><span class="pln"> </span><span class="typ">DataLoader</span><span class="pun">(</span></li><li class="L5"><span class="pln"> split_valid_</span><span class="pun">,</span><span class="pln"> batch_size</span><span class="pun">=</span><span class="pln">BATCH_SIZE</span><span class="pun">,</span><span class="pln"> shuffle</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">,</span><span class="pln"> collate_fn</span><span class="pun">=</span><span class="pln">collate_batch</span></li><li class="L6"><span class="pun">)</span></li><li class="L7"><span class="pln">test_dataloader </span><span class="pun">=</span><span class="pln"> </span><span class="typ">DataLoader</span><span class="pun">(</span></li><li class="L8"><span class="pln"> test_dataset</span><span class="pun">,</span><span class="pln"> batch_size</span><span class="pun">=</span><span class="pln">BATCH_SIZE</span><span class="pun">,</span><span class="pln"> shuffle</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">,</span><span class="pln"> collate_fn</span><span class="pun">=</span><span class="pln">collate_batch</span></li><li class="L9"><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-8">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>Predict function</td>
<td>The predict function takes in a text, a text pipeline, and a model
as inputs. It uses a pretrained model passed as a parameter to predict
the label of the text for text classification on the IMDB data set.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li></ol><ol class="linenums"><li class="L0"><span class="kwd">def</span><span class="pln"> predict</span><span class="pun">(</span><span class="pln">text</span><span class="pun">,</span><span class="pln"> text_pipeline</span><span class="pun">,</span><span class="pln"> model</span><span class="pun">):</span></li><li class="L1"><span class="pln"> </span><span class="kwd">with</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">no_grad</span><span class="pun">():</span></li><li class="L2"><span class="pln"> text </span><span class="pun">=</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">unsqueeze</span><span class="pun">(</span><span class="pln">torch</span><span class="pun">.</span><span class="pln">tensor</span><span class="pun">(</span><span class="pln">text_pipeline</span><span class="pun">(</span><span class="pln">text</span><span class="pun">)),</span><span class="lit">0</span><span class="pun">).</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">)</span></li><li class="L3"><span class="pln"> model</span><span class="pun">.</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">)</span></li><li class="L4"><span class="pln"> output </span><span class="pun">=</span><span class="pln"> model</span><span class="pun">(</span><span class="pln">text</span><span class="pun">)</span></li><li class="L5"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> imdb_label</span><span class="pun">[</span><span class="pln">output</span><span class="pun">.</span><span class="pln">argmax</span><span class="pun">(</span><span class="lit">1</span><span class="pun">).</span><span class="pln">item</span><span class="pun">()]</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-9">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>Training function</td>
<td>Helps in the training model, iteratively update the model's
parameters to minimize the loss function. It improves the model's
performance on a given task.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li><li>23</li><li>24</li><li>25</li><li>26</li><li>27</li><li>28</li><li>29</li><li>30</li><li>31</li><li>32</li><li>33</li><li>34</li><li>35</li><li>36</li><li>37</li><li>38</li><li>39</li></ol><ol class="linenums"><li class="L0"><span class="kwd">def</span><span class="pln"> train_model</span><span class="pun">(</span><span class="pln">model</span><span class="pun">,</span><span class="pln"> optimizer</span><span class="pun">,</span><span class="pln"> criterion</span><span class="pun">,</span><span class="pln"> train_dataloader</span><span class="pun">,</span><span class="pln"> valid_dataloader</span><span class="pun">,</span><span class="pln"> epochs</span><span class="pun">=</span><span class="lit">1000</span><span class="pun">,</span><span class="pln"> save_dir</span><span class="pun">=</span><span class="str">""</span><span class="pun">,</span><span class="pln"> file_name</span><span class="pun">=</span><span class="kwd">None</span><span class="pun">):</span></li><li class="L1"><span class="pln"> cum_loss_list </span><span class="pun">=</span><span class="pln"> </span><span class="pun">[]</span></li><li class="L2"><span class="pln"> acc_epoch </span><span class="pun">=</span><span class="pln"> </span><span class="pun">[]</span></li><li class="L3"><span class="pln"> acc_old </span><span class="pun">=</span><span class="pln"> </span><span class="lit">0</span></li><li class="L4"><span class="pln"> model_path </span><span class="pun">=</span><span class="pln"> os</span><span class="pun">.</span><span class="pln">path</span><span class="pun">.</span><span class="pln">join</span><span class="pun">(</span><span class="pln">save_dir</span><span class="pun">,</span><span class="pln"> file_name</span><span class="pun">)</span></li><li class="L5"><span class="pln"> acc_dir </span><span class="pun">=</span><span class="pln"> os</span><span class="pun">.</span><span class="pln">path</span><span class="pun">.</span><span class="pln">join</span><span class="pun">(</span><span class="pln">save_dir</span><span class="pun">,</span><span class="pln"> os</span><span class="pun">.</span><span class="pln">path</span><span class="pun">.</span><span class="pln">splitext</span><span class="pun">(</span><span class="pln">file_name</span><span class="pun">)[</span><span class="lit">0</span><span class="pun">]</span><span class="pln"> </span><span class="pun">+</span><span class="pln"> </span><span class="str">"_acc"</span><span class="pun">)</span></li><li class="L6"><span class="pln"> loss_dir </span><span class="pun">=</span><span class="pln"> os</span><span class="pun">.</span><span class="pln">path</span><span class="pun">.</span><span class="pln">join</span><span class="pun">(</span><span class="pln">save_dir</span><span class="pun">,</span><span class="pln"> os</span><span class="pun">.</span><span class="pln">path</span><span class="pun">.</span><span class="pln">splitext</span><span class="pun">(</span><span class="pln">file_name</span><span class="pun">)[</span><span class="lit">0</span><span class="pun">]</span><span class="pln"> </span><span class="pun">+</span><span class="pln"> </span><span class="str">"_loss"</span><span class="pun">)</span></li><li class="L7"><span class="pln"> time_start </span><span class="pun">=</span><span class="pln"> time</span><span class="pun">.</span><span class="pln">time</span><span class="pun">()</span></li><li class="L8"><span class="pln"> </span><span class="kwd">for</span><span class="pln"> epoch </span><span class="kwd">in</span><span class="pln"> tqdm</span><span class="pun">(</span><span class="pln">range</span><span class="pun">(</span><span class="lit">1</span><span class="pun">,</span><span class="pln"> epochs </span><span class="pun">+</span><span class="pln"> </span><span class="lit">1</span><span class="pun">)):</span></li><li class="L9"><span class="pln"> model</span><span class="pun">.</span><span class="pln">train</span><span class="pun">()</span></li><li class="L0"><span class="pln"> </span><span class="com">#print(model)</span></li><li class="L1"><span class="pln"> </span><span class="com">#for parm in model.parameters():</span></li><li class="L2"><span class="pln"> </span><span class="com"># print(parm.requires_grad)</span></li><li class="L3"><span class="pln"> cum_loss </span><span class="pun">=</span><span class="pln"> </span><span class="lit">0</span></li><li class="L4"><span class="pln"> </span><span class="kwd">for</span><span class="pln"> idx</span><span class="pun">,</span><span class="pln"> </span><span class="pun">(</span><span class="pln">label</span><span class="pun">,</span><span class="pln"> text</span><span class="pun">)</span><span class="pln"> </span><span class="kwd">in</span><span class="pln"> enumerate</span><span class="pun">(</span><span class="pln">train_dataloader</span><span class="pun">):</span></li><li class="L5"><span class="pln"> optimizer</span><span class="pun">.</span><span class="pln">zero_grad</span><span class="pun">()</span></li><li class="L6"><span class="pln"> label</span><span class="pun">,</span><span class="pln"> text </span><span class="pun">=</span><span class="pln"> label</span><span class="pun">.</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">),</span><span class="pln"> text</span><span class="pun">.</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">)</span></li><li class="L7"><span class="pln"> predicted_label </span><span class="pun">=</span><span class="pln"> model</span><span class="pun">(</span><span class="pln">text</span><span class="pun">)</span></li><li class="L8"><span class="pln"> loss </span><span class="pun">=</span><span class="pln"> criterion</span><span class="pun">(</span><span class="pln">predicted_label</span><span class="pun">,</span><span class="pln"> label</span><span class="pun">)</span></li><li class="L9"><span class="pln"> loss</span><span class="pun">.</span><span class="pln">backward</span><span class="pun">()</span></li><li class="L0"><span class="pln"> </span><span class="com">#print(loss)</span></li><li class="L1"><span class="pln"> torch</span><span class="pun">.</span><span class="pln">nn</span><span class="pun">.</span><span class="pln">utils</span><span class="pun">.</span><span class="pln">clip_grad_norm_</span><span class="pun">(</span><span class="pln">model</span><span class="pun">.</span><span class="pln">parameters</span><span class="pun">(),</span><span class="pln"> </span><span class="lit">0.1</span><span class="pun">)</span></li><li class="L2"><span class="pln"> optimizer</span><span class="pun">.</span><span class="pln">step</span><span class="pun">()</span></li><li class="L3"><span class="pln"> cum_loss </span><span class="pun">+=</span><span class="pln"> loss</span><span class="pun">.</span><span class="pln">item</span><span class="pun">()</span></li><li class="L4"><span class="pln"> </span><span class="kwd">print</span><span class="pun">(</span><span class="pln">f</span><span class="str">"Epoch {epoch}/{epochs} - Loss: {cum_loss}"</span><span class="pun">)</span></li><li class="L5"><span class="pln"> cum_loss_list</span><span class="pun">.</span><span class="pln">append</span><span class="pun">(</span><span class="pln">cum_loss</span><span class="pun">)</span></li><li class="L6"><span class="pln"> accu_val </span><span class="pun">=</span><span class="pln"> evaluate_no_tqdm</span><span class="pun">(</span><span class="pln">valid_dataloader</span><span class="pun">,</span><span class="pln">model</span><span class="pun">)</span></li><li class="L7"><span class="pln"> acc_epoch</span><span class="pun">.</span><span class="pln">append</span><span class="pun">(</span><span class="pln">accu_val</span><span class="pun">)</span></li><li class="L8"><span class="pln"> </span><span class="kwd">if</span><span class="pln"> model_path </span><span class="kwd">and</span><span class="pln"> accu_val </span><span class="pun">></span><span class="pln"> acc_old</span><span class="pun">:</span></li><li class="L9"><span class="pln"> </span><span class="kwd">print</span><span class="pun">(</span><span class="pln">accu_val</span><span class="pun">)</span></li><li class="L0"><span class="pln"> acc_old </span><span class="pun">=</span><span class="pln"> accu_val</span></li><li class="L1"><span class="pln"> </span><span class="kwd">if</span><span class="pln"> save_dir </span><span class="kwd">is</span><span class="pln"> </span><span class="kwd">not</span><span class="pln"> </span><span class="kwd">None</span><span class="pun">:</span></li><li class="L2"><span class="pln"> </span><span class="kwd">pass</span></li><li class="L3"><span class="pln"> </span><span class="com">#print("save model epoch",epoch)</span></li><li class="L4"><span class="pln"> </span><span class="com">#torch.save(model.state_dict(), model_path)</span></li><li class="L5"><span class="pln"> </span><span class="com">#save_list_to_file(lst=acc_epoch, filename=acc_dir)</span></li><li class="L6"><span class="pln"> </span><span class="com">#save_list_to_file(lst=cum_loss_list, filename=loss_dir)</span></li><li class="L7"><span class="pln"> time_end </span><span class="pun">=</span><span class="pln"> time</span><span class="pun">.</span><span class="pln">time</span><span class="pun">()</span></li><li class="L8"><span class="pln"> </span><span class="kwd">print</span><span class="pun">(</span><span class="pln">f</span><span class="str">"Training time: {time_end - time_start}"</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-10">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>Fine-tune a model in the AG News data set</td>
<td>Fine-tuning a model on the pretrained AG News data set is to
categorize news articles into one of four categories: Sports, Business,
Sci/Tech, or World. Start training a model from scratch on the AG News
data set. If you want to train the model for 2 epochs on a smaller data
set to demonstrate what the training process would look like, uncomment
the part that says ### Uncomment to Train ### before running the cell.
Training for 2 epochs on the reduced data set can take approximately 3
minutes.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li><li>23</li><li>24</li><li>25</li><li>26</li><li>27</li><li>28</li><li>29</li><li>30</li><li>31</li><li>32</li><li>33</li><li>34</li><li>35</li><li>36</li><li>37</li><li>38</li><li>39</li><li>40</li><li>41</li><li>42</li><li>43</li><li>44</li><li>45</li><li>46</li><li>47</li><li>48</li><li>49</li><li>50</li><li>51</li><li>52</li></ol><ol class="linenums"><li class="L0"><span class="pln">train_iter_ag_news </span><span class="pun">=</span><span class="pln"> AG_NEWS</span><span class="pun">(</span><span class="pln">split</span><span class="pun">=</span><span class="str">"train"</span><span class="pun">)</span></li><li class="L1"><span class="pln">num_class_ag_news </span><span class="pun">=</span><span class="pln"> len</span><span class="pun">(</span><span class="kwd">set</span><span class="pun">([</span><span class="pln">label </span><span class="kwd">for</span><span class="pln"> </span><span class="pun">(</span><span class="pln">label</span><span class="pun">,</span><span class="pln"> text</span><span class="pun">)</span><span class="pln"> </span><span class="kwd">in</span><span class="pln"> train_iter_ag_news </span><span class="pun">]))</span></li><li class="L2"><span class="pln">num_class_ag_news</span></li><li class="L3"><span class="com"># Split the dataset into training and testing iterators.</span></li><li class="L4"><span class="pln">train_iter_ag_news</span><span class="pun">,</span><span class="pln"> test_iter_ag_news </span><span class="pun">=</span><span class="pln"> AG_NEWS</span><span class="pun">()</span></li><li class="L5"><span class="com"># Convert the training and testing iterators to map-style datasets.</span></li><li class="L6"><span class="pln">train_dataset_ag_news </span><span class="pun">=</span><span class="pln"> to_map_style_dataset</span><span class="pun">(</span><span class="pln">train_iter_ag_news</span><span class="pun">)</span></li><li class="L7"><span class="pln">test_dataset_ag_news </span><span class="pun">=</span><span class="pln"> to_map_style_dataset</span><span class="pun">(</span><span class="pln">test_iter_ag_news</span><span class="pun">)</span></li><li class="L8"><span class="com"># Determine the number of samples to be used for training and validation (5% for validation).</span></li><li class="L9"><span class="pln">num_train_ag_news </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">int</span><span class="pun">(</span><span class="pln">len</span><span class="pun">(</span><span class="pln">train_dataset_ag_news</span><span class="pun">)</span><span class="pln"> </span><span class="pun">*</span><span class="pln"> </span><span class="lit">0.95</span><span class="pun">)</span></li><li class="L0"><span class="com"># Randomly split the training dataset into training and validation datasets using `random_split`.</span></li><li class="L1"><span class="com"># The training dataset will contain 95% of the samples, and the validation dataset will contain the remaining 5%.</span></li><li class="L2"><span class="pln">split_train_ag_news_</span><span class="pun">,</span><span class="pln"> split_valid_ag_news_ </span><span class="pun">=</span><span class="pln"> random_split</span><span class="pun">(</span><span class="pln">train_dataset_ag_news</span><span class="pun">,</span><span class="pln"> </span><span class="pun">[</span><span class="pln">num_train_ag_news</span><span class="pun">,</span><span class="pln"> len</span><span class="pun">(</span><span class="pln">train_dataset_ag_news</span><span class="pun">)</span><span class="pln"> </span><span class="pun">-</span><span class="pln"> num_train_ag_news</span><span class="pun">])</span></li><li class="L3"><span class="com"># Make the training set smaller to allow it to run fast as an example.</span></li><li class="L4"><span class="com"># IF YOU WANT TO TRAIN ON THE AG_NEWS DATASET, COMMENT OUT THE 2 LINEs BELOW.</span></li><li class="L5"><span class="com"># HOWEVER, NOTE THAT TRAINING WILL TAKE A LONG TIME</span></li><li class="L6"><span class="pln">num_train_ag_news </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">int</span><span class="pun">(</span><span class="pln">len</span><span class="pun">(</span><span class="pln">train_dataset_ag_news</span><span class="pun">)</span><span class="pln"> </span><span class="pun">*</span><span class="pln"> </span><span class="lit">0.05</span><span class="pun">)</span></li><li class="L7"><span class="pln">split_train_ag_news_</span><span class="pun">,</span><span class="pln"> _ </span><span class="pun">=</span><span class="pln"> random_split</span><span class="pun">(</span><span class="pln">split_train_ag_news_</span><span class="pun">,</span><span class="pln"> </span><span class="pun">[</span><span class="pln">num_train_ag_news</span><span class="pun">,</span><span class="pln"> len</span><span class="pun">(</span><span class="pln">split_train_ag_news_</span><span class="pun">)</span><span class="pln"> </span><span class="pun">-</span><span class="pln"> num_train_ag_news</span><span class="pun">])</span></li><li class="L8"><span class="pln">device </span><span class="pun">=</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">device</span><span class="pun">(</span><span class="str">"cuda"</span><span class="pln"> </span><span class="kwd">if</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">cuda</span><span class="pun">.</span><span class="pln">is_available</span><span class="pun">()</span><span class="pln"> </span><span class="kwd">else</span><span class="pln"> </span><span class="str">"cpu"</span><span class="pun">)</span></li><li class="L9"><span class="pln">device</span></li><li class="L0"><span class="kwd">def</span><span class="pln"> label_pipeline</span><span class="pun">(</span><span class="pln">x</span><span class="pun">):</span></li><li class="L1"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> </span><span class="kwd">int</span><span class="pun">(</span><span class="pln">x</span><span class="pun">)</span><span class="pln"> </span><span class="pun">-</span><span class="pln"> </span><span class="lit">1</span></li><li class="L2"><span class="kwd">from</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">nn</span><span class="pun">.</span><span class="pln">utils</span><span class="pun">.</span><span class="pln">rnn </span><span class="kwd">import</span><span class="pln"> pad_sequence</span></li><li class="L3"><span class="kwd">def</span><span class="pln"> collate_batch_ag_news</span><span class="pun">(</span><span class="pln">batch</span><span class="pun">):</span></li><li class="L4"><span class="pln"> label_list</span><span class="pun">,</span><span class="pln"> text_list </span><span class="pun">=</span><span class="pln"> </span><span class="pun">[],</span><span class="pln"> </span><span class="pun">[]</span></li><li class="L5"><span class="pln"> </span><span class="kwd">for</span><span class="pln"> _label</span><span class="pun">,</span><span class="pln"> _text </span><span class="kwd">in</span><span class="pln"> batch</span><span class="pun">:</span></li><li class="L6"><span class="pln"> label_list</span><span class="pun">.</span><span class="pln">append</span><span class="pun">(</span><span class="pln">label_pipeline</span><span class="pun">(</span><span class="pln">_label</span><span class="pun">))</span></li><li class="L7"><span class="pln"> text_list</span><span class="pun">.</span><span class="pln">append</span><span class="pun">(</span><span class="pln">torch</span><span class="pun">.</span><span class="pln">tensor</span><span class="pun">(</span><span class="pln">text_pipeline</span><span class="pun">(</span><span class="pln">_text</span><span class="pun">),</span><span class="pln"> dtype</span><span class="pun">=</span><span class="pln">torch</span><span class="pun">.</span><span class="pln">int64</span><span class="pun">))</span></li><li class="L8"><span class="pln"> label_list </span><span class="pun">=</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">tensor</span><span class="pun">(</span><span class="pln">label_list</span><span class="pun">,</span><span class="pln"> dtype</span><span class="pun">=</span><span class="pln">torch</span><span class="pun">.</span><span class="pln">int64</span><span class="pun">)</span></li><li class="L9"><span class="pln"> text_list </span><span class="pun">=</span><span class="pln"> pad_sequence</span><span class="pun">(</span><span class="pln">text_list</span><span class="pun">,</span><span class="pln"> batch_first</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">)</span></li><li class="L0"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> label_list</span><span class="pun">.</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">),</span><span class="pln"> text_list</span><span class="pun">.</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">)</span></li><li class="L1"><span class="pln">BATCH_SIZE </span><span class="pun">=</span><span class="pln"> </span><span class="lit">32</span></li><li class="L2"><span class="pln">train_dataloader_ag_news </span><span class="pun">=</span><span class="pln"> </span><span class="typ">DataLoader</span><span class="pun">(</span></li><li class="L3"><span class="pln"> split_train_ag_news_</span><span class="pun">,</span><span class="pln"> batch_size</span><span class="pun">=</span><span class="pln">BATCH_SIZE</span><span class="pun">,</span><span class="pln"> shuffle</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">,</span><span class="pln"> collate_fn</span><span class="pun">=</span><span class="pln">collate_batch_ag_news</span></li><li class="L4"><span class="pun">)</span></li><li class="L5"><span class="pln">valid_dataloader_ag_news </span><span class="pun">=</span><span class="pln"> </span><span class="typ">DataLoader</span><span class="pun">(</span></li><li class="L6"><span class="pln"> split_valid_ag_news_</span><span class="pun">,</span><span class="pln"> batch_size</span><span class="pun">=</span><span class="pln">BATCH_SIZE</span><span class="pun">,</span><span class="pln"> shuffle</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">,</span><span class="pln"> collate_fn</span><span class="pun">=</span><span class="pln">collate_batch_ag_news</span></li><li class="L7"><span class="pun">)</span></li><li class="L8"><span class="pln">test_dataloader_ag_news </span><span class="pun">=</span><span class="pln"> </span><span class="typ">DataLoader</span><span class="pun">(</span></li><li class="L9"><span class="pln"> test_dataset_ag_news</span><span class="pun">,</span><span class="pln"> batch_size</span><span class="pun">=</span><span class="pln">BATCH_SIZE</span><span class="pun">,</span><span class="pln"> shuffle</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">,</span><span class="pln"> collate_fn</span><span class="pun">=</span><span class="pln">collate_batch_ag_news</span></li><li class="L0"><span class="pun">)</span></li><li class="L1"><span class="pln">model_ag_news </span><span class="pun">=</span><span class="pln"> </span><span class="typ">Net</span><span class="pun">(</span><span class="pln">num_class</span><span class="pun">=</span><span class="lit">4</span><span class="pun">,</span><span class="pln">vocab_size</span><span class="pun">=</span><span class="pln">vocab_size</span><span class="pun">).</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">)</span></li><li class="L2"><span class="pln">model_ag_news</span><span class="pun">.</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">)</span></li><li class="L3"><span class="str">'''</span></li><li class="L4"><span class="str">### Uncomment to Train ###</span></li><li class="L5"><span class="str">LR=1</span></li><li class="L6"><span class="str">criterion = torch.nn.CrossEntropyLoss()</span></li><li class="L7"><span class="str">optimizer = torch.optim.SGD(model_ag_news.parameters(), lr=LR)</span></li><li class="L8"><span class="str">scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 1.0, gamma=0.1)</span></li><li class="L9"><span class="str">save_dir = ""</span></li><li class="L0"><span class="str">file_name = "model_AG News small1.pth"</span></li><li class="L1"><span class="str">train_model(model=model_ag_news, optimizer=optimizer, criterion=criterion, train_dataloader=train_dataloader_ag_news, valid_dataloader=valid_dataloader_ag_news, epochs=2, save_dir=save_dir, file_name=file_name)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-11">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>Cost and validation data accuracy for each epoch</td>
<td>Plots the cost and validation data accuracy for each epoch of the
pretrained model up to and including the epoch that yielded the highest
accuracy. As you can see, the pretrained model achieved a high accuracy
of over 90% on the AG News validation set.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li></ol><ol class="linenums"><li class="L0"><span class="pln">acc_urlopened </span><span class="pun">=</span><span class="pln"> urlopen</span><span class="pun">(</span><span class="str">'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/bQk8mJu3Uct3I4JEsEtRnw/model-AG%20News%20small1-acc'</span><span class="pun">)</span></li><li class="L1"><span class="pln">loss_urlopened </span><span class="pun">=</span><span class="pln"> urlopen</span><span class="pun">(</span><span class="str">'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/KNQkqJWWwY_XfbFBRFhZNA/model-AG%20News%20small1-loss'</span><span class="pun">)</span></li><li class="L2"><span class="pln">acc_epoch </span><span class="pun">=</span><span class="pln"> pickle</span><span class="pun">.</span><span class="pln">load</span><span class="pun">(</span><span class="pln">acc_urlopened</span><span class="pun">)</span></li><li class="L3"><span class="pln">cum_loss_list </span><span class="pun">=</span><span class="pln"> pickle</span><span class="pun">.</span><span class="pln">load</span><span class="pun">(</span><span class="pln">loss_urlopened</span><span class="pun">)</span></li><li class="L4"><span class="pln">plot</span><span class="pun">(</span><span class="pln">cum_loss_list</span><span class="pun">,</span><span class="pln">acc_epoch</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-12">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>Fine-tune the final layer</td>
<td>Fine-tuning the final output layer of a neural network is similar to
fine-tuning the whole model. You can begin by loading the pretrained
model you would like to fine-tune. In this case, the same model is
pretrained on the AG News data set.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li></ol><ol class="linenums"><li class="L0"><span class="pln">urlopened </span><span class="pun">=</span><span class="pln"> urlopen</span><span class="pun">(</span><span class="str">'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/9c3Dh2O_jsYBShBuchUNlg/model-AG%20News%20small1.pth'</span><span class="pun">)</span></li><li class="L1"><span class="pln">model_fine2 </span><span class="pun">=</span><span class="pln"> </span><span class="typ">Net</span><span class="pun">(</span><span class="pln">vocab_size</span><span class="pun">=</span><span class="pln">vocab_size</span><span class="pun">,</span><span class="pln"> num_class</span><span class="pun">=</span><span class="lit">4</span><span class="pun">).</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">)</span></li><li class="L2"><span class="pln">model_fine2</span><span class="pun">.</span><span class="pln">load_state_dict</span><span class="pun">(</span><span class="pln">torch</span><span class="pun">.</span><span class="pln">load</span><span class="pun">(</span><span class="pln">io</span><span class="pun">.</span><span class="typ">BytesIO</span><span class="pun">(</span><span class="pln">urlopened</span><span class="pun">.</span><span class="pln">read</span><span class="pun">()),</span><span class="pln"> map_location</span><span class="pun">=</span><span class="pln">device</span><span class="pun">))</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-13">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>Fine-tune full IMDB training set for 100 epoch</td>
<td>The code snippet helps achieve a well-optimized model that
accurately classifies movie reviews into positive or negative
sentiments.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li></ol><ol class="linenums"><li class="L0"><span class="pln">acc_urlopened </span><span class="pun">=</span><span class="pln"> urlopen</span><span class="pun">(</span><span class="str">'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/UdR3ApQnxSeV2mrA0CbiLg/model-fine2-acc'</span><span class="pun">)</span></li><li class="L1"><span class="pln">loss_urlopened </span><span class="pun">=</span><span class="pln"> urlopen</span><span class="pun">(</span><span class="str">'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/rWGDIF-uL2dEngWcIo9teQ/model-fine2-loss'</span><span class="pun">)</span></li><li class="L2"><span class="pln">acc_epoch </span><span class="pun">=</span><span class="pln"> pickle</span><span class="pun">.</span><span class="pln">load</span><span class="pun">(</span><span class="pln">acc_urlopened</span><span class="pun">)</span></li><li class="L3"><span class="pln">cum_loss_list </span><span class="pun">=</span><span class="pln"> pickle</span><span class="pun">.</span><span class="pln">load</span><span class="pun">(</span><span class="pln">loss_urlopened</span><span class="pun">)</span></li><li class="L4"><span class="pln">plot</span><span class="pun">(</span><span class="pln">cum_loss_list</span><span class="pun">,</span><span class="pln">acc_epoch</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-14">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>Adaptor model</td>
<td>FeatureAdapter is a neural network module that introduces a
low-dimensional bottleneck in a transformer architecture to allow
fine-tuning with fewer parameters. It compresses the original
high-dimensional embeddings into a lower dimension, applies a nonlinear
transformation, and then expands it back to the original dimension. This
process is followed by a residual connection that adds the transformed
output back to the original input to preserve information and promote
gradient flow.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li><li>23</li><li>24</li><li>25</li><li>26</li></ol><ol class="linenums"><li class="L0"><span class="kwd">class</span><span class="pln"> </span><span class="typ">FeatureAdapter</span><span class="pun">(</span><span class="pln">nn</span><span class="pun">.</span><span class="typ">Module</span><span class="pun">):</span></li><li class="L1"><span class="pln"> </span><span class="str">"""</span></li><li class="L2"><span class="str"> Attributes:</span></li><li class="L3"><span class="str"> size (int): The bottleneck dimension to which the embeddings are temporarily reduced.</span></li><li class="L4"><span class="str"> model_dim (int): The original dimension of the embeddings or features in the transformer model.</span></li><li class="L5"><span class="str"> """</span></li><li class="L6"><span class="pln"> </span><span class="kwd">def</span><span class="pln"> __init__</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">,</span><span class="pln"> bottleneck_size</span><span class="pun">=</span><span class="lit">50</span><span class="pun">,</span><span class="pln"> model_dim</span><span class="pun">=</span><span class="lit">100</span><span class="pun">):</span></li><li class="L7"><span class="pln"> </span><span class="kwd">super</span><span class="pun">().</span><span class="pln">__init__</span><span class="pun">()</span></li><li class="L8"><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">bottleneck_transform </span><span class="pun">=</span><span class="pln"> nn</span><span class="pun">.</span><span class="typ">Sequential</span><span class="pun">(</span></li><li class="L9"><span class="pln"> nn</span><span class="pun">.</span><span class="typ">Linear</span><span class="pun">(</span><span class="pln">model_dim</span><span class="pun">,</span><span class="pln"> bottleneck_size</span><span class="pun">),</span><span class="pln"> </span><span class="com"># Down-project to a smaller dimension</span></li><li class="L0"><span class="pln"> nn</span><span class="pun">.</span><span class="typ">ReLU</span><span class="pun">(),</span><span class="pln"> </span><span class="com"># Apply non-linearity</span></li><li class="L1"><span class="pln"> nn</span><span class="pun">.</span><span class="typ">Linear</span><span class="pun">(</span><span class="pln">bottleneck_size</span><span class="pun">,</span><span class="pln"> model_dim</span><span class="pun">)</span><span class="pln"> </span><span class="com"># Up-project back to the original dimension</span></li><li class="L2"><span class="pln"> </span><span class="pun">)</span></li><li class="L3"><span class="pln"> </span><span class="kwd">def</span><span class="pln"> forward</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">,</span><span class="pln"> x</span><span class="pun">):</span></li><li class="L4"><span class="pln"> </span><span class="str">"""</span></li><li class="L5"><span class="str"> Forward pass of the FeatureAdapter. Applies the bottleneck transformation to the input</span></li><li class="L6"><span class="str"> tensor and adds a skip connection.</span></li><li class="L7"><span class="str"> Args:</span></li><li class="L8"><span class="str"> x (Tensor): Input tensor with shape (batch_size, seq_length, model_dim).</span></li><li class="L9"><span class="str"> Returns:</span></li><li class="L0"><span class="str"> Tensor: Output tensor after applying the adapter transformation and skip connection,</span></li><li class="L1"><span class="str"> maintaining the original input shape.</span></li><li class="L2"><span class="str"> """</span></li><li class="L3"><span class="pln"> transformed_features </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">bottleneck_transform</span><span class="pun">(</span><span class="pln">x</span><span class="pun">)</span><span class="pln"> </span><span class="com"># Transform features through the bottleneck</span></li><li class="L4"><span class="pln"> output_with_residual </span><span class="pun">=</span><span class="pln"> transformed_features </span><span class="pun">+</span><span class="pln"> x </span><span class="com"># Add the residual connection</span></li><li class="L5"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> output_with_residual</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-15">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>Traverse the IMDB data set</td>
<td>This code snippet traverses the IMDB data set by obtaining, loading,
and exploring the data set. It also performs basic operations,
visualizes the data, and analyzes and interprets the data set.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li></ol><ol class="linenums"><li class="L0"><span class="kwd">class</span><span class="pln"> </span><span class="typ">IMDBDataset</span><span class="pun">(</span><span class="typ">Dataset</span><span class="pun">):</span></li><li class="L1"><span class="pln"> </span><span class="kwd">def</span><span class="pln"> __init__</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">,</span><span class="pln"> root_dir</span><span class="pun">,</span><span class="pln"> train</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">):</span></li><li class="L2"><span class="pln"> </span><span class="str">"""</span></li><li class="L3"><span class="str"> root_dir: The base directory of the IMDB dataset.</span></li><li class="L4"><span class="str"> train: A boolean flag indicating whether to use training or test data.</span></li><li class="L5"><span class="str"> """</span></li><li class="L6"><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">root_dir </span><span class="pun">=</span><span class="pln"> os</span><span class="pun">.</span><span class="pln">path</span><span class="pun">.</span><span class="pln">join</span><span class="pun">(</span><span class="pln">root_dir</span><span class="pun">,</span><span class="pln"> </span><span class="str">"train"</span><span class="pln"> </span><span class="kwd">if</span><span class="pln"> train </span><span class="kwd">else</span><span class="pln"> </span><span class="str">"test"</span><span class="pun">)</span></li><li class="L7"><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">neg_files </span><span class="pun">=</span><span class="pln"> </span><span class="pun">[</span><span class="pln">os</span><span class="pun">.</span><span class="pln">path</span><span class="pun">.</span><span class="pln">join</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">.</span><span class="pln">root_dir</span><span class="pun">,</span><span class="pln"> </span><span class="str">"neg"</span><span class="pun">,</span><span class="pln"> f</span><span class="pun">)</span><span class="pln"> </span><span class="kwd">for</span><span class="pln"> f </span><span class="kwd">in</span><span class="pln"> os</span><span class="pun">.</span><span class="pln">listdir</span><span class="pun">(</span><span class="pln">os</span><span class="pun">.</span><span class="pln">path</span><span class="pun">.</span><span class="pln">join</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">.</span><span class="pln">root_dir</span><span class="pun">,</span><span class="pln"> </span><span class="str">"neg"</span><span class="pun">))</span><span class="pln"> </span><span class="kwd">if</span><span class="pln"> f</span><span class="pun">.</span><span class="pln">endswith</span><span class="pun">(</span><span class="str">'.txt'</span><span class="pun">)]</span></li><li class="L8"><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">pos_files </span><span class="pun">=</span><span class="pln"> </span><span class="pun">[</span><span class="pln">os</span><span class="pun">.</span><span class="pln">path</span><span class="pun">.</span><span class="pln">join</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">.</span><span class="pln">root_dir</span><span class="pun">,</span><span class="pln"> </span><span class="str">"pos"</span><span class="pun">,</span><span class="pln"> f</span><span class="pun">)</span><span class="pln"> </span><span class="kwd">for</span><span class="pln"> f </span><span class="kwd">in</span><span class="pln"> os</span><span class="pun">.</span><span class="pln">listdir</span><span class="pun">(</span><span class="pln">os</span><span class="pun">.</span><span class="pln">path</span><span class="pun">.</span><span class="pln">join</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">.</span><span class="pln">root_dir</span><span class="pun">,</span><span class="pln"> </span><span class="str">"pos"</span><span class="pun">))</span><span class="pln"> </span><span class="kwd">if</span><span class="pln"> f</span><span class="pun">.</span><span class="pln">endswith</span><span class="pun">(</span><span class="str">'.txt'</span><span class="pun">)]</span></li><li class="L9"><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">files </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">neg_files </span><span class="pun">+</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">pos_files</span></li><li class="L0"><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">labels </span><span class="pun">=</span><span class="pln"> </span><span class="pun">[</span><span class="lit">0</span><span class="pun">]</span><span class="pln"> </span><span class="pun">*</span><span class="pln"> len</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">.</span><span class="pln">neg_files</span><span class="pun">)</span><span class="pln"> </span><span class="pun">+</span><span class="pln"> </span><span class="pun">[</span><span class="lit">1</span><span class="pun">]</span><span class="pln"> </span><span class="pun">*</span><span class="pln"> len</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">.</span><span class="pln">pos_files</span><span class="pun">)</span></li><li class="L1"><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">pos_inx</span><span class="pun">=</span><span class="pln">len</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">.</span><span class="pln">pos_files</span><span class="pun">)</span></li><li class="L2"><span class="pln"> </span><span class="kwd">def</span><span class="pln"> __len__</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">):</span></li><li class="L3"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> len</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">.</span><span class="pln">files</span><span class="pun">)</span></li><li class="L4"><span class="pln"> </span><span class="kwd">def</span><span class="pln"> __getitem__</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">,</span><span class="pln"> idx</span><span class="pun">):</span></li><li class="L5"><span class="pln"> file_path </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">files</span><span class="pun">[</span><span class="pln">idx</span><span class="pun">]</span></li><li class="L6"><span class="pln"> label </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">labels</span><span class="pun">[</span><span class="pln">idx</span><span class="pun">]</span></li><li class="L7"><span class="pln"> </span><span class="kwd">with</span><span class="pln"> open</span><span class="pun">(</span><span class="pln">file_path</span><span class="pun">,</span><span class="pln"> </span><span class="str">'r'</span><span class="pun">,</span><span class="pln"> encoding</span><span class="pun">=</span><span class="str">'utf-8'</span><span class="pun">)</span><span class="pln"> </span><span class="kwd">as</span><span class="pln"> file</span><span class="pun">:</span></li><li class="L8"><span class="pln"> content </span><span class="pun">=</span><span class="pln"> file</span><span class="pun">.</span><span class="pln">read</span><span class="pun">()</span></li><li class="L9"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> label</span><span class="pun">,</span><span class="pln"> content</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-16">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>Iterators to train and test data sets</td>
<td>This code snippet indicates a path to the IMDB data set directory by
combining temporary and subdirectory names. This code sets up the
training and testing data iterators, retrieves the starting index of the
training data, and prints the items from the training data set at
indices.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li></ol><ol class="linenums"><li class="L0"><span class="pln">root_dir </span><span class="pun">=</span><span class="pln"> tempdir</span><span class="pun">.</span><span class="pln">name </span><span class="pun">+</span><span class="pln"> </span><span class="str">'/'</span><span class="pln"> </span><span class="pun">+</span><span class="pln"> </span><span class="str">'imdb_dataset'</span></li><li class="L1"><span class="pln">train_iter </span><span class="pun">=</span><span class="pln"> </span><span class="typ">IMDBDataset</span><span class="pun">(</span><span class="pln">root_dir</span><span class="pun">=</span><span class="pln">root_dir</span><span class="pun">,</span><span class="pln"> train</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">)</span><span class="pln"> </span><span class="com"># For training data</span></li><li class="L2"><span class="pln">test_iter </span><span class="pun">=</span><span class="pln"> </span><span class="typ">IMDBDataset</span><span class="pun">(</span><span class="pln">root_dir</span><span class="pun">=</span><span class="pln">root_dir</span><span class="pun">,</span><span class="pln"> train</span><span class="pun">=</span><span class="kwd">False</span><span class="pun">)</span><span class="pln"> </span><span class="com"># For test data</span></li><li class="L3"><span class="pln">start</span><span class="pun">=</span><span class="pln">train_iter</span><span class="pun">.</span><span class="pln">pos_inx</span></li><li class="L4"><span class="kwd">for</span><span class="pln"> i </span><span class="kwd">in</span><span class="pln"> range</span><span class="pun">(-</span><span class="lit">10</span><span class="pun">,</span><span class="lit">10</span><span class="pun">):</span></li><li class="L5"><span class="pln"> </span><span class="kwd">print</span><span class="pun">(</span><span class="pln">train_iter</span><span class="pun">[</span><span class="pln">start</span><span class="pun">+</span><span class="pln">i</span><span class="pun">])</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-17">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>yield_tokens function</td>
<td>Generates tokens from the collection of text data samples. The code
snippet processes each text in 'data_iter' through the tokenizer and
yields tokens to generate efficient, on-the-fly token generation
suitable for tasks such as training machine learning models.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li></ol><ol class="linenums"><li class="L0"><span class="pln">tokenizer </span><span class="pun">=</span><span class="pln"> get_tokenizer</span><span class="pun">(</span><span class="str">"basic_english"</span><span class="pun">)</span></li><li class="L1"><span class="kwd">def</span><span class="pln"> yield_tokens</span><span class="pun">(</span><span class="pln">data_iter</span><span class="pun">):</span></li><li class="L2"><span class="pln"> </span><span class="str">"""Yield tokens for each data sample."""</span></li><li class="L3"><span class="pln"> </span><span class="kwd">for</span><span class="pln"> _</span><span class="pun">,</span><span class="pln"> text </span><span class="kwd">in</span><span class="pln"> data_iter</span><span class="pun">:</span></li><li class="L4"><span class="pln"> </span><span class="kwd">yield</span><span class="pln"> tokenizer</span><span class="pun">(</span><span class="pln">text</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-18">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>Load pretrained model and its evaluation on test data</td>
<td>This code snippet helps download a pretrained model from URL, loads
it into a specific architecture, and evaluates it on a test data set for
assessing its performance.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li></ol><ol class="linenums"><li class="L0"><span class="pln">urlopened </span><span class="pun">=</span><span class="pln"> urlopen</span><span class="pun">(</span><span class="str">'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/q66IH6a7lglkZ4haM6hB1w/model-IMDB%20dataset%20small2.pth'</span><span class="pun">)</span></li><li class="L1"><span class="pln">model_ </span><span class="pun">=</span><span class="pln"> </span><span class="typ">Net</span><span class="pun">(</span><span class="pln">vocab_size</span><span class="pun">=</span><span class="pln">vocab_size</span><span class="pun">,</span><span class="pln"> num_class</span><span class="pun">=</span><span class="lit">2</span><span class="pun">).</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">)</span></li><li class="L2"><span class="pln">model_</span><span class="pun">.</span><span class="pln">load_state_dict</span><span class="pun">(</span><span class="pln">torch</span><span class="pun">.</span><span class="pln">load</span><span class="pun">(</span><span class="pln">io</span><span class="pun">.</span><span class="typ">BytesIO</span><span class="pun">(</span><span class="pln">urlopened</span><span class="pun">.</span><span class="pln">read</span><span class="pun">()),</span><span class="pln"> map_location</span><span class="pun">=</span><span class="pln">device</span><span class="pun">))</span></li><li class="L3"><span class="pln">evaluate</span><span class="pun">(</span><span class="pln">test_dataloader</span><span class="pun">,</span><span class="pln"> model_</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-19">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>Loading the Hugging Face model</td>
<td>This code snippet initiates a tokenizer using a pretrained
'bert-base-cased' model. It also downloads a pretrained model for the
masked language model (MLM) task, and how to load the model
configurations from a pretrained model.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li></ol><ol class="linenums"><li class="L0"><span class="com"># Instantiate a tokenizer using the BERT base cased model</span></li><li class="L1"><span class="pln">tokenizer </span><span class="pun">=</span><span class="pln"> </span><span class="typ">AutoTokenizer</span><span class="pun">.</span><span class="pln">from_pretrained</span><span class="pun">(</span><span class="str">"bert-base-cased"</span><span class="pun">)</span></li><li class="L2"><span class="com"># Download pretrained model from huggingface.co and cache.</span></li><li class="L3"><span class="pln">model </span><span class="pun">=</span><span class="pln"> </span><span class="typ">BertForMaskedLM</span><span class="pun">.</span><span class="pln">from_pretrained</span><span class="pun">(</span><span class="str">'bert-base-cased'</span><span class="pun">)</span></li><li class="L4"><span class="com"># You can also start training from scratch by loading the model configuration</span></li><li class="L5"><span class="com"># config = AutoConfig.from_pretrained("google-bert/bert-base-cased")</span></li><li class="L6"><span class="com"># model = BertForMaskedLM.from_config(config)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-20">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>Training a BERT model for MLM task</td>
<td>This code snippet trains the model with the specified parameters and
data set. However, ensure that the 'SFTTrainer' is the appropriate
trainer class for the task and that the model is properly defined for
training.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li></ol><ol class="linenums"><li class="L0"><span class="pln">training_args </span><span class="pun">=</span><span class="pln"> </span><span class="typ">TrainingArguments</span><span class="pun">(</span></li><li class="L1"><span class="pln"> output_dir</span><span class="pun">=</span><span class="str">"./trained_model"</span><span class="pun">,</span><span class="pln"> </span><span class="com"># Specify the output directory for the trained model</span></li><li class="L2"><span class="pln"> overwrite_output_dir</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">,</span></li><li class="L3"><span class="pln"> do_eval</span><span class="pun">=</span><span class="kwd">False</span><span class="pun">,</span></li><li class="L4"><span class="pln"> learning_rate</span><span class="pun">=</span><span class="lit">5e-5</span><span class="pun">,</span></li><li class="L5"><span class="pln"> num_train_epochs</span><span class="pun">=</span><span class="lit">1</span><span class="pun">,</span><span class="pln"> </span><span class="com"># Specify the number of training epochs</span></li><li class="L6"><span class="pln"> per_device_train_batch_size</span><span class="pun">=</span><span class="lit">2</span><span class="pun">,</span><span class="pln"> </span><span class="com"># Set the batch size for training</span></li><li class="L7"><span class="pln"> save_total_limit</span><span class="pun">=</span><span class="lit">2</span><span class="pun">,</span><span class="pln"> </span><span class="com"># Limit the total number of saved checkpoints</span></li><li class="L8"><span class="pln"> logging_steps </span><span class="pun">=</span><span class="pln"> </span><span class="lit">20</span></li><li class="L9"><span class="pun">)</span></li><li class="L0"><span class="pln">dataset </span><span class="pun">=</span><span class="pln"> load_dataset</span><span class="pun">(</span><span class="str">"imdb"</span><span class="pun">,</span><span class="pln"> split</span><span class="pun">=</span><span class="str">"train"</span><span class="pun">)</span></li><li class="L1"><span class="pln">trainer </span><span class="pun">=</span><span class="pln"> </span><span class="typ">SFTTrainer</span><span class="pun">(</span></li><li class="L2"><span class="pln"> model</span><span class="pun">,</span></li><li class="L3"><span class="pln"> args</span><span class="pun">=</span><span class="pln">training_args</span><span class="pun">,</span></li><li class="L4"><span class="pln"> train_dataset</span><span class="pun">=</span><span class="pln">dataset</span><span class="pun">,</span></li><li class="L5"><span class="pln"> dataset_text_field</span><span class="pun">=</span><span class="str">"text"</span><span class="pun">,</span></li><li class="L6"><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-21">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>Load the model and tokenizer</td>
<td>Useful for tasks where you need to quickly classify the sentiment of
a piece of text with a pretrained, efficient transformer model.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li></ol><ol class="linenums"><li class="L0"><span class="pln">tokenizer </span><span class="pun">=</span><span class="pln"> </span><span class="typ">DistilBertTokenizer</span><span class="pun">.</span><span class="pln">from_pretrained</span><span class="pun">(</span><span class="str">"distilbert-base-uncased-finetuned-sst-2-english"</span><span class="pun">)</span></li><li class="L1"><span class="pln">model </span><span class="pun">=</span><span class="pln"> </span><span class="typ">DistilBertForSequenceClassification</span><span class="pun">.</span><span class="pln">from_pretrained</span><span class="pun">(</span><span class="str">"distilbert-base-uncased-finetuned-sst-2-english"</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-22">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>torch.no_grad()</td>
<td>The torch.no_grad() context manager disables gradient calculation.
This reduces memory consumption and speeds up computation, as gradients
are unnecessary for inference (for example, when you are not training
the model). The **inputs syntax is used to unpack a dictionary of
keyword arguments in Python.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li></ol><ol class="linenums"><li class="L0"><span class="com"># Perform inference</span></li><li class="L1"><span class="kwd">with</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">no_grad</span><span class="pun">():</span></li><li class="L2"><span class="pln"> outputs </span><span class="pun">=</span><span class="pln"> model</span><span class="pun">(**</span><span class="pln">inputs</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-23">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>GPT-2 tokenizer</td>
<td>Helps to initialize the GPT-2 tokenizer using a pretrained model to handle encoding and decoding.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li></ol><ol class="linenums"><li class="L0"><span class="com"># Load the tokenizer and model</span></li><li class="L1"><span class="pln">tokenizer </span><span class="pun">=</span><span class="pln"> GPT2Tokenizer</span><span class="pun">.</span><span class="pln">from_pretrained</span><span class="pun">(</span><span class="str">"gpt2"</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-24">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>Load GPT-2 model</td>
<td>This code snippet initializes and loads the pretrained GPT-2 model.
This code makes the GPT-2 model ready for generating text or other
language tasks.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li></ol><ol class="linenums"><li class="L0"><span class="com"># Load the tokenizer and model</span></li><li class="L1"><span class="pln">model </span><span class="pun">=</span><span class="pln"> GPT2LMHeadModel</span><span class="pun">.</span><span class="pln">from_pretrained</span><span class="pun">(</span><span class="str">"gpt2"</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-25">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>Generate text</td>
<td>This code snippet generates text sequences based on the input and doesn't compute the gradient to generate output.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li></ol><ol class="linenums"><li class="L0"><span class="com"># Generate text</span></li><li class="L1"><span class="pln">output_ids </span><span class="pun">=</span><span class="pln"> model</span><span class="pun">.</span><span class="pln">generate</span><span class="pun">(</span></li><li class="L2"><span class="pln"> inputs</span><span class="pun">.</span><span class="pln">input_ids</span><span class="pun">,</span></li><li class="L3"><span class="pln"> attention_mask</span><span class="pun">=</span><span class="pln">inputs</span><span class="pun">.</span><span class="pln">attention_mask</span><span class="pun">,</span></li><li class="L4"><span class="pln"> pad_token_id</span><span class="pun">=</span><span class="pln">tokenizer</span><span class="pun">.</span><span class="pln">eos_token_id</span><span class="pun">,</span></li><li class="L5"><span class="pln"> max_length</span><span class="pun">=</span><span class="lit">50</span><span class="pun">,</span></li><li class="L6"><span class="pln"> num_return_sequences</span><span class="pun">=</span><span class="lit">1</span></li><li class="L7"><span class="pun">)</span></li><li class="L8"><span class="pln">output_ids</span></li><li class="L9"><span class="kwd">or</span></li><li class="L0"><span class="kwd">with</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">no_grad</span><span class="pun">():</span></li><li class="L1"><span class="pln"> outputs </span><span class="pun">=</span><span class="pln"> model</span><span class="pun">(**</span><span class="pln">inputs</span><span class="pun">)</span></li><li class="L2"><span class="pln">outputs</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-26">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>Decode the generated text</td>
<td>This code snippet decodes the text from the token IDs generated by a
model. It also decodes it into a readable string to print it.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li></ol><ol class="linenums"><li class="L0"><span class="com"># Decode the generated text</span></li><li class="L1"><span class="pln">generated_text </span><span class="pun">=</span><span class="pln"> tokenizer</span><span class="pun">.</span><span class="pln">decode</span><span class="pun">(</span><span class="pln">output_ids</span><span class="pun">[</span><span class="lit">0</span><span class="pun">],</span><span class="pln"> skip_special_tokens</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">)</span></li><li class="L2"><span class="kwd">print</span><span class="pun">(</span><span class="pln">generated_text</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-27">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>Hugging Face pipeline() function</td>
<td>The pipeline() function from the Hugging Face transformers library
is a high-level API designed to simplify the usage of pretrained models
for various natural language processing (NLP) tasks. It abstracts the
complexities of model loading, tokenization, inference, and
post-processing, allowing users to perform complex NLP tasks with just a
few lines of code.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li></ol><ol class="linenums"><li class="L0"><span class="pln">transformers</span><span class="pun">.</span><span class="pln">pipeline</span><span class="pun">(</span></li><li class="L1"><span class="pln"> task</span><span class="pun">:</span><span class="pln"> str</span><span class="pun">,</span></li><li class="L2"><span class="pln"> model</span><span class="pun">:</span><span class="pln"> </span><span class="typ">Optional</span><span class="pln"> </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">None</span><span class="pun">,</span></li><li class="L3"><span class="pln"> config</span><span class="pun">:</span><span class="pln"> </span><span class="typ">Optional</span><span class="pln"> </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">None</span><span class="pun">,</span></li><li class="L4"><span class="pln"> tokenizer</span><span class="pun">:</span><span class="pln"> </span><span class="typ">Optional</span><span class="pln"> </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">None</span><span class="pun">,</span></li><li class="L5"><span class="pln"> feature_extractor</span><span class="pun">:</span><span class="pln"> </span><span class="typ">Optional</span><span class="pln"> </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">None</span><span class="pun">,</span></li><li class="L6"><span class="pln"> framework</span><span class="pun">:</span><span class="pln"> </span><span class="typ">Optional</span><span class="pln"> </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">None</span><span class="pun">,</span></li><li class="L7"><span class="pln"> revision</span><span class="pun">:</span><span class="pln"> str </span><span class="pun">=</span><span class="pln"> </span><span class="str">'main'</span><span class="pun">,</span></li><li class="L8"><span class="pln"> use_fast</span><span class="pun">:</span><span class="pln"> </span><span class="kwd">bool</span><span class="pln"> </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">True</span><span class="pun">,</span></li><li class="L9"><span class="pln"> model_kwargs</span><span class="pun">:</span><span class="pln"> </span><span class="typ">Dict</span><span class="pun">[</span><span class="pln">str</span><span class="pun">,</span><span class="pln"> </span><span class="typ">Any</span><span class="pun">]</span><span class="pln"> </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">None</span><span class="pun">,</span></li><li class="L0"><span class="pln"> </span><span class="pun">**</span><span class="pln">kwargs</span></li><li class="L1"><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-28">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>formatting_prompts_func_no_response function</td>
<td>The prompt function generates formatted text prompts from a data set
by using the instructions from the dataset. It creates strings that
include only the instruction and a placeholder for the response.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li></ol><ol class="linenums"><li class="L0"><span class="kwd">def</span><span class="pln"> formatting_prompts_func</span><span class="pun">(</span><span class="pln">mydataset</span><span class="pun">):</span></li><li class="L1"><span class="pln"> output_texts </span><span class="pun">=</span><span class="pln"> </span><span class="pun">[]</span></li><li class="L2"><span class="pln"> </span><span class="kwd">for</span><span class="pln"> i </span><span class="kwd">in</span><span class="pln"> range</span><span class="pun">(</span><span class="pln">len</span><span class="pun">(</span><span class="pln">mydataset</span><span class="pun">[</span><span class="str">'instruction'</span><span class="pun">])):</span></li><li class="L3"><span class="pln"> text </span><span class="pun">=</span><span class="pln"> </span><span class="pun">(</span></li><li class="L4"><span class="pln"> f</span><span class="str">"### Instruction:\n{mydataset['instruction'][i]}"</span></li><li class="L5"><span class="pln"> f</span><span class="str">"\n\n### Response:\n{mydataset['output'][i]}"</span></li><li class="L6"><span class="pln"> </span><span class="pun">)</span></li><li class="L7"><span class="pln"> output_texts</span><span class="pun">.</span><span class="pln">append</span><span class="pun">(</span><span class="pln">text</span><span class="pun">)</span></li><li class="L8"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> output_texts</span></li><li class="L9"><span class="kwd">def</span><span class="pln"> formatting_prompts_func_no_response</span><span class="pun">(</span><span class="pln">mydataset</span><span class="pun">):</span></li><li class="L0"><span class="pln"> output_texts </span><span class="pun">=</span><span class="pln"> </span><span class="pun">[]</span></li><li class="L1"><span class="pln"> </span><span class="kwd">for</span><span class="pln"> i </span><span class="kwd">in</span><span class="pln"> range</span><span class="pun">(</span><span class="pln">len</span><span class="pun">(</span><span class="pln">mydataset</span><span class="pun">[</span><span class="str">'instruction'</span><span class="pun">])):</span></li><li class="L2"><span class="pln"> text </span><span class="pun">=</span><span class="pln"> </span><span class="pun">(</span></li><li class="L3"><span class="pln"> f</span><span class="str">"### Instruction:\n{mydataset['instruction'][i]}"</span></li><li class="L4"><span class="pln"> f</span><span class="str">"\n\n### Response:\n"</span></li><li class="L5"><span class="pln"> </span><span class="pun">)</span></li><li class="L6"><span class="pln"> output_texts</span><span class="pun">.</span><span class="pln">append</span><span class="pun">(</span><span class="pln">text</span><span class="pun">)</span></li><li class="L7"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> output_texts</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-29">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>expected_outputs</td>
<td>Tokenize instructions and the instructions_with_responses. Then,
count the number of tokens in instructions and discard the equivalent
amount of tokens from the beginning of the tokenized
instructions_with_responses vector. Finally, discard the final token in
instructions_with_responses, corresponding to the eos token. Decode the
resulting vector using the tokenizer, resulting in the expected_output</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li></ol><ol class="linenums"><li class="L0"><span class="pln">expected_outputs </span><span class="pun">=</span><span class="pln"> </span><span class="pun">[]</span></li><li class="L1"><span class="pln">instructions_with_responses </span><span class="pun">=</span><span class="pln"> formatting_prompts_func</span><span class="pun">(</span><span class="pln">test_dataset</span><span class="pun">)</span></li><li class="L2"><span class="pln">instructions </span><span class="pun">=</span><span class="pln"> formatting_prompts_func_no_response</span><span class="pun">(</span><span class="pln">test_dataset</span><span class="pun">)</span></li><li class="L3"><span class="kwd">for</span><span class="pln"> i </span><span class="kwd">in</span><span class="pln"> tqdm</span><span class="pun">(</span><span class="pln">range</span><span class="pun">(</span><span class="pln">len</span><span class="pun">(</span><span class="pln">instructions_with_responses</span><span class="pun">))):</span></li><li class="L4"><span class="pln"> tokenized_instruction_with_response </span><span class="pun">=</span><span class="pln"> tokenizer</span><span class="pun">(</span><span class="pln">instructions_with_responses</span><span class="pun">[</span><span class="pln">i</span><span class="pun">],</span><span class="pln"> return_tensors</span><span class="pun">=</span><span class="str">"pt"</span><span class="pun">,</span><span class="pln"> max_length</span><span class="pun">=</span><span class="lit">1024</span><span class="pun">,</span><span class="pln"> truncation</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">,</span><span class="pln"> padding</span><span class="pun">=</span><span class="kwd">False</span><span class="pun">)</span></li><li class="L5"><span class="pln"> tokenized_instruction </span><span class="pun">=</span><span class="pln"> tokenizer</span><span class="pun">(</span><span class="pln">instructions</span><span class="pun">[</span><span class="pln">i</span><span class="pun">],</span><span class="pln"> return_tensors</span><span class="pun">=</span><span class="str">"pt"</span><span class="pun">)</span></li><li class="L6"><span class="pln"> expected_output </span><span class="pun">=</span><span class="pln"> tokenizer</span><span class="pun">.</span><span class="pln">decode</span><span class="pun">(</span><span class="pln">tokenized_instruction_with_response</span><span class="pun">[</span><span class="str">'input_ids'</span><span class="pun">][</span><span class="lit">0</span><span class="pun">][</span><span class="pln">len</span><span class="pun">(</span><span class="pln">tokenized_instruction</span><span class="pun">[</span><span class="str">'input_ids'</span><span class="pun">][</span><span class="lit">0</span><span class="pun">])-</span><span class="lit">1</span><span class="pun">:],</span><span class="pln"> skip_special_tokens</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">)</span></li><li class="L7"><span class="pln"> expected_outputs</span><span class="pun">.</span><span class="pln">append</span><span class="pun">(</span><span class="pln">expected_output</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-30">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>ListDataset</td>
<td>Inherits from Dataset and creates a torch Dataset from a list. This
class is then used to generate a Dataset object from instructions.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li></ol><ol class="linenums"><li class="L0"><span class="kwd">class</span><span class="pln"> </span><span class="typ">ListDataset</span><span class="pun">(</span><span class="typ">Dataset</span><span class="pun">):</span></li><li class="L1"><span class="pln"> </span><span class="kwd">def</span><span class="pln"> __init__</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">,</span><span class="pln"> original_list</span><span class="pun">):</span></li><li class="L2"><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">original_list </span><span class="pun">=</span><span class="pln"> original_list</span></li><li class="L3"><span class="pln"> </span><span class="kwd">def</span><span class="pln"> __len__</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">):</span></li><li class="L4"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> len</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">.</span><span class="pln">original_list</span><span class="pun">)</span></li><li class="L5"><span class="pln"> </span><span class="kwd">def</span><span class="pln"> __getitem__</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">,</span><span class="pln"> i</span><span class="pun">):</span></li><li class="L6"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">original_list</span><span class="pun">[</span><span class="pln">i</span><span class="pun">]</span></li><li class="L7"><span class="pln">instructions_torch </span><span class="pun">=</span><span class="pln"> </span><span class="typ">ListDataset</span><span class="pun">(</span><span class="pln">instructions</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-31">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>gen_pipeline</td>
<td>This code snippet takes the token IDs from the model output, decodes it from the table text, and prints the responses. </td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li></ol><ol class="linenums"><li class="L0"><span class="pln">gen_pipeline </span><span class="pun">=</span><span class="pln"> pipeline</span><span class="pun">(</span><span class="str">"text-generation"</span><span class="pun">,</span></li><li class="L1"><span class="pln"> model</span><span class="pun">=</span><span class="pln">model</span><span class="pun">,</span></li><li class="L2"><span class="pln"> tokenizer</span><span class="pun">=</span><span class="pln">tokenizer</span><span class="pun">,</span></li><li class="L3"><span class="pln"> device</span><span class="pun">=</span><span class="pln">device</span><span class="pun">,</span></li><li class="L4"><span class="pln"> batch_size</span><span class="pun">=</span><span class="lit">2</span><span class="pun">,</span></li><li class="L5"><span class="pln"> max_length</span><span class="pun">=</span><span class="lit">50</span><span class="pun">,</span></li><li class="L6"><span class="pln"> truncation</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">,</span></li><li class="L7"><span class="pln"> padding</span><span class="pun">=</span><span class="kwd">False</span><span class="pun">,</span></li><li class="L8"><span class="pln"> return_full_text</span><span class="pun">=</span><span class="kwd">False</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-32">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>torch.no_grad()</td>
<td>This code generates text from the given input using a pipeline while
optimizing resource usage by limiting input size and reducing gradient
calculations.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li></ol><ol class="linenums"><li class="L0"><span class="kwd">with</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">no_grad</span><span class="pun">():</span></li><li class="L1"><span class="pln"> </span><span class="com"># Due to resource limitation, only apply the function on 3 records using "instructions_torch[:10]"</span></li><li class="L2"><span class="pln"> pipeline_iterator</span><span class="pun">=</span><span class="pln"> gen_pipeline</span><span class="pun">(</span><span class="pln">instructions_torch</span><span class="pun">[:</span><span class="lit">3</span><span class="pun">],</span></li><li class="L3"><span class="pln"> max_length</span><span class="pun">=</span><span class="lit">50</span><span class="pun">,</span><span class="pln"> </span><span class="com"># this is set to 50 due to resource constraint, using a GPU, you can increase it to the length of your choice</span></li><li class="L4"><span class="pln"> num_beams</span><span class="pun">=</span><span class="lit">5</span><span class="pun">,</span></li><li class="L5"><span class="pln"> early_stopping</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">,)</span></li><li class="L6"><span class="pln">generated_outputs_base </span><span class="pun">=</span><span class="pln"> </span><span class="pun">[]</span></li><li class="L7"><span class="kwd">for</span><span class="pln"> text </span><span class="kwd">in</span><span class="pln"> pipeline_iterator</span><span class="pun">:</span></li><li class="L8"><span class="pln"> generated_outputs_base</span><span class="pun">.</span><span class="pln">append</span><span class="pun">(</span><span class="pln">text</span><span class="pun">[</span><span class="lit">0</span><span class="pun">][</span><span class="str">"generated_text"</span><span class="pun">])</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-33">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>SFTTrainer</td>
<td>This code snippet sets and initializes a training configuration for a
model using 'SFTTrainer' by specifying parameters and initializes the
'SFTTrainer' with the model, datasets, and additional settings. </td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li></ol><ol class="linenums"><li class="L0"><span class="pln">training_args </span><span class="pun">=</span><span class="pln"> </span><span class="typ">SFTConfig</span><span class="pun">(</span></li><li class="L1"><span class="pln"> output_dir</span><span class="pun">=</span><span class="str">"/tmp"</span><span class="pun">,</span></li><li class="L2"><span class="pln"> num_train_epochs</span><span class="pun">=</span><span class="lit">10</span><span class="pun">,</span></li><li class="L3"><span class="pln"> save_strategy</span><span class="pun">=</span><span class="str">"epoch"</span><span class="pun">,</span></li><li class="L4"><span class="pln"> fp16</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">,</span></li><li class="L5"><span class="pln"> per_device_train_batch_size</span><span class="pun">=</span><span class="lit">2</span><span class="pun">,</span><span class="pln"> </span><span class="com"># Reduce batch size</span></li><li class="L6"><span class="pln"> per_device_eval_batch_size</span><span class="pun">=</span><span class="lit">2</span><span class="pun">,</span><span class="pln"> </span><span class="com"># Reduce batch size</span></li><li class="L7"><span class="pln"> max_seq_length</span><span class="pun">=</span><span class="lit">1024</span><span class="pun">,</span></li><li class="L8"><span class="pln"> do_eval</span><span class="pun">=</span><span class="kwd">True</span></li><li class="L9"><span class="pun">)</span></li><li class="L0"><span class="pln">trainer </span><span class="pun">=</span><span class="pln"> </span><span class="typ">SFTTrainer</span><span class="pun">(</span></li><li class="L1"><span class="pln"> model</span><span class="pun">,</span></li><li class="L2"><span class="pln"> train_dataset</span><span class="pun">=</span><span class="pln">train_dataset</span><span class="pun">,</span></li><li class="L3"><span class="pln"> eval_dataset</span><span class="pun">=</span><span class="pln">test_dataset</span><span class="pun">,</span></li><li class="L4"><span class="pln"> formatting_func</span><span class="pun">=</span><span class="pln">formatting_prompts_func</span><span class="pun">,</span></li><li class="L5"><span class="pln"> args</span><span class="pun">=</span><span class="pln">training_args</span><span class="pun">,</span></li><li class="L6"><span class="pln"> packing</span><span class="pun">=</span><span class="kwd">False</span><span class="pun">,</span></li><li class="L7"><span class="pln"> data_collator</span><span class="pun">=</span><span class="pln">collator</span><span class="pun">,</span></li><li class="L8"><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-34">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>torch.no_grad()</td>
<td>This code snippet helps generate text sequences from the pipeline
function. It ensures that the gradient computations are disabled and
optimizes the performance and memory usage.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li></ol><ol class="linenums"><li class="L0"><span class="kwd">with</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">no_grad</span><span class="pun">():</span></li><li class="L1"><span class="pln"> </span><span class="com"># Due to resource limitation, only apply the function on 3 records using "instructions_torch[:10]"</span></li><li class="L2"><span class="pln"> pipeline_iterator</span><span class="pun">=</span><span class="pln"> gen_pipeline</span><span class="pun">(</span><span class="pln">instructions_torch</span><span class="pun">[:</span><span class="lit">3</span><span class="pun">],</span></li><li class="L3"><span class="pln"> max_length</span><span class="pun">=</span><span class="lit">50</span><span class="pun">,</span><span class="pln"> </span><span class="com"># this is set to 50 due to resource constraint, using a GPU, you can increase it to the length of your choice</span></li><li class="L4"><span class="pln"> num_beams</span><span class="pun">=</span><span class="lit">5</span><span class="pun">,</span></li><li class="L5"><span class="pln"> early_stopping</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">,)</span></li><li class="L6"><span class="pln">generated_outputs_lora </span><span class="pun">=</span><span class="pln"> </span><span class="pun">[]</span></li><li class="L7"><span class="kwd">for</span><span class="pln"> text </span><span class="kwd">in</span><span class="pln"> pipeline_iterator</span><span class="pun">:</span></li><li class="L8"><span class="pln"> generated_outputs_lora</span><span class="pun">.</span><span class="pln">append</span><span class="pun">(</span><span class="pln">text</span><span class="pun">[</span><span class="lit">0</span><span class="pun">][</span><span class="str">"generated_text"</span><span class="pun">])</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-35">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>load_summarize_chain</td>
<td>This code snippet uses LangChain library for loading and using a
summarization chain with a specific language model and chain type. This
chain type will be applied to web data to print a resulting summary.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li></ol><ol class="linenums"><li class="L0"><span class="kwd">from</span><span class="pln"> langchain</span><span class="pun">.</span><span class="pln">chains</span><span class="pun">.</span><span class="pln">summarize </span><span class="kwd">import</span><span class="pln"> load_summarize_chain</span></li><li class="L1"><span class="pln">chain </span><span class="pun">=</span><span class="pln"> load_summarize_chain</span><span class="pun">(</span><span class="pln">llm</span><span class="pun">=</span><span class="pln">mixtral_llm</span><span class="pun">,</span><span class="pln"> chain_type</span><span class="pun">=</span><span class="str">"stuff"</span><span class="pun">,</span><span class="pln"> verbose</span><span class="pun">=</span><span class="kwd">False</span><span class="pun">)</span></li><li class="L2"><span class="pln">response </span><span class="pun">=</span><span class="pln"> chain</span><span class="pun">.</span><span class="pln">invoke</span><span class="pun">(</span><span class="pln">web_data</span><span class="pun">)</span></li><li class="L3"><span class="kwd">print</span><span class="pun">(</span><span class="pln">response</span><span class="pun">[</span><span class="str">'output_text'</span><span class="pun">])</span><span class="pln">n</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-36">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>TextClassifier</td>
<td>Represents a simple text classifier that uses an embedding layer, a
hidden linear layer with a ReLU avtivation, and an output linear layer.
The constructor takes the following arguments:
num_class: The number of classes to classify.
freeze: Whether to freeze the embedding layer.
</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li></ol><ol class="linenums"><li class="L0"><span class="kwd">from</span><span class="pln"> torch </span><span class="kwd">import</span><span class="pln"> nn</span></li><li class="L1"><span class="kwd">class</span><span class="pln"> </span><span class="typ">TextClassifier</span><span class="pun">(</span><span class="pln">nn</span><span class="pun">.</span><span class="typ">Module</span><span class="pun">):</span></li><li class="L2"><span class="pln"> </span><span class="kwd">def</span><span class="pln"> __init__</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">,</span><span class="pln"> num_classes</span><span class="pun">,</span><span class="pln">freeze</span><span class="pun">=</span><span class="kwd">False</span><span class="pun">):</span></li><li class="L3"><span class="pln"> </span><span class="kwd">super</span><span class="pun">(</span><span class="typ">TextClassifier</span><span class="pun">,</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">).</span><span class="pln">__init__</span><span class="pun">()</span></li><li class="L4"><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">embedding </span><span class="pun">=</span><span class="pln"> nn</span><span class="pun">.</span><span class="typ">Embedding</span><span class="pun">.</span><span class="pln">from_pretrained</span><span class="pun">(</span><span class="pln">glove_embedding</span><span class="pun">.</span><span class="pln">vectors</span><span class="pun">.</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">),</span><span class="pln">freeze</span><span class="pun">=</span><span class="pln">freeze</span><span class="pun">)</span></li><li class="L5"><span class="pln"> </span><span class="com"># An example of adding additional layers: A linear layer and a ReLU activation</span></li><li class="L6"><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">fc1 </span><span class="pun">=</span><span class="pln"> nn</span><span class="pun">.</span><span class="typ">Linear</span><span class="pun">(</span><span class="pln">in_features</span><span class="pun">=</span><span class="lit">100</span><span class="pun">,</span><span class="pln"> out_features</span><span class="pun">=</span><span class="lit">128</span><span class="pun">)</span></li><li class="L7"><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">relu </span><span class="pun">=</span><span class="pln"> nn</span><span class="pun">.</span><span class="typ">ReLU</span><span class="pun">()</span></li><li class="L8"><span class="pln"> </span><span class="com"># The output layer that gives the final probabilities for the classes</span></li><li class="L9"><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">fc2 </span><span class="pun">=</span><span class="pln"> nn</span><span class="pun">.</span><span class="typ">Linear</span><span class="pun">(</span><span class="pln">in_features</span><span class="pun">=</span><span class="lit">128</span><span class="pun">,</span><span class="pln"> out_features</span><span class="pun">=</span><span class="pln">num_classes</span><span class="pun">)</span></li><li class="L0"><span class="pln"> </span><span class="kwd">def</span><span class="pln"> forward</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">,</span><span class="pln"> x</span><span class="pun">):</span></li><li class="L1"><span class="pln"> </span><span class="com"># Pass the input through the embedding layer</span></li><li class="L2"><span class="pln"> x </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">embedding</span><span class="pun">(</span><span class="pln">x</span><span class="pun">)</span></li><li class="L3"><span class="pln"> </span><span class="com"># Here you can use a simple mean pooling</span></li><li class="L4"><span class="pln"> x </span><span class="pun">=</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">mean</span><span class="pun">(</span><span class="pln">x</span><span class="pun">,</span><span class="pln"> dim</span><span class="pun">=</span><span class="lit">1</span><span class="pun">)</span></li><li class="L5"><span class="pln"> </span><span class="com"># Pass the pooled embeddings through the additional layers</span></li><li class="L6"><span class="pln"> x </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">fc1</span><span class="pun">(</span><span class="pln">x</span><span class="pun">)</span></li><li class="L7"><span class="pln"> x </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">relu</span><span class="pun">(</span><span class="pln">x</span><span class="pun">)</span></li><li class="L8"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">fc2</span><span class="pun">(</span><span class="pln">x</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-37">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>Train the model</td>
<td>This code snippet outlines the function to train a machine learning
model using PyTorch. This function trains the model over a specified
number of epochs, tracks them, and evaluates the performance on the data
set.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li><li>23</li><li>24</li><li>25</li><li>26</li></ol><ol class="linenums"><li class="L0"><span class="kwd">def</span><span class="pln"> train_model</span><span class="pun">(</span><span class="pln">model</span><span class="pun">,</span><span class="pln"> optimizer</span><span class="pun">,</span><span class="pln"> criterion</span><span class="pun">,</span><span class="pln"> train_dataloader</span><span class="pun">,</span><span class="pln"> valid_dataloader</span><span class="pun">,</span><span class="pln"> epochs</span><span class="pun">=</span><span class="lit">100</span><span class="pun">,</span><span class="pln"> model_name</span><span class="pun">=</span><span class="str">"my_modeldrop"</span><span class="pun">):</span></li><li class="L1"><span class="pln"> cum_loss_list </span><span class="pun">=</span><span class="pln"> </span><span class="pun">[]</span></li><li class="L2"><span class="pln"> acc_epoch </span><span class="pun">=</span><span class="pln"> </span><span class="pun">[]</span></li><li class="L3"><span class="pln"> best_acc </span><span class="pun">=</span><span class="pln"> </span><span class="lit">0</span></li><li class="L4"><span class="pln"> file_name </span><span class="pun">=</span><span class="pln"> model_name</span></li><li class="L5"><span class="pln"> </span><span class="kwd">for</span><span class="pln"> epoch </span><span class="kwd">in</span><span class="pln"> tqdm</span><span class="pun">(</span><span class="pln">range</span><span class="pun">(</span><span class="lit">1</span><span class="pun">,</span><span class="pln"> epochs </span><span class="pun">+</span><span class="pln"> </span><span class="lit">1</span><span class="pun">)):</span></li><li class="L6"><span class="pln"> model</span><span class="pun">.</span><span class="pln">train</span><span class="pun">()</span></li><li class="L7"><span class="pln"> cum_loss </span><span class="pun">=</span><span class="pln"> </span><span class="lit">0</span></li><li class="L8"><span class="pln"> </span><span class="kwd">for</span><span class="pln"> _</span><span class="pun">,</span><span class="pln"> </span><span class="pun">(</span><span class="pln">label</span><span class="pun">,</span><span class="pln"> text</span><span class="pun">)</span><span class="pln"> </span><span class="kwd">in</span><span class="pln"> enumerate</span><span class="pun">(</span><span class="pln">train_dataloader</span><span class="pun">):</span></li><li class="L9"><span class="pln"> optimizer</span><span class="pun">.</span><span class="pln">zero_grad</span><span class="pun">()</span></li><li class="L0"><span class="pln"> predicted_label </span><span class="pun">=</span><span class="pln"> model</span><span class="pun">(</span><span class="pln">text</span><span class="pun">)</span></li><li class="L1"><span class="pln"> loss </span><span class="pun">=</span><span class="pln"> criterion</span><span class="pun">(</span><span class="pln">predicted_label</span><span class="pun">,</span><span class="pln"> label</span><span class="pun">)</span></li><li class="L2"><span class="pln"> loss</span><span class="pun">.</span><span class="pln">backward</span><span class="pun">()</span></li><li class="L3"><span class="pln"> torch</span><span class="pun">.</span><span class="pln">nn</span><span class="pun">.</span><span class="pln">utils</span><span class="pun">.</span><span class="pln">clip_grad_norm_</span><span class="pun">(</span><span class="pln">model</span><span class="pun">.</span><span class="pln">parameters</span><span class="pun">(),</span><span class="pln"> </span><span class="lit">0.1</span><span class="pun">)</span></li><li class="L4"><span class="pln"> optimizer</span><span class="pun">.</span><span class="pln">step</span><span class="pun">()</span></li><li class="L5"><span class="pln"> cum_loss </span><span class="pun">+=</span><span class="pln"> loss</span><span class="pun">.</span><span class="pln">item</span><span class="pun">()</span></li><li class="L6"><span class="pln"> </span><span class="com">#print("Loss:", cum_loss)</span></li><li class="L7"><span class="pln"> cum_loss_list</span><span class="pun">.</span><span class="pln">append</span><span class="pun">(</span><span class="pln">cum_loss</span><span class="pun">)</span></li><li class="L8"><span class="pln"> acc_val </span><span class="pun">=</span><span class="pln"> evaluate</span><span class="pun">(</span><span class="pln">valid_dataloader</span><span class="pun">,</span><span class="pln"> model</span><span class="pun">,</span><span class="pln"> device</span><span class="pun">)</span></li><li class="L9"><span class="pln"> acc_epoch</span><span class="pun">.</span><span class="pln">append</span><span class="pun">(</span><span class="pln">acc_val</span><span class="pun">)</span></li><li class="L0"><span class="pln"> </span><span class="kwd">if</span><span class="pln"> acc_val </span><span class="pun">></span><span class="pln"> best_acc</span><span class="pun">:</span></li><li class="L1"><span class="pln"> best_acc </span><span class="pun">=</span><span class="pln"> acc_val</span></li><li class="L2"><span class="pln"> </span><span class="kwd">print</span><span class="pun">(</span><span class="pln">f</span><span class="str">"New best accuracy: {acc_val:.4f}"</span><span class="pun">)</span></li><li class="L3"><span class="pln"> </span><span class="com">#torch.save(model.state_dict(), f"{model_name}.pth")</span></li><li class="L4"><span class="pln"> </span><span class="com">#save_list_to_file(cum_loss_list, f"{model_name}_loss.pkl")</span></li><li class="L5"><span class="pln"> </span><span class="com">#save_list_to_file(acc_epoch, f"{model_name}_acc.pkl")</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-38">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>def plot_matrix_and_subspace(F)</td>
<td>The code snippet is useful for understanding the vectors in the 3D space.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li></ol><ol class="linenums"><li class="L0"><span class="kwd">def</span><span class="pln"> plot_matrix_and_subspace</span><span class="pun">(</span><span class="pln">F</span><span class="pun">):</span></li><li class="L1"><span class="pln"> </span><span class="kwd">assert</span><span class="pln"> F</span><span class="pun">.</span><span class="pln">shape</span><span class="pun">[</span><span class="lit">0</span><span class="pun">]</span><span class="pln"> </span><span class="pun">==</span><span class="pln"> </span><span class="lit">3</span><span class="pun">,</span><span class="pln"> </span><span class="str">"Matrix F must have rows equal to 3 for 3D visualization."</span></li><li class="L2"><span class="pln"> ax </span><span class="pun">=</span><span class="pln"> plt</span><span class="pun">.</span><span class="pln">figure</span><span class="pun">().</span><span class="pln">add_subplot</span><span class="pun">(</span><span class="pln">projection</span><span class="pun">=</span><span class="str">'3d'</span><span class="pun">)</span></li><li class="L3"><span class="pln"> </span><span class="com"># Plot each column vector of F as a point and line from the origin</span></li><li class="L4"><span class="pln"> </span><span class="kwd">for</span><span class="pln"> i </span><span class="kwd">in</span><span class="pln"> range</span><span class="pun">(</span><span class="pln">F</span><span class="pun">.</span><span class="pln">shape</span><span class="pun">[</span><span class="lit">1</span><span class="pun">]):</span></li><li class="L5"><span class="pln"> ax</span><span class="pun">.</span><span class="pln">quiver</span><span class="pun">(</span><span class="lit">0</span><span class="pun">,</span><span class="pln"> </span><span class="lit">0</span><span class="pun">,</span><span class="pln"> </span><span class="lit">0</span><span class="pun">,</span><span class="pln"> F</span><span class="pun">[</span><span class="lit">0</span><span class="pun">,</span><span class="pln"> i</span><span class="pun">],</span><span class="pln"> F</span><span class="pun">[</span><span class="lit">1</span><span class="pun">,</span><span class="pln"> i</span><span class="pun">],</span><span class="pln"> F</span><span class="pun">[</span><span class="lit">2</span><span class="pun">,</span><span class="pln"> i</span><span class="pun">],</span><span class="pln"> color</span><span class="pun">=</span><span class="str">'blue'</span><span class="pun">,</span><span class="pln"> arrow_length_ratio</span><span class="pun">=</span><span class="lit">0.1</span><span class="pun">,</span><span class="pln"> label</span><span class="pun">=</span><span class="pln">f</span><span class="str">'Column {i+1}'</span><span class="pun">)</span></li><li class="L6"><span class="pln"> </span><span class="kwd">if</span><span class="pln"> F</span><span class="pun">.</span><span class="pln">shape</span><span class="pun">[</span><span class="lit">1</span><span class="pun">]</span><span class="pln"> </span><span class="pun">==</span><span class="pln"> </span><span class="lit">2</span><span class="pun">:</span></li><li class="L7"><span class="pln"> </span><span class="com"># Calculate the normal to the plane spanned by the columns of F if they are exactly two</span></li><li class="L8"><span class="pln"> normal_vector </span><span class="pun">=</span><span class="pln"> np</span><span class="pun">.</span><span class="pln">cross</span><span class="pun">(</span><span class="pln">F</span><span class="pun">[:,</span><span class="pln"> </span><span class="lit">0</span><span class="pun">],</span><span class="pln"> F</span><span class="pun">[:,</span><span class="pln"> </span><span class="lit">1</span><span class="pun">])</span></li><li class="L9"><span class="pln"> </span><span class="com"># Plot the plane</span></li><li class="L0"><span class="pln"> xx</span><span class="pun">,</span><span class="pln"> yy </span><span class="pun">=</span><span class="pln"> np</span><span class="pun">.</span><span class="pln">meshgrid</span><span class="pun">(</span><span class="pln">np</span><span class="pun">.</span><span class="pln">linspace</span><span class="pun">(-</span><span class="lit">3</span><span class="pun">,</span><span class="pln"> </span><span class="lit">3</span><span class="pun">,</span><span class="pln"> </span><span class="lit">10</span><span class="pun">),</span><span class="pln"> np</span><span class="pun">.</span><span class="pln">linspace</span><span class="pun">(-</span><span class="lit">3</span><span class="pun">,</span><span class="pln"> </span><span class="lit">3</span><span class="pun">,</span><span class="pln"> </span><span class="lit">10</span><span class="pun">))</span></li><li class="L1"><span class="pln"> zz </span><span class="pun">=</span><span class="pln"> </span><span class="pun">(-</span><span class="pln">normal_vector</span><span class="pun">[</span><span class="lit">0</span><span class="pun">]</span><span class="pln"> </span><span class="pun">*</span><span class="pln"> xx </span><span class="pun">-</span><span class="pln"> normal_vector</span><span class="pun">[</span><span class="lit">1</span><span class="pun">]</span><span class="pln"> </span><span class="pun">*</span><span class="pln"> yy</span><span class="pun">)</span><span class="pln"> </span><span class="pun">/</span><span class="pln"> normal_vector</span><span class="pun">[</span><span class="lit">2</span><span class="pun">]</span><span class="pln"> </span><span class="kwd">if</span><span class="pln"> normal_vector</span><span class="pun">[</span><span class="lit">2</span><span class="pun">]</span><span class="pln"> </span><span class="pun">!=</span><span class="pln"> </span><span class="lit">0</span><span class="pln"> </span><span class="kwd">else</span><span class="pln"> </span><span class="lit">0</span></li><li class="L2"><span class="pln"> ax</span><span class="pun">.</span><span class="pln">plot_surface</span><span class="pun">(</span><span class="pln">xx</span><span class="pun">,</span><span class="pln"> yy</span><span class="pun">,</span><span class="pln"> zz</span><span class="pun">,</span><span class="pln"> alpha</span><span class="pun">=</span><span class="lit">0.5</span><span class="pun">,</span><span class="pln"> color</span><span class="pun">=</span><span class="str">'green'</span><span class="pun">,</span><span class="pln"> label</span><span class="pun">=</span><span class="str">'Spanned Plane'</span><span class="pun">)</span></li><li class="L3"><span class="pln"> </span><span class="com"># Set plot limits and labels</span></li><li class="L4"><span class="pln"> ax</span><span class="pun">.</span><span class="pln">set_xlim</span><span class="pun">([-</span><span class="lit">3</span><span class="pun">,</span><span class="pln"> </span><span class="lit">3</span><span class="pun">])</span></li><li class="L5"><span class="pln"> ax</span><span class="pun">.</span><span class="pln">set_ylim</span><span class="pun">([-</span><span class="lit">3</span><span class="pun">,</span><span class="pln"> </span><span class="lit">3</span><span class="pun">])</span></li><li class="L6"><span class="pln"> ax</span><span class="pun">.</span><span class="pln">set_zlim</span><span class="pun">([-</span><span class="lit">3</span><span class="pun">,</span><span class="pln"> </span><span class="lit">3</span><span class="pun">])</span></li><li class="L7"><span class="pln"> ax</span><span class="pun">.</span><span class="pln">set_xlabel</span><span class="pun">(</span><span class="str">'$x_{1}$'</span><span class="pun">)</span></li><li class="L8"><span class="pln"> ax</span><span class="pun">.</span><span class="pln">set_ylabel</span><span class="pun">(</span><span class="str">'$x_{2}$'</span><span class="pun">)</span></li><li class="L9"><span class="pln"> ax</span><span class="pun">.</span><span class="pln">set_zlabel</span><span class="pun">(</span><span class="str">'$x_{3}$'</span><span class="pun">)</span></li><li class="L0"><span class="pln"> </span><span class="com">#ax.legend()</span></li><li class="L1"><span class="pln"> plt</span><span class="pun">.</span><span class="pln">show</span><span class="pun">()</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-39">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>nn.Parameter</td>
<td>The provided code is useful for defining the parameters of the
'LoRALayer' module during the training. The 'LoRALayer' has been used as
an intermediate layer in a simple neural network. </td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li></ol><ol class="linenums"><li class="L0"><span class="kwd">class</span><span class="pln"> </span><span class="typ">LoRALayer</span><span class="pun">(</span><span class="pln">torch</span><span class="pun">.</span><span class="pln">nn</span><span class="pun">.</span><span class="typ">Module</span><span class="pun">):</span></li><li class="L1"><span class="pln"> </span><span class="kwd">def</span><span class="pln"> __init__</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">,</span><span class="pln"> in_dim</span><span class="pun">,</span><span class="pln"> out_dim</span><span class="pun">,</span><span class="pln"> rank</span><span class="pun">,</span><span class="pln"> alpha</span><span class="pun">):</span></li><li class="L2"><span class="pln"> </span><span class="kwd">super</span><span class="pun">().</span><span class="pln">__init__</span><span class="pun">()</span></li><li class="L3"><span class="pln"> std_dev </span><span class="pun">=</span><span class="pln"> </span><span class="lit">1</span><span class="pln"> </span><span class="pun">/</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">sqrt</span><span class="pun">(</span><span class="pln">torch</span><span class="pun">.</span><span class="pln">tensor</span><span class="pun">(</span><span class="pln">rank</span><span class="pun">).</span><span class="kwd">float</span><span class="pun">())</span></li><li class="L4"><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">A </span><span class="pun">=</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">nn</span><span class="pun">.</span><span class="typ">Parameter</span><span class="pun">(</span><span class="pln">torch</span><span class="pun">.</span><span class="pln">randn</span><span class="pun">(</span><span class="pln">in_dim</span><span class="pun">,</span><span class="pln"> rank</span><span class="pun">)</span><span class="pln"> </span><span class="pun">*</span><span class="pln"> std_dev</span><span class="pun">)</span></li><li class="L5"><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">B </span><span class="pun">=</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">nn</span><span class="pun">.</span><span class="typ">Parameter</span><span class="pun">(</span><span class="pln">torch</span><span class="pun">.</span><span class="pln">zeros</span><span class="pun">(</span><span class="pln">rank</span><span class="pun">,</span><span class="pln"> out_dim</span><span class="pun">))</span></li><li class="L6"><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">alpha </span><span class="pun">=</span><span class="pln"> alpha</span></li><li class="L7"><span class="pln"> </span><span class="kwd">def</span><span class="pln"> forward</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">,</span><span class="pln"> x</span><span class="pun">):</span></li><li class="L8"><span class="pln"> x </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">alpha </span><span class="pun">*</span><span class="pln"> </span><span class="pun">(</span><span class="pln">x </span><span class="pun">@</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">A </span><span class="pun">@</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">B</span><span class="pun">)</span></li><li class="L9"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> x</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-40">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>LinearWithLoRA class</td>
<td>This code snippet defines the custom neural network layer called
'LoRALayer' using PyTorch. It uses 'nn.Parameter' to create learnable
parameters for optimizing the training process.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li></ol><ol class="linenums"><li class="L0"><span class="kwd">class</span><span class="pln"> </span><span class="typ">LinearWithLoRA</span><span class="pun">(</span><span class="pln">torch</span><span class="pun">.</span><span class="pln">nn</span><span class="pun">.</span><span class="typ">Module</span><span class="pun">):</span></li><li class="L1"><span class="pln"> </span><span class="kwd">def</span><span class="pln"> __init__</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">,</span><span class="pln"> linear</span><span class="pun">,</span><span class="pln"> rank</span><span class="pun">,</span><span class="pln"> alpha</span><span class="pun">):</span></li><li class="L2"><span class="pln"> </span><span class="kwd">super</span><span class="pun">().</span><span class="pln">__init__</span><span class="pun">()</span></li><li class="L3"><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">linear </span><span class="pun">=</span><span class="pln"> linear</span><span class="pun">.</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">)</span></li><li class="L4"><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">lora </span><span class="pun">=</span><span class="pln"> </span><span class="typ">LoRALayer</span><span class="pun">(</span></li><li class="L5"><span class="pln"> linear</span><span class="pun">.</span><span class="pln">in_features</span><span class="pun">,</span><span class="pln"> linear</span><span class="pun">.</span><span class="pln">out_features</span><span class="pun">,</span><span class="pln"> rank</span><span class="pun">,</span><span class="pln"> alpha</span></li><li class="L6"><span class="pln"> </span><span class="pun">).</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">)</span></li><li class="L7"><span class="pln"> </span><span class="kwd">def</span><span class="pln"> forward</span><span class="pun">(</span><span class="kwd">self</span><span class="pun">,</span><span class="pln"> x</span><span class="pun">):</span></li><li class="L8"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">linear</span><span class="pun">(</span><span class="pln">x</span><span class="pun">)</span><span class="pln"> </span><span class="pun">+</span><span class="pln"> </span><span class="kwd">self</span><span class="pun">.</span><span class="pln">lora</span><span class="pun">(</span><span class="pln">x</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-41">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>Applying LoRA</td>
<td>To fine-tune with LoRA, first, load a pretrained TextClassifier
model with LoRA (while freezing its layers), load its pretrained state
from a file, and then disable gradient updates for all its parameters to
prevent further training. Here, you will load a model that was
pretrained on the AG NEWS data set, which is a data set that has 4
classes. Note that when you initialize this model, you set num_classes
to 4. Moreover, the pretrained AG_News model was trained with the
embedding layer unfrozen. Hence, you will initialize the model with
freeze=False. Although you are initializing the model with layers
unfrozen and the wrong number of classes for your task, you will make
modifications to the model later that correct this.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li></ol><ol class="linenums"><li class="L0"><span class="kwd">from</span><span class="pln"> urllib</span><span class="pun">.</span><span class="pln">request </span><span class="kwd">import</span><span class="pln"> urlopen</span></li><li class="L1"><span class="kwd">import</span><span class="pln"> io</span></li><li class="L2"><span class="pln">model_lora</span><span class="pun">=</span><span class="typ">TextClassifier</span><span class="pun">(</span><span class="pln">num_classes</span><span class="pun">=</span><span class="lit">4</span><span class="pun">,</span><span class="pln">freeze</span><span class="pun">=</span><span class="kwd">False</span><span class="pun">)</span></li><li class="L3"><span class="pln">model_lora</span><span class="pun">.</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">)</span></li><li class="L4"><span class="pln">urlopened </span><span class="pun">=</span><span class="pln"> urlopen</span><span class="pun">(</span><span class="str">'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/uGC04Pom651hQs1XrZ0NsQ/my-model-freeze-false.pth'</span><span class="pun">)</span></li><li class="L5"><span class="pln">stream </span><span class="pun">=</span><span class="pln"> io</span><span class="pun">.</span><span class="typ">BytesIO</span><span class="pun">(</span><span class="pln">urlopened</span><span class="pun">.</span><span class="pln">read</span><span class="pun">())</span></li><li class="L6"><span class="pln">state_dict </span><span class="pun">=</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">load</span><span class="pun">(</span><span class="pln">stream</span><span class="pun">,</span><span class="pln"> map_location</span><span class="pun">=</span><span class="pln">device</span><span class="pun">)</span></li><li class="L7"><span class="pln">model_lora</span><span class="pun">.</span><span class="pln">load_state_dict</span><span class="pun">(</span><span class="pln">state_dict</span><span class="pun">)</span></li><li class="L8"><span class="com"># Here, you freeze all layers:</span></li><li class="L9"><span class="kwd">for</span><span class="pln"> parm </span><span class="kwd">in</span><span class="pln"> model_lora</span><span class="pun">.</span><span class="pln">parameters</span><span class="pun">():</span></li><li class="L0"><span class="pln"> parm</span><span class="pun">.</span><span class="pln">requires_grad</span><span class="pun">=</span><span class="kwd">False</span></li><li class="L1"><span class="pln">model_lora</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-42">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>Select rank and alpha</td>
<td>The given code spinet evaluates the performance of a text
classification model varying configurations of 'LoRALayer'. It assesses
the combination of rank and alpha hyperparameters, trains the model, and
records the accuracy of each configuration.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li><li>23</li><li>24</li><li>25</li><li>26</li><li>27</li><li>28</li><li>29</li><li>30</li><li>31</li><li>32</li><li>33</li><li>34</li><li>35</li><li>36</li><li>37</li><li>38</li></ol><ol class="linenums"><li class="L0"><span class="pln">ranks </span><span class="pun">=</span><span class="pln"> </span><span class="pun">[</span><span class="lit">1</span><span class="pun">,</span><span class="pln"> </span><span class="lit">2</span><span class="pun">,</span><span class="pln"> </span><span class="lit">5</span><span class="pun">,</span><span class="pln"> </span><span class="lit">10</span><span class="pun">]</span></li><li class="L1"><span class="pln">alphas </span><span class="pun">=</span><span class="pln"> </span><span class="pun">[</span><span class="lit">0.1</span><span class="pun">,</span><span class="pln"> </span><span class="lit">0.5</span><span class="pun">,</span><span class="pln"> </span><span class="lit">1.0</span><span class="pun">,</span><span class="pln"> </span><span class="lit">2.0</span><span class="pun">,</span><span class="pln"> </span><span class="lit">5.0</span><span class="pun">]</span></li><li class="L2"><span class="pln">results</span><span class="pun">=[]</span></li><li class="L3"><span class="pln">accuracy_old</span><span class="pun">=</span><span class="lit">0</span></li><li class="L4"><span class="com"># Loop over each combination of 'r' and 'alpha'</span></li><li class="L5"><span class="kwd">for</span><span class="pln"> r </span><span class="kwd">in</span><span class="pln"> ranks</span><span class="pun">:</span></li><li class="L6"><span class="pln"> </span><span class="kwd">for</span><span class="pln"> alpha </span><span class="kwd">in</span><span class="pln"> alphas</span><span class="pun">:</span></li><li class="L7"><span class="pln"> </span><span class="kwd">print</span><span class="pun">(</span><span class="pln">f</span><span class="str">"Testing with rank = {r} and alpha = {alpha}"</span><span class="pun">)</span></li><li class="L8"><span class="pln"> model_name</span><span class="pun">=</span><span class="pln">f</span><span class="str">"model_lora_rank{r}_alpha{alpha}_AGtoIBDM_final_adam_"</span></li><li class="L9"><span class="pln"> model_lora</span><span class="pun">=</span><span class="typ">TextClassifier</span><span class="pun">(</span><span class="pln">num_classes</span><span class="pun">=</span><span class="lit">4</span><span class="pun">,</span><span class="pln">freeze</span><span class="pun">=</span><span class="kwd">False</span><span class="pun">)</span></li><li class="L0"><span class="pln"> model_lora</span><span class="pun">.</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">)</span></li><li class="L1"><span class="pln"> urlopened </span><span class="pun">=</span><span class="pln"> urlopen</span><span class="pun">(</span><span class="str">'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/uGC04Pom651hQs1XrZ0NsQ/my-model-freeze-false.pth'</span><span class="pun">)</span></li><li class="L2"><span class="pln"> stream </span><span class="pun">=</span><span class="pln"> io</span><span class="pun">.</span><span class="typ">BytesIO</span><span class="pun">(</span><span class="pln">urlopened</span><span class="pun">.</span><span class="pln">read</span><span class="pun">())</span></li><li class="L3"><span class="pln"> state_dict </span><span class="pun">=</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">load</span><span class="pun">(</span><span class="pln">stream</span><span class="pun">,</span><span class="pln"> map_location</span><span class="pun">=</span><span class="pln">device</span><span class="pun">)</span></li><li class="L4"><span class="pln"> model_lora</span><span class="pun">.</span><span class="pln">load_state_dict</span><span class="pun">(</span><span class="pln">state_dict</span><span class="pun">)</span></li><li class="L5"><span class="pln"> </span><span class="kwd">for</span><span class="pln"> parm </span><span class="kwd">in</span><span class="pln"> model_lora</span><span class="pun">.</span><span class="pln">parameters</span><span class="pun">():</span></li><li class="L6"><span class="pln"> parm</span><span class="pun">.</span><span class="pln">requires_grad</span><span class="pun">=</span><span class="kwd">False</span></li><li class="L7"><span class="pln"> model_lora</span><span class="pun">.</span><span class="pln">fc2</span><span class="pun">=</span><span class="pln">nn</span><span class="pun">.</span><span class="typ">Linear</span><span class="pun">(</span><span class="pln">in_features</span><span class="pun">=</span><span class="lit">128</span><span class="pun">,</span><span class="pln"> out_features</span><span class="pun">=</span><span class="lit">2</span><span class="pun">,</span><span class="pln"> bias</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">)</span></li><li class="L8"><span class="pln"> model_lora</span><span class="pun">.</span><span class="pln">fc1</span><span class="pun">=</span><span class="typ">LinearWithLoRA</span><span class="pun">(</span><span class="pln">model_lora</span><span class="pun">.</span><span class="pln">fc1</span><span class="pun">,</span><span class="pln">rank</span><span class="pun">=</span><span class="pln">r</span><span class="pun">,</span><span class="pln"> alpha</span><span class="pun">=</span><span class="pln">alpha </span><span class="pun">)</span></li><li class="L9"><span class="pln"> optimizer </span><span class="pun">=</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">optim</span><span class="pun">.</span><span class="typ">Adam</span><span class="pun">(</span><span class="pln">model_lora</span><span class="pun">.</span><span class="pln">parameters</span><span class="pun">(),</span><span class="pln"> lr</span><span class="pun">=</span><span class="pln">LR</span><span class="pun">)</span></li><li class="L0"><span class="pln"> scheduler </span><span class="pun">=</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">optim</span><span class="pun">.</span><span class="pln">lr_scheduler</span><span class="pun">.</span><span class="typ">ExponentialLR</span><span class="pun">(</span><span class="pln">optimizer</span><span class="pun">,</span><span class="pln"> gamma</span><span class="pun">=</span><span class="lit">0.1</span><span class="pun">)</span></li><li class="L1"><span class="pln"> model_lora</span><span class="pun">.</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">)</span></li><li class="L2"><span class="pln"> train_model</span><span class="pun">(</span><span class="pln">model_lora</span><span class="pun">,</span><span class="pln"> optimizer</span><span class="pun">,</span><span class="pln"> criterion</span><span class="pun">,</span><span class="pln"> train_dataloader</span><span class="pun">,</span><span class="pln"> valid_dataloader</span><span class="pun">,</span><span class="pln"> epochs</span><span class="pun">=</span><span class="lit">300</span><span class="pun">,</span><span class="pln"> model_name</span><span class="pun">=</span><span class="pln">model_name</span><span class="pun">)</span></li><li class="L3"><span class="pln"> accuracy</span><span class="pun">=</span><span class="pln">evaluate</span><span class="pun">(</span><span class="pln">valid_dataloader </span><span class="pun">,</span><span class="pln"> model_lora</span><span class="pun">,</span><span class="pln"> device</span><span class="pun">)</span></li><li class="L4"><span class="pln"> result </span><span class="pun">=</span><span class="pln"> </span><span class="pun">{</span></li><li class="L5"><span class="pln"> </span><span class="str">'rank'</span><span class="pun">:</span><span class="pln"> r</span><span class="pun">,</span></li><li class="L6"><span class="pln"> </span><span class="str">'alpha'</span><span class="pun">:</span><span class="pln"> alpha</span><span class="pun">,</span></li><li class="L7"><span class="pln"> </span><span class="str">'accuracy'</span><span class="pun">:</span><span class="pln">accuracy</span></li><li class="L8"><span class="pln"> </span><span class="pun">}</span></li><li class="L9"><span class="pln"> </span><span class="com"># Append the dictionary to the results list</span></li><li class="L0"><span class="pln"> results</span><span class="pun">.</span><span class="pln">append</span><span class="pun">(</span><span class="pln">result</span><span class="pun">)</span></li><li class="L1"><span class="pln"> </span><span class="kwd">if</span><span class="pln"> accuracy</span><span class="pun">></span><span class="pln">accuracy_old</span><span class="pun">:</span></li><li class="L2"><span class="pln"> </span><span class="kwd">print</span><span class="pun">(</span><span class="pln">f</span><span class="str">"Testing with rank = {r} and alpha = {alpha}"</span><span class="pun">)</span></li><li class="L3"><span class="pln"> </span><span class="kwd">print</span><span class="pun">(</span><span class="pln">f</span><span class="str">"accuracy: {accuracy} accuracy_old: {accuracy_old}"</span><span class="pln"> </span><span class="pun">)</span></li><li class="L4"><span class="pln"> accuracy_old</span><span class="pun">=</span><span class="pln">accuracy</span></li><li class="L5"><span class="pln"> torch</span><span class="pun">.</span><span class="pln">save</span><span class="pun">(</span><span class="pln">model</span><span class="pun">.</span><span class="pln">state_dict</span><span class="pun">(),</span><span class="pln"> f</span><span class="str">"{model_name}.pth"</span><span class="pun">)</span></li><li class="L6"><span class="pln"> save_list_to_file</span><span class="pun">(</span><span class="pln">cum_loss_list</span><span class="pun">,</span><span class="pln"> f</span><span class="str">"{model_name}_loss.pkl"</span><span class="pun">)</span></li><li class="L7"><span class="pln"> save_list_to_file</span><span class="pun">(</span><span class="pln">acc_epoch</span><span class="pun">,</span><span class="pln"> f</span><span class="str">"{model_name}_acc.pkl"</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-43">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>model_lora model</td>
<td>Sets up the training components for the model, defining a learning
rate of 1, using cross-entropy loss as the criterion, optimizing with
stochastic gradient descent (SGD), and scheduling the learning rate to
decay by a factor of 0.1 at each epoch.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li></ol><ol class="linenums"><li class="L0"><span class="pln">LR</span><span class="pun">=</span><span class="lit">1</span></li><li class="L1"><span class="pln">criterion </span><span class="pun">=</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">nn</span><span class="pun">.</span><span class="typ">CrossEntropyLoss</span><span class="pun">()</span></li><li class="L2"><span class="pln">optimizer </span><span class="pun">=</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">optim</span><span class="pun">.</span><span class="pln">SGD</span><span class="pun">(</span><span class="pln">model_lora</span><span class="pun">.</span><span class="pln">parameters</span><span class="pun">(),</span><span class="pln"> lr</span><span class="pun">=</span><span class="pln">LR</span><span class="pun">)</span></li><li class="L3"><span class="pln">scheduler </span><span class="pun">=</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">optim</span><span class="pun">.</span><span class="pln">lr_scheduler</span><span class="pun">.</span><span class="typ">StepLR</span><span class="pun">(</span><span class="pln">optimizer</span><span class="pun">,</span><span class="pln"> </span><span class="lit">1.0</span><span class="pun">,</span><span class="pln"> gamma</span><span class="pun">=</span><span class="lit">0.1</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-44">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>load_dataset</td>
<td>The data set is loaded using the load_dataset function from the data set's library, specifically loading the "train" split.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li></ol><ol class="linenums"><li class="L0"><span class="pln">dataset_name </span><span class="pun">=</span><span class="pln"> </span><span class="str">"imdb"</span></li><li class="L1"><span class="pln">ds </span><span class="pun">=</span><span class="pln"> load_dataset</span><span class="pun">(</span><span class="pln">dataset_name</span><span class="pun">,</span><span class="pln"> split </span><span class="pun">=</span><span class="pln"> </span><span class="str">"train"</span><span class="pun">)</span></li><li class="L2"><span class="pln">N </span><span class="pun">=</span><span class="pln"> </span><span class="lit">5</span></li><li class="L3"><span class="kwd">for</span><span class="pln"> sample </span><span class="kwd">in</span><span class="pln"> range</span><span class="pun">(</span><span class="pln">N</span><span class="pun">):</span></li><li class="L4"><span class="pln"> </span><span class="kwd">print</span><span class="pun">(</span><span class="str">'text'</span><span class="pun">,</span><span class="pln">ds</span><span class="pun">[</span><span class="pln">sample</span><span class="pun">][</span><span class="str">'text'</span><span class="pun">])</span></li><li class="L5"><span class="pln"> </span><span class="kwd">print</span><span class="pun">(</span><span class="str">'label'</span><span class="pun">,</span><span class="pln">ds</span><span class="pun">[</span><span class="pln">sample</span><span class="pun">][</span><span class="str">'label'</span><span class="pun">])</span></li><li class="L6"><span class="pln">ds </span><span class="pun">=</span><span class="pln"> ds</span><span class="pun">.</span><span class="pln">rename_columns</span><span class="pun">({</span><span class="str">"text"</span><span class="pun">:</span><span class="pln"> </span><span class="str">"review"</span><span class="pun">})</span></li><li class="L7"><span class="pln">ds</span></li><li class="L8"><span class="pln">ds </span><span class="pun">=</span><span class="pln"> ds</span><span class="pun">.</span><span class="pln">filter</span><span class="pun">(</span><span class="kwd">lambda</span><span class="pln"> x</span><span class="pun">:</span><span class="pln"> len</span><span class="pun">(</span><span class="pln">x</span><span class="pun">[</span><span class="str">"review"</span><span class="pun">])</span><span class="pln"> </span><span class="pun">></span><span class="pln"> </span><span class="lit">200</span><span class="pun">,</span><span class="pln"> batched</span><span class="pun">=</span><span class="kwd">False</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-45">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>build_dataset</td>
<td>Incorporates the necessary steps to build a data set object for use as an input to PPOTrainer.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li><li>23</li><li>24</li><li>25</li><li>26</li><li>27</li><li>28</li><li>29</li></ol><ol class="linenums"><li class="L0"><span class="kwd">del</span><span class="pun">(</span><span class="pln">ds</span><span class="pun">)</span></li><li class="L1"><span class="pln">dataset_name</span><span class="pun">=</span><span class="str">"imdb"</span></li><li class="L2"><span class="pln">ds </span><span class="pun">=</span><span class="pln"> load_dataset</span><span class="pun">(</span><span class="pln">dataset_name</span><span class="pun">,</span><span class="pln"> split</span><span class="pun">=</span><span class="str">"train"</span><span class="pun">)</span></li><li class="L3"><span class="pln">ds </span><span class="pun">=</span><span class="pln"> ds</span><span class="pun">.</span><span class="pln">rename_columns</span><span class="pun">({</span><span class="str">"text"</span><span class="pun">:</span><span class="pln"> </span><span class="str">"review"</span><span class="pun">})</span></li><li class="L4"><span class="kwd">def</span><span class="pln"> build_dataset</span><span class="pun">(</span><span class="pln">config</span><span class="pun">,</span><span class="pln"> dataset_name</span><span class="pun">=</span><span class="str">"imdb"</span><span class="pun">,</span><span class="pln"> input_min_text_length</span><span class="pun">=</span><span class="lit">2</span><span class="pun">,</span><span class="pln"> input_max_text_length</span><span class="pun">=</span><span class="lit">8</span><span class="pun">,</span><span class="pln">tokenizer</span><span class="pun">=</span><span class="pln">tokenizer</span><span class="pun">):</span></li><li class="L5"><span class="pln"> </span><span class="str">"""</span></li><li class="L6"><span class="str"> Build dataset for training. This builds the dataset from `load_dataset`, one should</span></li><li class="L7"><span class="str"> customize this function to train the model on its own dataset.</span></li><li class="L8"><span class="str"> Args:</span></li><li class="L9"><span class="str"> dataset_name (`str`):</span></li><li class="L0"><span class="str"> The name of the dataset to be loaded.</span></li><li class="L1"><span class="str"> Returns:</span></li><li class="L2"><span class="str"> dataloader (`torch.utils.data.DataLoader`):</span></li><li class="L3"><span class="str"> The dataloader for the dataset.</span></li><li class="L4"><span class="str"> """</span></li><li class="L5"><span class="pln"> tokenizer </span><span class="pun">=</span><span class="pln"> </span><span class="typ">AutoTokenizer</span><span class="pun">.</span><span class="pln">from_pretrained</span><span class="pun">(</span><span class="pln">config</span><span class="pun">.</span><span class="pln">model_name</span><span class="pun">)</span></li><li class="L6"><span class="pln"> tokenizer</span><span class="pun">.</span><span class="pln">pad_token </span><span class="pun">=</span><span class="pln"> tokenizer</span><span class="pun">.</span><span class="pln">eos_token</span></li><li class="L7"><span class="pln"> </span><span class="com"># load imdb with datasets</span></li><li class="L8"><span class="pln"> ds </span><span class="pun">=</span><span class="pln"> load_dataset</span><span class="pun">(</span><span class="pln">dataset_name</span><span class="pun">,</span><span class="pln"> split</span><span class="pun">=</span><span class="str">"train"</span><span class="pun">)</span></li><li class="L9"><span class="pln"> ds </span><span class="pun">=</span><span class="pln"> ds</span><span class="pun">.</span><span class="pln">rename_columns</span><span class="pun">({</span><span class="str">"text"</span><span class="pun">:</span><span class="pln"> </span><span class="str">"review"</span><span class="pun">})</span></li><li class="L0"><span class="pln"> ds </span><span class="pun">=</span><span class="pln"> ds</span><span class="pun">.</span><span class="pln">filter</span><span class="pun">(</span><span class="kwd">lambda</span><span class="pln"> x</span><span class="pun">:</span><span class="pln"> len</span><span class="pun">(</span><span class="pln">x</span><span class="pun">[</span><span class="str">"review"</span><span class="pun">])</span><span class="pln"> </span><span class="pun">></span><span class="pln"> </span><span class="lit">200</span><span class="pun">,</span><span class="pln"> batched</span><span class="pun">=</span><span class="kwd">False</span><span class="pun">)</span></li><li class="L1"><span class="pln"> input_size </span><span class="pun">=</span><span class="pln"> </span><span class="typ">LengthSampler</span><span class="pun">(</span><span class="pln">input_min_text_length</span><span class="pun">,</span><span class="pln"> input_max_text_length</span><span class="pun">)</span></li><li class="L2"><span class="pln"> </span><span class="kwd">def</span><span class="pln"> tokenize</span><span class="pun">(</span><span class="pln">sample</span><span class="pun">):</span></li><li class="L3"><span class="pln"> sample</span><span class="pun">[</span><span class="str">"input_ids"</span><span class="pun">]</span><span class="pln"> </span><span class="pun">=</span><span class="pln"> tokenizer</span><span class="pun">.</span><span class="pln">encode</span><span class="pun">(</span><span class="pln">sample</span><span class="pun">[</span><span class="str">"review"</span><span class="pun">])[:</span><span class="pln"> input_size</span><span class="pun">()]</span></li><li class="L4"><span class="pln"> sample</span><span class="pun">[</span><span class="str">"query"</span><span class="pun">]</span><span class="pln"> </span><span class="pun">=</span><span class="pln"> tokenizer</span><span class="pun">.</span><span class="pln">decode</span><span class="pun">(</span><span class="pln">sample</span><span class="pun">[</span><span class="str">"input_ids"</span><span class="pun">])</span></li><li class="L5"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> sample</span></li><li class="L6"><span class="pln"> ds </span><span class="pun">=</span><span class="pln"> ds</span><span class="pun">.</span><span class="pln">map</span><span class="pun">(</span><span class="pln">tokenize</span><span class="pun">,</span><span class="pln"> batched</span><span class="pun">=</span><span class="kwd">False</span><span class="pun">)</span></li><li class="L7"><span class="pln"> ds</span><span class="pun">.</span><span class="pln">set_format</span><span class="pun">(</span><span class="pln">type</span><span class="pun">=</span><span class="str">"torch"</span><span class="pun">)</span></li><li class="L8"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> ds</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-46">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>Text generation function</td>
<td>Tokenizes input text, generates a response, and decodes it.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li></ol><ol class="linenums"><li class="L0"><span class="pln">gen_kwargs </span><span class="pun">=</span><span class="pln"> </span><span class="pun">{</span><span class="str">"min_length"</span><span class="pun">:</span><span class="pln"> </span><span class="pun">-</span><span class="lit">1</span><span class="pun">,</span><span class="pln"> </span><span class="str">"top_k"</span><span class="pun">:</span><span class="pln"> </span><span class="lit">0.0</span><span class="pun">,</span><span class="pln"> </span><span class="str">"top_p"</span><span class="pun">:</span><span class="pln"> </span><span class="lit">1.0</span><span class="pun">,</span><span class="pln"> </span><span class="str">"do_sample"</span><span class="pun">:</span><span class="pln"> </span><span class="kwd">True</span><span class="pun">,</span><span class="pln"> </span><span class="str">"pad_token_id"</span><span class="pun">:</span><span class="pln"> tokenizer</span><span class="pun">.</span><span class="pln">eos_token_id</span><span class="pun">}</span></li><li class="L1"><span class="kwd">def</span><span class="pln"> generate_some_text</span><span class="pun">(</span><span class="pln">input_text</span><span class="pun">,</span><span class="pln">my_model</span><span class="pun">):</span></li><li class="L2"><span class="com"># Tokenize the input text</span></li><li class="L3"><span class="pln"> input_ids </span><span class="pun">=</span><span class="pln"> tokenizer</span><span class="pun">(</span><span class="pln">input_text</span><span class="pun">,</span><span class="pln"> return_tensors</span><span class="pun">=</span><span class="str">'pt'</span><span class="pun">).</span><span class="pln">input_ids</span><span class="pun">.</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">)</span></li><li class="L4"><span class="pln"> generated_ids </span><span class="pun">=</span><span class="pln"> my_model</span><span class="pun">.</span><span class="pln">generate</span><span class="pun">(</span><span class="pln">input_ids</span><span class="pun">,**</span><span class="pln">gen_kwargs </span><span class="pun">)</span></li><li class="L5"><span class="pln"> </span><span class="com"># Decode the generated text</span></li><li class="L6"><span class="pln"> generated_text_ </span><span class="pun">=</span><span class="pln"> tokenizer</span><span class="pun">.</span><span class="pln">decode</span><span class="pun">(</span><span class="pln">generated_ids</span><span class="pun">[</span><span class="lit">0</span><span class="pun">],</span><span class="pln"> skip_special_tokens</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">)</span></li><li class="L7"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> generated_text_</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-47">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>Tokenizing data</td>
<td>This code snippet defines a function 'compare_models_on_dataset' for
comparing the performance of two models by initializing generation
parameters and setting the batch size, preparing the data set in the
pandas format, and sampling the batch queries.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li></ol><ol class="linenums"><li class="L0"><span class="com"># Instantiate a tokenizer using the BERT base cased model</span></li><li class="L1"><span class="pln">tokenizer </span><span class="pun">=</span><span class="pln"> </span><span class="typ">AutoTokenizer</span><span class="pun">.</span><span class="pln">from_pretrained</span><span class="pun">(</span><span class="str">"bert-base-cased"</span><span class="pun">)</span></li><li class="L2"><span class="com"># Define a function to tokenize examples</span></li><li class="L3"><span class="kwd">def</span><span class="pln"> tokenize_function</span><span class="pun">(</span><span class="pln">examples</span><span class="pun">):</span></li><li class="L4"><span class="pln"> </span><span class="com"># Tokenize the text using the tokenizer</span></li><li class="L5"><span class="pln"> </span><span class="com"># Apply padding to ensure all sequences have the same length</span></li><li class="L6"><span class="pln"> </span><span class="com"># Apply truncation to limit the maximum sequence length</span></li><li class="L7"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> tokenizer</span><span class="pun">(</span><span class="pln">examples</span><span class="pun">[</span><span class="str">"text"</span><span class="pun">],</span><span class="pln"> padding</span><span class="pun">=</span><span class="str">"max_length"</span><span class="pun">,</span><span class="pln"> truncation</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">)</span></li><li class="L8"><span class="com"># Apply the tokenize function to the dataset in batches</span></li><li class="L9"><span class="pln">tokenized_datasets </span><span class="pun">=</span><span class="pln"> dataset</span><span class="pun">.</span><span class="pln">map</span><span class="pun">(</span><span class="pln">tokenize_function</span><span class="pun">,</span><span class="pln"> batched</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-48">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>Training loop</td>
<td>The train_model function trains a model using a set of training data
provided through a dataloader. It begins by setting up a progress bar
to help monitor the training progress visually. The model is switched to
training mode, which is necessary for certain model behaviors like
dropout to work correctly during training. The function processes the
data in batches for each epoch, which involves several steps for each
batch: transferring the data to the correct device (like a GPU), running
the data through the model to get outputs and calculate loss, updating
the model's parameters using the calculated gradients, adjusting the
learning rate, and clearing the old gradients.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li><li>23</li><li>24</li><li>25</li><li>26</li><li>27</li><li>28</li><li>29</li><li>30</li><li>31</li><li>32</li><li>33</li><li>34</li><li>35</li></ol><ol class="linenums"><li class="L0"><span class="kwd">def</span><span class="pln"> train_model</span><span class="pun">(</span><span class="pln">model</span><span class="pun">,</span><span class="pln">tr_dataloader</span><span class="pun">):</span></li><li class="L1"><span class="pln"> </span><span class="com"># Create a progress bar to track the training progress</span></li><li class="L2"><span class="pln"> progress_bar </span><span class="pun">=</span><span class="pln"> tqdm</span><span class="pun">(</span><span class="pln">range</span><span class="pun">(</span><span class="pln">num_training_steps</span><span class="pun">))</span></li><li class="L3"><span class="pln"> </span><span class="com"># Set the model in training mode</span></li><li class="L4"><span class="pln"> model</span><span class="pun">.</span><span class="pln">train</span><span class="pun">()</span></li><li class="L5"><span class="pln"> tr_losses</span><span class="pun">=[]</span></li><li class="L6"><span class="pln"> </span><span class="com"># Training loop</span></li><li class="L7"><span class="pln"> </span><span class="kwd">for</span><span class="pln"> epoch </span><span class="kwd">in</span><span class="pln"> range</span><span class="pun">(</span><span class="pln">num_epochs</span><span class="pun">):</span></li><li class="L8"><span class="pln"> total_loss </span><span class="pun">=</span><span class="pln"> </span><span class="lit">0</span></li><li class="L9"><span class="pln"> </span><span class="com"># Iterate over the training data batches</span></li><li class="L0"><span class="pln"> </span><span class="kwd">for</span><span class="pln"> batch </span><span class="kwd">in</span><span class="pln"> tr_dataloader</span><span class="pun">:</span></li><li class="L1"><span class="pln"> </span><span class="com"># Move the batch to the appropriate device</span></li><li class="L2"><span class="pln"> batch </span><span class="pun">=</span><span class="pln"> </span><span class="pun">{</span><span class="pln">k</span><span class="pun">:</span><span class="pln"> v</span><span class="pun">.</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">)</span><span class="pln"> </span><span class="kwd">for</span><span class="pln"> k</span><span class="pun">,</span><span class="pln"> v </span><span class="kwd">in</span><span class="pln"> batch</span><span class="pun">.</span><span class="pln">items</span><span class="pun">()}</span></li><li class="L3"><span class="pln"> </span><span class="com"># Forward pass through the model</span></li><li class="L4"><span class="pln"> outputs </span><span class="pun">=</span><span class="pln"> model</span><span class="pun">(**</span><span class="pln">batch</span><span class="pun">)</span></li><li class="L5"><span class="pln"> </span><span class="com"># Compute the loss</span></li><li class="L6"><span class="pln"> loss </span><span class="pun">=</span><span class="pln"> outputs</span><span class="pun">.</span><span class="pln">loss</span></li><li class="L7"><span class="pln"> </span><span class="com"># Backward pass (compute gradients)</span></li><li class="L8"><span class="pln"> loss</span><span class="pun">.</span><span class="pln">backward</span><span class="pun">()</span></li><li class="L9"><span class="pln"> total_loss </span><span class="pun">+=</span><span class="pln"> loss</span><span class="pun">.</span><span class="pln">item</span><span class="pun">()</span></li><li class="L0"><span class="pln"> </span><span class="com"># Update the model parameters</span></li><li class="L1"><span class="pln"> optimizer</span><span class="pun">.</span><span class="pln">step</span><span class="pun">()</span></li><li class="L2"><span class="pln"> </span><span class="com"># Update the learning rate scheduler</span></li><li class="L3"><span class="pln"> lr_scheduler</span><span class="pun">.</span><span class="pln">step</span><span class="pun">()</span></li><li class="L4"><span class="pln"> </span><span class="com"># Clear the gradients</span></li><li class="L5"><span class="pln"> optimizer</span><span class="pun">.</span><span class="pln">zero_grad</span><span class="pun">()</span></li><li class="L6"><span class="pln"> </span><span class="com"># Update the progress bar</span></li><li class="L7"><span class="pln"> progress_bar</span><span class="pun">.</span><span class="pln">update</span><span class="pun">(</span><span class="lit">1</span><span class="pun">)</span></li><li class="L8"><span class="pln"> tr_losses</span><span class="pun">.</span><span class="pln">append</span><span class="pun">(</span><span class="pln">total_loss</span><span class="pun">/</span><span class="pln">len</span><span class="pun">(</span><span class="pln">tr_dataloader</span><span class="pun">))</span></li><li class="L9"><span class="pln"> </span><span class="com">#plot loss</span></li><li class="L0"><span class="pln"> plt</span><span class="pun">.</span><span class="pln">plot</span><span class="pun">(</span><span class="pln">tr_losses</span><span class="pun">)</span></li><li class="L1"><span class="pln"> plt</span><span class="pun">.</span><span class="pln">title</span><span class="pun">(</span><span class="str">"Training loss"</span><span class="pun">)</span></li><li class="L2"><span class="pln"> plt</span><span class="pun">.</span><span class="pln">xlabel</span><span class="pun">(</span><span class="str">"Epoch"</span><span class="pun">)</span></li><li class="L3"><span class="pln"> plt</span><span class="pun">.</span><span class="pln">ylabel</span><span class="pun">(</span><span class="str">"Loss"</span><span class="pun">)</span></li><li class="L4"><span class="pln"> plt</span><span class="pun">.</span><span class="pln">show</span><span class="pun">()</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-49">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>evaluate_model function</td>
<td>Works similarly to the train_model function but is used for
evaluating the model's performance instead of training it. It uses a
dataloader to process data in batches, setting the model to evaluation
mode to ensure accuracy in measurements and disabling gradient
calculations since it's not training. The function calculates
predictions for each batch, updates an accuracy metric, and finally,
prints the overall accuracy after processing all batches.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li></ol><ol class="linenums"><li class="L0"><span class="kwd">def</span><span class="pln"> evaluate_model</span><span class="pun">(</span><span class="pln">model</span><span class="pun">,</span><span class="pln"> evl_dataloader</span><span class="pun">):</span></li><li class="L1"><span class="pln"> </span><span class="com"># Create an instance of the Accuracy metric for multiclass classification with 5 classes</span></li><li class="L2"><span class="pln"> metric </span><span class="pun">=</span><span class="pln"> </span><span class="typ">Accuracy</span><span class="pun">(</span><span class="pln">task</span><span class="pun">=</span><span class="str">"multiclass"</span><span class="pun">,</span><span class="pln"> num_classes</span><span class="pun">=</span><span class="lit">5</span><span class="pun">).</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">)</span></li><li class="L3"><span class="pln"> </span><span class="com"># Set the model in evaluation mode</span></li><li class="L4"><span class="pln"> model</span><span class="pun">.</span><span class="kwd">eval</span><span class="pun">()</span></li><li class="L5"><span class="pln"> </span><span class="com"># Disable gradient calculation during evaluation</span></li><li class="L6"><span class="pln"> </span><span class="kwd">with</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">no_grad</span><span class="pun">():</span></li><li class="L7"><span class="pln"> </span><span class="com"># Iterate over the evaluation data batches</span></li><li class="L8"><span class="pln"> </span><span class="kwd">for</span><span class="pln"> batch </span><span class="kwd">in</span><span class="pln"> evl_dataloader</span><span class="pun">:</span></li><li class="L9"><span class="pln"> </span><span class="com"># Move the batch to the appropriate device</span></li><li class="L0"><span class="pln"> batch </span><span class="pun">=</span><span class="pln"> </span><span class="pun">{</span><span class="pln">k</span><span class="pun">:</span><span class="pln"> v</span><span class="pun">.</span><span class="pln">to</span><span class="pun">(</span><span class="pln">device</span><span class="pun">)</span><span class="pln"> </span><span class="kwd">for</span><span class="pln"> k</span><span class="pun">,</span><span class="pln"> v </span><span class="kwd">in</span><span class="pln"> batch</span><span class="pun">.</span><span class="pln">items</span><span class="pun">()}</span></li><li class="L1"><span class="pln"> </span><span class="com"># Forward pass through the model</span></li><li class="L2"><span class="pln"> outputs </span><span class="pun">=</span><span class="pln"> model</span><span class="pun">(**</span><span class="pln">batch</span><span class="pun">)</span></li><li class="L3"><span class="pln"> </span><span class="com"># Get the predicted class labels</span></li><li class="L4"><span class="pln"> logits </span><span class="pun">=</span><span class="pln"> outputs</span><span class="pun">.</span><span class="pln">logits</span></li><li class="L5"><span class="pln"> predictions </span><span class="pun">=</span><span class="pln"> torch</span><span class="pun">.</span><span class="pln">argmax</span><span class="pun">(</span><span class="pln">logits</span><span class="pun">,</span><span class="pln"> dim</span><span class="pun">=-</span><span class="lit">1</span><span class="pun">)</span></li><li class="L6"><span class="pln"> </span><span class="com"># Accumulate the predictions and labels for the metric</span></li><li class="L7"><span class="pln"> metric</span><span class="pun">(</span><span class="pln">predictions</span><span class="pun">,</span><span class="pln"> batch</span><span class="pun">[</span><span class="str">"labels"</span><span class="pun">])</span></li><li class="L8"><span class="pln"> </span><span class="com"># Compute the accuracy</span></li><li class="L9"><span class="pln"> accuracy </span><span class="pun">=</span><span class="pln"> metric</span><span class="pun">.</span><span class="pln">compute</span><span class="pun">()</span></li><li class="L0"><span class="pln"> </span><span class="com"># Print the accuracy</span></li><li class="L1"><span class="pln"> </span><span class="kwd">print</span><span class="pun">(</span><span class="str">"Accuracy:"</span><span class="pun">,</span><span class="pln"> accuracy</span><span class="pun">.</span><span class="pln">item</span><span class="pun">())</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-50">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>llm_model</td>
<td>This code snippet defines function 'llm_model' for generating text
using the language model from the mistral.ai platform, specifically the
'mitral-8x7b-instruct-v01' model. The function helps in customizing
generating parameters and interacts with IBM Watson's machine learning
services. </td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li><li>23</li><li>24</li><li>25</li><li>26</li><li>27</li><li>28</li><li>29</li><li>30</li><li>31</li></ol><ol class="linenums"><li class="L0"><span class="kwd">def</span><span class="pln"> llm_model</span><span class="pun">(</span><span class="pln">prompt_txt</span><span class="pun">,</span><span class="pln"> </span><span class="kwd">params</span><span class="pun">=</span><span class="kwd">None</span><span class="pun">):</span></li><li class="L1"><span class="pln"> model_id </span><span class="pun">=</span><span class="pln"> </span><span class="str">'mistralai/mixtral-8x7b-instruct-v01'</span></li><li class="L2"><span class="pln"> default_params </span><span class="pun">=</span><span class="pln"> </span><span class="pun">{</span></li><li class="L3"><span class="pln"> </span><span class="str">"max_new_tokens"</span><span class="pun">:</span><span class="pln"> </span><span class="lit">256</span><span class="pun">,</span></li><li class="L4"><span class="pln"> </span><span class="str">"min_new_tokens"</span><span class="pun">:</span><span class="pln"> </span><span class="lit">0</span><span class="pun">,</span></li><li class="L5"><span class="pln"> </span><span class="str">"temperature"</span><span class="pun">:</span><span class="pln"> </span><span class="lit">0.5</span><span class="pun">,</span></li><li class="L6"><span class="pln"> </span><span class="str">"top_p"</span><span class="pun">:</span><span class="pln"> </span><span class="lit">0.2</span><span class="pun">,</span></li><li class="L7"><span class="pln"> </span><span class="str">"top_k"</span><span class="pun">:</span><span class="pln"> </span><span class="lit">1</span></li><li class="L8"><span class="pln"> </span><span class="pun">}</span></li><li class="L9"><span class="pln"> </span><span class="kwd">if</span><span class="pln"> </span><span class="kwd">params</span><span class="pun">:</span></li><li class="L0"><span class="pln"> default_params</span><span class="pun">.</span><span class="pln">update</span><span class="pun">(</span><span class="kwd">params</span><span class="pun">)</span></li><li class="L1"><span class="pln"> parameters </span><span class="pun">=</span><span class="pln"> </span><span class="pun">{</span></li><li class="L2"><span class="pln"> </span><span class="typ">GenParams</span><span class="pun">.</span><span class="pln">MAX_NEW_TOKENS</span><span class="pun">:</span><span class="pln"> default_params</span><span class="pun">[</span><span class="str">"max_new_tokens"</span><span class="pun">],</span><span class="pln"> </span><span class="com"># this controls the maximum number of tokens in the generated output</span></li><li class="L3"><span class="pln"> </span><span class="typ">GenParams</span><span class="pun">.</span><span class="pln">MIN_NEW_TOKENS</span><span class="pun">:</span><span class="pln"> default_params</span><span class="pun">[</span><span class="str">"min_new_tokens"</span><span class="pun">],</span><span class="pln"> </span><span class="com"># this controls the minimum number of tokens in the generated output</span></li><li class="L4"><span class="pln"> </span><span class="typ">GenParams</span><span class="pun">.</span><span class="pln">TEMPERATURE</span><span class="pun">:</span><span class="pln"> default_params</span><span class="pun">[</span><span class="str">"temperature"</span><span class="pun">],</span><span class="pln"> </span><span class="com"># this randomness or creativity of the model's responses</span></li><li class="L5"><span class="pln"> </span><span class="typ">GenParams</span><span class="pun">.</span><span class="pln">TOP_P</span><span class="pun">:</span><span class="pln"> default_params</span><span class="pun">[</span><span class="str">"top_p"</span><span class="pun">],</span></li><li class="L6"><span class="pln"> </span><span class="typ">GenParams</span><span class="pun">.</span><span class="pln">TOP_K</span><span class="pun">:</span><span class="pln"> default_params</span><span class="pun">[</span><span class="str">"top_k"</span><span class="pun">]</span></li><li class="L7"><span class="pln"> </span><span class="pun">}</span></li><li class="L8"><span class="pln"> credentials </span><span class="pun">=</span><span class="pln"> </span><span class="pun">{</span></li><li class="L9"><span class="pln"> </span><span class="str">"url"</span><span class="pun">:</span><span class="pln"> </span><span class="str">"https://us-south.ml.cloud.ibm.com"</span></li><li class="L0"><span class="pln"> </span><span class="pun">}</span></li><li class="L1"><span class="pln"> project_id </span><span class="pun">=</span><span class="pln"> </span><span class="str">"skills-network"</span></li><li class="L2"><span class="pln"> model </span><span class="pun">=</span><span class="pln"> </span><span class="typ">Model</span><span class="pun">(</span></li><li class="L3"><span class="pln"> model_id</span><span class="pun">=</span><span class="pln">model_id</span><span class="pun">,</span></li><li class="L4"><span class="pln"> </span><span class="kwd">params</span><span class="pun">=</span><span class="pln">parameters</span><span class="pun">,</span></li><li class="L5"><span class="pln"> credentials</span><span class="pun">=</span><span class="pln">credentials</span><span class="pun">,</span></li><li class="L6"><span class="pln"> project_id</span><span class="pun">=</span><span class="pln">project_id</span></li><li class="L7"><span class="pln"> </span><span class="pun">)</span></li><li class="L8"><span class="pln"> mixtral_llm </span><span class="pun">=</span><span class="pln"> </span><span class="typ">WatsonxLLM</span><span class="pun">(</span><span class="pln">model</span><span class="pun">=</span><span class="pln">model</span><span class="pun">)</span></li><li class="L9"><span class="pln"> response </span><span class="pun">=</span><span class="pln"> mixtral_llm</span><span class="pun">.</span><span class="pln">invoke</span><span class="pun">(</span><span class="pln">prompt_txt</span><span class="pun">)</span></li><li class="L0"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> response</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-51">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>class_names</td>
<td>This code snippet maps numerical labels to their corresponding
textual descriptions to classify tasks. This code helps in machine
learning to interpret the output model, where the model's predictions
are numerical and should be presented in a more human-readable format.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li></ol><ol class="linenums"><li class="L0"><span class="pln">class_names </span><span class="pun">=</span><span class="pln"> </span><span class="pun">{</span><span class="lit">0</span><span class="pun">:</span><span class="pln"> </span><span class="str">"negative"</span><span class="pun">,</span><span class="pln"> </span><span class="lit">1</span><span class="pun">:</span><span class="pln"> </span><span class="str">"positive"</span><span class="pun">}</span></li><li class="L1"><span class="pln">class_names</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-52">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>DistilBERT tokenizer</td>
<td>This code snippet uses 'AutoTokenizer' for preprocessing text data
for DistilBERT, a lighter version of BERT. It tokenizes input text into a
format suitable for model processing by converting words into token
IDs, handling special tokens, padding, and truncating sequences as
needed. </td>
<td><pre class="prettyprint linenums prettyprinted" style="padding-right: 42px;"><ol class="formatted-line-numbers"><li>1</li></ol><ol class="linenums"><li class="L0"><span class="pln">tokenizer </span><span class="pun">=</span><span class="pln"> </span><span class="typ">AutoTokenizer</span><span class="pun">.</span><span class="pln">from_pretrained</span><span class="pun">(</span><span class="str">"distilbert-base-uncased"</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block one-line"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-53">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>Tokenize input IDs</td>
<td>This code snippet tokenizes text data and inspects the resulting
token IDs, attention masks, and token type IDs for further processing
the natural language processing (NLP) tasks.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li></ol><ol class="linenums"><li class="L0"><span class="pln">my_tokens</span><span class="pun">=</span><span class="pln">tokenizer</span><span class="pun">(</span><span class="pln">imdb</span><span class="pun">[</span><span class="str">'train'</span><span class="pun">][</span><span class="lit">0</span><span class="pun">][</span><span class="str">'text'</span><span class="pun">])</span></li><li class="L1"><span class="com"># Print the tokenized input IDs</span></li><li class="L2"><span class="kwd">print</span><span class="pun">(</span><span class="str">"Input IDs:"</span><span class="pun">,</span><span class="pln"> my_tokens</span><span class="pun">[</span><span class="str">'input_ids'</span><span class="pun">])</span></li><li class="L3"><span class="com"># Print the attention mask</span></li><li class="L4"><span class="kwd">print</span><span class="pun">(</span><span class="str">"Attention Mask:"</span><span class="pun">,</span><span class="pln"> my_tokens</span><span class="pun">[</span><span class="str">'attention_mask'</span><span class="pun">])</span></li><li class="L5"><span class="com"># If token_type_ids is present, print it</span></li><li class="L6"><span class="kwd">if</span><span class="pln"> </span><span class="str">'token_type_ids'</span><span class="pln"> </span><span class="kwd">in</span><span class="pln"> my_tokens</span><span class="pun">:</span></li><li class="L7"><span class="pln"> </span><span class="kwd">print</span><span class="pun">(</span><span class="str">"Token Type IDs:"</span><span class="pun">,</span><span class="pln"> my_tokens</span><span class="pun">[</span><span class="str">'token_type_ids'</span><span class="pun">])</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-54">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>Preprocessing function tokenizer</td>
<td>This code snippet explains how to use a tokenizer for preprocessing
text data from the IMDB data set. The tokenizer is applied to review the
training data set and convert text into tokenized input IDs, an
attention mask, and token type IDs.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li></ol><ol class="linenums"><li class="L0"><span class="kwd">def</span><span class="pln"> preprocess_function</span><span class="pun">(</span><span class="pln">examples</span><span class="pun">):</span></li><li class="L1"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> tokenizer</span><span class="pun">(</span><span class="pln">examples</span><span class="pun">[</span><span class="str">"text"</span><span class="pun">],</span><span class="pln"> padding</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">,</span><span class="pln"> truncation</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">,</span><span class="pln"> max_length</span><span class="pun">=</span><span class="lit">512</span><span class="pun">)</span></li><li class="L2"><span class="pln">small_tokenized_train </span><span class="pun">=</span><span class="pln"> small_train_dataset</span><span class="pun">.</span><span class="pln">map</span><span class="pun">(</span><span class="pln">preprocess_function</span><span class="pun">,</span><span class="pln"> batched</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">)</span></li><li class="L3"><span class="pln">small_tokenized_test </span><span class="pun">=</span><span class="pln"> small_test_dataset</span><span class="pun">.</span><span class="pln">map</span><span class="pun">(</span><span class="pln">preprocess_function</span><span class="pun">,</span><span class="pln"> batched</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">)</span></li><li class="L4"><span class="pln">medium_tokenized_train </span><span class="pun">=</span><span class="pln"> medium_train_dataset</span><span class="pun">.</span><span class="pln">map</span><span class="pun">(</span><span class="pln">preprocess_function</span><span class="pun">,</span><span class="pln"> batched</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">)</span></li><li class="L5"><span class="pln">medium_tokenized_test </span><span class="pun">=</span><span class="pln"> medium_test_dataset</span><span class="pun">.</span><span class="pln">map</span><span class="pun">(</span><span class="pln">preprocess_function</span><span class="pun">,</span><span class="pln"> batched</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-55">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>compute_metrics funcion</td>
<td>Evaluates model performance using accuracy.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li></ol><ol class="linenums"><li class="L0"><span class="kwd">def</span><span class="pln"> compute_metrics</span><span class="pun">(</span><span class="pln">eval_pred</span><span class="pun">):</span></li><li class="L1"><span class="pln"> load_accuracy </span><span class="pun">=</span><span class="pln"> load_metric</span><span class="pun">(</span><span class="str">"accuracy"</span><span class="pun">,</span><span class="pln"> trust_remote_code</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">)</span></li><li class="L2"><span class="pln"> logits</span><span class="pun">,</span><span class="pln"> labels </span><span class="pun">=</span><span class="pln"> eval_pred</span></li><li class="L3"><span class="pln"> predictions </span><span class="pun">=</span><span class="pln"> np</span><span class="pun">.</span><span class="pln">argmax</span><span class="pun">(</span><span class="pln">logits</span><span class="pun">,</span><span class="pln"> axis</span><span class="pun">=-</span><span class="lit">1</span><span class="pun">)</span></li><li class="L4"><span class="pln"> accuracy </span><span class="pun">=</span><span class="pln"> load_accuracy</span><span class="pun">.</span><span class="pln">compute</span><span class="pun">(</span><span class="pln">predictions</span><span class="pun">=</span><span class="pln">predictions</span><span class="pun">,</span><span class="pln"> references</span><span class="pun">=</span><span class="pln">labels</span><span class="pun">)[</span><span class="str">"accuracy"</span><span class="pun">]</span></li><li class="L5"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> </span><span class="pun">{</span><span class="str">"accuracy"</span><span class="pun">:</span><span class="pln"> accuracy</span><span class="pun">}</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-56">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>Configure BitsAndBytes</td>
<td>Defines the quantization parameters.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li></ol><ol class="linenums"><li class="L0"><span class="pln">config_bnb </span><span class="pun">=</span><span class="pln"> </span><span class="typ">BitsAndBytesConfig</span><span class="pun">(</span></li><li class="L1"><span class="pln"> load_in_4bit</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">,</span><span class="pln"> </span><span class="com"># quantize the model to 4-bits when you load it</span></li><li class="L2"><span class="pln"> bnb_4bit_quant_type</span><span class="pun">=</span><span class="str">"nf4"</span><span class="pun">,</span><span class="pln"> </span><span class="com"># use a special 4-bit data type for weights initialized from a normal distribution</span></li><li class="L3"><span class="pln"> bnb_4bit_use_double_quant</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">,</span><span class="pln"> </span><span class="com"># nested quantization scheme to quantize the already quantized weights</span></li><li class="L4"><span class="pln"> bnb_4bit_compute_dtype</span><span class="pun">=</span><span class="pln">torch</span><span class="pun">.</span><span class="pln">bfloat16</span><span class="pun">,</span><span class="pln"> </span><span class="com"># use bfloat16 for faster computation</span></li><li class="L5"><span class="pln"> llm_int8_skip_modules</span><span class="pun">=[</span><span class="str">"classifier"</span><span class="pun">,</span><span class="pln"> </span><span class="str">"pre_classifier"</span><span class="pun">]</span><span class="pln"> </span><span class="com"># Don't convert the "classifier" and "pre_classifier" layers to 8-bit</span></li><li class="L6"><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-57">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>id2label</td>
<td>Maps IDs to text labels for the two classes in this problem.</td>
<td><pre class="prettyprint linenums prettyprinted" style="padding-right: 42px;"><ol class="formatted-line-numbers"><li>1</li></ol><ol class="linenums"><li class="L0"><span class="pln">id2label </span><span class="pun">=</span><span class="pln"> </span><span class="pun">{</span><span class="lit">0</span><span class="pun">:</span><span class="pln"> </span><span class="str">"NEGATIVE"</span><span class="pun">,</span><span class="pln"> </span><span class="lit">1</span><span class="pun">:</span><span class="pln"> </span><span class="str">"POSITIVE"</span><span class="pun">}</span></li></ol><button title="Copy" class="action-code-block copy-code-block one-line"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-58">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>label2id</td>
<td>Swaps the keys and the values to map the text labels to the IDs.</td>
<td><pre class="prettyprint linenums prettyprinted" style="padding-right: 42px;"><ol class="formatted-line-numbers"><li>1</li></ol><ol class="linenums"><li class="L0"><span class="pln">label2id </span><span class="pun">=</span><span class="pln"> dict</span><span class="pun">((</span><span class="pln">v</span><span class="pun">,</span><span class="pln">k</span><span class="pun">)</span><span class="pln"> </span><span class="kwd">for</span><span class="pln"> k</span><span class="pun">,</span><span class="pln">v </span><span class="kwd">in</span><span class="pln"> id2label</span><span class="pun">.</span><span class="pln">items</span><span class="pun">())</span></li></ol><button title="Copy" class="action-code-block copy-code-block one-line"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-59">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>model_qlora</td>
<td>This code snippet initializes a tokenizer using text data from the
IMDB data set, creates a model called model_qlora for sequence
classification using DistilBERT, and configures with id2label and
label2id mappings. This code provides two output labels, including
quantization configuration using config_bnb settings. </td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li></ol><ol class="linenums"><li class="L0"><span class="pln">model_qlora </span><span class="pun">=</span><span class="pln"> </span><span class="typ">AutoModelForSequenceClassification</span><span class="pun">.</span><span class="pln">from_pretrained</span><span class="pun">(</span><span class="str">"distilbert-base-uncased"</span><span class="pun">,</span></li><li class="L1"><span class="pln"> id2label</span><span class="pun">=</span><span class="pln">id2label</span><span class="pun">,</span></li><li class="L2"><span class="pln"> label2id</span><span class="pun">=</span><span class="pln">label2id</span><span class="pun">,</span></li><li class="L3"><span class="pln"> num_labels</span><span class="pun">=</span><span class="lit">2</span><span class="pun">,</span></li><li class="L4"><span class="pln"> quantization_config</span><span class="pun">=</span><span class="pln">config_bnb</span></li><li class="L5"><span class="pln"> </span><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-60">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>training_args</td>
<td>This code snippet initializes training arguments to train a model.
It specifies the output directory for results, sets the number of
training epochs to 10 and the learning rate to 2e-5, and defines the
batch size for training and evaluation. This code also specifies the
assessment strategies for each epoch. </td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li></ol><ol class="linenums"><li class="L0"><span class="pln">training_args </span><span class="pun">=</span><span class="pln"> </span><span class="typ">TrainingArguments</span><span class="pun">(</span></li><li class="L1"><span class="pln"> output_dir</span><span class="pun">=</span><span class="str">"./results_qlora"</span><span class="pun">,</span></li><li class="L2"><span class="pln"> num_train_epochs</span><span class="pun">=</span><span class="lit">10</span><span class="pun">,</span></li><li class="L3"><span class="pln"> per_device_train_batch_size</span><span class="pun">=</span><span class="lit">16</span><span class="pun">,</span></li><li class="L4"><span class="pln"> per_device_eval_batch_size</span><span class="pun">=</span><span class="lit">64</span><span class="pun">,</span></li><li class="L5"><span class="pln"> learning_rate</span><span class="pun">=</span><span class="lit">2e-5</span><span class="pun">,</span></li><li class="L6"><span class="pln"> evaluation_strategy</span><span class="pun">=</span><span class="str">"epoch"</span><span class="pun">,</span></li><li class="L7"><span class="pln"> weight_decay</span><span class="pun">=</span><span class="lit">0.01</span></li><li class="L8"><span class="pun">)</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-61">Copied!</span></button></pre></td>
</tr>
<tr class="odd">
<td>text_to_emb</td>
<td>Designed to convert a list of text strings into their corresponding embeddings using a pre-defined tokenizer.</td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li></ol><ol class="linenums"><li class="L0"><span class="kwd">def</span><span class="pln"> text_to_emb</span><span class="pun">(</span><span class="pln">list_of_text</span><span class="pun">,</span><span class="pln">max_input</span><span class="pun">=</span><span class="lit">512</span><span class="pun">):</span></li><li class="L1"><span class="pln"> data_token_index </span><span class="pun">=</span><span class="pln"> tokenizer</span><span class="pun">.</span><span class="pln">batch_encode_plus</span><span class="pun">(</span><span class="pln">list_of_text</span><span class="pun">,</span><span class="pln"> add_special_tokens</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">,</span><span class="pln">padding</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">,</span><span class="pln">truncation</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">,</span><span class="pln">max_length</span><span class="pun">=</span><span class="pln">max_input</span><span class="pun">)</span></li><li class="L2"><span class="pln"> question_embeddings</span><span class="pun">=</span><span class="pln">aggregate_embeddings</span><span class="pun">(</span><span class="pln">data_token_index</span><span class="pun">[</span><span class="str">'input_ids'</span><span class="pun">],</span><span class="pln"> data_token_index</span><span class="pun">[</span><span class="str">'attention_mask'</span><span class="pun">])</span></li><li class="L3"><span class="pln"> </span><span class="kwd">return</span><span class="pln"> question_embeddings</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-62">Copied!</span></button></pre></td>
</tr>
<tr class="even">
<td>model_name_or_path</td>
<td>This code snippet defines the model name to ‘gpt2’ and initializes
the token and model using the GPT-2 model. In this code, add special
tokens for padding by keeping the maximum sequence length to 1024. </td>
<td><pre class="prettyprint linenums prettyprinted" style=""><ol class="formatted-line-numbers"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li></ol><ol class="linenums"><li class="L0"><span class="com"># Define the model name or path</span></li><li class="L1"><span class="pln">model_name_or_path </span><span class="pun">=</span><span class="pln"> </span><span class="str">"gpt2"</span></li><li class="L2"><span class="com"># Initialize tokenizer and model</span></li><li class="L3"><span class="pln">tokenizer </span><span class="pun">=</span><span class="pln"> GPT2Tokenizer</span><span class="pun">.</span><span class="pln">from_pretrained</span><span class="pun">(</span><span class="pln">model_name_or_path</span><span class="pun">,</span><span class="pln"> use_fast</span><span class="pun">=</span><span class="kwd">True</span><span class="pun">)</span></li><li class="L4"><span class="pln">model </span><span class="pun">=</span><span class="pln"> GPT2ForSequenceClassification</span><span class="pun">.</span><span class="pln">from_pretrained</span><span class="pun">(</span><span class="pln">model_name_or_path</span><span class="pun">,</span><span class="pln"> num_labels</span><span class="pun">=</span><span class="lit">1</span><span class="pun">)</span></li><li class="L5"><span class="com"># Add special tokens if necessary</span></li><li class="L6"><span class="pln">tokenizer</span><span class="pun">.</span><span class="pln">pad_token </span><span class="pun">=</span><span class="pln"> tokenizer</span><span class="pun">.</span><span class="pln">eos_token</span></li><li class="L7"><span class="pln">model</span><span class="pun">.</span><span class="pln">config</span><span class="pun">.</span><span class="pln">pad_token_id </span><span class="pun">=</span><span class="pln"> model</span><span class="pun">.</span><span class="pln">config</span><span class="pun">.</span><span class="pln">eos_token_id</span></li><li class="L8"><span class="com"># Define the maximum length</span></li><li class="L9"><span class="pln">max_length </span><span class="pun">=</span><span class="pln"> </span><span class="lit">1024</span></li></ol><button title="Copy" class="action-code-block copy-code-block multiple-lines"><i class="fa fa-copy" aria-hidden="true"></i><span class="popuptext" id="md-code-block-copy-63">Copied!</span></button></pre></td>
</tr>
</tbody>
</table><footer>
<img src="render_files/ibmsn_footer.jpeg" alt="">
</footer></body></html>