-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathoutput.json
More file actions
2270 lines (2270 loc) · 388 KB
/
output.json
File metadata and controls
2270 lines (2270 loc) · 388 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
[
{
"section": "Page 1",
"content": [
{
"type": "text",
"text": "Introduction to Theory of Computation\nAnil Maheshwari Michiel Smid\nSchool of Computer Science\nCarleton University\nOttawa\nCanada\nfanil,[email protected]\nAugust 29, 2024\n"
}
]
},
{
"section": "Page 2",
"content": [
{
"type": "text",
"text": "ii Contents"
}
]
},
{
"section": "Page 3",
"content": [
{
"type": "text",
"text": "Contents\nPreface vi\n1 Introduction 1\n1.1 Purpose and motivation . . . . . . . . . . . . . . . . . . . . . 1\n1.1.1 Complexity theory . . . . . . . . . . . . . . . . . . . . 2\n1.1.2 Computability theory . . . . . . . . . . . . . . . . . . . 2\n1.1.3 Automata theory . . . . . . . . . . . . . . . . . . . . . 3\n1.1.4 This course . . . . . . . . . . . . . . . . . . . . . . . . 3\n1.2 Mathematical preliminaries . . . . . . . . . . . . . . . . . . . 4\n1.3 Proof techniques . . . . . . . . . . . . . . . . . . . . . . . . . 7\n1.3.1 Direct proofs . . . . . . . . . . . . . . . . . . . . . . . 8\n1.3.2 Constructive proofs . . . . . . . . . . . . . . . . . . . . 9\n1.3.3 Nonconstructive proofs . . . . . . . . . . . . . . . . . . 10\n1.3.4 Proofs by contradiction . . . . . . . . . . . . . . . . . . 11\n1.3.5 The pigeon hole principle . . . . . . . . . . . . . . . . . 12\n1.3.6 Proofs by induction . . . . . . . . . . . . . . . . . . . . 13\n1.3.7 More examples of proofs . . . . . . . . . . . . . . . . . 15\nExercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18\n2 Finite Automata and Regular Languages 21\n2.1 An example: Controling a toll gate . . . . . . . . . . . . . . . 21\n2.2 Deterministic \fnite automata . . . . . . . . . . . . . . . . . . 23\n2.2.1 A \frst example of a \fnite automaton . . . . . . . . . . 26\n2.2.2 A second example of a \fnite automaton . . . . . . . . 28\n2.2.3 A third example of a \fnite automaton . . . . . . . . . 29\n2.3 Regular operations . . . . . . . . . . . . . . . . . . . . . . . . 31\n2.4 Nondeterministic \fnite automata . . . . . . . . . . . . . . . . 35\n2.4.1 A \frst example . . . . . . . . . . . . . . . . . . . . . . 35"
}
]
},
{
"section": "Page 4",
"content": [
{
"type": "text",
"text": "iv Contents\n2.4.2 A second example . . . . . . . . . . . . . . . . . . . . . 37\n2.4.3 A third example . . . . . . . . . . . . . . . . . . . . . . 38\n2.4.4 De\fnition of nondeterministic \fnite automaton . . . . 39\n2.5 Equivalence of DFAs and NFAs . . . . . . . . . . . . . . . . . 41\n2.5.1 An example . . . . . . . . . . . . . . . . . . . . . . . . 44\n2.6 Closure under the regular operations . . . . . . . . . . . . . . 48\n2.7 Regular expressions . . . . . . . . . . . . . . . . . . . . . . . . 52\n2.8 Equivalence of regular expressions and regular languages . . . 57\n2.8.1 Every regular expression describes a regular language . 58\n2.8.2 Converting a DFA to a regular expression . . . . . . . 61\n2.9 The pumping lemma and nonregular languages . . . . . . . . . 68\n2.9.1 Applications of the pumping lemma . . . . . . . . . . . 70\n2.10 Higman's Theorem . . . . . . . . . . . . . . . . . . . . . . . . 77\n2.10.1 Dickson's Theorem . . . . . . . . . . . . . . . . . . . . 77\n2.10.2 Proof of Higman's Theorem . . . . . . . . . . . . . . . 78\nExercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81\n3 Context-Free Languages 91\n3.1 Context-free grammars . . . . . . . . . . . . . . . . . . . . . . 91\n3.2 Examples of context-free grammars . . . . . . . . . . . . . . . 94\n3.2.1 Properly nested parentheses . . . . . . . . . . . . . . . 94\n3.2.2 A context-free grammar for a nonregular language . . . 95\n3.2.3 A context-free grammar for the complement of a non-\nregular language . . . . . . . . . . . . . . . . . . . . . 97\n3.2.4 A context-free grammar that veri\fes addition . . . . . 98\n3.3 Regular languages are context-free . . . . . . . . . . . . . . . . 100\n3.3.1 An example . . . . . . . . . . . . . . . . . . . . . . . . 102\n3.4 Chomsky normal form . . . . . . . . . . . . . . . . . . . . . . 104\n3.4.1 An example . . . . . . . . . . . . . . . . . . . . . . . . 109\n3.5 Pushdown automata . . . . . . . . . . . . . . . . . . . . . . . 112\n3.6 Examples of pushdown automata . . . . . . . . . . . . . . . . 116\n3.6.1 Properly nested parentheses . . . . . . . . . . . . . . . 116\n3.6.2 Strings of the form 0n1n. . . . . . . . . . . . . . . . . 117\n3.6.3 Strings with bin the middle . . . . . . . . . . . . . . . 118\n3.7 Equivalence of pushdown automata and context-free grammars 120\n3.8 The pumping lemma for context-free languages . . . . . . . . 124\n3.8.1 Proof of the pumping lemma . . . . . . . . . . . . . . . 125\n3.8.2 Applications of the pumping lemma . . . . . . . . . . . 128"
}
]
},
{
"section": "Page 5",
"content": [
{
"type": "text",
"text": "Contents v\nExercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132\n4 Turing Machines and the Church-Turing Thesis 137\n4.1 De\fnition of a Turing machine . . . . . . . . . . . . . . . . . . 137\n4.2 Examples of Turing machines . . . . . . . . . . . . . . . . . . 141\n4.2.1 Accepting palindromes using one tape . . . . . . . . . 141\n4.2.2 Accepting palindromes using two tapes . . . . . . . . . 142\n4.2.3 Accepting anbncnusing one tape . . . . . . . . . . . . . 143\n4.2.4 Accepting anbncnusing tape alphabet fa;b;c;2g. . . . 145\n4.2.5 Accepting ambncmnusing one tape . . . . . . . . . . . . 147\n4.3 Multi-tape Turing machines . . . . . . . . . . . . . . . . . . . 148\n4.4 The Church-Turing Thesis . . . . . . . . . . . . . . . . . . . . 151\nExercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152\n5 Decidable and Undecidable Languages 157\n5.1 Decidability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157\n5.1.1 The language ADFA. . . . . . . . . . . . . . . . . . . . 158\n5.1.2 The language ANFA. . . . . . . . . . . . . . . . . . . . 159\n5.1.3 The language ACFG. . . . . . . . . . . . . . . . . . . . 160\n5.1.4 The language ATM. . . . . . . . . . . . . . . . . . . . 161\n5.1.5 The Halting Problem . . . . . . . . . . . . . . . . . . . 163\n5.2 Countable sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 164\n5.2.1 The Halting Problem revisited . . . . . . . . . . . . . . 168\n5.3 Rice's Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 170\n5.3.1 Proof of Rice's Theorem . . . . . . . . . . . . . . . . . 171\n5.4 Enumerability . . . . . . . . . . . . . . . . . . . . . . . . . . . 173\n5.4.1 Hilbert's problem . . . . . . . . . . . . . . . . . . . . . 174\n5.4.2 The language ATM. . . . . . . . . . . . . . . . . . . . 176\n5.5 Where does the term \\enumerable\" come from? . . . . . . . . 177\n5.6 Most languages are not enumerable . . . . . . . . . . . . . . . 180\n5.6.1 The set of enumerable languages is countable . . . . . 180\n5.6.2 The set of all languages is not countable . . . . . . . . 181\n5.6.3 There are languages that are not enumerable . . . . . . 183\n5.7 The relationship between decidable and enumerable languages 184\n5.8 A language Asuch that both AandAare not enumerable . . 186\n5.8.1 EQTMis not enumerable . . . . . . . . . . . . . . . . . 186\n5.8.2 EQTMis not enumerable . . . . . . . . . . . . . . . . . 188\nExercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189"
}
]
},
{
"section": "Page 6",
"content": [
{
"type": "text",
"text": "vi Contents\n6 Complexity Theory 197\n6.1 The running time of algorithms . . . . . . . . . . . . . . . . . 197\n6.2 The complexity class P. . . . . . . . . . . . . . . . . . . . . . 199\n6.2.1 Some examples . . . . . . . . . . . . . . . . . . . . . . 199\n6.3 The complexity class NP. . . . . . . . . . . . . . . . . . . . . 202\n6.3.1 Pis contained in NP. . . . . . . . . . . . . . . . . . . 208\n6.3.2 Deciding NP-languages in exponential time . . . . . . 208\n6.3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 211\n6.4 Non-deterministic algorithms . . . . . . . . . . . . . . . . . . 211\n6.5NP-complete languages . . . . . . . . . . . . . . . . . . . . . 213\n6.5.1 Two examples of reductions . . . . . . . . . . . . . . . 215\n6.5.2 De\fnition of NP-completeness . . . . . . . . . . . . . . 220\n6.5.3 An NP-complete domino game . . . . . . . . . . . . . 222\n6.5.4 Examples of NP-complete languages . . . . . . . . . . 231\nExercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235\n7 Summary 239"
}
]
},
{
"section": "Page 7",
"content": [
{
"type": "text",
"text": "Preface\nThis is a free textbook for an undergraduate course on the Theory of Com-\nputation, which we have been teaching at Carleton University since 2002.\nUntil the 2011/2012 academic year, this course was o\u000bered as a second-year\ncourse (COMP 2805) and was compulsory for all Computer Science students.\nStarting with the 2012/2013 academic year, the course has been downgraded\nto a third-year optional course (COMP 3803).\nWe have been developing this book since we started teaching this course.\nCurrently, we cover most of the material from Chapters 2{5 during a 12-week\nterm with three hours of classes per week.\nThe material from Chapter 6, on Complexity Theory, is taught in the\nthird-year course COMP 3804 (Design and Analysis of Algorithms). In the\nearly years of COMP 2805, we gave a two-lecture overview of Complexity\nTheory at the end of the term. Even though this overview has disappeared\nfrom the course, we decided to keep Chapter 6. This chapter has not been\nrevised/modi\fed for a long time.\nThe course as we teach it today has been in\ruenced by the following two\ntextbooks:\n\u000fIntroduction to the Theory of Computation (second edition), by Michael\nSipser, Thomson Course Technnology, Boston, 2006.\n\u000fEinf uhrung in die Theoretische Informatik, by Klaus Wagner, Springer-\nVerlag, Berlin, 1994.\nBesides reading this text, we recommend that you also take a look at\nthese excellent textbooks, as well as one or more of the following ones:\n\u000fElements of the Theory of Computation (second edition), by Harry\nLewis and Christos Papadimitriou, Prentice-Hall, 1998."
}
]
},
{
"section": "Page 8",
"content": [
{
"type": "text",
"text": "viii\n\u000fIntroduction to Languages and the Theory of Computation (third edi-\ntion), by John Martin, McGraw-Hill, 2003.\n\u000fIntroduction to Automata Theory, Languages, and Computation (third\nedition), by John Hopcroft, Rajeev Motwani, Je\u000brey Ullman, Addison\nWesley, 2007.\nPlease let us know if you \fnd errors, typos, simpler proofs, comments,\nomissions, or if you think that some parts of the book \\need improvement\"."
}
]
},
{
"section": "Page 9",
"content": [
{
"type": "text",
"text": "Chapter 1\nIntroduction\n1.1 Purpose and motivation\nThis course is on the Theory of Computation , which tries to answer the\nfollowing questions:\n\u000fWhat are the mathematical properties of computer hardware and soft-\nware?\n\u000fWhat is a computation and what is an algorithm ? Can we give rigorous\nmathematical de\fnitions of these notions?\n\u000fWhat are the limitations of computers? Can \\everything\" be com-\nputed? (As we will see, the answer to this question is \\no\".)\nPurpose of the Theory of Computation: Develop formal math-\nematical models of computation that re\rect real-world computers.\nThis \feld of research was started by mathematicians and logicians in the\n1930's, when they were trying to understand the meaning of a \\computation\".\nA central question asked was whether all mathematical problems can be\nsolved in a systematic way. The research that started in those days led to\ncomputers as we know them today.\nNowadays, the Theory of Computation can be divided into the follow-\ning three areas: Complexity Theory, Computability Theory, and Automata\nTheory."
}
]
},
{
"section": "Page 10",
"content": [
{
"type": "text",
"text": "2 Chapter 1. Introduction\n1.1.1 Complexity theory\nThe main question asked in this area is \\What makes some problems com-\nputationally hard and other problems easy?\"\nInformally, a problem is called \\easy\", if it is e\u000eciently solvable. Exam-\nples of \\easy\" problems are (i) sorting a sequence of, say, 1,000,000 numbers,\n(ii) searching for a name in a telephone directory, and (iii) computing the\nfastest way to drive from Ottawa to Miami. On the other hand, a problem is\ncalled \\hard\", if it cannot be solved e\u000eciently, or if we don't know whether\nit can be solved e\u000eciently. Examples of \\hard\" problems are (i) time table\nscheduling for all courses at Carleton, (ii) factoring a 300-digit integer into\nits prime factors, and (iii) computing a layout for chips in VLSI.\nCentral Question in Complexity Theory: Classify problems ac-\ncording to their degree of \\di\u000eculty\". Give a rigorous proof that\nproblems that seem to be \\hard\" are really \\hard\".\n1.1.2 Computability theory\nIn the 1930's, G odel, Turing, and Church discovered that some of the fun-\ndamental mathematical problems cannot be solved by a \\computer\". (This\nmay sound strange, because computers were invented only in the 1940's).\nAn example of such a problem is \\Is an arbitrary mathematical statement\ntrue or false?\" To attack such a problem, we need formal de\fnitions of the\nnotions of\n\u000fcomputer,\n\u000falgorithm, and\n\u000fcomputation.\nThe theoretical models that were proposed in order to understand solvable\nand unsolvable problems led to the development of real computers.\nCentral Question in Computability Theory: Classify problems\nas being solvable or unsolvable."
}
]
},
{
"section": "Page 11",
"content": [
{
"type": "text",
"text": "1.1. Purpose and motivation 3\n1.1.3 Automata theory\nAutomata Theory deals with de\fnitions and properties of di\u000berent types of\n\\computation models\". Examples of such models are:\n\u000fFinite Automata. These are used in text processing, compilers, and\nhardware design.\n\u000fContext-Free Grammars. These are used to de\fne programming lan-\nguages and in Arti\fcial Intelligence.\n\u000fTuring Machines. These form a simple abstract model of a \\real\"\ncomputer, such as your PC at home.\nCentral Question in Automata Theory: Do these models have\nthe same power, or can one model solve more problems than the\nother?\n1.1.4 This course\nIn this course, we will study the last two areas in reverse order: We will start\nwith Automata Theory, followed by Computability Theory. The \frst area,\nComplexity Theory, will be covered in COMP 3804.\nActually, before we start, we will review some mathematical proof tech-\nniques. As you may guess, this is a fairly theoretical course, with lots of\nde\fnitions, theorems, and proofs. You may guess this course is fun stu\u000b for\nmath lovers, but boring and irrelevant for others. You guessed it wrong , and\nhere are the reasons:\n1. This course is about the fundamental capabilities and limitations of\ncomputers. These topics form the core of computer science.\n2. It is about mathematical properties of computer hardware and software.\n3. This theory is very much relevant to practice, for example, in the design\nof new programming languages, compilers, string searching, pattern\nmatching, computer security, arti\fcial intelligence, etc., etc.\n4. This course helps you to learn problem solving skills. Theory teaches\nyou how to think, prove, argue, solve problems, express, and abstract."
}
]
},
{
"section": "Page 12",
"content": [
{
"type": "text",
"text": "4 Chapter 1. Introduction\n5. This theory simpli\fes the complex computers to an abstract and simple\nmathematical model, and helps you to understand them better.\n6. This course is about rigorously analyzing capabilities and limitations\nof systems.\nWhere does this course \ft in the Computer Science Curriculum at Car-\nleton University? It is a theory course that is the third part in the series\nCOMP 1805, COMP 2804, COMP 3803, COMP 3804, and COMP 4804.\nThis course also widens your understanding of computers and will in\ruence\nother courses including Compilers, Programming Languages, and Arti\fcial\nIntelligence.\n1.2 Mathematical preliminaries\nThroughout this course, we will assume that you know the following mathe-\nmatical concepts:\n1. A setis a collection of well-de\fned objects. Examples are (i) the set of\nall Dutch Olympic Gold Medallists, (ii) the set of all pubs in Ottawa,\nand (iii) the set of all even natural numbers.\n2. The set of natural numbers isN=f1;2;3;:::g.\n3. The set of integers isZ=f:::;\u00003;\u00002;\u00001;0;1;2;3;:::g.\n4. The set of rational numbers isQ=fm=n :m2Z;n2Z;n6= 0g.\n5. The set of real numbers is denoted by R.\n6. IfAandBare sets, then Ais asubset ofB, written as A\u0012B, if every\nelement of Ais also an element of B. For example, the set of even\nnatural numbers is a subset of the set of all natural numbers. Every\nsetAis a subset of itself, i.e., A\u0012A. The empty set is a subset of\nevery setA, i.e.,;\u0012A.\n7. IfBis a set, then the power setP(B) ofBis de\fned to be the set of\nall subsets of B:\nP(B) =fA:A\u0012Bg:\nObserve that;2P (B) andB2P(B)."
}
]
},
{
"section": "Page 13",
"content": [
{
"type": "text",
"text": "1.2. Mathematical preliminaries 5\n8. IfAandBare two sets, then\n(a) their union is de\fned as\nA[B=fx:x2Aorx2Bg;\n(b) their intersection is de\fned as\nA\\B=fx:x2Aandx2Bg;\n(c) their di\u000berence is de\fned as\nAnB=fx:x2Aandx62Bg;\n(d) the Cartesian product ofAandBis de\fned as\nA\u0002B=f(x;y) :x2Aandy2Bg;\n(e) the complement ofAis de\fned as\nA=fx:x62Ag:\n9. A binary relation on two sets AandBis a subset of A\u0002B.\n10. A functionffromAtoB, denoted by f:A!B, is a binary relation\nR, having the property that for each element a2A, there is exactly\none ordered pair in R, whose \frst component is a. We will also say\nthatf(a) =b, orfmapsatob, or the image of aunderfisb. The\nsetAis called the domain off, and the set\nfb2B: there is an a2Awithf(a) =bg\nis called the range off.\n11. A function f:A!Bisone-to-one (orinjective ), if for any two distinct\nelementsaanda0inA, we havef(a)6=f(a0). The function fisonto\n(orsurjective ), if for each element b2B, there exists an element a2A,\nsuch thatf(a) =b; in other words, the range of fis equal to the set\nB. A function fis abijection , iffis both injective and surjective.\n12. A binary relation R\u0012A\u0002Ais an equivalence relation , if it satis\fes\nthe following three conditions:"
}
]
},
{
"section": "Page 14",
"content": [
{
"type": "text",
"text": "6 Chapter 1. Introduction\n(a)Risre\rexive : For every element in a2A, we have (a;a)2R.\n(b)Rissymmetric : For allaandbinA, if (a;b)2R, then also\n(b;a)2R.\n(c)Ristransitive : For alla,b, andcinA, if (a;b)2Rand (b;c)2R,\nthen also (a;c)2R.\n13. A graphG= (V;E) is a pair consisting of a set V, whose elements are\ncalled vertices , and a setE, where each element of Eis a pair of distinct\nvertices. The elements of Eare called edges . The \fgure below shows\nsome well-known graphs: K5(the complete graph on \fve vertices), K3;3\n(the complete bipartite graph on 2 \u00023 = 6 vertices), and the Peterson\ngraph.\nK5K3,3\nPeterson graph\nThe degree of a vertexv, denoted by deg(v), is de\fned to be the number\nof edges that are incident on v.\nApath in a graph is a sequence of vertices that are connected by edges.\nA path is a cycle , if it starts and ends at the same vertex. A simple\npath is a path without any repeated vertices. A graph is connected , if\nthere is a path between every pair of vertices.\n14. In the context of strings, an alphabet is a \fnite set, whose elements\nare called symbols . Examples of alphabets are \u0006 = f0;1gand \u0006 =\nfa;b;c;:::;zg.\n15. A string over an alphabet \u0006 is a \fnite sequence of symbols, where each\nsymbol is an element of \u0006. The length of a stringw, denoted byjwj, is\nthe number of symbols contained in w. The empty string , denoted by"
}
]
},
{
"section": "Page 15",
"content": [
{
"type": "text",
"text": "1.3. Proof techniques 7\n\u000f, is the string having length zero. For example, if the alphabet \u0006 is\nequal tof0;1g, then 10, 1000, 0, 101, and \u000fare strings over \u0006, having\nlengths 2, 4, 1, 3, and 0, respectively.\n16. A language is a set of strings.\n17. The Boolean values are 1 and 0, that represent true and false, respec-\ntively. The basic Boolean operations include\n(a) negation (or NOT ), represented by :,\n(b) conjunction (or AND ), represented by ^,\n(c) disjunction (or OR), represented by _,\n(d) exclusive-or (or XOR ), represented by \b,\n(e) equivalence, represented by $or,,\n(f) implication, represented by !or).\nThe following table explains the meanings of these operations.\nNOT AND OR XOR equivalence implication\n:0 = 1 0^0 = 0 0_0 = 0 0\b0 = 0 0$0 = 1 0!0 = 1\n:1 = 0 0^1 = 0 0_1 = 1 0\b1 = 1 0$1 = 0 0!1 = 1\n1^0 = 0 1_0 = 1 1\b0 = 1 1$0 = 0 1!0 = 0\n1^1 = 1 1_1 = 1 1\b1 = 0 1$1 = 1 1!1 = 1\n1.3 Proof techniques\nIn mathematics, a theorem is a statement that is true. A proof is a sequence\nof mathematical statements that form an argument to show that a theorem is\ntrue. The statements in the proof of a theorem include axioms (assumptions\nabout the underlying mathematical structures), hypotheses of the theorem\nto be proved, and previously proved theorems. The main question is \\How\ndo we go about proving theorems?\" This question is similar to the question\nof how to solve a given problem. Of course, the answer is that \fnding proofs,\nor solving problems, is not easy; otherwise life would be dull! There is no\nspeci\fed way of coming up with a proof, but there are some generic strategies\nthat could be of help. In this section, we review some of these strategies,\nthat will be su\u000ecient for this course. The best way to get a feeling of how\nto come up with a proof is by solving a large number of problems. Here are"
}
]
},
{
"section": "Page 16",
"content": [
{
"type": "text",
"text": "8 Chapter 1. Introduction\nsome useful tips. (You may take a look at the book How to Solve It , by G.\nP\u0013 olya).\n1. Read and completely understand the statement of the theorem to be\nproved. Most often this is the hardest part.\n2. Sometimes, theorems contain theorems inside them. For example,\n\\PropertyAif and only if property B\", requires showing two state-\nments:\n(a) If property Ais true, then property Bis true (A)B).\n(b) If property Bis true, then property Ais true (B)A).\nAnother example is the theorem \\Set Aequals setB.\" To prove this,\nwe need to prove that A\u0012BandB\u0012A. That is, we need to show\nthat each element of set Ais in setB, and that each element of set B\nis in setA.\n3. Try to work out a few simple cases of the theorem just to get a grip on\nit (i.e., crack a few simple cases \frst).\n4. Try to write down the proof once you have it. This is to ensure the\ncorrectness of your proof. Often, mistakes are found at the time of\nwriting.\n5. Finding proofs takes time, we do not come prewired to produce proofs.\nBe patient, think, express and write clearly and try to be precise as\nmuch as possible.\nIn the next sections, we will go through some of the proof strategies.\n1.3.1 Direct proofs\nAs the name suggests, in a direct proof of a theorem, we just approach the\ntheorem directly.\nTheorem 1.3.1 Ifnis an odd positive integer, then n2is odd as well."
}
]
},
{
"section": "Page 17",
"content": [
{
"type": "text",
"text": "1.3. Proof techniques 9\nProof. An odd positive integer ncan be written as n= 2k+ 1, for some\nintegerk\u00150. Then\nn2= (2k+ 1)2= 4k2+ 4k+ 1 = 2(2k2+ 2k) + 1:\nSince 2(2k2+ 2k) is even, and \\even plus one is odd\", we can conclude that\nn2is odd.\nTheorem 1.3.2 LetG= (V;E)be a graph. Then the sum of the degrees of\nall vertices is an even integer, i.e.,\nX\nv2Vdeg(v)\nis even.\nProof. If you do not see the meaning of this statement, then \frst try it out\nfor a few graphs. The reason why the statement holds is very simple: Each\nedge contributes 2 to the summation (because an edge is incident on exactly\ntwo distinct vertices).\nActually, the proof above proves the following theorem.\nTheorem 1.3.3 LetG= (V;E)be a graph. Then the sum of the degrees of\nall vertices is equal to twice the number of edges, i.e.,\nX\nv2Vdeg(v) = 2jEj:\n1.3.2 Constructive proofs\nThis technique not only shows the existence of a certain object, it actually\ngives a method of creating it. Here is how a constructive proof looks like:\nTheorem 1.3.4 There exists an object with property P.\nProof. Here is the object: [ :::]\nAnd here is the proof that the object satis\fes property P: [:::]\nHere is an example of a constructive proof. A graph is called 3- regular , if\neach vertex has degree three."
}
]
},
{
"section": "Page 18",
"content": [
{
"type": "text",
"text": "10 Chapter 1. Introduction\nTheorem 1.3.5 For every even integer n\u00154, there exists a 3-regular graph\nwithnvertices.\nProof. De\fne\nV=f0;1;2;:::;n\u00001g;\nand\nE=ffi;i+1g: 0\u0014i\u0014n\u00002g[ffn\u00001;0gg[ffi;i+n=2g: 0\u0014i\u0014n=2\u00001g:\nThen the graph G= (V;E) is 3-regular.\nConvince yourself that this graph is indeed 3-regular. It may help to draw\nthe graph for, say, n= 8.\n1.3.3 Nonconstructive proofs\nIn a nonconstructive proof, we show that a certain object exists, without\nactually creating it. Here is an example of such a proof:\nTheorem 1.3.6 There exist irrational numbers xandysuch thatxyis ra-\ntional.\nProof. There are two possible cases.\nCase 1:p\n2p\n22Q.\nIn this case, we take x=y=p\n2. In Theorem 1.3.9 below, we will prove\nthatp\n2 is irrational.\nCase 2:p\n2p\n262Q.\nIn this case, we take x=p\n2p\n2andy=p\n2. Since\nxy=\u0012p\n2p\n2\u0013p\n2\n=p\n22= 2;\nthe claim in the theorem follows.\nObserve that this proof indeed proves the theorem, but it does not give\nan example of a pair of irrational numbers xandysuch thatxyis rational."
}
]
},
{
"section": "Page 19",
"content": [
{
"type": "text",
"text": "1.3. Proof techniques 11\n1.3.4 Proofs by contradiction\nThis is how a proof by contradiction looks like:\nTheorem 1.3.7 StatementSis true.\nProof. Assume that statement Sis false. Then, derive a contradiction (such\nas 1 + 1 = 3).\nIn other words, show that the statement \\ :S) false\" is true. This is\nsu\u000ecient, because the contrapositive of the statement \\ :S) false\" is the\nstatement \\ true)S \". The latter logical formula is equivalent to S, and\nthat is what we wanted to show.\nBelow, we give two examples of proofs by contradiction.\nTheorem 1.3.8 Letnbe a positive integer. If n2is even, then nis even.\nProof. We will prove the theorem by contradiction. So we assume that n2\nis even, but nis odd. Since nis odd, we know from Theorem 1.3.1 that n2\nis odd. This is a contradiction, because we assumed that n2is even.\nTheorem 1.3.9p\n2is irrational, i.e.,p\n2cannot be written as a fraction of\ntwo integers mandn.\nProof. We will prove the theorem by contradiction. So we assume thatp\n2\nis rational. Thenp\n2 can be written as a fraction of two integers,p\n2 =m=n,\nwherem\u00151 andn\u00151. We may assume that mandndo not share any\ncommon factors, i.e., the greatest common divisor of mandnis equal to\none; if this is not the case, then we can get rid of the common factors. By\nsquaringp\n2 =m=n, we get 2n2=m2. This implies that m2is even. Then,\nby Theorem 1.3.8, mis even, which means that we can write masm= 2k,\nfor some positive integer k. It follows that 2 n2=m2= 4k2, which implies\nthatn2= 2k2. Hence,n2is even. Again by Theorem 1.3.8, it follows that n\nis even.\nWe have shown that mandnare both even. But we know that mand\nnarenotboth even. Hence, we have a contradiction. Our assumption thatp\n2 is rational is wrong. Thus, we can conclude thatp\n2 is irrational.\nThere is a nice discussion of this proof in the book My Brain is Open:\nThe Mathematical Journeys of Paul Erd} os by B. Schechter."
}
]
},
{
"section": "Page 20",
"content": [
{
"type": "text",
"text": "12 Chapter 1. Introduction\n1.3.5 The pigeon hole principle\nThis is a simple principle with surprising consequences.\nPigeon Hole Principle: Ifn+ 1 or more objects are placed into n\nboxes, then there is at least one box containing two or more objects.\nIn other words, if AandBare two sets such that jAj>jBj, then\nthere is no one-to-one function from AtoB.\nTheorem 1.3.10 Letnbe a positive integer. Every sequence of n2+ 1dis-\ntinct real numbers contains a subsequence of length n+ 1 that is either in-\ncreasing or decreasing.\nProof. For example consider the sequence (20 ;10;9;7;11;2;21;1;20;31) of\n10 = 32+ 1 numbers. This sequence contains an increasing subsequence of\nlength 4 = 3 + 1, namely (10 ;11;21;31).\nThe proof of this theorem is by contradiction, and uses the pigeon hole\nprinciple.\nLet (a1;a2;:::;an2+1) be an arbitrary sequence of n2+ 1 distinct real\nnumbers. For each iwith 1\u0014i\u0014n2+ 1, let incidenote the length of\nthe longest increasing subsequence that starts at ai, and let decidenote the\nlength of the longest decreasing subsequence that starts at ai.\nUsing this notation, the claim in the theorem can be formulated as follows:\nThere is an index isuch that inci\u0015n+ 1 or deci\u0015n+ 1.\nWe will prove the claim by contradiction. So we assume that inci\u0014n\nand deci\u0014nfor alliwith 1\u0014i\u0014n2+ 1.\nConsider the set\nB=f(b;c) : 1\u0014b\u0014n;1\u0014c\u0014ng;\nand think of the elements of Bas being boxes. For each iwith 1\u0014i\u0014n2+1,\nthe pair ( inci;deci) is an element of B. So we have n2+1 elements ( inci;deci),\nwhich are placed in the n2boxes ofB. By the pigeon hole principle, there\nmust be a box that contains two (or more) elements. In other words, there\nexist two integers iandjsuch thati<j and\n(inci;deci) = ( incj;decj):\nRecall that the elements in the sequence are distinct. Hence, ai6=aj. We\nconsider two cases."
}
]
},
{
"section": "Page 21",
"content": [
{
"type": "text",
"text": "1.3. Proof techniques 13\nFirst assume that ai< aj. Then the length of the longest increasing\nsubsequence starting at aimust be at least 1+ incj, because we can append ai\nto the longest increasing subsequence starting at aj. Therefore, inci6=incj,\nwhich is a contradiction.\nThe second case is when ai>aj. Then the length of the longest decreasing\nsubsequence starting at aimust be at least 1+ decj, because we can append ai\nto the longest decreasing subsequence starting at aj. Therefore, deci6=decj,\nwhich is again a contradiction.\n1.3.6 Proofs by induction\nThis is a very powerful and important technique for proving theorems.\nFor each positive integer n, letP(n) be a mathematical statement that\ndepends on n. Assume we wish to prove that P(n) is true for all positive\nintegersn. A proof by induction of such a statement is carried out as follows:\nBasis: Prove that P(1) is true.\nInduction step: Prove that for all n\u00151, the following holds: If P(n) is\ntrue, thenP(n+ 1) is also true.\nIn the induction step, we choose an arbitrary integer n\u00151 and assume\nthatP(n) is true; this is called the induction hypothesis . Then we prove that\nP(n+ 1) is also true.\nTheorem 1.3.11 For all positive integers n, we have\n1 + 2 + 3 + :::+n=n(n+ 1)\n2:\nProof. We start with the basis of the induction. If n= 1, then the left-hand\nside is equal to 1, and so is the right-hand side. So the theorem is true for\nn= 1.\nFor the induction step, let n\u00151 and assume that the theorem is true for\nn, i.e., assume that\n1 + 2 + 3 + :::+n=n(n+ 1)\n2:"
}
]
},
{
"section": "Page 22",
"content": [
{
"type": "text",
"text": "14 Chapter 1. Introduction\nWe have to prove that the theorem is true for n+ 1, i.e., we have to prove\nthat\n1 + 2 + 3 + :::+ (n+ 1) =(n+ 1)(n+ 2)\n2:\nHere is the proof:\n1 + 2 + 3 + :::+ (n+ 1) = 1 + 2 + 3 + :::+n|{z}\n=n(n+1)\n2+(n+ 1)\n=n(n+ 1)\n2+ (n+ 1)\n=(n+ 1)(n+ 2)\n2:\nBy the way, here is an alternative proof of the theorem above: Let S=\n1 + 2 + 3 + :::+n. Then,\nS = 1 + 2 + 3 + : : : + ( n\u00002) + ( n\u00001) + n\nS = n + ( n\u00001) + ( n\u00002) + : : : + 3 + 2 + 1\n2S = ( n+ 1) + ( n+ 1) + ( n+ 1) + : : : + ( n+ 1) + ( n+ 1) + ( n+ 1)\nSince there are nterms on the right-hand side, we have 2 S=n(n+ 1). This\nimplies that S=n(n+ 1)=2.\nTheorem 1.3.12 For every positive integer n,a\u0000bis a factor of an\u0000bn.\nProof. A direct proof can be given by providing a factorization of an\u0000bn:\nan\u0000bn= (a\u0000b)(an\u00001+an\u00002b+an\u00003b2+:::+abn\u00002+bn\u00001):\nWe now prove the theorem by induction. For the basis, let n= 1. The claim\nin the theorem is \\ a\u0000bis a factor of a\u0000b\", which is obviously true.\nLetn\u00151 and assume that a\u0000bis a factor of an\u0000bn. We have to prove\nthata\u0000bis a factor of an+1\u0000bn+1. We have\nan+1\u0000bn+1=an+1\u0000anb+anb\u0000bn+1=an(a\u0000b) + (an\u0000bn)b:\nThe \frst term on the right-hand side is divisible by a\u0000b. By the induction\nhypothesis, the second term on the right-hand side is divisible by a\u0000bas\nwell. Therefore, the entire right-hand side is divisible by a\u0000b. Since the\nright-hand side is equal to an+1\u0000bn+1, it follows that a\u0000bis a factor of\nan+1\u0000bn+1.\nWe now give an alternative proof of Theorem 1.3.3:"
}
]
},
{
"section": "Page 23",
"content": [
{
"type": "text",
"text": "1.3. Proof techniques 15\nTheorem 1.3.13 LetG= (V;E)be a graph with medges. Then the sum\nof the degrees of all vertices is equal to twice the number of edges, i.e.,\nX\nv2Vdeg(v) = 2m:\nProof. The proof is by induction on the number mof edges. For the basis of\nthe induction, assume that m= 0. Then the graph Gdoes not contain any\nedges and, therefore,P\nv2Vdeg(v) = 0. Thus, the theorem is true if m= 0.\nLetm\u00150 and assume that the theorem is true for every graph with m\nedges. LetGbe an arbitrary graph with m+ 1 edges. We have to prove thatP\nv2Vdeg(v) = 2(m+ 1).\nLetfa;bgbe an arbitrary edge in G, and letG0be the graph obtained\nfromGby removing the edge fa;bg. SinceG0hasmedges, we know from\nthe induction hypothesis that the sum of the degrees of all vertices in G0is\nequal to 2m. Using this, we obtain\nX\nv2Gdeg(v) =X\nv2G0deg(v) + 2 = 2m+ 2 = 2(m+ 1):\n1.3.7 More examples of proofs\nRecall Theorem 1.3.5, which states that for every even integern\u00154, there\nexists a 3-regular graph with nvertices. The following theorem explains why\nwe stated this theorem for even values of n.\nTheorem 1.3.14 Letn\u00155be an odd integer. There is no 3-regular graph\nwithnvertices.\nProof. The proof is by contradiction. So we assume that there exists a\ngraphG= (V;E) withnvertices that is 3-regular. Let mbe the number of\nedges inG. Since deg(v) = 3 for every vertex, we have\nX\nv2Vdeg(v) = 3n:\nOn the other hand, by Theorem 1.3.3, we have\nX\nv2Vdeg(v) = 2m:"
}
]
},
{
"section": "Page 24",
"content": [
{
"type": "text",
"text": "16 Chapter 1. Introduction\nIt follows that 3 n= 2m, which can be rewritten as m= 3n=2. Sincemis an\ninteger, and since gcd(2;3) = 1,n=2 must be an integer. Hence, nis even,\nwhich is a contradiction.\nLetKnbe the complete graph onnvertices. This graph has a vertex set\nof sizen, and every pair of distinct vertices is joined by an edge.\nIfG= (V;E) is a graph with nvertices, then the complement GofGis\nthe graph with vertex set Vthat consists of those edges of Knthat are not\npresent inG.\nTheorem 1.3.15 Letn\u00152and letGbe a graph on nvertices. Then Gis\nconnected or Gis connected.\nProof. We prove the theorem by induction on the number nof vertices. For\nthe basis, assume that n= 2. There are two possibilities for the graph G:\n1.Gcontains one edge. In this case, Gis connected.\n2.Gdoes not contain an edge. In this case, the complement Gcontains\none edge and, therefore, Gis connected.\nSo forn= 2, the theorem is true.\nLetn\u00152 and assume that the theorem is true for every graph with n\nvertices. Let Gbe graph with n+ 1 vertices. We have to prove that Gis\nconnected or Gis connected. We consider three cases.\nCase 1: There is a vertex vwhose degree in Gis equal ton.\nSinceGhasn+1 vertices, vis connected by an edge to every other vertex\nofG. Therefore, Gis connected.\nCase 2: There is a vertex vwhose degree in Gis equal to 0.\nIn this case, the degree of vin the graph Gis equal ton. SinceGhasn+1\nvertices,vis connected by an edge to every other vertex of G. Therefore, G\nis connected.\nCase 3: For every vertex v, the degree of vinGis inf1;2;:::;n\u00001g.\nLetvbe an arbitrary vertex of G. LetG0be the graph obtained by\ndeleting from Gthe vertexv, together with all edges that are incident on v.\nSinceG0hasnvertices, we know from the induction hypothesis that G0is\nconnected or G0is connected."
}
]
},
{
"section": "Page 25",
"content": [
{
"type": "text",
"text": "1.3. Proof techniques 17\nLet us \frst assume that G0is connected. Then the graph Gis connected\nas well, because there is at least one edge in Gbetweenvand some vertex\nofG0.\nIfG0is not connected, then G0must be connected. Since we are in Case 3,\nwe know that the degree of vinGis in the setf1;2;:::;n\u00001g. It follows\nthat the degree of vin the graph Gis in this set as well. Hence, there is at\nleast one edge in Gbetweenvand some vertex in G0. This implies that Gis\nconnected.\nThe previous theorem can be rephrased as follows:\nTheorem 1.3.16 Letn\u00152and consider the complete graph Knonnver-\ntices. Color each edge of this graph as either red or blue. Let Rbe the graph\nconsisting of all the red edges, and let Bbe the graph consisting of all the\nblue edges. Then Ris connected or Bis connected.\nA graph is said to be planar , if it can be drawn (a better term is \\embed-\nded\") in the plane in such a way that no two edges intersect, except possibly\nat their endpoints. An embedding of a planar graph consists of vertices,\nedges, and faces. In the example below, there are 11 vertices, 18 edges, and\n9 faces (including the unbounded face).\nThe following theorem is known as Euler's theorem for planar graphs .\nApparently, this theorem was discovered by Euler around 1750. Legendre\ngave the \frst proof in 1794, see\nhttp://www.ics.uci.edu/~eppstein/junkyard/euler/\nTheorem 1.3.17 (Euler) Consider an embedding of a planar graph G. Let\nv,e, andfbe the number of vertices, edges, and faces (including the single"
}
]
},
{
"section": "Page 26",
"content": [
{
"type": "text",
"text": "18 Chapter 1. Introduction\nunbounded face) of this embedding, respectively. Moreover, let cbe the number\nof connected components of G. Then\nv\u0000e+f=c+ 1:\nProof. The proof is by induction on the number of edges of G. To be more\nprecise, we start with a graph having no edges, and prove that the theorem\nholds for this case. Then, we add the edges one by one, and show that the\nrelationv\u0000e+f=c+ 1 is maintained.\nSo we \frst assume that Ghas no edges, i.e., e= 0. Then the embedding\nconsists of a collection of vpoints. In this case, we have f= 1 andc=v.\nHence, the relation v\u0000e+f=c+ 1 holds.\nLete > 0 and assume that Euler's formula holds for a subgraph of G\nhavinge\u00001 edges. Letfu;vgbe an edge of Gthat is not in the subgraph,\nand add this edge to the subgraph. There are two cases depending on whether\nthis new edge joins two connected components or joins two vertices in the\nsame connected component.\nCase 1: The new edgefu;vgjoins two connected components.\nIn this case, the number of vertices and the number of faces do not change,\nthe number of connected components goes down by 1, and the number of\nedges increases by 1. It follows that the relation in the theorem is still valid.\nCase 2: The new edgefu;vgjoins two vertices in the same connected com-\nponent.\nIn this case, the number of vertices and the number of connected com-\nponents do not change, the number of edges increases by 1, and the number\nof faces increases by 1 (because the new edge splits one face into two faces).\nTherefore, the relation in the theorem is still valid.\nEuler's theorem is usually stated as follows:\nTheorem 1.3.18 (Euler) Consider an embedding of a connected planar\ngraphG. Letv,e, andfbe the number of vertices, edges, and faces (in-\ncluding the single unbounded face) of this embedding, respectively. Then\nv\u0000e+f= 2:\nIf you like surprising proofs of various mathematical results, you should\nread the book Proofs from THE BOOK by Aigner and Ziegler."
}
]
},
{
"section": "Page 27",
"content": [
{
"type": "text",
"text": "Exercises 19\nExercises\n1.1Use induction to prove that every integer n\u00152 can be written as a\nproduct of prime numbers.\n1.2For every prime number p, prove thatppis irrational.\n1.3Letnbe a positive integer that is not a perfect square. Prove thatpn\nis irrational.\n1.4Prove by induction that n4\u00004n2is divisible by 3, for all integers n\u00151.\n1.5Prove thatnX\ni=11\ni2<2\u00001=n;\nfor every integer n\u00152.\n1.6Prove that 9 divides n3+ (n+ 1)3+ (n+ 2)3, for every integer n\u00150.\n1.7Prove that in any set of n+ 1 numbers from f1;2;:::; 2ng, there are\nalways two numbers that are consecutive.\n1.8Prove that in any set of n+ 1 numbers from f1;2;:::; 2ng, there are\nalways two numbers such that one divides the other."
}
]
},
{
"section": "Page 28",
"content": [
{
"type": "text",
"text": "20 Chapter 1. Introduction"
}
]
},
{
"section": "Page 29",
"content": [
{
"type": "text",
"text": "Chapter 2\nFinite Automata and Regular\nLanguages\nIn this chapter, we introduce and analyze the class of languages that are\nknown as regular languages . Informally, these languages can be \\processed\"\nby computers having a very small amount of memory.\n2.1 An example: Controling a toll gate\nBefore we give a formal de\fnition of a \fnite automaton, we consider an\nexample in which such an automaton shows up in a natural way. We consider\nthe problem of designing a \\computer\" that controls a toll gate .\nWhen a car arrives at the toll gate, the gate is closed. The gate opens as\nsoon as the driver has payed 25 cents. We assume that we have only three\ncoin denominations: 5, 10, and 25 cents. We also assume that no excess\nchange is returned.\nAfter having arrived at the toll gate, the driver inserts a sequence of coins\ninto the machine. At any moment, the machine has to decide whether or not\nto open the gate, i.e., whether or not the driver has paid 25 cents (or more).\nIn order to decide this, the machine is in one of the following six states , at\nany moment during the process:\n\u000fThe machine is in state q0, if it has not collected any money yet.\n\u000fThe machine is in state q1, if it has collected exactly 5 cents.\n\u000fThe machine is in state q2, if it has collected exactly 10 cents."
}
]
},
{
"section": "Page 30",
"content": [
{
"type": "text",
"text": "22 Chapter 2. Finite Automata and Regular Languages\n\u000fThe machine is in state q3, if it has collected exactly 15 cents.\n\u000fThe machine is in state q4, if it has collected exactly 20 cents.\n\u000fThe machine is in state q5, if it has collected 25 cents or more.\nInitially (when a car arrives at the toll gate), the machine is in state q0.\nAssume, for example, that the driver presents the sequence (10,5,5,10) of\ncoins.\n\u000fAfter receiving the \frst 10 cents coin, the machine switches from state\nq0to stateq2.\n\u000fAfter receiving the \frst 5 cents coin, the machine switches from state\nq2to stateq3.\n\u000fAfter receiving the second 5 cents coin, the machine switches from state\nq3to stateq4.\n\u000fAfter receiving the second 10 cents coin, the machine switches from\nstateq4to stateq5. At this moment, the gate opens. (Remember that\nno change is given.)\nThe \fgure below represents the behavior of the machine for all possible\nsequences of coins. State q5is represented by two circles, because it is a\nspecial state: As soon as the machine reaches this state, the gate opens.\nq0 q1 q2 q3 q4 q55 5 5 510 10 10\n25\n252510,255,10,25\n5,10\n25start\nObserve that the machine (or computer) only has to remember which\nstate it is in at any given time. Thus, it needs only a very small amount\nof memory: It has to be able to distinguish between any one of six possible\ncases and, therefore, it only needs a memory of dlog 6e= 3 bits."
}
]
},
{
"section": "Page 31",
"content": [
{
"type": "text",
"text": "2.2. Deterministic \fnite automata 23\n2.2 Deterministic \fnite automata\nLet us look at another example. Consider the following state diagram :\nq1 q2 q30\n0\n11\n0,1\nWe say that q1is the start state and q2is an accept state. Consider the\ninput string 1101. This string is processed in the following way:\n\u000fInitially, the machine is in the start state q1.\n\u000fAfter having read the \frst 1, the machine switches from state q1to\nstateq2.\n\u000fAfter having read the second 1, the machine switches from state q2to\nstateq2. (So actually, it does not switch.)\n\u000fAfter having read the \frst 0, the machine switches from state q2to\nstateq3.\n\u000fAfter having read the third 1, the machine switches from state q3to\nstateq2.\nAfter the entire string 1101 has been processed, the machine is in state q2,\nwhich is an accept state. We say that the string 1101 is accepted by the\nmachine.\nConsider now the input string 0101010. After having read this string\nfrom left to right (starting in the start state q1), the machine is in state q3.\nSinceq3is not an accept state, we say that the machine rejects the string\n0101010.\nWe hope you are able to see that this machine accepts every binary string\nthat ends with a 1. In fact, the machine accepts more strings:\n\u000fEvery binary string having the property that there are an even number\nof 0s following the rightmost 1, is accepted by this machine."
}
]
},
{
"section": "Page 32",
"content": [
{
"type": "text",
"text": "24 Chapter 2. Finite Automata and Regular Languages\n\u000fEvery other binary string is rejected by the machine. Observe that each\nsuch string is either empty, consists of 0s only, or has an odd number\nof 0s following the rightmost 1.\nWe now come to the formal de\fnition of a \fnite automaton:\nDe\fnition 2.2.1 A\fnite automaton is a 5-tuple M= (Q;\u0006;\u000e;q;F ), where\n1.Qis a \fnite set, whose elements are called states ,\n2. \u0006 is a \fnite set, called the alphabet ; the elements of \u0006 are called symbols ,\n3.\u000e:Q\u0002\u0006!Qis a function, called the transition function ,\n4.qis an element of Q; it is called the start state ,\n5.Fis a subset of Q; the elements of Fare called accept states .\nYou can think of the transition function \u000eas being the \\program\" of the\n\fnite automaton M= (Q;\u0006;\u000e;q;F ). This function tells us what Mcan do\nin \\one step\":\n\u000fLetrbe a state of Qand letabe a symbol of the alphabet \u0006. If\nthe \fnite automaton Mis in staterand reads the symbol a, then it\nswitches from state rto state\u000e(r;a). (In fact, \u000e(r;a) may be equal to\nr.)\nThe \\computer\" that we designed in the toll gate example in Section 2.1\nis a \fnite automaton. For this example, we have Q=fq0;q1;q2;q3;q4;q5g,\n\u0006 =f5;10;25g, the start state is q0,F=fq5g, and\u000eis given by the following\ntable:\n5 10 25\nq0q1q2q5\nq1q2q3q5\nq2q3q4q5\nq3q4q5q5\nq4q5q5q5\nq5q5q5q5\nThe example given in the beginning of this section is also a \fnite automa-\nton. For this example, we have Q=fq1;q2;q3g, \u0006 =f0;1g, the start state\nisq1,F=fq2g, and\u000eis given by the following table:"
}
]
},
{
"section": "Page 33",
"content": [
{
"type": "text",
"text": "2.2. Deterministic \fnite automata 25\n0 1\nq1q1q2\nq2q3q2\nq3q2q2\nLet us denote this \fnite automaton by M. The language of M, denoted\nbyL(M), is the set of all binary strings that are accepted by M. As we have\nseen before, we have\nL(M) =fw:wcontains at least one 1 and ends with an even number of 0s g:\nWe now give a formal de\fnition of the language of a \fnite automaton:\nDe\fnition 2.2.2 LetM= (Q;\u0006;\u000e;q;F ) be a \fnite automaton and let w=\nw1w2:::wnbe a string over \u0006. De\fne the sequence r0;r1;:::;rnof states, in\nthe following way:\n\u000fr0=q,\n\u000fri+1=\u000e(ri;wi+1), fori= 0;1;:::;n\u00001.\n1. Ifrn2F, then we say that Macceptsw.\n2. Ifrn62F, then we say that Mrejectsw.\nIn this de\fnition, wmay be the empty string , which we denote by \u000f, and\nwhose length is zero; thus in the de\fnition above, n= 0. In this case, the\nsequencer0;r1;:::;rnof states has length one; it consists of just the state\nr0=q. The empty string is accepted by Mif and only if the start state q\nbelongs toF.\nDe\fnition 2.2.3 LetM= (Q;\u0006;\u000e;q;F ) be a \fnite automaton. The lan-\nguageL(M)accepted byMis de\fned to be the set of all strings that are\naccepted by M:\nL(M) =fw:wis a string over \u0006 and Macceptswg:\nDe\fnition 2.2.4 A language Ais called regular , if there exists a \fnite au-\ntomatonMsuch thatA=L(M)."
}
]
},
{
"section": "Page 34",
"content": [
{
"type": "text",
"text": "26 Chapter 2. Finite Automata and Regular Languages\nWe \fnish this section by presenting an equivalent way of de\fning the\nlanguage accepted by a \fnite automaton. Let M= (Q;\u0006;\u000e;q;F ) be a \fnite\nautomaton. The transition function \u000e:Q\u0002\u0006!Qtells us that, when M\nis in stater2Qand reads symbol a2\u0006, it switches from state rto state\n\u000e(r;a). Let \u0006\u0003denote the set of all strings over the alphabet \u0006. (\u0006\u0003includes\nthe empty string \u000f.) We extend the function \u000eto a function\n\u000e:Q\u0002\u0006\u0003!Q;\nthat is de\fned as follows. For any state r2Qand for any string wover the\nalphabet \u0006,\n\u000e(r;w) =\u001ar ifw=\u000f,\n\u000e(\u000e(r;v);a) ifw=va, wherevis a string and a2\u0006.\nWhat is the meaning of this function \u000e? Letrbe a state of Qand letwbe\na string over the alphabet \u0006. Then\n\u000f\u000e(r;w) is the state that Mreaches, when it starts in state r, reads the\nstringwfrom left to right, and uses \u000eto switch from state to state.\nUsing this notation, we have\nL(M) =fw:wis a string over \u0006 and \u000e(q;w)2Fg:\n2.2.1 A \frst example of a \fnite automaton\nLet\nA=fw:wis a binary string containing an odd number of 1s g:\nWe claim that this language Ais regular. In order to prove this, we have to\nconstruct a \fnite automaton Msuch thatA=L(M).\nHow to construct M? Here is a \frst idea: The \fnite automaton reads the\ninput string wfrom left to right and keeps track of the number of 1s it has\nseen. After having read the entire string w, it checks whether this number\nis odd (in which case wis accepted) or even (in which case wis rejected).\nUsing this approach, the \fnite automaton needs a state for every integer\ni\u00150, indicating that the number of 1s read so far is equal to i. Hence,\nto design a \fnite automaton that follows this approach, we need an in\fnite"
}
]
},
{
"section": "Page 35",
"content": [
{
"type": "text",
"text": "2.2. Deterministic \fnite automata 27\nnumber of states. But, the de\fnition of \fnite automaton requires the number\nof states to be \fnite .\nA better, and correct approach, is to keep track of whether the number\nof 1s read so far is even or odd. This leads to the following \fnite automaton:\n\u000fThe set of states is Q=fqe;qog. If the \fnite automaton is in state qe,\nthen it has read an even number of 1s; if it is in state qo, then it has\nread an odd number of 1s.\n\u000fThe alphabet is \u0006 = f0;1g.\n\u000fThe start state is qe, because at the start, the number of 1s read by the\nautomaton is equal to 0, and 0 is even.\n\u000fThe setFof accept states is F=fqog.\n\u000fThe transition function \u000eis given by the following table:\n0 1\nqeqeqo\nqoqoqe\nThis \fnite automaton M= (Q;\u0006;\u000e;qe;F) can also be described by its state\ndiagram , which is given in the \fgure below. The arrow that comes \\out of\nthe blue\" and enters the state qe, indicates that qeis the start state. The\nstate depicted with double circles indicates the accept state.\nqe qo0\n01\n1\nWe have constructed a \fnite automaton Mthat accepts the language A.\nTherefore,Ais a regular language."
}
]
},
{
"section": "Page 36",
"content": [
{
"type": "text",
"text": "28 Chapter 2. Finite Automata and Regular Languages\n2.2.2 A second example of a \fnite automaton\nDe\fne the language Aas\nA=fw:wis a binary string containing 101 as a substring g:\nAgain, we claim that Ais a regular language. In other words, we claim that\nthere exists a \fnite automaton Mthat accepts A, i.e.,A=L(M).\nThe \fnite automaton Mwill do the following, when reading an input\nstring from left to right:\n\u000fIt skips over all 0s, and stays in the start state.\n\u000fAt the \frst 1, it switches to the state \\maybe the next two symbols are\n01\".\n{If the next symbol is 1, then it stays in the state \\maybe the next\ntwo symbols are 01\".\n{On the other hand, if the next symbol is 0, then it switches to the\nstate \\maybe the next symbol is 1\".\n\u0003If the next symbol is indeed 1, then it switches to the accept\nstate (but keeps on reading until the end of the string).\n\u0003On the other hand, if the next symbol is 0, then it switches\nto the start state, and skips 0s until it reads 1 again.\nBy de\fning the following four states, this process will become clear:\n\u000fq1:Mis in this state if the last symbol read was 1, but the substring\n101 has not been read.\n\u000fq10:Mis in this state if the last two symbols read were 10, but the\nsubstring 101 has not been read.\n\u000fq101:Mis in this state if the substring 101 has been read in the input\nstring.\n\u000fq: In all other cases, Mis in this state.\nHere is the formal description of the \fnite automaton that accepts the\nlanguageA:\n\u000fQ=fq;q1;q10;q101g,"
}
]
},
{
"section": "Page 37",
"content": [
{
"type": "text",
"text": "2.2. Deterministic \fnite automata 29\n\u000f\u0006 =f0;1g,\n\u000fthe start state is q,\n\u000fthe setFof accept states is equal to F=fq101g, and\n\u000fthe transition function \u000eis given by the following table:\n0 1\nqq q 1\nq1q10q1\nq10q q 101\nq101q101q101\nThe \fgure below gives the state diagram of the \fnite automaton M=\n(Q;\u0006;\u000e;q;F ).\nq q 1\nq10 q1010\n11\n00\n10,1\nThis \fnite automaton accepts the language Aconsisting of all binary\nstrings that contain the substring 101. As an exercise, how would you obtain\na \fnite automaton that accepts the complement of A, i.e., the language\nconsisting of all binary strings that do not contain the substring 101?\n2.2.3 A third example of a \fnite automaton\nThe \fnite automata we have seen so far have exactly one accept state. In\nthis section, we will see an example of a \fnite automaton having more accept\nstates."
}
]
},
{
"section": "Page 38",
"content": [
{
"type": "text",
"text": "30 Chapter 2. Finite Automata and Regular Languages\nLetAbe the language\nA=fw2f0;1g\u0003:whas a 1 in the third position from the right g;\nwheref0;1g\u0003is the set of all binary strings, including the empty string \u000f. We\nclaim thatAis a regular language. To prove this, we have to construct a \fnite\nautomaton Msuch thatA=L(M). At \frst sight, it seems di\u000ecult (or even\nimpossible?) to construct such a \fnite automaton: How does the automaton\n\\know\" that it has reached the third symbol from the right? It is, however,\npossible to construct such an automaton. The main idea is to remember the\nlast three symbols that have been read. Thus, the \fnite automaton has eight\nstatesqijk, wherei,j, andkrange over the two elements of f0;1g. If the\nautomaton is in state qijk, then the following hold:\n\u000fIfMhas read at least three symbols, then the three most recently read\nsymbols are ijk.\n\u000fIfMhas read only two symbols, then these two symbols are jk; more-\nover,i= 0.\n\u000fIfMhas read only one symbol, then this symbol is k; moreover, i=\nj= 0.\n\u000fIfMhas not read any symbol, then i=j=k= 0.\nThe start state is q000and the set of accept states is fq100;q110;q101;q111g.\nThe transition function of Mis given by the following state diagram.\nq000 q100 q010 q110\nq001 q101 q011 q1110\n10\n10\n10\n1\n0\n110\n10\n10"
}
]
},
{
"section": "Page 39",
"content": [
{
"type": "text",
"text": "2.3. Regular operations 31\n2.3 Regular operations\nIn this section, we de\fne three operations on languages. Later, we will answer\nthe question whether the set of all regular languages is closed under these\noperations. Let AandBbe two languages over the same alphabet.\n1. The union ofAandBis de\fned as\nA[B=fw:w2Aorw2Bg:\n2. The concatenation ofAandBis de\fned as\nAB=fww0:w2Aandw02Bg:\nIn words,ABis the set of all strings obtained by taking an arbitrary\nstringwinAand an arbitrary string w0inB, and gluing them together\n(such thatwis to the left of w0).\n3. The star ofAis de\fned as\nA\u0003=fu1u2:::uk:k\u00150 andui2Afor alli= 1;2;:::;kg:\nIn words,A\u0003is obtained by taking any \fnite number of strings in A, and\ngluing them together. Observe that k= 0 is allowed; this corresponds\nto the empty string \u000f. Thus,\u000f2A\u0003.\nTo give an example, let A=f0;01gandB=f1;10g. Then\nA[B=f0;01;1;10g;\nAB=f01;010;011;0110g;\nand\nA\u0003=f\u000f;0;01;00;001;010;0101;000;0001;00101;:::g:\nAs another example, if \u0006 = f0;1g, then \u0006\u0003is the set of all binary strings\n(including the empty string). Observe that a string always has a \fnite length.\nBefore we proceed, we give an alternative (and equivalent) de\fnition of\nthe star of the language A: De\fne\nA0=f\u000fg"
}
]
},
{
"section": "Page 40",
"content": [
{
"type": "text",
"text": "32 Chapter 2. Finite Automata and Regular Languages\nand, fork\u00151,\nAk=AAk\u00001;\ni.e.,Akis the concatenation of the two languages AandAk\u00001. Then we have\nA\u0003=1[\nk=0Ak:\nTheorem 2.3.1 The set of regular languages is closed under the union op-\neration, i.e., if AandBare regular languages over the same alphabet \u0006, then\nA[Bis also a regular language.\nProof. SinceAandBare regular languages, there are \fnite automata\nM1= (Q1;\u0006;\u000e1;q1;F1) andM2= (Q2;\u0006;\u000e2;q2;F2) that accept AandB,\nrespectively. In order to prove that A[Bis regular, we have to construct a\n\fnite automaton Mthat accepts A[B. In other words, Mmust have the\nproperty that for every string w2\u0006\u0003,\nMacceptsw,M1acceptsworM2acceptsw.\nAs a \frst idea, we may think that Mcould do the following:\n\u000fStarting in the start state q1ofM1,M\\runs\"M1onw.\n\u000fIf, after having read w,M1is in a state of F1, thenw2A, thus\nw2A[Band, therefore, Macceptsw.\n\u000fOn the other hand, if, after having read w,M1is in a state that is not\ninF1, thenw62AandM\\runs\"M2onw, starting in the start state\nq2ofM2. If, after having read w,M2is in a state of F2, then we know\nthatw2B, thusw2A[Band, therefore, Macceptsw. Otherwise,\nwe know that w62A[B, andMrejectsw.\nThis idea does not work, because the \fnite automaton Mcan read the input\nstringwonly once . The correct approach is to runM1andM2simulta-\nneously . We de\fne the set Qof states of Mto be the Cartesian product\nQ1\u0002Q2. IfMis in state ( r1;r2), this means that\n\u000fifM1would have read the input string up to this point, then it would\nbe in state r1, and"
}
]
},
{
"section": "Page 41",
"content": [
{
"type": "text",
"text": "2.3. Regular operations 33\n\u000fifM2would have read the input string up to this point, then it would\nbe in state r2.\nThis leads to the \fnite automaton M= (Q;\u0006;\u000e;q;F ), where\n\u000fQ=Q1\u0002Q2=f(r1;r2) :r12Q1andr22Q2g. Observe that\njQj=jQ1j\u0002jQ2j, which is \fnite.\n\u000f\u0006 is the alphabet of AandB(recall that we assume that AandBare\nlanguages over the same alphabet).\n\u000fThe start state qofMis equal toq= (q1;q2).\n\u000fThe setFof accept states of Mis given by\nF=f(r1;r2) :r12F1orr22F2g= (F1\u0002Q2)[(Q1\u0002F2):\n\u000fThe transition function \u000e:Q\u0002\u0006!Qis given by\n\u000e((r1;r2);a) = (\u000e1(r1;a);\u000e2(r2;a));\nfor allr12Q1,r22Q2, anda2\u0006.\nTo \fnish the proof, we have to show that this \fnite automaton Mindeed\naccepts the language A[B. Intuitively, this should be clear from the discus-\nsion above. The easiest way to give a formal proof is by using the extended\ntransition functions \u000e1and\u000e2. (The extended transition function has been\nde\fned after De\fnition 2.2.4.) Here we go: Recall that we have to prove that\nMacceptsw,M1acceptsworM2acceptsw,\ni.e,\nMacceptsw,\u000e1(q1;w)2F1or\u000e2(q2;w)2F2.\nIn terms of the extended transition function \u000eof the transition function \u000eof\nM, this becomes\n\u000e((q1;q2);w)2F,\u000e1(q1;w)2F1or\u000e2(q2;w)2F2. (2.1)\nBy applying the de\fnition of the extended transition function, as given after\nDe\fnition 2.2.4, to \u000e, it can be seen that\n\u000e((q1;q2);w) = (\u000e1(q1;w);\u000e2(q2;w)):"
}
]
},
{
"section": "Page 42",
"content": [
{
"type": "text",
"text": "34 Chapter 2. Finite Automata and Regular Languages\nThe latter equality implies that (2.1) is true and, therefore, Mindeed accepts\nthe language A[B.\nWhat about the closure of the regular languages under the concatenation\nand star operations? It turns out that the regular languages are closed under\nthese operations. But how do we prove this?\nLetAandBbe two regular languages, and let M1andM2be \fnite\nautomata that accept AandB, respectively. How do we construct a \fnite\nautomaton Mthat accepts the concatenation AB? Given an input string\nu,Mhas to decide whether or not ucan be broken into two strings wand\nw0(i.e., write uasu=ww0), such that w2Aandw02B. In words, M\nhas to decide whether or not ucan be broken into two substrings, such that\nthe \frst substring is accepted by M1and the second substring is accepted by\nM2. The di\u000eculty is caused by the fact that Mhas to make this decision by\nscanning the string uonly once. If u2AB, thenMhas to decide, during\nthis single scan , where to break uinto two substrings. Similarly, if u62AB,\nthenMhas to decide, during this single scan , thatucannot be broken into\ntwo substrings such that the \frst substring is in Aand the second substring\nis inB.\nIt seems to be even more di\u000ecult to prove that A\u0003is a regular language,\nifAitself is regular. In order to prove this, we need a \fnite automaton that,\nwhen given an arbitrary input string u, decides whether or not ucan be\nbroken into substrings such that each substring is in A. The problem is that,\nifu2A\u0003, the \fnite automaton has to determine into how many substrings,\nand where, the string uhas to be broken; it has to do this during one single\nscan of the string u.\nAs we mentioned already, if AandBare regular languages, then both\nABandA\u0003are also regular. In order to prove these claims, we will introduce\na more general type of \fnite automaton.\nThe \fnite automata that we have seen so far are deterministic . This\nmeans the following:\n\u000fIf the \fnite automaton Mis in staterand if it reads the symbol a,\nthenMswitches from state rto the uniquely de\fned state \u000e(r;a).\nFrom now on, we will call such a \fnite automaton a deterministic \fnite\nautomaton (DFA) . In the next section, we will de\fne the notion of a nonde-\nterministic \fnite automaton (NFA) . For such an automaton, there are zero\nor more possible states to switch to. At \frst sight, nondeterministic \fnite"
}
]
},
{
"section": "Page 43",
"content": [
{
"type": "text",
"text": "2.4. Nondeterministic \fnite automata 35\nautomata seem to be more powerful than their deterministic counterparts.\nWe will prove, however, that DFAs have the same power as NFAs. As we will\nsee, using this fact, it will be easy to prove that the class of regular languages\nis closed under the concatenation and star operations.\n2.4 Nondeterministic \fnite automata\nWe start by giving three examples of nondeterministic \fnite automata. These\nexamples will show the di\u000berence between this type of automata and the\ndeterministic versions that we have considered in the previous sections. After\nthese examples, we will give a formal de\fnition of a nondeterministic \fnite\nautomaton.\n2.4.1 A \frst example\nConsider the following state diagram:\nq1 q2 q3 q40,1\n1 0,ε 10,1\nYou will notice three di\u000berences with the \fnite automata that we have\nseen until now. First, if the automaton is in state q1and reads the symbol 1,\nthen it has two options: Either it stays in state q1, or it switches to state q2.\nSecond, if the automaton is in state q2, then it can switch to state q3without\nreading a symbol ; this is indicated by the edge having the empty string \u000fas\nlabel. Third, if the automaton is in state q3and reads the symbol 0, then it\ncannot continue.\nLet us see what this automaton can do when it gets the string 010110 as\ninput. Initially, the automaton is in the start state q1.\n\u000fSince the \frst symbol in the input string is 0, the automaton stays in\nstateq1after having read this symbol.\n\u000fThe second symbol is 1, and the automaton can either stay in state q1\nor switch to state q2."
}
]
},
{
"section": "Page 44",
"content": [
{
"type": "text",
"text": "36 Chapter 2. Finite Automata and Regular Languages\n{If the automaton stays in state q1, then it is still in this state after\nhaving read the third symbol.\n{If the automaton switches to state q2, then it again has two op-\ntions:\n\u0003Either read the third symbol in the input string, which is 0,\nand switch to state q3,\n\u0003or switch to state q3, without reading the third symbol.\nIf we continue in this way, then we see that, for the input string 010110,\nthere are seven possible computations. All these computations are given in\nthe \fgure below.\nq1 q101q1 q101\n1q1\nq21\n1q1\nq2q10\n0\nεq3\nq3 hang\nhang\nε\nq3 q41 0q4\n1\nq20\nεq3\nq3 hang1q41q4 q40\nConsider the lowest path in the \fgure above:\n\u000fWhen reading the \frst symbol, the automaton stays in state q1.\n\u000fWhen reading the second symbol, the automaton switches to state q2.\n\u000fThe automaton does not read the third symbol (equivalently, it \\reads\"\nthe empty string \u000f), and switches to state q3. At this moment, the"
}
]
},
{
"section": "Page 45",
"content": [
{
"type": "text",
"text": "2.4. Nondeterministic \fnite automata 37\nautomaton cannot continue: The third symbol is 0, but there is no\nedge leaving q3that is labeled 0, and there is no edge leaving q3that\nis labeled\u000f. Therefore, the computation hangs at this point.\nFrom the \fgure, you can see that, out of the seven possible computations,\nexactly two end in the accept state q4(after the entire input string 010110 has\nbeen read). We say that the automaton accepts the string 010110, because\nthere is at least one computation that ends in the accept state.\nNow consider the input string 010. In this case, there are three possible\ncomputations:\n1.q10!q11!q10!q1\n2.q10!q11!q20!q3\n3.q10!q11!q2\u000f!q3!hang\nNone of these computations ends in the accept state (after the entire input\nstring 010 has been read). Therefore, we say that the automaton rejects the\ninput string 010.\nThe state diagram given above is an example of a nondeterministic \fnite\nautomaton (NFA). Informally, an NFA accepts a string, if there exists at least\none path in the state diagram that (i) starts in the start state, (ii) does not\nhang before the entire string has been read, and (iii) ends in an accept state.\nA string for which (i), (ii), and (iii) does not hold is rejected by the NFA.\nThe NFA given above accepts all binary strings that contain 101 or 11 as\na substring. All other binary strings are rejected.\n2.4.2 A second example\nLetAbe the language\nA=fw2f0;1g\u0003:whas a 1 in the third position from the right g:\nThe following state diagram de\fnes an NFA that accepts all strings that are\ninA, and rejects all strings that are not in A.\nq1 q2 q3 q40,1\n1 0,1 0 ,1"
}
]
},
{
"section": "Page 46",
"content": [
{
"type": "text",
"text": "38 Chapter 2. Finite Automata and Regular Languages\nThis NFA does the following. If it is in the start state q1and reads the\nsymbol 1, then it either stays in state q1or it \\guesses\" that this symbol\nis the third symbol from the right in the input string. In the latter case,\nthe NFA switches to state q2, and then it \\veri\fes\" that there are indeed\nexactly two remaining symbols in the input string. If there are more than\ntwo remaining symbols, then the NFA hangs (in state q4) after having read\nthe next two symbols.\nObserve how this guessing mechanism is used: The automaton can only\nread the input string once, from left to right. Hence, it does not know when\nit reaches the third symbol from the right. When the NFA reads a 1, it can\nguess that this is the third symbol from the right; after having made this\nguess, it veri\fes whether or not the guess was correct.\nIn Section 2.2.3, we have seen a DFA for the same language A. Observe\nthat the NFA has a much simpler structure than the DFA.\n2.4.3 A third example\nConsider the following state diagram, which de\fnes an NFA whose alphabet\nisf0g.\nε\nε0\n0\n0\n00\nThis NFA accepts the language\nA=f0k:k\u00110 mod 2 or k\u00110 mod 3g;\nwhere 0kis the string consisting of kmany 0s. (If k= 0, then 0k=\u000f.)\nObserve that Ais the union of the two languages\nA1=f0k:k\u00110 mod 2g"
}
]
},
{
"section": "Page 47",
"content": [
{
"type": "text",
"text": "2.4. Nondeterministic \fnite automata 39\nand\nA2=f0k:k\u00110 mod 3g:\nThe NFA basically consists of two DFAs: one of these accepts A1, whereas the\nother accepts A2. Given an input string w, the NFA has to decide whether\nor notw2A, which is equivalent to deciding whether or not w2A1or\nw2A2. The NFA makes this decision in the following way: At the start, it\n\\guesses\" whether (i) it is going to check whether or not w2A1(i.e., the\nlength ofwis even), or (ii) it is going to check whether or not w2A2(i.e.,\nthe length of wis a multiple of 3). After having made the guess, it veri\fes\nwhether or not the guess was correct. If w2A, then there exists a way of\nmaking the correct guess and verifying that wis indeed an element of A(by\nending in an accept state). If w62A, then no matter which guess is made,\nthe NFA will never end in an accept state.\n2.4.4 De\fnition of nondeterministic \fnite automaton\nThe previous examples give you an idea what nondeterministic \fnite au-\ntomata are and how they work. In this section, we give a formal de\fnition\nof these automata.\nFor any alphabet \u0006, we de\fne \u0006 \u000fto be the set\n\u0006\u000f= \u0006[f\u000fg:\nRecall the notion of a power set : For any set Q, the power set of Q, denoted\nbyP(Q), is the set of all subsets of Q, i.e.,\nP(Q) =fR:R\u0012Qg:\nDe\fnition 2.4.1 Anondeterministic \fnite automaton (NFA) is a 5-tuple\nM= (Q;\u0006;\u000e;q;F ), where\n1.Qis a \fnite set, whose elements are called states ,\n2. \u0006 is a \fnite set, called the alphabet ; the elements of \u0006 are called symbols ,\n3.\u000e:Q\u0002\u0006\u000f!P (Q) is a function, called the transition function ,\n4.qis an element of Q; it is called the start state ,\n5.Fis a subset of Q; the elements of Fare called accept states ."
}
]
},
{
"section": "Page 48",
"content": [
{
"type": "text",
"text": "40 Chapter 2. Finite Automata and Regular Languages\nAs for DFAs, the transition function \u000ecan be thought of as the \\program\"\nof the \fnite automaton M= (Q;\u0006;\u000e;q;F ):\n\u000fLetr2Q, and leta2\u0006\u000f. Then\u000e(r;a) is a (possibly empty) subset of\nQ. If the NFA Mis in stater, and if it reads a(whereamay be the\nempty string \u000f), thenMcan switch from state rtoanystate in\u000e(r;a).\nIf\u000e(r;a) =;, thenMcannot continue and the computation hangs.\nThe example given in Section 2.4.1 is an NFA, where Q=fq1;q2;q3;q4g,\n\u0006 =f0;1g, the start state is q1, the set of accept states is F=fq4g, and the\ntransition function \u000eis given by the following table:\n0 1 \u000f\nq1fq1g fq1;q2g ;\nq2fq3g ; f q3g\nq3; fq4g ;\nq4fq4g fq4g ;\nDe\fnition 2.4.2 LetM= (Q;\u0006;\u000e;q;F ) be an NFA, and let w2\u0006\u0003. We\nsay thatMacceptsw, if1\n\u000fw=\u000fand the start state qis an accept state, or\n\u000fthere exists an integer m\u00151, such that wcan be written as w=\ny1y2:::ym, whereyi2\u0006\u000ffor alliwith 1\u0014i\u0014m, and there exists a\nsequencer0;r1;:::;rmof states in Q, such that\n{r0=q,\n{ri+12\u000e(ri;yi+1), fori= 0;1;:::;m\u00001, and\n{rm2F.\nOtherwise, we say that Mrejects the stringw.\nThe NFA in the example in Section 2.4.1 accepts the string 01100. This\ncan be seen by taking\n\u000fm= 6,\n1Thanks to Antoine Vigneron for pointing out an error in a previous version of this\nde\fnition."
}
]
},
{
"section": "Page 49",
"content": [
{
"type": "text",
"text": "2.5. Equivalence of DFAs and NFAs 41\n\u000fw= 01\u000f100 =y1y2y3y4y5y6, and\n\u000fr0=q1,r1=q1,r2=q2,r3=q3,r4=q4,r5=q4, andr6=q4.\nDe\fnition 2.4.3 LetM= (Q;\u0006;\u000e;q;F ) be an NFA. The languageL(M)\naccepted byMis de\fned as\nL(M) =fw2\u0006\u0003:Macceptswg:\n2.5 Equivalence of DFAs and NFAs\nYou may have the impression that nondeterministic \fnite automata are more\npowerful than deterministic \fnite automata. In this section, we will show\nthat this is not the case. That is, we will prove that a language can be\naccepted by a DFA if and only if it can be accepted by an NFA. In order\nto prove this, we will show how to convert an arbitrary NFA to a DFA that\naccepts the same language.\nWhat about converting a DFA to an NFA? Well, there is (almost) nothing\nto do, because a DFA is also an NFA. This is not quite true, because\n\u000fthe transition function of a DFA maps a state and a symbol to a state,\nwhereas\n\u000fthe transition function of an NFA maps a state and a symbol to a set\nof zero or more states.\nThe formal conversion of a DFA to an NFA is done as follows: Let M=\n(Q;\u0006;\u000e;q;F ) be a DFA. Recall that \u000eis a function \u000e:Q\u0002\u0006!Q. We\nde\fne the function \u000e0:Q\u0002\u0006\u000f!P (Q) as follows. For any r2Qand for\nanya2\u0006\u000f,\n\u000e0(r;a) =\u001af\u000e(r;a)gifa6=\u000f,\n; ifa=\u000f.\nThenN= (Q;\u0006;\u000e0;q;F ) is an NFA, whose behavior is exactly the same as\nthat of the DFA M; the easiest way to see this is by observing that the state\ndiagrams of MandNare equal. Therefore, we have L(M) =L(N).\nIn the rest of this section, we will show how to convert an NFA to a DFA:\nTheorem 2.5.1 LetN= (Q;\u0006;\u000e;q;F )be a nondeterministic \fnite automa-\nton. There exists a deterministic \fnite automaton M, such that L(M) =\nL(N)."
}
]
},
{
"section": "Page 50",
"content": [
{
"type": "text",
"text": "42 Chapter 2. Finite Automata and Regular Languages\nProof. Recall that the NFA Ncan (in general) perform more than one\ncomputation on a given input string. The idea of the proof is to construct a\nDFAMthat runs all these di\u000berent computations simultaneously . (We have\nseen this idea already in the proof of Theorem 2.3.1.) To be more precise,\nthe DFAMwill have the following property:\n\u000fthe state that Mis in after having read an initial part of the input\nstring corresponds exactly to the set of all states that Ncan reach\nafter having read the same part of the input string.\nWe start by presenting the conversion for the case when Ndoes not\ncontain\u000f-transitions. In other words, the state diagram of Ndoes not contain\nany edge that has \u000fas a label. (Later, we will extend the conversion to the\ngeneral case.) Let the DFA Mbe de\fned as M= (Q0;\u0006;\u000e0;q0;F0), where\n\u000fthe setQ0of states is equal to Q0=P(Q); observe thatjQ0j= 2jQj,\n\u000fthe start state q0is equal toq0=fqg; soMhas the \\same\" start state\nasN,\n\u000fthe setF0of accept states is equal to the set of all elements RofQ0\nhaving the property that Rcontains at least one accept state of N, i.e.,\nF0=fR2Q0:R\\F6=;g;\n\u000fthe transition function \u000e0:Q0\u0002\u0006!Q0is de\fned as follows: For each\nR2Q0and for each a2\u0006,\n\u000e0(R;a) =[\nr2R\u000e(r;a):\nLet us see what the transition function \u000e0ofMdoes. First observe that,\nsinceNis an NFA, \u000e(r;a) is a subset of Q. This implies that \u000e0(R;a) is the\nunion of subsets of Qand, therefore, also a subset of Q. Hence,\u000e0(R;a) is\nan element of Q0.\nThe set\u000e(r;a) is equal to the set of all states of the NFA Nthat can be\nreached from state rby reading the symbol a. We take the union of these\nsets\u000e(r;a), whererranges over all elements of R, to obtain the new set\n\u000e0(R;a). This new set is the state that the DFA Mreaches from state R, by\nreading the symbol a."
}
]
},
{
"section": "Page 51",
"content": [
{
"type": "text",
"text": "2.5. Equivalence of DFAs and NFAs 43\nIn this way, we obtain the correspondence that was given in the beginning\nof this proof.\nAfter this warming-up, we can consider the general case. In other words,\nfrom now on, we allow \u000f-transitions in the NFA N. The DFA Mis de\fned as\nabove, except that the start state q0and the transition function \u000e0have to be\nmodi\fed. Recall that a computation of the NFA Nconsists of the following:\n1. Start in the start state qand make zero or more \u000f-transitions.\n2. Read one \\real\" symbol of \u0006 and move to a new state (or stay in the\ncurrent state).\n3. Make zero or more \u000f-transitions.\n4. Read one \\real\" symbol of \u0006 and move to a new state (or stay in the\ncurrent state).\n5. Make zero or more \u000f-transitions.\n6. Etc.\nThe DFAMwill simulate this computation in the following way:\n\u000fSimulate 1. in one single step. As we will see below, this simulation is\nimplicitly encoded in the de\fnition of the start state q0ofM.\n\u000fSimulate 2. and 3. in one single step.\n\u000fSimulate 4. and 5. in one single step.\n\u000fEtc.\nThus, in onestep, the DFA Msimulates the reading of one \\real\" symbol of\n\u0006, followed by making zero or more \u000f-transitions.\nTo formalize this, we need the notion of \u000f-closure . For any state rof the\nNFAN, the\u000f-closure of r, denoted by C\u000f(r), is de\fned to be the set of all\nstates ofNthat can be reached from r, by making zero or more \u000f-transitions.\nFor any state Rof the DFA M(hence,R\u0012Q), we de\fne\nC\u000f(R) =[\nr2RC\u000f(r):"
}
]
},
{
"section": "Page 52",
"content": [
{
"type": "text",
"text": "44 Chapter 2. Finite Automata and Regular Languages\nHow do we de\fne the start state q0of the DFA M? Before the NFA N\nreads its \frst \\real\" symbol of \u0006, it makes zero or more \u000f-transitions. In\nother words, at the moment when Nreads the \frst symbol of \u0006, it can be\nin any state of C\u000f(q). Therefore, we de\fne q0to be\nq0=C\u000f(q) =C\u000f(fqg):\nHow do we de\fne the transition function \u000e0of the DFA M? Assume that\nMis in stateR, and reads the symbol a. At this moment, the NFA Nwould\nhave been in any state rofR. By reading the symbol a,Ncan switch to\nany state in \u000e(r;a), and then make zero or more \u000f-transitions. Hence, the\nNFA can switch to any state in the set C\u000f(\u000e(r;a)). Based on this, we de\fne\n\u000e0(R;a) to be\n\u000e0(R;a) =[\nr2RC\u000f(\u000e(r;a)):\nTo summarize, the NFA N= (Q;\u0006;\u000e;q;F ) is converted to the DFA\nM= (Q0;\u0006;\u000e0;q0;F0), where\n\u000fQ0=P(Q),\n\u000fq0=C\u000f(fqg),\n\u000fF0=fR2Q0:R\\F6=;g,\n\u000f\u000e0:Q0\u0002\u0006!Q0is de\fned as follows: For each R2Q0and for each\na2\u0006,\n\u000e0(R;a) =[\nr2RC\u000f(\u000e(r;a)):\nThe results proved until now can be summarized in the following theorem.\nTheorem 2.5.2 LetAbe a language. Then Ais regular if and only if there\nexists a nondeterministic \fnite automaton that accepts A.\n2.5.1 An example\nConsider the NFA N= (Q;\u0006;\u000e;q;F ), whereQ=f1;2;3g, \u0006 =fa;bg,q= 1,\nF=f2g, and\u000eis given by the following table:"
}
]
},
{
"section": "Page 53",
"content": [
{
"type": "text",
"text": "2.5. Equivalence of DFAs and NFAs 45\na b \u000f\n1f3g ; f 2g\n2f1g ; ;\n3f2g f2;3g ;\nThe state diagram of Nis as follows:\n1 2\n3aaǫ\nba, b\nWe will show how to convert this NFA Nto a DFAMthat accepts the\nsame language. Following the proof of Theorem 2.5.1, the DFA Mis speci\fed\nbyM= (Q0;\u0006;\u000e0;q0;F0), where each of the components is de\fned below.\n\u000fQ0=P(Q). Hence,\nQ0=f;;f1g;f2g;f3g;f1;2g;f1;3g;f2;3g;f1;2;3gg:\n\u000fq0=C\u000f(fqg). Hence, the start state q0ofMis the set of all states of\nNthat can be reached from N's start state q= 1, by making zero or\nmore\u000f-transitions. We obtain\nq0=C\u000f(fqg) =C\u000f(f1g) =f1;2g:\n\u000fF0=fR2Q0:R\\F6=;g. Hence, the accept states of Mare those\nstates that contain the accept state 2 of N. We obtain\nF0=ff2g;f1;2g;f2;3g;f1;2;3gg:"
}
]
},
{
"section": "Page 54",
"content": [
{
"type": "text",
"text": "46 Chapter 2. Finite Automata and Regular Languages\n\u000f\u000e0:Q0\u0002\u0006!Q0is de\fned as follows: For each R2Q0and for each\na2\u0006,\n\u000e0(R;a) =[\nr2RC\u000f(\u000e(r;a)):\nIn this example \u000e0is given by\n\u000e0(;;a) =;\u000e0(;;b) =;\n\u000e0(f1g;a) =f3g\u000e0(f1g;b) =;\n\u000e0(f2g;a) =f1;2g\u000e0(f2g;b) =;\n\u000e0(f3g;a) =f2g\u000e0(f3g;b) =f2;3g\n\u000e0(f1;2g;a) =f1;2;3g\u000e0(f1;2g;b) =;\n\u000e0(f1;3g;a) =f2;3g\u000e0(f1;3g;b) =f2;3g\n\u000e0(f2;3g;a) =f1;2g\u000e0(f2;3g;b) =f2;3g\n\u000e0(f1;2;3g;a) =f1;2;3g\u000e0(f1;2;3g;b) =f2;3g\nThe state diagram of the DFA Mis as follows:"
}
]
},
{
"section": "Page 55",
"content": [
{
"type": "text",
"text": "2.5. Equivalence of DFAs and NFAs 47\n/0 {1}\n{2}\n{3} {1,2}\n{2,3} {1,3}\n{1,2,3}a,b\nb\nab\naa\nb\na,b ab\nba\nb\na\nWe make the following observations:\n\u000fThe statesf1gandf1;3gdo not have incoming edges. Therefore, these\ntwo states cannot be reached from the start state f1;2g.\n\u000fThe statef3ghas only one incoming edge; it comes from the state\nf1g. Sincef1gcannot be reached from the start state, f3gcannot be\nreached from the start state.\n\u000fThe statef2ghas only one incoming edge; it comes from the state\nf3g. Sincef3gcannot be reached from the start state, f2gcannot be\nreached from the start state.\nHence, we can remove the four states f1g,f2g,f3g, andf1;3g. The\nresulting DFA accepts the same language as the DFA above. This leads\nto the following state diagram, which depicts a DFA that accepts the same\nlanguage as the NFA N:"
}
]
},
{
"section": "Page 56",
"content": [
{
"type": "text",
"text": "48 Chapter 2. Finite Automata and Regular Languages\n/0\n{1,2}\n{2,3}\n{1,2,3}a,b\nab\nba\nb\na\n2.6 Closure under the regular operations\nIn Section 2.3, we have de\fned the regular operations union, concatenation,\nand star. We proved in Theorem 2.3.1 that the union of two regular lan-\nguages is a regular language. We also explained why it is not clear that the\nconcatenation of two regular languages is regular, and that the star of a reg-\nular language is regular. In this section, we will see that the concept of NFA,\ntogether with Theorem 2.5.2, can be used to give a simple proof of the fact\nthat the regular languages are indeed closed under the regular operations.\nWe start by giving an alternative proof of Theorem 2.3.1:\nTheorem 2.6.1 The set of regular languages is closed under the union op-\neration, i.e., if A1andA2are regular languages over the same alphabet \u0006,\nthenA1[A2is also a regular language."
}
]
},
{
"section": "Page 57",
"content": [
{
"type": "text",
"text": "2.6. Closure under the regular operations 49\nq1\nM1\nM2q2q0q1\nq2ε\nε\nM\nFigure 2.1: The NFAMacceptsL(M1)[L(M2).\nProof. SinceA1is regular, there is, by Theorem 2.5.2, an NFA M1=\n(Q1;\u0006;\u000e1;q1;F1), such that A1=L(M1). Similarly, there is an NFA M2=\n(Q2;\u0006;\u000e2;q2;F2), such that A2=L(M2). We may assume that Q1\\Q2=;,\nbecause otherwise, we can give new \\names\" to the states of Q1andQ2.\nFrom these two NFAs, we will construct an NFA M= (Q;\u0006;\u000e;q 0;F), such\nthatL(M) =A1[A2. The construction is illustrated in Figure 2.1. The\nNFAMis de\fned as follows:\n1.Q=fq0g[Q1[Q2, whereq0is a new state.\n2.q0is the start state of M.\n3.F=F1[F2.\n4.\u000e:Q\u0002\u0006\u000f!P (Q) is de\fned as follows: For any r2Qand for any"
}
]
},
{
"section": "Page 58",
"content": [
{
"type": "text",
"text": "50 Chapter 2. Finite Automata and Regular Languages\na2\u0006\u000f,\n\u000e(r;a) =8\n>><\n>>:\u000e1(r;a) ifr2Q1,\n\u000e2(r;a) ifr2Q2,\nfq1;q2gifr=q0anda=\u000f,\n; ifr=q0anda6=\u000f.\nTheorem 2.6.2 The set of regular languages is closed under the concatena-\ntion operation, i.e., if A1andA2are regular languages over the same alphabet\n\u0006, thenA1A2is also a regular language.\nProof. LetM1= (Q1;\u0006;\u000e1;q1;F1) be an NFA, such that A1=L(M1).\nSimilarly, let M2= (Q2;\u0006;\u000e2;q2;F2) be an NFA, such that A2=L(M2).\nAs in the proof of Theorem 2.6.1, we may assume that Q1\\Q2=;. We\nwill construct an NFA M= (Q;\u0006;\u000e;q 0;F), such that L(M) =A1A2. The\nconstruction is illustrated in Figure 2.2. The NFA Mis de\fned as follows:\n1.Q=Q1[Q2.\n2.q0=q1.\n3.F=F2.\n4.\u000e:Q\u0002\u0006\u000f!P (Q) is de\fned as follows: For any r2Qand for any\na2\u0006\u000f,\n\u000e(r;a) =8\n>><\n>>:\u000e1(r;a) if r2Q1andr62F1,\n\u000e1(r;a) if r2F1anda6=\u000f,\n\u000e1(r;a)[fq2gifr2F1anda=\u000f,\n\u000e2(r;a) if r2Q2.\nTheorem 2.6.3 The set of regular languages is closed under the star oper-\nation, i.e., if Ais a regular language, then A\u0003is also a regular language."
}
]
},
{
"section": "Page 59",
"content": [
{
"type": "text",
"text": "2.6. Closure under the regular operations 51\nq1\nM1 M2q2\nq2ε\nε\nεq0\nM\nFigure 2.2: The NFAMacceptsL(M1)L(M2).\nq1\nNq1\nq0εε\nε\nε\nM\nFigure 2.3: The NFAMaccepts (L(N))\u0003.\nProof. Let \u0006 be the alphabet of Aand letN= (Q1;\u0006;\u000e1;q1;F1) be an\nNFA, such that A=L(N). We will construct an NFA M= (Q;\u0006;\u000e;q 0;F),\nsuch thatL(M) =A\u0003. The construction is illustrated in Figure 2.3. The\nNFAMis de\fned as follows:"
}
]
},
{
"section": "Page 60",
"content": [
{
"type": "text",
"text": "52 Chapter 2. Finite Automata and Regular Languages\n1.Q=fq0g[Q1, whereq0is a new state.\n2.q0is the start state of M.\n3.F=fq0g[F1. (Since\u000f2A\u0003,q0has to be an accept state.)\n4.\u000e:Q\u0002\u0006\u000f!P (Q) is de\fned as follows: For any r2Qand for any\na2\u0006\u000f,\n\u000e(r;a) =8\n>>>><\n>>>>:\u000e1(r;a) if r2Q1andr62F1,\n\u000e1(r;a) if r2F1anda6=\u000f,\n\u000e1(r;a)[fq1gifr2F1anda=\u000f,\nfq1g ifr=q0anda=\u000f,\n; ifr=q0anda6=\u000f.\nIn the \fnal theorem of this section, we mention (without proof) two more\nclosure properties of the regular languages:\nTheorem 2.6.4 The set of regular languages is closed under the complement\nand intersection operations:\n1. IfAis a regular language over the alphabet \u0006, then the complement\nA=fw2\u0006\u0003:w62Ag\nis also a regular language.\n2. IfA1andA2are regular languages over the same alphabet \u0006, then the\nintersection\nA1\\A2=fw2\u0006\u0003:w2A1andw2A2g\nis also a regular language.\n2.7 Regular expressions\nIn this section, we present regular expressions, which are a means to describe\nlanguages. As we will see, the class of languages that can be described by\nregular expressions coincides with the class of regular languages."
}
]
},
{
"section": "Page 61",
"content": [
{
"type": "text",
"text": "2.7. Regular expressions 53\nBefore formally de\fning the notion of a regular expression, we give some\nexamples. Consider the expression\n(0[1)01\u0003:\nThe language described by this expression is the set of all binary strings\n1. that start with either 0 or 1 (this is indicated by (0 [1)),\n2. for which the second symbol is 0 (this is indicated by 0), and\n3. that end with zero or more 1s (this is indicated by 1\u0003).\nThat is, the language described by this expression is\nf00;001;0011;00111;:::; 10;101;1011;10111;:::g:\nHere are some more examples (in all cases, the alphabet is f0;1g):\n\u000fThe languagefw:wcontains exactly two 0s gis described by the ex-\npression\n1\u000301\u000301\u0003:\n\u000fThe languagefw:wcontains at least two 0s gis described by the ex-\npression\n(0[1)\u00030(0[1)\u00030(0[1)\u0003:\n\u000fThe languagefw: 1011 is a substring of wgis described by the ex-\npression\n(0[1)\u00031011(0[1)\u0003:\n\u000fThe languagefw: the length of wis evengis described by the expres-\nsion\n((0[1)(0[1))\u0003:\n\u000fThe languagefw: the length of wis oddgis described by the expres-\nsion\n(0[1) ((0[1)(0[1))\u0003:\n\u000fThe languagef1011;0gis described by the expression\n1011[0:"
}
]
},
{
"section": "Page 62",
"content": [
{
"type": "text",
"text": "54 Chapter 2. Finite Automata and Regular Languages\n\u000fThe languagefw: the \frst and last symbols of ware equalgis de-\nscribed by the expression\n0(0[1)\u00030[1(0[1)\u00031[0[1:\nAfter these examples, we give a formal (and inductive) de\fnition of regular\nexpressions :\nDe\fnition 2.7.1 Let \u0006 be a non-empty alphabet.\n1.\u000fis a regular expression.\n2.;is a regular expression.\n3. For each a2\u0006,ais a regular expression.\n4. IfR1andR2are regular expressions, then R1[R2is a regular expres-\nsion.\n5. IfR1andR2are regular expressions, then R1R2is a regular expression.\n6. IfRis a regular expression, then R\u0003is a regular expression.\nYou can regard 1., 2., and 3. as being the \\building blocks\" of regular\nexpressions. Items 4., 5., and 6. give rules that can be used to combine\nregular expressions into new (and \\larger\") regular expressions. To give an\nexample, we claim that\n(0[1)\u0003101(0[1)\u0003\nis a regular expression (where the alphabet \u0006 is equal to f0;1g). In order\nto prove this, we have to show that this expression can be \\built\" using the\n\\rules\" given in De\fnition 2.7.1. Here we go:\n\u000fBy 3., 0 is a regular expression.\n\u000fBy 3., 1 is a regular expression.\n\u000fSince 0 and 1 are regular expressions, by 4., 0 [1 is a regular expression.\n\u000fSince 0[1 is a regular expression, by 6., (0 [1)\u0003is a regular expression.\n\u000fSince 1 and 0 are regular expressions, by 5., 10 is a regular expression."
}
]
},
{
"section": "Page 63",
"content": [
{
"type": "text",
"text": "2.7. Regular expressions 55\n\u000fSince 10 and 1 are regular expressions, by 5., 101 is a regular expression.\n\u000fSince (0[1)\u0003and 101 are regular expressions, by 5., (0 [1)\u0003101 is a\nregular expression.\n\u000fSince (0[1)\u0003101 and (0[1)\u0003are regular expressions, by 5., (0 [\n1)\u0003101(0[1)\u0003is a regular expression.\nNext we de\fne the language that is described by a regular expression:\nDe\fnition 2.7.2 Let \u0006 be a non-empty alphabet.\n1. The regular expression \u000fdescribes the language f\u000fg.\n2. The regular expression ;describes the language ;.\n3. For each a2\u0006, the regular expression adescribes the language fag.\n4. LetR1andR2be regular expressions and let L1andL2be the lan-\nguages described by them, respectively. The regular expression R1[R2\ndescribes the language L1[L2.\n5. LetR1andR2be regular expressions and let L1andL2be the languages\ndescribed by them, respectively. The regular expression R1R2describes\nthe language L1L2.\n6. LetRbe a regular expression and let Lbe the language described by\nit. The regular expression R\u0003describes the language L\u0003.\nWe consider some examples:\n\u000fThe regular expression (0 [\u000f)(1[\u000f) describes the language f01;0;1;\u000fg.\n\u000fThe regular expression 0 [\u000fdescribes the language f0;\u000fg, whereas the\nregular expression 1\u0003describes the language f\u000f;1;11;111;:::g. There-\nfore, the regular expression (0 [\u000f)1\u0003describes the language\nf0;01;011;0111;:::;\u000f; 1;11;111;:::g:\nObserve that this language is also described by the regular expression\n01\u0003[1\u0003."
}
]
},
{
"section": "Page 64",
"content": [
{
"type": "text",
"text": "56 Chapter 2. Finite Automata and Regular Languages\n\u000fThe regular expression 1\u0003;describes the empty language, i.e., the lan-\nguage;. (You should convince yourself that this is correct.)\n\u000fThe regular expression ;\u0003describes the language f\u000fg.\nDe\fnition 2.7.3 LetR1andR2be regular expressions and let L1andL2\nbe the languages described by them, respectively. If L1=L2(i.e.,R1and\nR2describe the same language), then we will write R1=R2.\nHence, even though (0 [\u000f)1\u0003and 01\u0003[1\u0003are di\u000berent regular expressions,\nwe write\n(0[\u000f)1\u0003= 01\u0003[1\u0003;\nbecause they describe the same language.\nIn Section 2.8.2, we will show that every regular language can be described\nby a regular expression. The proof of this fact is purely algebraic and uses\nthe following algebraic identities involving regular expressions.\nTheorem 2.7.4 LetR1,R2, andR3be regular expressions. The following\nidentities hold:\n1.R1;=;R1=;.\n2.R1\u000f=\u000fR1=R1.\n3.R1[;=;[R1=R1.\n4.R1[R1=R1.\n5.R1[R2=R2[R1.\n6.R1(R2[R3) =R1R2[R1R3.\n7.(R1[R2)R3=R1R3[R2R3.\n8.R1(R2R3) = (R1R2)R3.\n9.;\u0003=\u000f.\n10.\u000f\u0003=\u000f.\n11.(\u000f[R1)\u0003=R\u0003\n1."
}
]
},
{
"section": "Page 65",
"content": [
{
"type": "text",
"text": "2.8. Equivalence of regular expressions and regular languages 57\n12.(\u000f[R1)(\u000f[R1)\u0003=R\u0003\n1.\n13.R\u0003\n1(\u000f[R1) = (\u000f[R1)R\u0003\n1=R\u0003\n1.\n14.R\u0003\n1R2[R2=R\u0003\n1R2.\n15.R1(R2R1)\u0003= (R1R2)\u0003R1.\n16.(R1[R2)\u0003= (R\u0003\n1R2)\u0003R\u0003\n1= (R\u0003\n2R1)\u0003R\u0003\n2.\nWe will not present the (boring) proofs of these identities, but urge you\nto convince yourself informally that they make perfect sense. To give an\nexample, we mentioned above that\n(0[\u000f)1\u0003= 01\u0003[1\u0003:\nWe can verify this identity in the following way:\n(0[\u000f)1\u0003= 01\u0003[\u000f1\u0003(by identity 7)\n= 01\u0003[1\u0003(by identity 2)\n2.8 Equivalence of regular expressions and reg-\nular languages\nIn the beginning of Section 2.7, we mentioned the following result:\nTheorem 2.8.1 LetLbe a language. Then Lis regular if and only if there\nexists a regular expression that describes L.\nThe proof of this theorem consists of two parts:\n\u000fIn Section 2.8.1, we will prove that every regular expression describes\na regular language.\n\u000fIn Section 2.8.2, we will prove that every DFA Mcan be converted to\na regular expression that describes the language L(M).\nThese two results will prove Theorem 2.8.1."
}
]
},
{
"section": "Page 66",
"content": [
{
"type": "text",
"text": "58 Chapter 2. Finite Automata and Regular Languages\n2.8.1 Every regular expression describes a regular lan-\nguage\nLetRbe an arbitrary regular expression over the alphabet \u0006. We will prove\nthat the language described by Ris a regular language. The proof is by\ninduction on the structure of R(i.e., by induction on the way Ris \\built\"\nusing the \\rules\" given in De\fnition 2.7.1).\nThe \frst base case: Assume that R=\u000f. ThenRdescribes the lan-\nguagef\u000fg. In order to prove that this language is regular, it su\u000eces, by\nTheorem 2.5.2, to construct an NFA M= (Q;\u0006;\u000e;q;F ) that accepts this\nlanguage. This NFA is obtained by de\fning Q=fqg,qis the start state,\nF=fqg, and\u000e(q;a) =;for alla2\u0006\u000f. The \fgure below gives the state\ndiagram of M:\nq\nThe second base case: Assume that R=;. ThenRdescribes the language\n;. In order to prove that this language is regular, it su\u000eces, by Theorem 2.5.2,\nto construct an NFA M= (Q;\u0006;\u000e;q;F ) that accepts this language. This\nNFA is obtained by de\fning Q=fqg,qis the start state, F=;, and\n\u000e(q;a) =;for alla2\u0006\u000f. The \fgure below gives the state diagram of M:\nq\nThe third base case: Leta2\u0006 and assume that R=a. ThenRdescribes\nthe languagefag. In order to prove that this language is regular, it su\u000eces,\nby Theorem 2.5.2, to construct an NFA M= (Q;\u0006;\u000e;q 1;F) that accepts\nthis language. This NFA is obtained by de\fning Q=fq1;q2g,q1is the start\nstate,F=fq2g, and\n\u000e(q1;a) =fq2g;\n\u000e(q1;b) =;for allb2\u0006\u000fnfag,\n\u000e(q2;b) =;for allb2\u0006\u000f.\nThe \fgure below gives the state diagram of M:"
}
]
},
{
"section": "Page 67",
"content": [
{
"type": "text",
"text": "2.8. Equivalence of regular expressions and regular languages 59\nq1 q2a\nThe \frst case of the induction step: Assume that R=R1[R2, where\nR1andR2are regular expressions. Let L1andL2be the languages described\nbyR1andR2, respectively, and assume that L1andL2are regular. Then R\ndescribes the language L1[L2, which, by Theorem 2.6.1, is regular.\nThe second case of the induction step: Assume that R=R1R2, where\nR1andR2are regular expressions. Let L1andL2be the languages described\nbyR1andR2, respectively, and assume that L1andL2are regular. Then R\ndescribes the language L1L2, which, by Theorem 2.6.2, is regular.\nThe third case of the induction step: Assume that R= (R1)\u0003, where\nR1is a regular expression. Let L1be the language described by R1and\nassume that L1is regular. Then Rdescribes the language ( L1)\u0003, which, by\nTheorem 2.6.3, is regular.\nThis concludes the proof of the claim that every regular expression de-\nscribes a regular language.\nTo give an example, consider the regular expression\n(ab[a)\u0003;\nwhere the alphabet is fa;bg. We will prove that this regular expression de-\nscribes a regular language, by constructing an NFA that accepts the language\ndescribed by this regular expression. Observe how the regular expression is\n\\built\":\n\u000fTake the regular expressions aandb, and combine them into the regular\nexpressionab.\n\u000fTake the regular expressions abanda, and combine them into the\nregular expression ab[a.\n\u000fTake the regular expression ab[a, and transform it into the regular\nexpression ( ab[a)\u0003.\nFirst, we construct an NFA M1that accepts the language described by\nthe regular expression a:"
}
]
},
{
"section": "Page 68",
"content": [
{
"type": "text",
"text": "60 Chapter 2. Finite Automata and Regular Languages\naM1\nNext, we construct an NFA M2that accepts the language described by\nthe regular expression b:\nM2b\nNext, we apply the construction given in the proof of Theorem 2.6.2 to\nM1andM2. This gives an NFA M3that accepts the language described by\nthe regular expression ab:\nM3a ε b\nNext, we apply the construction given in the proof of Theorem 2.6.1 to\nM3andM1. This gives an NFA M4that accepts the language described by\nthe regular expression ab[a:\na ε b\naε\nεM4\nFinally, we apply the construction given in the proof of Theorem 2.6.3\ntoM4. This gives an NFA M5that accepts the language described by the\nregular expression ( ab[a)\u0003:"
}
]
},
{
"section": "Page 69",
"content": [
{
"type": "text",
"text": "2.8. Equivalence of regular expressions and regular languages 61\na ε b\naε\nεεε\nεM5\n2.8.2 Converting a DFA to a regular expression\nIn this section, we will prove that every DFA Mcan be converted to a regular\nexpression that describes the language L(M). In order to prove this result,\nwe need to solve recurrence relations involving languages.\nSolving recurrence relations\nLet \u0006 be an alphabet, let BandCbe \\known\" languages in \u0006\u0003such that\n\u000f62B, and letLbe an \\unknown\" language such that\nL=BL[C:\nCan we \\solve\" this equation for L? That is, can we express Lin terms of\nBandC?\nConsider an arbitrary string uinL. We are going to determine how u\nlooks like. Since u2LandL=BL[C, we know that uis a string in\nBL[C. Hence, there are two possibilities for u.\n1.uis an element of C.\n2.uis an element of BL. In this case, there are strings b2Bandv2L\nsuch thatu=bv. Since\u000f62B, we haveb6=\u000fand, therefore,jvj<juj.\n(Recall thatjvjdenotes the length, i.e., the number of symbols, of the\nstringv.) Sincevis a string in L, which is equal to BL[C,vis a\nstring inBL[C. Hence, there are two possibilities for v."
}
]
},
{
"section": "Page 70",
"content": [
{
"type": "text",
"text": "62 Chapter 2. Finite Automata and Regular Languages\n(a)vis an element of C. In this case,\nu=bv;whereb2Bandv2C; thus,u2BC.\n(b)vis an element of BL. In this case, there are strings b02Band\nw2Lsuch thatv=b0w. Since\u000f62B, we haveb06=\u000fand,\ntherefore,jwj<jvj. Sincewis a string in L, which is equal to\nBL[C,wis a string in BL[C. Hence, there are two possibilities\nforw.\ni.wis an element of C. In this case,\nu=bb0w;whereb;b02Bandw2C; thus,u2BBC .\nii.wis an element of BL. In this case, there are strings b002B\nandx2Lsuch thatw=b00x. Since\u000f62B, we haveb006=\u000f\nand, therefore,jxj<jwj. Sincexis a string in L, which is\nequal toBL[C,xis a string in BL[C. Hence, there are\ntwo possibilities for x.\nA.xis an element of C. In this case,\nu=bb0b00x;whereb;b0;b002Bandx2C; thus,u2BBBC .\nB.xis an element of BL. Etc., etc.\nThis process hopefully convinces you that any string uinLcan be written\nas the concatenation of zero or more strings in B, followed by one string in\nC. In fact,Lconsists of exactly those strings having this property:\nLemma 2.8.2 Let\u0006be an alphabet, and let B,C, andLbe languages in\n\u0006\u0003such that\u000f62Band\nL=BL[C:\nThen\nL=B\u0003C:\nProof. First, we show that B\u0003C\u0012L. Letube an arbitrary string in B\u0003C.\nThenuis the concatenation of kstrings ofB, for somek\u00150, followed by\none string of C. We proceed by induction on k.\nThe base case is when k= 0. In this case, uis a string in C. Hence,uis\na string inBL[C. SinceBL[C=L, it follows that uis a string in L."
}
]
},
{
"section": "Page 71",
"content": [
{
"type": "text",
"text": "2.8. Equivalence of regular expressions and regular languages 63\nNow letk\u00151. Then we can write u=vwc, wherevis a string in B,\nwis the concatenation of k\u00001 strings of B, andcis a string of C. De\fne\ny=wc. Observe that yis the concatenation of k\u00001 strings of Bfollowed\nby one string of C. Therefore, by induction, the string yis an element of L.\nHence,u=vy, wherevis a string in Bandyis a string in L. This shows\nthatuis a string in BL. Hence,uis a string in BL[C. SinceBL[C=L,\nit follows that uis a string in L. This completes the proof that B\u0003C\u0012L.\nIt remains to show that L\u0012B\u0003C. Letube an arbitrary string in L,\nand let`be its length (i.e., `is the number of symbols in u). We prove by\ninduction on `thatuis a string in B\u0003C.\nThe base case is when `= 0. Then u=\u000f. Sinceu2LandL=BL[C,\nuis a string in BL[C. Since\u000f62B,ucannot be a string in BL. Hence,u\nmust be a string in C. SinceC\u0012B\u0003C, it follows that uis a string in B\u0003C.\nLet`\u00151. Ifuis a string in C, thenuis a string in B\u0003Cand we are done.\nSo assume that uis not a string in C. Sinceu2LandL=BL[C,uis a\nstring inBL. Hence, there are strings b2Bandv2Lsuch thatu=bv.\nSince\u000f62B, the length of bis at least one; hence, the length of vis less than\nthe length of u. By induction, vis a string in B\u0003C. Hence,u=bv, where\nb2Bandv2B\u0003C. This shows that u2B(B\u0003C). SinceB(B\u0003C)\u0012B\u0003C,\nit follows that u2B\u0003C.\nNote that Lemma 2.8.2 holds for anylanguageBthat does not contain\nthe empty string \u000f. As an example, assume that B=;. Then the language\nLsatis\fes the equation\nL=BL[C=;L[C:\nUsing Theorem 2.7.4, this equation becomes\nL=;[C=C:\nWe now show that Lemma 2.8.2 also implies that L=C: Since\u000f62B,\nLemma 2.8.2 implies that L=B\u0003C, which, using Theorem 2.7.4, becomes\nL=B\u0003C=;\u0003C=\u000fC=C:\nThe conversion\nWe will now use Lemma 2.8.2 to prove that every DFA can be converted to\na regular expression."
}
]
},
{
"section": "Page 72",
"content": [
{
"type": "text",
"text": "64 Chapter 2. Finite Automata and Regular Languages\nLetM= (Q;\u0006;\u000e;q;F ) be an arbitrary deterministic \fnite automaton.\nWe will show that there exists a regular expression that describes the lan-\nguageL(M).\nFor each state r2Q, we de\fne\nLr=fw2\u0006\u0003: the path in the state diagram of Mthat starts\nin staterand that corresponds to wends in a\nstate ofFg.\nIn words,Lris the language accepted by M,ifrwere the start state .\nWe will show that each such language Lrcan be described by a regular\nexpression. Since L(M) =Lq, this will prove that L(M) can be described by\na regular expression.\nThe basic idea is to set up equations for the languages Lr, which we then\nsolve using Lemma 2.8.2. We claim that\nLr=[\na2\u0006a\u0001L\u000e(r;a) ifr62F: (2.2)\nWhy is this true? Let wbe a string in Lr. Then the path Pin the state\ndiagram of Mthat starts in state rand that corresponds to wends in a\nstate ofF. Sincer62F, this path contains at least one edge. Let r0be the\nstate that follows the \frst state (i.e., r) ofP. Thenr0=\u000e(r;b) for some\nsymbolb2\u0006. Hence,bis the \frst symbol of w. Writew=bv, wherevis\nthe remaining part of w. Then the path P0=Pnfrgin the state diagram\nofMthat starts in state r0and that corresponds to vends in a state of F.\nTherefore,v2Lr0=L\u000e(r;b). Hence,\nw2b\u0001L\u000e(r;b)\u0012[\na2\u0006a\u0001L\u000e(r;a):\nConversely, let wbe a string inS\na2\u0006a\u0001L\u000e(r;a). Then there is a symbol b2\u0006\nand a string v2L\u000e(r;b)such thatw=bv. LetP0be the path in the state\ndiagram of Mthat starts in state \u000e(r;b) and that corresponds to v. Since\nv2L\u000e(r;b), this path ends in a state of F. LetPbe the path in the state\ndiagram of Mthat starts in r, follows the edge to \u000e(r;b), and then follows P0.\nThis pathPcorresponds to wand ends in a state of F. Therefore, w2Lr.\nThis proves the correctness of (2.2)."
}
]
},
{
"section": "Page 73",
"content": [
{
"type": "text",
"text": "2.8. Equivalence of regular expressions and regular languages 65\nSimilarly, we can prove that\nLr=\u000f[ [\na2\u0006a\u0001L\u000e(r;a)!\nifr2F: (2.3)\nSo we now have a set of equations in the \\unknowns\" Lr, forr2Q. The\nnumber of equations is equal to the size of Q. In other words, the number\nof equations is equal to the number of unknowns. The regular expression for\nL(M) =Lqis obtained by solving these equations using Lemma 2.8.2.\nOf course, we have to convince ourselves that these equations have a so-\nlution for any given DFA. Before we deal with this issue, we give an example.\nAn example\nConsider the deterministic \fnite automaton M= (Q;\u0006;\u000e;q 0;F), whereQ=\nfq0;q1;q2g, \u0006 =fa;bg,q0is the start state, F=fq2g, and\u000eis given in the\nstate diagram below. We show how to obtain the regular expression that\ndescribes the language accepted by M.\nq0\nq1q2a\na a\nbb\nb\nFor this case, (2.2) and (2.3) give the following equations:\n8\n<\n:Lq0=a\u0001Lq0[b\u0001Lq2\nLq1=a\u0001Lq0[b\u0001Lq1\nLq2=\u000f[a\u0001Lq1[b\u0001Lq0"
}
]
},
{
"section": "Page 74",
"content": [
{
"type": "text",
"text": "66 Chapter 2. Finite Automata and Regular Languages\nIn the third equation, Lq2is expressed in terms of Lq0andLq1. Hence, if we\nsubstitute the third equation into the \frst one, and use Theorem 2.7.4, then\nwe get\nLq0=a\u0001Lq0[b\u0001(\u000f[a\u0001Lq1[b\u0001Lq0)\n= (a[bb)\u0001Lq0[ba\u0001Lq1[b:\nWe obtain the following set of equations.\n\u001aLq0= (a[bb)\u0001Lq0[ba\u0001Lq1[b\nLq1=b\u0001Lq1[a\u0001Lq0\nLetL=Lq1,B=b, andC=a\u0001Lq0. Then\u000f62Band the second equation\nreadsL=BL[C. Hence, by Lemma 2.8.2,\nLq1=L=B\u0003C=b\u0003a\u0001Lq0:\nIf we substitute Lq1into the \frst equation, then we get (again using Theo-\nrem 2.7.4)\nLq0= (a[bb)\u0001Lq0[ba\u0001b\u0003a\u0001Lq0[b\n= (a[bb[bab\u0003a)Lq0[b:\nAgain applying Lemma 2.8.2, this time with L=Lq0,B=a[bb[bab\u0003aand\nC=b, gives\nLq0= (a[bb[bab\u0003a)\u0003b:\nThus, the regular expression that describes the language accepted by Mis\n(a[bb[bab\u0003a)\u0003b:\nCompleting the correctness of the conversion\nIt remains to prove that, for any DFA, the system of equations (2.2) and (2.3)\ncan be solved. This will follow from the following (more general) lemma.\n(You should verify that the equations (2.2) and (2.3) are in the form as\nspeci\fed in this lemma.)"
}
]
},
{
"section": "Page 75",
"content": [
{
"type": "text",
"text": "2.8. Equivalence of regular expressions and regular languages 67\nLemma 2.8.3 Letn\u00151be an integer and, for 1\u0014i\u0014nand1\u0014j\u0014n,\nletBijandCibe regular expressions such that \u000f62Bij. LetL1;L2;:::;Lnbe\nlanguages that satisfy\nLi= n[\nj=1BijLj!\n[Cifor1\u0014i\u0014n.\nThenL1can be expressed as a regular expression only involving the regular\nexpressions BijandCi.\nProof. The proof is by induction on n. The base case is when n= 1. In\nthis case, we have\nL1=B11L1[C1:\nSince\u000f62B11, it follows from Lemma 2.8.2 that L1=B\u0003\n11C1. This proves\nthe base case.\nLetn\u00152 and assume the lemma is true for n\u00001. We have\nLn= n[\nj=1BnjLj!\n[Cn\n=BnnLn[ n\u00001[\nj=1BnjLj!\n[Cn:\nSince\u000f62Bnn, it follows from Lemma 2.8.2 that\nLn=B\u0003\nnn n\u00001[\nj=1BnjLj!\n[Cn!\n=B\u0003\nnn n\u00001[\nj=1BnjLj!\n[B\u0003\nnnCn\n= n\u00001[\nj=1B\u0003\nnnBnjLj!\n[B\u0003\nnnCn\nBy substituting this equation for Lninto the equations for Li, 1\u0014i\u0014n\u00001,"
}
]
},
{
"section": "Page 76",
"content": [
{
"type": "text",
"text": "68 Chapter 2. Finite Automata and Regular Languages\nwe obtain\nLi= n[\nj=1BijLj!\n[Ci\n=BinLn[ n\u00001[\nj=1BijLj!\n[Ci\n= n\u00001[\nj=1(BinB\u0003\nnnBnj[Bij)Lj!\n[BinB\u0003\nnnCn[Ci:\nThus, we have obtained n\u00001 equations in L1;L2;:::;Ln\u00001. Since\u000f62\nBinB\u0003\nnnBnj[Bij, it follows from the induction hypothesis that L1can be\nexpressed as a regular expression only involving the regular expressions Bij\nandCi.\n2.9 The pumping lemma and nonregular lan-\nguages\nIn the previous sections, we have seen that the class of regular languages is\nclosed under various operations, and that these languages can be described by\n(deterministic or nondeterministic) \fnite automata and regular expressions.\nThese properties helped in developing techniques for showing that a language\nis regular. In this section, we will present a tool that can be used to prove\nthat certain languages are notregular. Observe that for a regular language,\n1. the amount of memory that is needed to determine whether or not a\ngiven string is in the language is \fnite and independent of the length\nof the string, and\n2. if the language consists of an in\fnite number of strings, then this lan-\nguage should contain in\fnite subsets having a fairly repetitive struc-\nture.\nIntuitively, languages that do not follow 1. or 2. should be nonregular. For\nexample, consider the language\nf0n1n:n\u00150g:"
}
]
},
{
"section": "Page 77",
"content": [
{
"type": "text",
"text": "2.9. The pumping lemma and nonregular languages 69\nThis language should be nonregular, because it seems unlikely that a DFA can\nremember how many 0s it has seen when it has reached the border between\nthe 0s and the 1s. Similarly the language\nf0n:nis a prime number g\nshould be nonregular, because the prime numbers do not seem to have any\nrepetitive structure that can be used by a DFA. To be more rigorous about\nthis, we will establish a property that all regular languages must possess.\nThis property is called the pumping lemma . If a language does not have this\nproperty, then it must be nonregular.\nThe pumping lemma states that any su\u000eciently long string in a regular\nlanguage can be pumped , i.e., there is a section in that string that can be\nrepeated any number of times, so that the resulting strings are all in the\nlanguage.\nTheorem 2.9.1 (Pumping Lemma for Regular Languages) LetAbe\na regular language. Then there exists an integer p\u00151, called the pumping\nlength , such that the following holds: Every string sinA, withjsj\u0015p, can\nbe written as s=xyz, such that\n1.y6=\u000f(i.e.,jyj\u00151),\n2.jxyj\u0014p, and\n3. for alli\u00150,xyiz2A.\nIn words, the pumping lemma states that by replacing the portion yins\nby zero or more copies of it, the resulting string is still in the language A.\nProof. Let \u0006 be the alphabet of A. SinceAis a regular language, there\nexists a DFA M= (Q;\u0006;\u000e;q;F ) that accepts A. We de\fne pto be the\nnumber of states in Q.\nLets=s1s2:::snbe an arbitrary string in Asuch thatn\u0015p. De\fne\nr1=q,r2=\u000e(r1;s1),r3=\u000e(r2;s2),:::,rn+1=\u000e(rn;sn). Thus, when the\nDFAMreads the string sfrom left to right, it visits the states r1;r2;:::;rn+1.\nSincesis a string in A, we know that rn+1belongs toF.\nConsider the \frst p+ 1 statesr1;r2;:::;rp+1in this sequence. Since the\nnumber of states of Mis equal to p, the pigeonhole principle implies that\nthere must be a state that occurs twice in this sequence. That is, there are\nindicesjand`such that 1\u0014j <`\u0014p+ 1 andrj=r`."
}
]
},
{
"section": "Page 78",
"content": [
{
"type": "text",
"text": "70 Chapter 2. Finite Automata and Regular Languages\nq=r1\nrn+1rj=rℓread x\nread y\nread z\nWe de\fnex=s1s2:::sj\u00001,y=sj:::s`\u00001, andz=s`:::sn. Sincej <` ,\nwe havey6=\u000f, proving the \frst claim in the theorem. Since `\u0014p+ 1, we\nhavejxyj=`\u00001\u0014p, proving the second claim in the theorem. To see that\nthe third claim also holds, recall that the string s=xyzis accepted by M.\nWhile reading x,Mmoves from the start state qto staterj. While reading\ny, it moves from state rjto stater`=rj, i.e., after having read y,Mis again\nin staterj. While reading z,Mmoves from state rjto the accept state rn+1.\nTherefore, the substring ycan be repeated any number i\u00150 of times, and\nthe corresponding string xyizwill still be accepted by M. It follows that\nxyiz2Afor alli\u00150.\n2.9.1 Applications of the pumping lemma\nFirst example\nConsider the language\nA=f0n1n:n\u00150g:\nWe will prove by contradiction that Ais not a regular language.\nAssume that Ais a regular language. Let p\u00151 be the pumping length,\nas given by the pumping lemma. Consider the string s= 0p1p. It is clear\nthats2Aandjsj= 2p\u0015p. Hence, by the pumping lemma, scan be\nwritten ass=xyz, wherey6=\u000f,jxyj\u0014p, andxyiz2Afor alli\u00150.\nObserve that, since jxyj\u0014p, the string ycontains only 0s. Moreover,\nsincey6=\u000f,ycontains at least one 0. But now we are in trouble: None of\nthe strings xy0z=xz,xy2z=xyyz ,xy3z=xyyyz , . . . , is contained in A.\nHowever, by the pumping lemma, all these strings must be in A. Hence, we\nhave a contradiction and we conclude that Ais not a regular language."
}
]
},
{
"section": "Page 79",
"content": [
{
"type": "text",
"text": "2.9. The pumping lemma and nonregular languages 71\nSecond example\nConsider the language\nA=fw2f0;1g\u0003: the number of 0s in wequals the number of 1s in wg:\nAgain, we prove by contradiction that Ais not a regular language.\nAssume that Ais a regular language. Let p\u00151 be the pumping length,\nas given by the pumping lemma. Consider the string s= 0p1p. Thens2A\nandjsj= 2p\u0015p. By the pumping lemma, scan be written as s=xyz,\nwherey6=\u000f,jxyj\u0014p, andxyiz2Afor alli\u00150.\nSincejxyj\u0014p, the string ycontains only 0s. Since y6=\u000f,ycontains at\nleast one 0. Therefore, the string xy2z=xyyz contains more 0s than 1s,\nwhich implies that this string is not contained in A. But, by the pumping\nlemma, this string is contained in A. This is a contradiction and, therefore,\nAis not a regular language.\nThird example\nConsider the language\nA=fww:w2f0;1g\u0003g:\nWe prove by contradiction that Ais not a regular language.\nAssume that Ais a regular language. Let p\u00151 be the pumping length,\nas given by the pumping lemma. Consider the string s= 0p10p1. Thens2A\nandjsj= 2p+ 2\u0015p. By the pumping lemma, scan be written as s=xyz,\nwherey6=\u000f,jxyj\u0014p, andxyiz2Afor alli\u00150.\nSincejxyj\u0014p, the string ycontains only 0s. Since y6=\u000f,ycontains at\nleast one 0. Therefore, the string xy2z=xyyz is not contained in A. But,\nby the pumping lemma, this string is contained in A. This is a contradiction\nand, therefore, Ais not a regular language.\nYou should convince yourself that by choosing s= 02p(which is a string\ninAwhose length is at least p), we do not obtain a contradiction. The reason\nis that the string ymay have an even length. Thus, 02pis the \\wrong\" string\nfor showing that Ais not regular. By choosing s= 0p10p1, we do obtain\na contradiction; thus, this is the \\correct\" string for showing that Ais not\nregular."
}
]
},
{
"section": "Page 80",
"content": [
{
"type": "text",
"text": "72 Chapter 2. Finite Automata and Regular Languages\nFourth example\nConsider the language\nA=f0m1n:m>n\u00150g:\nWe prove by contradiction that Ais not a regular language.\nAssume that Ais a regular language. Let p\u00151 be the pumping length,\nas given by the pumping lemma. Consider the string s= 0p+11p. Thens2A\nandjsj= 2p+ 1\u0015p. By the pumping lemma, scan be written as s=xyz,\nwherey6=\u000f,jxyj\u0014p, andxyiz2Afor alli\u00150.\nSincejxyj\u0014p, the string ycontains only 0s. Since y6=\u000f,ycontains at\nleast one 0. Consider the string xy0z=xz. The number of 1s in this string\nis equal top, whereas the number of 0s is at most equal to p. Therefore, the\nstringxy0zis not contained in A. But, by the pumping lemma, this string\nis contained in A. This is a contradiction and, therefore, Ais not a regular\nlanguage.\nFifth example\nConsider the language\nA=f1n2:n\u00150g:\nWe prove by contradiction that Ais not a regular language.\nAssume that Ais a regular language. Let p\u00151 be the pumping length,\nas given by the pumping lemma. Consider the string s= 1p2. Thens2A\nandjsj=p2\u0015p. By the pumping lemma, scan be written as s=xyz,\nwherey6=\u000f,jxyj\u0014p, andxyiz2Afor alli\u00150.\nObserve that\njsj=jxyzj=p2\nand\njxy2zj=jxyyzj=jxyzj+jyj=p2+jyj:\nSincejxyj\u0014p, we havejyj\u0014p. Sincey6=\u000f, we havejyj\u00151. It follows that\np2<jxy2zj\u0014p2+p<(p+ 1)2:\nHence, the length of the string xy2zis strictly between two consecutive\nsquares. It follows that this length is not a square and, therefore, xy2z\nis not contained in A. But, by the pumping lemma, this string is contained\ninA. This is a contradiction and, therefore, Ais not a regular language."
}
]
},
{
"section": "Page 81",
"content": [
{
"type": "text",
"text": "2.9. The pumping lemma and nonregular languages 73\nSixth example\nConsider the language\nA=f1n:nis a prime number g:\nWe prove by contradiction that Ais not a regular language.\nAssume that Ais a regular language. Let p\u00151 be the pumping length,\nas given by the pumping lemma. Let n\u0015pbe a prime number, and consider\nthe strings= 1n. Thens2Aandjsj=n\u0015p. By the pumping lemma, s\ncan be written as s=xyz, wherey6=\u000f,jxyj\u0014p, andxyiz2Afor alli\u00150.\nLetkbe the integer such that y= 1k. Sincey6=\u000f, we havek\u00151. For\neachi\u00150,n+ (i\u00001)kis a prime number, because xyiz= 1n+(i\u00001)k2A.\nFori=n+ 1, however, we have\nn+ (i\u00001)k=n+nk=n(1 +k);\nwhich is not a prime number, because n\u00152 and 1 + k\u00152. This is a\ncontradiction and, therefore, Ais not a regular language.\nSeventh example\nConsider the language\nA=fw2f0;1g\u0003: the number of occurrences of 01 in wis equal to\nthe number of occurrences of 10 in wg.\nSince this language has the same \ravor as the one in the second example,\nwe may suspect that Ais not a regular language. This is, however, not true:\nAs we will show, Ais a regular language.\nThe key property is the following one: Let wbe an arbitrary string in\nf0;1g\u0003. Then\nthe absolute value of the number of occurrences of 01 in wminus\nthe number of occurrences of 10 in wis at most one.\nThis property holds, because between any two consecutive occurrences of\n01, there must be exactly one occurrence of 10. Similarly, between any two\nconsecutive occurrences of 10, there must be exactly one occurrence of 01.\nWe will construct a DFA that accepts A. This DFA uses the following\n\fve states:"
}
]
},
{
"section": "Page 82",
"content": [
{
"type": "text",
"text": "74 Chapter 2. Finite Automata and Regular Languages\n\u000fq: start state; no symbol has been read.\n\u000fq01: the last symbol read was 1; in the part of the string read so far, the\nnumber of occurrences of 01 is one more than the number of occurrences\nof 10.\n\u000fq10: the last symbol read was 0; in the part of the string read so far, the\nnumber of occurrences of 10 is one more than the number of occurrences\nof 01.\n\u000fq0\nequal: the last symbol read was 0; in the part of the string read so far,\nthe number of occurrences of 01 is equal to the number of occurrences\nof 10.\n\u000fq1\nequal: the last symbol read was 1; in the part of the string read so far,\nthe number of occurrences of 01 is equal to the number of occurrences\nof 10.\nThe set of accept states is equal to fq;q0\nequal;q1\nequalg. The state diagram of\nthe DFA is given below.\nq0\nequal\nq1\nequalq01\nq10q00\n1\n1\n0\n1\n0\n0\n1\n1\nIn fact, the key property mentioned above implies that the language A\nconsists of the empty string \u000fand all non-empty binary strings that start"
}
]
},
{
"section": "Page 83",
"content": [
{
"type": "text",
"text": "2.9. The pumping lemma and nonregular languages 75\nand end with the same symbol. As a result, Ais the language described by\nthe regular expression\n\u000f[0[1[0(0[1)\u00030[1(0[1)\u00031:\nThis gives an alternative proof for the fact that Ais a regular language.\nEighth example\nConsider the language\nL=fw2f0;1g\u0003:wis the binary representation of a prime number g:\nWe assume that for any positive integer, the leftmost bit in its binary repre-\nsentation is 1. In other words, we assume that there are no 0's added to the\nleft of such a binary representation. Thus,\nL=f10;11;101;111;1011;1101;10001;:::g:\nWe will prove that Lis not a regular language.\nAssume that Lis a regular language. Let p\u00151 be the pumping length.\nLetN > 2pbe a prime number and let s2f0;1g\u0003be the binary representa-\ntion ofN. Observe thatjsj\u0015p+ 1. Also, the leftmost and rightmost bits of\nsare 1.\nSinces2Landjsj\u0015p+ 1\u0015p, the Pumping Lemma implies that we\ncan writes=xyz, such that\n1.jyj\u00151,\n2.jxyj\u0014p(and, thus,jzj\u00151), and\n3. for alli\u00150,xyiz2L, i.e.,xyizis the binary representation of a prime\nnumber.\nDe\fneA,B, andCto be the integers whose binary representations are\nx,y, andz, respectively. Note that both yandzmay have leading 0's. In\nfact,ymay be a string consisting of 0's only, in which case B= 0. However,\nsince the rightmost bit of zis 1, we have C\u00151. Observe that\nN=C+B\u00012jzj+A\u00012jzj+jyj: (2.4)"
}
]
},
{
"section": "Page 84",
"content": [
{
"type": "text",
"text": "76 Chapter 2. Finite Automata and Regular Languages\nLeti=N, consider the bitstring xyiz=xyNz, and letMbe the prime\nnumber whose binary representation is given by this bitstring. Then,\nM=C+N\u00001X\nk=0B\u00012jzj+kjyj+A\u00012jzj+Njyj\n=C+B\u00012jzjN\u00001X\nk=02kjyj+A\u00012jzj+Njyj:\nLet\nT=N\u00001X\nk=02kjyj:\nThen\u0000\n2jyj\u00001\u0001\nT= 2Njyj\u00001: (2.5)\nBy Fermat's Little Theorem, we have\n2N\u00112 (modN);\nimplying that\n2Njyj\u00001 =\u0000\n2N\u0001jyj\u00001\u00112jyj\u00001 (modN):\nThus, (2.5) implies that\n\u0000\n2jyj\u00001\u0001\nT\u00112jyj\u00001 (modN): (2.6)\nObserve that 2jyj\u00142p<N, becausejyj\u0014jxyj\u0014p. Also, 2jyj\u00152, because\ny6=\u000f. It follows that\n1\u00142jyj\u00001<N;\nimplying that\n2jyj\u000016\u00110 (modN):\nThis, together with (2.6), implies that\nT\u00111 (modN):\nSince\nM=C+B\u00012jzj\u0001T+A\u00012jzj+Njyj;"
}
]
},
{
"section": "Page 85",
"content": [
{
"type": "text",
"text": "2.10. Higman's Theorem 77\nit follows that\nM\u0011C+B\u00012jzj+A\u00012jzj+jyj(modN):\nThis, together with (2.4), implies that\nM\u00110 (modN);\ni.e.,NdividesM. SinceM >N , we conclude that Mis not a prime number,\nwhich is a contradiction. Thus, the language Lis not regular.\n2.10 Higman's Theorem\nLet \u0006 be a \fnite alphabet. For any two strings xandyin \u0006\u0003, we say that x\nis asubsequence ofy, ifxcan be obtained by deleting zero or more symbols\nfromy. For example, 10110 is a subsequence of 0010010101010001. For any\nlanguageL\u0012\u0006\u0003, we de\fne\nSUBSEQ (L) :=fx: there exists a y2Lsuch thatxis a subsequence of yg:\nThat is, SUBSEQ (L) is the language consisting of the subsequences of all\nstrings inL. In 1952, Higman proved the following result:\nTheorem 2.10.1 (Higman) For any \fnite alphabet \u0006and for any lan-\nguageL\u0012\u0006\u0003, the language SUBSEQ (L)is regular.\n2.10.1 Dickson's Theorem\nOur proof of Higman's Theorem will use a theorem that was proved in 1913\nby Dickson.\nRecall that Ndenotes the set of positive integers. Let n2N. For any\ntwo pointsp= (p1;p2;:::;pn) andq= (q1;q2;:::;qn) inNn, we say that pis\ndominated byq, ifpi\u0014qifor alliwith 1\u0014i\u0014n.\nTheorem 2.10.2 (Dickson) LetS\u0012Nn, and letMbe the set consisting of\nall elements of Sthat are minimal in the relation \\is dominated by\". Thus,\nM=fq2S:there is no pinSnfqgsuch thatpis dominated by qg:\nThen, the set Mis \fnite."
}
]
},
{
"section": "Page 86",
"content": [
{
"type": "text",
"text": "78 Chapter 2. Finite Automata and Regular Languages\nWe will prove this theorem by induction on the dimension n. Ifn= 1,\nthen either M=;(ifS=;) orMconsists of exactly one element (if S6=;).\nTherefore, the theorem holds if n= 1. Letn\u00152 and assume the theorem\nholds for all subsets of Nn\u00001. LetSbe a subset of Nnand consider the set\nMof minimal elements in S. IfS=;, thenM=;and, thus,Mis \fnite.\nAssume that S6=;. We \fx an arbitrary element qinM. Ifp2Mnfqg,\nthenqis not dominated by p. Therefore, there exists an index isuch that\npi\u0014qi\u00001. It follows that\nMnfqg\u0012n[\ni=1\u0000\nNi\u00001\u0002[1;qi\u00001]\u0002Nn\u0000i\u0001\n:\nFor alliandkwith 1\u0014i\u0014nand 1\u0014k\u0014qi\u00001, we de\fne\nSik=fp2S:pi=kg\nand\nMik=fp2M:pi=kg:\nThen,\nMnfqg=n[\ni=1qi\u00001[\nk=1Mik: (2.7)\nLemma 2.10.3 Mikis a subset of the set of all elements of Sikthat are\nminimal in the relation \\is dominated by\".\nProof. Letpbe an element of Mik, and assume that pis not minimal in\nSik. Then there is an element rinSik, such that r6=pandris dominated\nbyp. Sincepandrare both elements of S, it follows that p62M. This is a\ncontradiction.\nSince the set Sikis basically a subset of Nn\u00001, it follows from the induction\nhypothesis that Sikcontains \fnitely many minimal elements. This, combined\nwith Lemma 2.10.3, implies that Mikis a \fnite set. Thus, by (2.7), Mnfqg\nis the union of \fnitely many \fnite sets. Therefore, the set Mis \fnite.\n2.10.2 Proof of Higman's Theorem\nWe give the proof of Theorem 2.10.1 for the case when \u0006 = f0;1g. IfL=;\norSUBSEQ (L) =f0;1g\u0003, then SUBSEQ (L) is obviously a regular language."
}
]
},
{
"section": "Page 87",
"content": [
{
"type": "text",
"text": "2.10. Higman's Theorem 79\nHence, we may assume that Lis non-empty and SUBSEQ (L) is a proper\nsubset off0;1g\u0003.\nWe \fx a string zof length at least two in the complement SUBSEQ (L) of\nthe language SUBSEQ (L). Observe that this is possible, because SUBSEQ (L)\nis an in\fnite language. We insert 0s and 1s into z, such that, in the result-\ning stringz0, 0s and 1s alternate. For example, if z= 0011101011, then\nz0= 01010101010101. Let n=jz0j\u00001, wherejz0jdenotes the length of z0.\nThen,n\u0015jzj\u00001\u00151.\nA (0;1)-alternation in a binary string xis any occurrence of 01 or 10 in x.\nFor example, the string 1101001 contains four (0 ;1)-alternations. We de\fne\nA=fx2f0;1g\u0003:xhas at most nmany (0;1)-alternationsg:\nLemma 2.10.4 SUBSEQ (L)\u0012A.\nProof. Letx2SUBSEQ (L) and assume that x62A. Then,xhas at least\nn+ 1 =jz0jmany (0;1)-alternations and, therefore, z0is a subsequence of x.\nIn particular, zis a subsequence of x. Sincex2SUBSEQ (L), it follows that\nz2SUBSEQ (L), which is a contradiction.\nLemma 2.10.5 SUBSEQ (L) =\u0010\nA\\SUBSEQ (L)\u0011\n[A.\nProof. Follows from Lemma 2.10.4.\nLemma 2.10.6 The language Ais regular.\nProof. The complement AofAis the language consisting of all binary\nstrings with at least n+ 1 many (0 ;1)-alternations. If, for example, n= 3,\nthenAis described by the regular expression\n(00\u000311\u000300\u000311\u00030(0[1)\u0003)[(11\u000300\u000311\u000300\u00031(0[1)\u0003):\nThis should convince you that the claim is true for any value of n.\nFor anyb2f0;1gand for any k\u00150, we de\fne Abkto be the language\nconsisting of all binary strings that start with a band have exactly kmany\n(0;1)-alternations. Then, we have\nA=f\u000fg[ 1[\nb=0n[\nk=0Abk!\n:"
}
]
},
{
"section": "Page 88",
"content": [
{
"type": "text",
"text": "80 Chapter 2. Finite Automata and Regular Languages\nThus, if we de\fne\nFbk=Abk\\SUBSEQ (L);\nand use the fact that \u000f2SUBSEQ (L) (which is true because L6=;), then\nA\\SUBSEQ (L) =1[\nb=0n[\nk=0Fbk: (2.8)\nFor anyb2f0;1gand for any k\u00150, consider the relation \\is a subse-\nquence of\" on the language Fbk. We de\fne Mbkto be the language consisting\nof all strings in Fbkthat are minimal in this relation. Thus,\nMbk=fx2Fbk: there is no x0inFbknfxgsuch thatx0is a subsequence of xg:\nIt is clear that\nFbk=[\nx2Mbkfy2Fbk:xis a subsequence of yg:\nIfx2Mbk,y2Abk, andxis a subsequence of y, thenymust be in\nSUBSEQ (L) and, therefore, ymust be in Fbk. To prove this, assume that\ny2SUBSEQ (L). Then,x2SUBSEQ (L), contradicting the fact that\nx2Mbk\u0012Fbk\u0012SUBSEQ (L). It follows that\nFbk=[\nx2Mbkfy2Abk:xis a subsequence of yg: (2.9)\nLemma 2.10.7 Letb2f0;1gand0\u0014k\u0014n, and letxbe an element of\nMbk. Then, the language\nfy2Abk:xis a subsequence of yg\nis regular.\nProof. We will prove the claim by means of an example. Assume that b= 1,\nk= 3, andx= 11110001000. Then, the language\nfy2Abk:xis a subsequence of yg\nis described by the regular expression\n11111\u00030000\u000311\u00030000\u0003:\nThis should convince you that the claim is true in general."
}
]
},
{
"section": "Page 89",
"content": [
{
"type": "text",
"text": "Exercises 81\nLemma 2.10.8 For eachb2f0;1gand each 0\u0014k\u0014n, the setMbkis\n\fnite.\nProof. Again, we will prove the claim by means of an example. Assume\nthatb= 1 andk= 3. Any string in Fbkcan be written as 1a0b1c0d, for some\nintegersa;b;c;d\u00151. Consider the function ':Fbk!N4that is de\fned by\n'(1a0b1c0d) = (a;b;c;d ). Then,'is an injective function, and the following\nis true, for any two strings xandx0inFbk:\nxis a subsequence of x0if and only if '(x) is dominated by '(x0).\nIt follows that the elements of Mbkare in one-to-one correspondence with\nthose elements of '(Fbk) that are minimal in the relation \\is dominated by\".\nThe lemma thus follows from Dickson's Theorem.\nNow we can complete the proof of Higman's Theorem:\n\u000fIt follows from (2.9) and Lemmas 2.10.7 and 2.10.8, that Fbkis the\nunion of \fnitely many regular languages. Therefore, by Theorem 2.3.1,\nFbkis a regular language.\n\u000fIt follows from (2.8) that A\\SUBSEQ (L) is the union of \fnitely many\nregular languages. Therefore, again by Theorem 2.3.1, A\\SUBSEQ (L)\nis a regular language.\n\u000fSinceA\\SUBSEQ (L) is regular and, by Lemma 2.10.6, Ais regular,\nit follows from Lemma 2.10.5 that SUBSEQ (L) is the union of two reg-\nular languages. Therefore, by Theorem 2.3.1, SUBSEQ (L) is a regular\nlanguage.\n\u000fSince SUBSEQ (L) is regular, it follows from Theorem 2.6.4 that the\nlanguage SUBSEQ (L) is regular as well.\nExercises\n2.1For each of the following languages, construct a DFA that accepts the\nlanguage. In all cases, the alphabet is f0;1g.\n1.fw: the length of wis divisible by three g"
}
]
},
{
"section": "Page 90",
"content": [
{
"type": "text",
"text": "82 Chapter 2. Finite Automata and Regular Languages\n2.fw: 110 is not a substring of wg\n3.fw:wcontains at least \fve 1s g\n4.fw:wcontains the substring 1011 g\n5.fw:wcontains at least two 1s and at most two 0s g\n6.fw:wcontains an odd number of 1s or exactly two 0s g\n7.fw:wbegins with 1 and ends with 0 g\n8.fw: every odd position in wis 1g\n9.fw:whas length at least 3 and its third symbol is 0 g\n10.f\u000f;0g\n2.2For each of the following languages, construct an NFA, with the speci\fed\nnumber of states, that accepts the language. In all cases, the alphabet is\nf0;1g.\n1. The languagefw:wends with 10gwith three states.\n2. The languagefw:wcontains the substring 1011 gwith \fve states.\n3. The languagefw:wcontains an odd number of 1s or exactly two 0s g\nwith six states.\n2.3For each of the following languages, construct an NFA that accepts the\nlanguage. In all cases, the alphabet is f0;1g.\n1.fw:wcontains the substring 11001 g\n2.fw:whas length at least 2 and does not end with 10 g\n3.fw:wbegins with 1 or ends with 0 g\n2.4Convert the following NFA to an equivalent DFA."
}
]
},
{
"section": "Page 91",
"content": [
{
"type": "text",
"text": "Exercises 83\n1 2a\nba, b\n2.5Convert the following NFA to an equivalent DFA.\n1\n3 2a\na\nb\naε,b\n2.6Convert the following NFA to an equivalent DFA.\n0 1 2 3a, ǫ b a\nǫb\n2.7In the proof of Theorem 2.6.3, we introduced a new start state q0, which\nis also an accept state. Explain why the following is not a valid proof of\nTheorem 2.6.3:\nLetN= (Q1;\u0006;\u000e1;q1;F1) be an NFA, such that A=L(N). De\fne the\nNFAM= (Q1;\u0006;\u000e;q 1;F), where"
}
]
},
{
"section": "Page 92",
"content": [
{
"type": "text",
"text": "84 Chapter 2. Finite Automata and Regular Languages\n1.F=fq1g[F1.\n2.\u000e:Q1\u0002\u0006\u000f!P (Q1) is de\fned as follows: For any r2Q1and for any\na2\u0006\u000f,\n\u000e(r;a) =8\n<\n:\u000e1(r;a) if r2Q1andr62F1,\n\u000e1(r;a) if r2F1anda6=\u000f,\n\u000e1(r;a)[fq1gifr2F1anda=\u000f.\nThenL(M) =A\u0003.\n2.8Prove Theorem 2.6.4.\n2.9LetAbe a language over the alphabet \u0006 = f0;1gand letAbe the\ncomplement ofA. Thus,Ais the language consisting of all binary strings\nthat are not in A.\nAssume that Ais a regular language. Let M= (Q;\u0006;\u000e;q;F ) be a non-\ndeterministic \fnite automaton (NFA) that accepts A.\nConsider the NFA N= (Q;\u0006;\u000e;q;F), whereF=QnFis the complement\nofF. Thus,Nis obtained from Mby turning all accept states into nonaccept\nstates, and turning all nonaccept states into accept states.\n1. Is it true that the language accepted by Nis equal toA?\n2. Assume now that Mis a deterministic \fnite automaton (DFA) that\nacceptsA. De\fneNas above; thus, turn all accept states into nonac-\ncept states, and turn all nonaccept states into accept states. Is it true\nthat the language accepted by Nis equal toA?\n2.10 Recall the alternative de\fnition for the star of a language Athat we\ngave just before Theorem 2.3.1.\nIn Theorems 2.3.1 and 2.6.2, we have shown that the class of regular\nlanguages is closed under the union and concatenation operations. Since\nA\u0003=S1\nk=0Ak, why doesn't this imply that the class of regular languages is\nclosed under the star operation?\n2.11 LetAandBbe two regular languages over the same alphabet \u0006. Prove\nthat the di\u000berence of AandB, i.e., the language\nAnB=fw:w2Aandw62Bg\nis a regular language."
}
]
},
{
"section": "Page 93",
"content": [
{
"type": "text",
"text": "Exercises 85\n2.12 For each of the following regular expressions, give two strings that are\nmembers and two strings that are not members of the language described by\nthe expression. The alphabet is \u0006 = fa;bg.\n1.a(ba)\u0003b.\n2. (a[b)\u0003a(a[b)\u0003b(a[b)\u0003a(a[b)\u0003.\n3. (a[ba[bb)(a[b)\u0003.\n2.13 Give regular expressions describing the following languages. In all\ncases, the alphabet is f0;1g.\n1.fw:wcontains at least three 1s g.\n2.fw:wcontains at least two 1s and at most one 0 g,\n3.fw:wcontains an even number of 0s and exactly two 1s g.\n4.fw:wcontains exactly two 0s and at least two 1s g.\n5.fw:wcontains an even number of 0s and each 0 is followed by at least one 1 g.\n6.fw: every odd position in wis 1g.\n2.14 Convert each of the following regular expressions to an NFA.\n1. (0[1)\u0003000(0[1)\u0003\n2. (((10)\u0003(00))[10)\u0003\n3. ((0[1)(11)\u0003[0)\u0003\n2.15 Convert the following DFA to a regular expression."
}
]
},
{
"section": "Page 94",
"content": [
{
"type": "text",
"text": "86 Chapter 2. Finite Automata and Regular Languages\n1 2\n3a\nab\nbab\n2.16 Convert the following DFA to a regular expression.\n1 2\n3a, ba\nab\nb\n2.17 Convert the following DFA to a regular expression.\na, b\n2.18 1. LetAbe a non-empty regular language. Prove that there exists\nan NFA that accepts Aand that has exactly one accept state."
}
]
},
{
"section": "Page 95",
"content": [
{
"type": "text",
"text": "Exercises 87\n2. For any string w=w1w2:::wn, we denote by wRthe string obtained\nby readingwbackwards, i.e., wR=wnwn\u00001:::w 2w1. For any language\nA, we de\fne ARto be the language obtained by reading all strings in\nAbackwards, i.e.,\nAR=fwR:w2Ag:\nLetAbe a non-empty regular language. Prove that the language AR\nis also regular.\n2.19 Ifn\u00151 is an integer and w=a1a2:::anis a string, then for any i\nwith 0\u0014i<n , the string a1a2:::aiis called a proper pre\fx ofw. (Ifi= 0,\nthena1a2:::ai=\u000f.)\nFor any language L, we de\fne MIN (L) to be the language\nMIN (L) =fw2L: no proper pre\fx of wbelongs toLg:\nProve the following claim: If Lis a regular language, then MIN (L) is regular\nas well.\n2.20 Use the pumping lemma to prove that the following languages are not\nregular.\n1.fanbmcn+m:n\u00150;m\u00150g.\n2.fanbnc2n:n\u00150g.\n3.fanbman:n\u00150;m\u00150g.\n4.fa2n:n\u00150g. (Remark: a2nis the string consisting of 2nmanya's.)\n5.fanbmck:n\u00150;m\u00150;k\u00150;n2+m2=k2g.\n6.fuvu:u2fa;bg\u0003;u6=\u000f;v2fa;bg\u0003g.\n2.21 Prove that the language\nfambn:m\u00150;n\u00150;m6=ng\nis not regular. (Using the pumping lemma for this one is a bit tricky. You\ncan avoid using the pumping lemma by combining results about the closure\nunder regular operations.)"
}
]
},
{
"section": "Page 96",
"content": [
{
"type": "text",
"text": "88 Chapter 2. Finite Automata and Regular Languages\n2.22 1. Give an example of a regular language Aand a non-regular lan-\nguageBfor whichA\u0012B.\n2. Give an example of a non-regular language Aand a regular language\nBfor whichA\u0012B.\n2.23 LetAbe a language consisting of \fnitely many strings.\n1. Prove that Ais a regular language.\n2. Letnbe the maximum length of any string in A. Prove that every\ndeterministic \fnite automaton (DFA) that accepts Ahas at least n+ 1\nstates. ( Hint: How is the pumping length chosen in the proof of the\npumping lemma?)\n2.24 LetLbe a regular language, let Mbe a DFA whose language is equal\ntoL, and letpbe the number of states of M. Prove that L6=;if and only\nifLcontains a string of length less than p.\n2.25 LetLbe a regular language, let Mbe a DFA whose language is equal\ntoL, and letpbe the number of states of M. Prove that Lis an in\fnite\nlanguage if and only if Lcontains a string wwithp\u0014jwj\u00142p\u00001.\n2.26 Let \u0006 be a non-empty alphabet, and let Lbe a language over \u0006, i.e.,\nL\u0012\u0006\u0003. We de\fne a binary relation RLon \u0006\u0003\u0002\u0006\u0003, in the following way:\nFor any two strings uandu0in \u0006\u0003,\nuRLu0if and only if (8v2\u0006\u0003:uv2L,u0v2L):\nProve that RLis an equivalence relation.\n2.27 Let \u0006 =f0;1g, let\nL=fw2\u0006\u0003:jwjis oddg;\nand consider the relation RLde\fned in Exercise 2.26.\n1. Prove that for any two strings uandu0in \u0006\u0003,\nuRLu0,juj\u0000ju0jis even."
}
]
},
{
"section": "Page 97",
"content": [
{
"type": "text",
"text": "Exercises 89\n2. Determine all equivalence classes of the relation RL.\n2.28 Let \u0006 be a non-empty alphabet, and let Lbe a language over \u0006, i.e.,\nL\u0012\u0006\u0003. Recall the equivalence relation RLthat was de\fned in Exercise 2.26.\n1. Assume that Lis a regular language, and let M= (Q;\u0006;\u000e;q 0;F) be\na DFA that accepts L. Letuandu0be strings in \u0006\u0003. Letqbe the\nstate reached, when following the path in the state diagram of M, that\nstarts inq0and that is obtained by reading the string u. Similarly, let\nq0be the state reached, when following the path in the state diagram\nofM, that starts in q0and that is obtained by reading the string u0.\nProve the following: If q=q0, thenuRLu0.\n2. Prove the following claim: If Lis a regular language, then the equiva-\nlence relation RLhas a \fnite number of equivalence classes.\n2.29 LetLbe the language de\fned by\nL=fuuR:u2f0;1g\u0003g:\nIn words, a string is in Lif and only if its length is even, and the second half\nis the reverse of the \frst half. Consider the equivalence relation RLthat was\nde\fned in Exercise 2.26.\n1. Letmandnbe two distinct positive integers and consider the two\nstringsu= 0m1 andu0= 0n1. Prove that:(uRLu0).\n2. Prove that Lis not a regular language, without using the pumping\nlemma.\n3. Use the pumping lemma to prove that Lis not a regular language.\n2.30 In this exercise, we will show that the converse of the pumping lemma\ndoes, in general, not hold. Consider the language\nA=fambncn:m\u00151;n\u00150g[fbnck:n\u00150;k\u00150g:\n1. Show that Asatis\fes the conclusion of the pumping lemma for p= 1.\nThus, show that every string sinAwhose length is at least pcan be\nwritten ass=xyz, such that y6=\u000f,jxyj\u0014p, andxyiz2Afor all\ni\u00150."
}
]
},
{
"section": "Page 98",
"content": [
{
"type": "text",
"text": "90 Chapter 2. Finite Automata and Regular Languages\n2. Consider the equivalence relation RAthat was de\fned in Exercise 2.26.\nLetnandn0be two distinct non-negative integers and consider the two\nstringsu=abnandu0=abn0. Prove that:(uRAu0).\n3. Prove that Ais not a regular language."
}
]
},
{
"section": "Page 99",
"content": [
{
"type": "text",
"text": "Chapter 3\nContext-Free Languages\nIn this chapter, we introduce the class of context-free languages. As we\nwill see, this class contains all regular languages, as well as some nonregular\nlanguages such as f0n1n:n\u00150g.\nThe class of context-free languages consists of languages that have some\nsort of recursive structure. We will see two equivalent methods to obtain this\nclass. We start with context-free grammars, which are used for de\fning the\nsyntax of programming languages and their compilation. Then we introduce\nthe notion of (nondeterministic) pushdown automata, and show that these\nautomata have the same power as context-free grammars.\n3.1 Context-free grammars\nWe start with an example. Consider the following \fve (substitution) rules:\nS!AB\nA!a\nA!aA\nB!b\nB!bB\nHere,S,A, andBarevariables ,Sis the start variable , andaandbare\nterminals . We use these rules to derive strings consisting of terminals (i.e.,\nelements offa;bg\u0003), in the following manner:\n1. Initialize the current string to be the string consisting of the start\nvariableS."
}
]
},
{
"section": "Page 100",
"content": [
{
"type": "text",
"text": "92 Chapter 3. Context-Free Languages\n2. Take any variable in the current string and take any rule that has this\nvariable on the left-hand side. Then, in the current string, replace this\nvariable by the right-hand side of the rule.\n3. Repeat 2. until the current string only contains terminals.\nFor example, the string aaaabb can be derived in the following way:\nS)AB\n)aAB\n)aAbB\n)aaAbB\n)aaaAbB\n)aaaabB\n)aaaabb\nThis derivation can also be represented using a parse tree , as in the \fgure\nbelow:\nS\nA\nA\nA\nAa\na\naab\nbB\nB\nThe \fve rules in this example constitute a context-free grammar. The\nlanguage of this grammar is the set of all strings that"
}
]
},
{
"section": "Page 101",
"content": [
{
"type": "text",
"text": "3.1. Context-free grammars 93\n\u000fcan be derived from the start variable and\n\u000fonly contain terminals.\nFor this example, the language is\nfambn:m\u00151;n\u00151g;\nbecause every string of the form ambn, for somem\u00151 andn\u00151, can be\nderived from the start variable, whereas no other string over the alphabet\nfa;bgcan be derived from the start variable.\nDe\fnition 3.1.1 A context-free grammar is a 4-tuple G= (V;\u0006;R;S ),\nwhere\n1.Vis a \fnite set, whose elements are called variables ,\n2. \u0006 is a \fnite set, whose elements are called terminals ,\n3.V\\\u0006 =;,\n4.Sis an element of V; it is called the start variable ,\n5.Ris a \fnite set, whose elements are called rules. Each rule has the\nformA!w, whereA2Vandw2(V[\u0006)\u0003.\nIn our example, we have V=fS;A;Bg, \u0006 =fa;bg, and\nR=fS!AB;A!a;A!aA;B!b;B!bBg:\nDe\fnition 3.1.2 LetG= (V;\u0006;R;S ) be a context-free grammar. Let Abe\nan element in Vand letu,v, andwbe strings in ( V[\u0006)\u0003such thatA!w\nis a rule in R. We say that the string uwv can be derived in one step from\nthe stringuAv, and write this as\nuAv)uwv:\nIn other words, by applying the rule A!wto the string uAv, we obtain\nthe stringuwv. In our example, we see that aaAbb)aaaAbb .\nDe\fnition 3.1.3 LetG= (V;\u0006;R;S ) be a context-free grammar. Let u\nandvbe strings in ( V[\u0006)\u0003. We say that vcan be derived from u, and write\nthis asu\u0003)v, if one of the following two conditions holds:"
}
]
},
{
"section": "Page 102",
"content": [
{
"type": "text",
"text": "94 Chapter 3. Context-Free Languages\n1.u=vor\n2. there exist an integer k\u00152 and a sequence u1;u2;:::;ukof strings in\n(V[\u0006)\u0003, such that\n(a)u=u1,\n(b)v=uk, and\n(c)u1)u2):::)uk.\nIn other words, by starting with the string uand applying rules zero or\nmore times, we obtain the string v. In our example, we see that aaAbB\u0003)\naaaabbbB .\nDe\fnition 3.1.4 LetG= (V;\u0006;R;S ) be a context-free grammar. The\nlanguage ofGis de\fned to be the set of all strings in \u0006\u0003that can be derived\nfrom the start variable S:\nL(G) =fw2\u0006\u0003:S\u0003)wg:\nDe\fnition 3.1.5 A language Lis called context-free , if there exists a context-\nfree grammar Gsuch thatL(G) =L.\n3.2 Examples of context-free grammars\n3.2.1 Properly nested parentheses\nConsider the context-free grammar G= (V;\u0006;R;S ), whereV=fSg, \u0006 =\nfa;bg, and\nR=fS!\u000f;S!aSb;S!SSg:\nWe write the three rules in Ras\nS!\u000fjaSbjSS;\nwhere you can think of \\ j\" as being a short-hand for \\or\"."
}
]
},
{
"section": "Page 103",
"content": [
{
"type": "text",
"text": "3.2. Examples of context-free grammars 95\nBy applying the rules in R, starting with the start variable S, we obtain,\nfor example,\nS)SS\n)aSbS\n)aSbSS\n)aSSbSS\n)aaSbSbSS\n)aabSbSS\n)aabbSS\n)aabbaSbS\n)aabbabS\n)aabbabaSb\n)aabbabab\nWhat is the language L(G) of this context-free grammar G? If we think\nofaas being a left-parenthesis \\(\", and of bas being a right-parenthesis \\)\",\nthenL(G) is the language consisting of all strings of properly nested paren-\ntheses. Here is the explanation: Any string of properly nested parentheses is\neither\n\u000fempty (which we derive from Sby the rule S!\u000f),\n\u000fconsists of a left-parenthesis, followed by an arbitrary string of properly\nnested parentheses, followed by a right-parenthesis (these are derived\nfromSby \frst applying the rule S!aSb), or\n\u000fconsists of an arbitrary string of properly nested parentheses, followed\nby an arbitrary string of properly nested parentheses (these are derived\nfromSby \frst applying the rule S!SS).\n3.2.2 A context-free grammar for a nonregular lan-\nguage\nConsider the language L1=f0n1n:n\u00150g. We have seen in Section 2.9.1\nthatL1is not a regular language. We claim that L1is a context-free language."
}
]
},
{
"section": "Page 104",
"content": [
{
"type": "text",
"text": "96 Chapter 3. Context-Free Languages\nIn order to prove this claim, we have to construct a context-free grammar\nG1such thatL(G1) =L1.\nObserve that any string in L1is either\n\u000fempty or\n\u000fconsists of a 0, followed by an arbitrary string in L1, followed by a 1.\nThis leads to the context-free grammar G1= (V1;\u0006;R1;S1), whereV1=\nfS1g, \u0006 =f0;1g, andR1consists of the rules\nS1!\u000fj0S11:\nHence,R1=fS1!\u000f;S1!0S11g.\nTo derive the string 0n1nfrom the start variable S1, we do the following:\n\u000fStarting with S1, apply the rule S1!0S11 exactlyntimes. This gives\nthe string 0nS11n.\n\u000fApply the rule S1!\u000f. This gives the string 0n1n.\nIt is not di\u000ecult to see that these are the only strings that can be derived\nfrom the start variable S1. Thus,L(G1) =L1.\nIn a symmetric way, we see that the context-free grammar G2= (V2;\u0006;R2;S2),\nwhereV2=fS2g, \u0006 =f0;1g, andR2consists of the rules\nS2!\u000fj1S20;\nhas the property that L(G2) =L2, whereL2=f1n0n:n\u00150g. Thus,L2is\na context-free language.\nDe\fneL=L1[L2, i.e.,\nL=f0n1n:n\u00150g[f 1n0n:n\u00150g:\nThe context-free grammar G= (V;\u0006;R;S ), whereV=fS;S 1;S2g, \u0006 =\nf0;1g, andRconsists of the rules\nS!S1jS2\nS1!\u000fj0S11\nS2!\u000fj1S20;\nhas the property that L(G) =L. Hence,Lis a context-free language."
}
]
},
{
"section": "Page 105",
"content": [
{
"type": "text",
"text": "3.2. Examples of context-free grammars 97\n3.2.3 A context-free grammar for the complement of\na nonregular language\nLetLbe the (nonregular) language L=f0n1n:n\u00150g. We want to prove\nthat the complement LofLis a context-free language. Hence, we want to\nconstruct a context-free grammar Gwhose language is equal to L. Observe\nthat a binary string wis inLif and only if\n1.w= 0m1n, for some integers mandnwith 0\u0014m<n , or\n2.w= 0m1n, for some integers mandnwith 0\u0014n<m , or\n3.wcontains 10 as a substring.\nThus, we can write Las the union of the languages of all strings of type 1.,\ntype 2., and type 3.\nAny string of type 1. is either\n\u000fthe string 1,\n\u000fconsists of a string of type 1., followed by one 1, or\n\u000fconsists of one 0, followed by an arbitrary string of type 1., followed by\none 1.\nThus, using the rules\nS1!1jS11j0S11;\nwe can derive, from S1, all strings of type 1.\nSimilarly, using the rules\nS2!0j0S2j0S21;\nwe can derive, from S2, all strings of type 2.\nAny string of type 3.\n\u000fconsists of an arbitrary binary string, followed by the string 10, followed\nby an arbitrary binary string.\nUsing the rules\nX!\u000fj0Xj1X;"
}
]
},
{
"section": "Page 106",
"content": [
{
"type": "text",
"text": "98 Chapter 3. Context-Free Languages\nwe can derive, from X, all binary strings. Thus, by combining these with\nthe rule\nS3!X10X;\nwe can derive, from S3, all strings of type 3.\nWe arrive at the context-free grammar G= (V;\u0006;R;S ), whereV=\nfS;S 1;S2;S3;Xg, \u0006 =f0;1g, andRconsists of the rules\nS!S1jS2jS3\nS1!1jS11j0S11\nS2!0j0S2j0S21\nS3!X10X\nX!\u000fj0Xj1X\nTo summarize, we have\nS1\u0003)0m1n;for all integers mandnwith 0\u0014m<n ,\nS2\u0003)0m1n;for all integers mandnwith 0\u0014n<m ,\nX\u0003)u;for each string uinf0;1g\u0003,\nand\nS3\u0003)w;for every binary string wthat contains 10 as a substring.\nFrom these observations, it follows that that L(G) =L.\n3.2.4 A context-free grammar that veri\fes addition\nConsider the language\nL=fanbmcn+m:n\u00150;m\u00150g:\nUsing the pumping lemma for regular languages (Theorem 2.9.1), it can\nbe shown that Lis not a regular language. We will construct a context-\nfree grammar Gwhose language is equal to L, thereby proving that Lis a\ncontext-free language.\nFirst observe that \u000f2L. Therefore, we will take S!\u000fto be one of the\nrules in the grammar.\nLet us see how we can derive all strings in Lfrom the start variable S:"
}
]
},
{
"section": "Page 107",
"content": [
{
"type": "text",
"text": "3.3. Regular languages are context-free 99\n1. Every time we add an a, we also add a c. In this way, we obtain all\nstrings of the form ancn, wheren\u00150.\n2. Given a string of the form ancn, we start adding bs. Every time we add\nab, we also add a c. Observe that every bhas to be added between\ntheas and thecs. Therefore, we use a variable Bas a \\pointer\" to\nthe position in the current string where a bcan be added: Instead of\nderivingancnfromS, we derive the string anBcn. Then, from B, we\nderive all strings of the form bmcm, wherem\u00150.\nWe obtain the context-free grammar G= (V;\u0006;R;S ), whereV=fS;A;Bg,\n\u0006 =fa;b;cg, andRconsists of the rules\nS!\u000fjA\nA!\u000fjaAcjB\nB!\u000fjbBc\nThe facts that\n\u000fA\u0003)anBcn, for everyn\u00150,\n\u000fB\u0003)bmcm, for everym\u00150,\nimply that the following strings can be derived from the start variable S:\n\u000fS\u0003)anBcn\u0003)anbmcmcn=anbmcn+m, for alln\u00150 andm\u00150.\nIn fact, no other strings in fa;b;cg\u0003can be derived from S. Therefore, we\nhaveL(G) =L. Since\nS)A)B)\u000f;\nwe can simplify this grammar G, by eliminating the rules S!\u000fandA!\u000f.\nThis gives the context-free grammar G0= (V;\u0006;R0;S), whereV=fS;A;Bg,\n\u0006 =fa;b;cg, andR0consists of the rules\nS!A\nA!aAcjB\nB!\u000fjbBc\nFinally, observe that we do not need S; instead, we can use Aas start\nvariable. This gives our \fnal context-free grammar G00= (V;\u0006;R00;A), where\nV=fA;Bg, \u0006 =fa;b;cg, andR00consists of the rules\nA!aAcjB\nB!\u000fjbBc"
}
]
},
{
"section": "Page 108",
"content": [
{
"type": "text",
"text": "100 Chapter 3. Context-Free Languages\n3.3 Regular languages are context-free\nWe mentioned already that the class of context-free languages includes the\nclass of regular languages. In this section, we will prove this claim.\nTheorem 3.3.1 Let\u0006be an alphabet and let L\u0012\u0006\u0003be a regular language.\nThenLis a context-free language.\nProof. SinceLis a regular language, there exists a deterministic \fnite\nautomaton M= (Q;\u0006;\u000e;q;F ) that accepts L.\nTo prove that Lis context-free, we have to de\fne a context-free grammar\nG= (V;\u0006;R;S ), such that L=L(M) =L(G). Thus,Gmust have the\nfollowing property: For every string w2\u0006\u0003,\nw2L(M) if and only if w2L(G),\nwhich can be reformulated as\nMacceptswif and only if S\u0003)w.\nWe will de\fne the context-free grammar Gin such a way that the following\ncorrespondence holds for any string w=w1w2:::wn:\n\u000fAssume that Mis in state Ajust after it has read the substring\nw1w2:::wi.\n\u000fThen in the context-free grammar G, we haveS\u0003)w1w2:::wiA.\nIn the next step, Mreads the symbol wi+1and switches from state Ato,\nsay, stateB; thus,\u000e(A;wi+1) =B. In order to guarantee that the above\ncorrespondence still holds, we have to add the rule A!wi+1BtoG.\nConsider the moment when Mhas read the entire string w. LetAbe the\nstateMis in at that moment. By the above correspondence, we have\nS\u0003)w1w2:::wnA=wA:\nRecall that Gmust have the property that\nMacceptswif and only if S\u0003)w,\nwhich is equivalent to\nA2Fif and only if S\u0003)w."
}
]
},
{
"section": "Page 109",
"content": [
{
"type": "text",
"text": "3.3. Regular languages are context-free 101\nWe guarantee this property by adding to Gthe ruleA!\u000ffor every accept\nstateAofM.\nWe are now ready to give the formal de\fnition of the context-free gram-\nmarG= (V;\u0006;R;S ):\n\u000fV=Q, i.e., the variables of Gare the states of M.\n\u000fS=q, i.e., the start variable of Gis the start state of M.\n\u000fRconsists of the rules\nA!aB; whereA2Q,a2\u0006,B2Q, and\u000e(A;a) =B;\nand\nA!\u000f;whereA2F.\nIn words,\n\u000fevery transition \u000e(A;a) =BofM(i.e., when Mis in the state Aand\nreads the symbol a, it switches to the state B) corresponds to a rule\nA!aBin the grammar G,\n\u000fevery accept state AofMcorresponds to a rule A!\u000fin the grammar\nG.\nWe claim that L(G) =L. In order to prove this, we have to show that\nL(G)\u0012LandL\u0012L(G).\nWe prove that L\u0012L(G). Letw=w1w2:::wnbe an arbitrary string\ninL. When the \fnite automaton Mreads the string w, it visits the states\nr0;r1;:::;rn, where\n\u000fr0=q, and\n\u000fri+1=\u000e(ri;wi+1) fori= 0;1;:::;n\u00001.\nSincew2L=L(M), we know that rn2F.\nIt follows from the way we de\fned the grammar Gthat\n\u000ffor eachi= 0;1;:::;n\u00001,ri!wi+1ri+1is a rule in R, and\n\u000frn!\u000fis a rule in R."
}
]
},
{
"section": "Page 110",
"content": [
{
"type": "text",
"text": "102 Chapter 3. Context-Free Languages\nTherefore, we have\nS=q=r0)w1r1)w1w2r2):::)w1w2:::wnrn)w1w2:::wn=w:\nThis proves that w2L(G).\nThe proof of the claim that L(G)\u0012Lis left as an exercise.\nIn Sections 2.9.1 and 3.2.2, we have seen that the language f0n1n:n\u0015\n0gis not regular, but context-free. Therefore, the class of all context-free\nlanguages properly contains the class of regular languages.\n3.3.1 An example\nLetLbe the language de\fned as\nL=fw2f0;1g\u0003: 101 is a substring of wg:\nIn Section 2.2.2, we have seen that Lis a regular language. In that section,\nwe constructed the following deterministic \fnite automaton Mthat accepts\nL(we have renamed the states):\n0\n11\n00\n10,1S A\nB C\nWe apply the construction given in the proof of Theorem 3.3.1 to convert\nMto a context-free grammar Gwhose language is equal to L. According\nto this construction, we have G= (V;\u0006;R;S ), whereV=fS;A;B;Cg,\n\u0006 =f0;1g, the start variable Sis the start state of M, andRconsists of the\nrules\nS!0Sj1A\nA!0Bj1A\nB!0Sj1C\nC!0Cj1Cj\u000f"
}
]
},
{
"section": "Page 111",
"content": [
{
"type": "text",
"text": "3.4. Chomsky normal form 103\nConsider the string 010011011, which is an element of L. When the \fnite\nautomaton Mreads this string, it visits the states\nS;S;A;B;S;A;A;B;C;C:\nIn the grammar G, this corresponds to the derivation\nS)0S\n)01A\n)010B\n)0100S\n)01001A\n)010011A\n)0100110B\n)01001101C\n)010011011C\n)010011011:\nHence,\nS\u0003)010011011;\nimplying that the string 010011011 is in the language L(G) of the context-free\ngrammarG.\nThe string 10011 is not in the language L. When the \fnite automaton\nMreads this string, it visits the states\nS;A;B;S;A;A;\ni.e., after the string has been read, Mis in the non-accept state A. In the\ngrammarG, reading the string 10011 corresponds to the derivation\nS)1A\n)10B\n)100S\n)1001A\n)10011A:\nSinceAis not an accept state in M, the grammar Gdoes not contain the\nruleA!\u000f. This implies that the string 10011 cannot be derived from the\nstart variable S. Thus, 10011 is not in the language L(G) ofG."
}
]
},