43
43
44
44
% include polycode.fmt
45
45
% include forall.fmt
46
+ % include spacing.fmt
46
47
47
48
% format `comp_` = "\cdot "
48
49
% subst space = "\codeskip "
@@ -170,10 +171,17 @@ This can be interpretted by defining a function such as |size|, that finds the s
170
171
|size| interprets the deep embedding, by folding over the datatype.
171
172
See the appendix for how to add an interpretation with a shallow embedding.
172
173
174
+ > type Size = Int
175
+ > size :: Parser2 a -> Size
176
+ > size Empty2 = 1
177
+ > size (Pure2 _) = 1
178
+ > size (Satisfy2 _) = 1
179
+ > size (Try2 px) = 1 + size px
180
+ > size (Ap2 pf px) = 1 + size pf + size px
181
+ > size (Or2 px py) = 1 + size px + size py
173
182
174
183
\section {Folds }
175
184
176
- It is possible to capture the shape of an abstract datatype through the |Functor | type class.
177
185
It is possible to capture the shape of an abstract datatype as a |Functor |.
178
186
The use of a |Functor | allows for the specification of where a datatype recurses.
179
187
There is however one problem, a functor expresing the parser language is required to be typed.
@@ -204,7 +212,7 @@ The shape of |Parser2|, can be seen in |ParserF| where the |k a| marks the recur
204
212
% format Parser4
205
213
206
214
|Fix | is used to get the fixed point of the functor.
207
- It contains the structure needed to make the datatype recursive.
215
+ It contains the structure needed to allow the datatype to recursive.
208
216
|Parser4 | is the fixed point of |ParserF |.
209
217
210
218
> type Parser4 a = Fix ParserF a
@@ -239,18 +247,18 @@ Due to parsers being a typed language, a constant functor is required to preserv
239
247
\subsection {Multiple Interpretations }
240
248
241
249
In DSLs it is common to want to evaluate multiple interpretations.
242
- For example, a parser may also want to know the maximum characters it will read (maximum munch).
250
+ For example, a parser may also want to know the maximum number of characters it will read (maximum munch).
243
251
In a deep embedding this is simple, a second algebra can be defined.
244
252
245
253
% format maxMunch4
246
254
247
255
> type MM = Int
248
256
>
249
257
> mmAlg :: ParserAlg (C MM ) i
250
- > mmAlg ( PureF _) = C 0
251
- > mmAlg EmptyF = C 0
252
- > mmAlg (SatisfyF c) = C 1
253
- > mmAlg (TryF (C px)) = C px
258
+ > mmAlg EmptyF = C 0
259
+ > mmAlg ( PureF _) = C 0
260
+ > mmAlg (SatisfyF c) = C 1
261
+ > mmAlg (TryF (C px)) = C px
254
262
> mmAlg (ApF (C pf) (C px)) = C $ pf + px
255
263
> mmAlg (OrF (C px) (C py)) = C $ max px py
256
264
>
@@ -275,8 +283,8 @@ If many semantics are required this can become cumbersome to define.
275
283
> maxMunch5 = snd
276
284
>
277
285
> smmAlg :: ParserAlg (C (Size , MM )) a
278
- > smmAlg (PureF _) = C (1 , 0 )
279
286
> smmAlg EmptyF = C (1 , 0 )
287
+ > smmAlg (PureF _) = C (1 , 0 )
280
288
> smmAlg (SatisfyF c) = C (1 , 1 )
281
289
> smmAlg (TryF (C (s, mm))) = C (s + 1 , mm)
282
290
> smmAlg (ApF (C (s, mm)) (C (s', mm')))
@@ -298,10 +306,9 @@ Although this is an algebra, you are able to learn the shallow embedding from th
298
306
299
307
In a more complex parser combinator library that perform optimisations on a deep embedding,
300
308
it could also be possible that there is a primary fold that depends on other secondary folds on parts of the AST.
301
- Folds such as this are named mutumorphisms ~\cite {Fokkinga1989TuplingAM },
309
+ Folds such as this are named zygomorphisms ~\cite {Fokkinga1989TuplingAM } - a special case of a mutomorphism -
302
310
they can be implemented by tupling the functions in the fold.
303
- \citet {parsley } makes use of a zygomorphism -
304
- a special case where the dependency is only one-way - to perform consumption analysis.
311
+ \citet {parsley } makes use of a zygomorphism to perform consumption analysis.
305
312
306
313
307
314
\subsection {Context-sensitive Interpretations }
@@ -319,8 +326,8 @@ Using the semantics from~\citet{wuYoda}, an implementation can be given for a si
319
326
320
327
321
328
> yAlg :: ParserAlg Y i
329
+ > yAlg EmptyF = Y $ const []
322
330
> yAlg (PureF x) = Y $ \ ts -> [(x, ts)]
323
- > yAlg EmptyF = Y $ const []
324
331
> yAlg (SatisfyF c) = Y $ \ case
325
332
> [] -> []
326
333
> (t: ts') -> [(t, ts') | c t]
@@ -359,12 +366,12 @@ This algebra describes how the shallow embedding should fold the structure.
359
366
> newtype Parser7 i = P7
360
367
> { unP7 :: forall a . ( forall j . ParserF a j -> a j ) -> a i }
361
368
>
362
- > pure7 :: a -> Parser7 a
363
- > pure7 x = P7 (\ h -> h (PureF x))
364
- >
365
369
> empty7 :: Parser7 a
366
370
> empty7 = P7 (\ h -> h EmptyF )
367
371
>
372
+ > pure7 :: a -> Parser7 a
373
+ > pure7 x = P7 (\ h -> h (PureF x))
374
+ >
368
375
> satisfy7 :: (Char -> Bool ) -> Parser7 Char
369
376
> satisfy7 c = P7 (\ h -> h (SatisfyF c))
370
377
>
@@ -389,8 +396,8 @@ Simillarly it is possible to convert a deep embedding into a parameterised shall
389
396
> shallow = cata shallowAlg
390
397
>
391
398
> shallowAlg :: ParserAlg Parser7 i
392
- > shallowAlg (PureF x) = pure7 x
393
399
> shallowAlg EmptyF = empty7
400
+ > shallowAlg (PureF x) = pure7 x
394
401
> shallowAlg (SatisfyF c) = satisfy7 c
395
402
> shallowAlg (TryF px) = try7 px
396
403
> shallowAlg (ApF pf px) = ap7 pf px
@@ -417,7 +424,7 @@ This is possible in Haskell through the use of a type class.
417
424
418
425
> class Parser8 parser where
419
426
> empty8 :: parser a
420
- > pure8 :: a -> parser a
427
+ > pure8 :: a -> parser a
421
428
> satisfy8 :: (Char -> Bool ) -> parser Char
422
429
> try8 :: parser a -> parser a
423
430
> ap8 :: parser (a -> b ) -> parser a -> parser b
@@ -450,9 +457,6 @@ To be able to reuse the previously defined algebras, a different type class can
450
457
451
458
\subsection {Modular Interpretations }
452
459
453
- There may be times when adding extra combinators would be convenient, for example adding a 'many' operator that allows for
454
- A modular technique to assembling DSLs would aid this process.
455
-
456
460
% format Empty10
457
461
% format Pure10
458
462
% format Satisfy10
@@ -464,6 +468,22 @@ A modular technique to assembling DSLs would aid this process.
464
468
% format aorb10
465
469
% format :+: = ":\!\!+\!\!:"
466
470
471
+ There may be times when adding extra combinators would be convenient, for example adding a 'string' operator.
472
+ A modular technique to assembling DSLs would aid this process.
473
+ This approach is described in Data types à la carte~\cite {datatypesalacarte }.
474
+ An |:+: | operator can be defined to specify a choice between constructors.
475
+
476
+ > data (f :+: g ) (k :: * -> * ) (a :: * ) where
477
+ > L :: f k a -> (f :+: g ) k a
478
+ > R :: g k a -> (f :+: g ) k a
479
+ > infixr :+:
480
+ >
481
+ > instance (IFunctor f , IFunctor g )
482
+ > => IFunctor (f :+: g ) where
483
+ > imap f (L x) = L (imap f x)
484
+ > imap f (R y) = R (imap f y)
485
+
486
+ Now the datatypes can be defined for each constructor that is required. The |IFunctor | instances for each constructor can be found in the appendix.
467
487
468
488
> data Empty10 (k :: * -> * ) (a :: * ) where
469
489
> Empty10 :: Empty10 k a
@@ -483,29 +503,20 @@ A modular technique to assembling DSLs would aid this process.
483
503
> data Or10 (k :: * -> * ) (a :: * ) where
484
504
> Or10 :: k a -> k a -> Or10 k a
485
505
486
-
487
- > data (f :+: g ) (k :: * -> * ) (a :: * ) where
488
- > L :: f k a -> (f :+: g ) k a
489
- > R :: g k a -> (f :+: g ) k a
490
- > infixr :+:
491
- >
492
- > instance (IFunctor f , IFunctor g )
493
- > => IFunctor (f :+: g ) where
494
- > imap f (L x) = L (imap f x)
495
- > imap f (R y) = R (imap f y)
506
+ The datatypes are now summed together to form a single |ParserF10 | type.
496
507
497
508
> type ParserF10 = Empty10 :+: Pure10 :+: Satisfy10
498
509
> :+: Try10 :+: Ap10 :+: Or10
499
510
>
500
511
> type Parser10 = Fix ParserF10
501
512
513
+ There is however, one problem with this approach: there is now a mess of |L | and |R |'s. This makes this approach inconvenient to use.
502
514
503
515
> aorb10 :: Parser10 Char
504
516
> aorb10 = In (R (R (R (R (R (Or10
505
517
> (In (R (R (L (Satisfy10 (== ' a' ))))))
506
518
> (In (R (R (L (Satisfy10 (== ' b' ))))))))))))
507
519
508
-
509
520
% format empty10
510
521
% format pure10
511
522
% format satisfy10
@@ -515,28 +526,37 @@ A modular technique to assembling DSLs would aid this process.
515
526
% format aorb10'
516
527
% format :<: = ":\prec:"
517
528
529
+ Data types à la carte~\cite {datatypesalacarte }, however, describes a technique that allows for the injection of these |L |'s and |R |'s.
530
+ The notion of subtypes between functors, can be specified using the |:<: | operator.
531
+
518
532
519
533
> class (IFunctor f , IFunctor g ) => f :<: g where
520
534
> inj :: f k a -> g k a
521
535
>
522
536
> instance IFunctor f => f :<: f where
523
537
> inj = id
524
538
>
525
- > instance {-# OVERLAPPING #-} (IFunctor f , IFunctor g ) => f :<: (f :+: g ) where
526
- > inj = L
539
+ > instance {-# OVERLAPPING #-}
540
+ > (IFunctor f , IFunctor g )
541
+ > => f :<: (f :+: g ) where
542
+ > inj = L
527
543
>
528
- > instance (IFunctor f , IFunctor g , IFunctor h , f :<: g ) => f :<: (h :+: g ) where
544
+ > instance (IFunctor f , IFunctor g ,
545
+ > IFunctor h , f :<: g )
546
+ > => f :<: (h :+: g ) where
529
547
> inj = R . inj
530
548
531
- Smart constructors:
549
+
550
+ Smart constructors are defined that allow for the |L |'s and |R |'s to be injected.
532
551
533
552
> empty10 :: (Empty10 :<: f ) => Fix f a
534
553
> empty10 = In (inj Empty10 )
535
554
>
536
555
> pure10 :: (Pure10 :<: f ) => a -> Fix f a
537
556
> pure10 x = In (inj (Pure10 x))
538
557
>
539
- > satisfy10 :: (Satisfy10 :<: f ) => (Char -> Bool ) -> Fix f Char
558
+ > satisfy10 :: (Satisfy10 :<: f ) => (Char -> Bool )
559
+ > -> Fix f Char
540
560
> satisfy10 c = In (inj (Satisfy10 c))
541
561
>
542
562
> try10 :: (Try10 :<: f ) => Fix f a -> Fix f a
@@ -550,37 +570,64 @@ Smart constructors:
550
570
551
571
552
572
573
+ Now the smart constructors can be used to form an expression |aorb10'|.
574
+ The type contraints on this expression allow for |f| to be flexible, so long as |Or10 | and |Satisfy10 | are subtypes of the functor |f|.
575
+
553
576
> aorb10' :: (Or10 :<: f , Satisfy10 :<: f ) => Fix f Char
554
577
> aorb10' = satisfy10 (== ' a' ) `or10` satisfy10 (== ' b' )
555
578
556
579
557
580
% format sizeAlg10
558
581
% format size10
559
582
583
+ To be able to give an interpretation an algebra is still required.
584
+ Simillarly to the constructors the algebra needs to be modularized.
585
+ A type class can be defined that provides the algebra to fold each constructor.
586
+
587
+
560
588
561
589
> class IFunctor f => SizeAlg f where
562
590
> sizeAlg10 :: f Size8 i -> Size8 i
563
591
>
564
- > instance (SizeAlg f , SizeAlg g ) => SizeAlg (f :+: g ) where
592
+ > instance (SizeAlg f , SizeAlg g )
593
+ > => SizeAlg (f :+: g ) where
565
594
> sizeAlg10 (L x) = sizeAlg10 x
566
595
> sizeAlg10 (R y) = sizeAlg10 y
567
- >
596
+
597
+ One benefit to this approach is that if an interpretation is only needed for parsers that use |or10| and |satisfy10|,
598
+ then only those instances need to be defined.
599
+ Take calculating the size of the parser |aorb10'|, only the two instances need to be defined to do so.
600
+
601
+
568
602
> instance SizeAlg Or10 where
569
603
> sizeAlg10 (Or10 px py) = px + py + 1
570
604
>
571
605
> instance SizeAlg Satisfy10 where
572
606
> sizeAlg10 (Satisfy10 _) = 1
573
607
574
-
575
608
> size10 :: SizeAlg f => Fix f a -> Size8 a
576
609
> size10 = cata sizeAlg10
577
610
>
578
611
> eval :: Size
579
- > eval = coerce (size10 (aorb10' :: (Fix (Or10 :+: Satisfy10 )) Char ))
612
+ > eval = coerce (size10 (aorb10' :: (
613
+ > Fix (Or10 :+: Satisfy10 )) Char ))
580
614
615
+ % format Circuit11
616
+ % format WidthAlg11
617
+ % format Fan11
618
+ % format Stretch11
581
619
620
+ The type of |aorb10'| is required to be specified.
621
+ This is so that the compiler knows the top level functor being used and the constructors included in it.
622
+ There could possibly an error in the paper here as it uses |Circuit11 |.
623
+ This requires that the |WidthAlg11 | instances be defined for all constructors in |Circuit11 |, however, only |Fan11 | and |Stretch11 | have been.
624
+ To rectify this type error |stretchfan| should be given the type |stretchfan :: Fix (Fan11 :+: Stretch11 )|.
582
625
583
626
627
+ \section {Conclusion }
628
+ This overview has walked through the techniques described in the paper and applied them to a typed Domain Specific Langugae.
629
+ It aims to highlight the correspondence of shallow embeddings to algebras of deep embeddings.
630
+
584
631
585
632
\bibliography {biblo}
586
633
@@ -594,14 +641,6 @@ Smart constructors:
594
641
595
642
% endif
596
643
597
- > type Size = Int
598
- > size :: Parser2 a -> Size
599
- > size Empty2 = 1
600
- > size (Pure2 _) = 1
601
- > size (Satisfy2 _) = 1
602
- > size (Try2 px) = 1 + size px
603
- > size (Ap2 pf px) = 1 + size pf + size px
604
- > size (Or2 px py) = 1 + size px + size py
605
644
606
645
% format Parser3
607
646
% format pure3
0 commit comments