Skip to content

Commit 461217d

Browse files
committed
Stuff
1 parent 28b2673 commit 461217d

File tree

4 files changed

+283
-171
lines changed

4 files changed

+283
-171
lines changed

biblo.bib

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,18 @@ @inproceedings{Fokkinga1989TuplingAM
44
year = 1989
55
}
66

7+
@article{datatypesalacarte,
8+
title = {Data types à la carte},
9+
volume = 18,
10+
DOI = {10.1017/S0956796808006758},
11+
number = 4,
12+
journal = {Journal of Functional Programming},
13+
publisher = {Cambridge University Press},
14+
author = {Swierstra, Wouter},
15+
year = 2008,
16+
pages = {423–436}
17+
}
18+
719
@inproceedings{embedding,
820
title = "Folding Domain-Specific Languages: Deep and Shallow
921
Embeddings",

main.lhs

Lines changed: 87 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@
4343

4444
%include polycode.fmt
4545
%include forall.fmt
46+
%include spacing.fmt
4647

4748
%format `comp_` = "\cdot "
4849
%subst space = "\codeskip "
@@ -170,10 +171,17 @@ This can be interpretted by defining a function such as |size|, that finds the s
170171
|size| interprets the deep embedding, by folding over the datatype.
171172
See the appendix for how to add an interpretation with a shallow embedding.
172173

174+
> type Size = Int
175+
> size :: Parser2 a -> Size
176+
> size Empty2 = 1
177+
> size (Pure2 _) = 1
178+
> size (Satisfy2 _) = 1
179+
> size (Try2 px) = 1 + size px
180+
> size (Ap2 pf px) = 1 + size pf + size px
181+
> size (Or2 px py) = 1 + size px + size py
173182

174183
\section{Folds}
175184

176-
It is possible to capture the shape of an abstract datatype through the |Functor| type class.
177185
It is possible to capture the shape of an abstract datatype as a |Functor|.
178186
The use of a |Functor| allows for the specification of where a datatype recurses.
179187
There is however one problem, a functor expresing the parser language is required to be typed.
@@ -204,7 +212,7 @@ The shape of |Parser2|, can be seen in |ParserF| where the |k a| marks the recur
204212
%format Parser4
205213

206214
|Fix| is used to get the fixed point of the functor.
207-
It contains the structure needed to make the datatype recursive.
215+
It contains the structure needed to allow the datatype to recursive.
208216
|Parser4| is the fixed point of |ParserF|.
209217

210218
> type Parser4 a = Fix ParserF a
@@ -239,18 +247,18 @@ Due to parsers being a typed language, a constant functor is required to preserv
239247
\subsection{Multiple Interpretations}
240248

241249
In DSLs it is common to want to evaluate multiple interpretations.
242-
For example, a parser may also want to know the maximum characters it will read (maximum munch).
250+
For example, a parser may also want to know the maximum number of characters it will read (maximum munch).
243251
In a deep embedding this is simple, a second algebra can be defined.
244252

245253
%format maxMunch4
246254

247255
> type MM = Int
248256
>
249257
> mmAlg :: ParserAlg (C MM) i
250-
> mmAlg (PureF _) = C 0
251-
> mmAlg EmptyF = C 0
252-
> mmAlg (SatisfyF c) = C 1
253-
> mmAlg (TryF (C px)) = C px
258+
> mmAlg EmptyF = C 0
259+
> mmAlg (PureF _) = C 0
260+
> mmAlg (SatisfyF c) = C 1
261+
> mmAlg (TryF (C px)) = C px
254262
> mmAlg (ApF (C pf) (C px)) = C $ pf + px
255263
> mmAlg (OrF (C px) (C py)) = C $ max px py
256264
>
@@ -275,8 +283,8 @@ If many semantics are required this can become cumbersome to define.
275283
> maxMunch5 = snd
276284
>
277285
> smmAlg :: ParserAlg (C (Size, MM)) a
278-
> smmAlg (PureF _) = C (1, 0)
279286
> smmAlg EmptyF = C (1, 0)
287+
> smmAlg (PureF _) = C (1, 0)
280288
> smmAlg (SatisfyF c) = C (1, 1)
281289
> smmAlg (TryF (C (s, mm))) = C (s + 1, mm)
282290
> smmAlg (ApF (C (s, mm)) (C (s', mm')))
@@ -298,10 +306,9 @@ Although this is an algebra, you are able to learn the shallow embedding from th
298306

299307
In a more complex parser combinator library that perform optimisations on a deep embedding,
300308
it could also be possible that there is a primary fold that depends on other secondary folds on parts of the AST.
301-
Folds such as this are named mutumorphisms~\cite{Fokkinga1989TuplingAM},
309+
Folds such as this are named zygomorphisms~\cite{Fokkinga1989TuplingAM} - a special case of a mutomorphism -
302310
they can be implemented by tupling the functions in the fold.
303-
\citet{parsley} makes use of a zygomorphism -
304-
a special case where the dependency is only one-way - to perform consumption analysis.
311+
\citet{parsley} makes use of a zygomorphism to perform consumption analysis.
305312

306313

307314
\subsection{Context-sensitive Interpretations}
@@ -319,8 +326,8 @@ Using the semantics from~\citet{wuYoda}, an implementation can be given for a si
319326

320327

321328
> yAlg :: ParserAlg Y i
329+
> yAlg EmptyF = Y $ const []
322330
> yAlg (PureF x) = Y $ \ts -> [(x, ts)]
323-
> yAlg EmptyF = Y $ const []
324331
> yAlg (SatisfyF c) = Y $ \ case
325332
> [] -> []
326333
> (t:ts') -> [(t, ts') | c t]
@@ -359,12 +366,12 @@ This algebra describes how the shallow embedding should fold the structure.
359366
> newtype Parser7 i = P7
360367
> {unP7 :: forall a . ( forall j . ParserF a j -> a j) -> a i}
361368
>
362-
> pure7 :: a -> Parser7 a
363-
> pure7 x = P7 (\h -> h (PureF x))
364-
>
365369
> empty7 :: Parser7 a
366370
> empty7 = P7 (\h -> h EmptyF)
367371
>
372+
> pure7 :: a -> Parser7 a
373+
> pure7 x = P7 (\h -> h (PureF x))
374+
>
368375
> satisfy7 :: (Char -> Bool) -> Parser7 Char
369376
> satisfy7 c = P7 (\h -> h (SatisfyF c))
370377
>
@@ -389,8 +396,8 @@ Simillarly it is possible to convert a deep embedding into a parameterised shall
389396
> shallow = cata shallowAlg
390397
>
391398
> shallowAlg :: ParserAlg Parser7 i
392-
> shallowAlg (PureF x) = pure7 x
393399
> shallowAlg EmptyF = empty7
400+
> shallowAlg (PureF x) = pure7 x
394401
> shallowAlg (SatisfyF c) = satisfy7 c
395402
> shallowAlg (TryF px) = try7 px
396403
> shallowAlg (ApF pf px) = ap7 pf px
@@ -417,7 +424,7 @@ This is possible in Haskell through the use of a type class.
417424

418425
> class Parser8 parser where
419426
> empty8 :: parser a
420-
> pure8 :: a -> parser a
427+
> pure8 :: a -> parser a
421428
> satisfy8 :: (Char -> Bool) -> parser Char
422429
> try8 :: parser a -> parser a
423430
> ap8 :: parser (a -> b) -> parser a -> parser b
@@ -450,9 +457,6 @@ To be able to reuse the previously defined algebras, a different type class can
450457

451458
\subsection{Modular Interpretations}
452459

453-
There may be times when adding extra combinators would be convenient, for example adding a 'many' operator that allows for
454-
A modular technique to assembling DSLs would aid this process.
455-
456460
%format Empty10
457461
%format Pure10
458462
%format Satisfy10
@@ -464,6 +468,22 @@ A modular technique to assembling DSLs would aid this process.
464468
%format aorb10
465469
%format :+: = ":\!\!+\!\!:"
466470

471+
There may be times when adding extra combinators would be convenient, for example adding a 'string' operator.
472+
A modular technique to assembling DSLs would aid this process.
473+
This approach is described in Data types à la carte~\cite{datatypesalacarte}.
474+
An |:+:| operator can be defined to specify a choice between constructors.
475+
476+
> data (f :+: g) (k :: * -> *) (a :: *) where
477+
> L :: f k a -> (f :+: g) k a
478+
> R :: g k a -> (f :+: g) k a
479+
> infixr :+:
480+
>
481+
> instance (IFunctor f, IFunctor g)
482+
> => IFunctor (f :+: g) where
483+
> imap f (L x) = L (imap f x)
484+
> imap f (R y) = R (imap f y)
485+
486+
Now the datatypes can be defined for each constructor that is required. The |IFunctor| instances for each constructor can be found in the appendix.
467487

468488
> data Empty10 (k :: * -> *) (a :: *) where
469489
> Empty10 :: Empty10 k a
@@ -483,29 +503,20 @@ A modular technique to assembling DSLs would aid this process.
483503
> data Or10 (k :: * -> *) (a :: *) where
484504
> Or10 :: k a -> k a -> Or10 k a
485505

486-
487-
> data (f :+: g) (k :: * -> *) (a :: *) where
488-
> L :: f k a -> (f :+: g) k a
489-
> R :: g k a -> (f :+: g) k a
490-
> infixr :+:
491-
>
492-
> instance (IFunctor f, IFunctor g)
493-
> => IFunctor (f :+: g) where
494-
> imap f (L x) = L (imap f x)
495-
> imap f (R y) = R (imap f y)
506+
The datatypes are now summed together to form a single |ParserF10| type.
496507

497508
> type ParserF10 = Empty10 :+: Pure10 :+: Satisfy10
498509
> :+: Try10 :+: Ap10 :+: Or10
499510
>
500511
> type Parser10 = Fix ParserF10
501512

513+
There is however, one problem with this approach: there is now a mess of |L| and |R|'s. This makes this approach inconvenient to use.
502514

503515
> aorb10 :: Parser10 Char
504516
> aorb10 = In (R (R (R (R (R (Or10
505517
> (In (R (R (L (Satisfy10 (=='a'))))))
506518
> (In (R (R (L (Satisfy10 (=='b'))))))))))))
507519

508-
509520
%format empty10
510521
%format pure10
511522
%format satisfy10
@@ -515,28 +526,37 @@ A modular technique to assembling DSLs would aid this process.
515526
%format aorb10'
516527
%format :<: = ":\prec:"
517528

529+
Data types à la carte~\cite{datatypesalacarte}, however, describes a technique that allows for the injection of these |L|'s and |R|'s.
530+
The notion of subtypes between functors, can be specified using the |:<:| operator.
531+
518532

519533
> class (IFunctor f, IFunctor g) => f :<: g where
520534
> inj :: f k a -> g k a
521535
>
522536
> instance IFunctor f => f :<: f where
523537
> inj = id
524538
>
525-
> instance {-# OVERLAPPING #-} (IFunctor f, IFunctor g) => f :<: (f :+: g) where
526-
> inj = L
539+
> instance {-# OVERLAPPING #-}
540+
> (IFunctor f, IFunctor g)
541+
> => f :<: (f :+: g) where
542+
> inj = L
527543
>
528-
> instance (IFunctor f, IFunctor g, IFunctor h, f :<: g) => f :<: (h :+: g) where
544+
> instance (IFunctor f, IFunctor g,
545+
> IFunctor h, f :<: g)
546+
> => f :<: (h :+: g) where
529547
> inj = R . inj
530548

531-
Smart constructors:
549+
550+
Smart constructors are defined that allow for the |L|'s and |R|'s to be injected.
532551

533552
> empty10 :: (Empty10 :<: f) => Fix f a
534553
> empty10 = In (inj Empty10)
535554
>
536555
> pure10 :: (Pure10 :<: f) => a -> Fix f a
537556
> pure10 x = In (inj (Pure10 x))
538557
>
539-
> satisfy10 :: (Satisfy10 :<: f) => (Char -> Bool) -> Fix f Char
558+
> satisfy10 :: (Satisfy10 :<: f) => (Char -> Bool)
559+
> -> Fix f Char
540560
> satisfy10 c = In (inj (Satisfy10 c))
541561
>
542562
> try10 :: (Try10 :<: f) => Fix f a -> Fix f a
@@ -550,37 +570,64 @@ Smart constructors:
550570

551571

552572

573+
Now the smart constructors can be used to form an expression |aorb10'|.
574+
The type contraints on this expression allow for |f| to be flexible, so long as |Or10| and |Satisfy10| are subtypes of the functor |f|.
575+
553576
> aorb10' :: (Or10 :<: f, Satisfy10 :<: f) => Fix f Char
554577
> aorb10' = satisfy10 (== 'a') `or10` satisfy10 (== 'b')
555578

556579

557580
%format sizeAlg10
558581
%format size10
559582

583+
To be able to give an interpretation an algebra is still required.
584+
Simillarly to the constructors the algebra needs to be modularized.
585+
A type class can be defined that provides the algebra to fold each constructor.
586+
587+
560588

561589
> class IFunctor f => SizeAlg f where
562590
> sizeAlg10 :: f Size8 i -> Size8 i
563591
>
564-
> instance (SizeAlg f, SizeAlg g) => SizeAlg (f :+: g) where
592+
> instance (SizeAlg f, SizeAlg g)
593+
> => SizeAlg (f :+: g) where
565594
> sizeAlg10 (L x) = sizeAlg10 x
566595
> sizeAlg10 (R y) = sizeAlg10 y
567-
>
596+
597+
One benefit to this approach is that if an interpretation is only needed for parsers that use |or10| and |satisfy10|,
598+
then only those instances need to be defined.
599+
Take calculating the size of the parser |aorb10'|, only the two instances need to be defined to do so.
600+
601+
568602
> instance SizeAlg Or10 where
569603
> sizeAlg10 (Or10 px py) = px + py + 1
570604
>
571605
> instance SizeAlg Satisfy10 where
572606
> sizeAlg10 (Satisfy10 _) = 1
573607

574-
575608
> size10 :: SizeAlg f => Fix f a -> Size8 a
576609
> size10 = cata sizeAlg10
577610
>
578611
> eval :: Size
579-
> eval = coerce (size10 (aorb10' :: (Fix (Or10 :+: Satisfy10)) Char))
612+
> eval = coerce (size10 (aorb10' :: (
613+
> Fix (Or10 :+: Satisfy10)) Char))
580614

615+
%format Circuit11
616+
%format WidthAlg11
617+
%format Fan11
618+
%format Stretch11
581619

620+
The type of |aorb10'| is required to be specified.
621+
This is so that the compiler knows the top level functor being used and the constructors included in it.
622+
There could possibly an error in the paper here as it uses |Circuit11|.
623+
This requires that the |WidthAlg11| instances be defined for all constructors in |Circuit11|, however, only |Fan11| and |Stretch11| have been.
624+
To rectify this type error |stretchfan| should be given the type |stretchfan :: Fix (Fan11 :+: Stretch11)|.
582625

583626

627+
\section{Conclusion}
628+
This overview has walked through the techniques described in the paper and applied them to a typed Domain Specific Langugae.
629+
It aims to highlight the correspondence of shallow embeddings to algebras of deep embeddings.
630+
584631

585632
\bibliography{biblo}
586633

@@ -594,14 +641,6 @@ Smart constructors:
594641

595642
%endif
596643

597-
> type Size = Int
598-
> size :: Parser2 a -> Size
599-
> size Empty2 = 1
600-
> size (Pure2 _) = 1
601-
> size (Satisfy2 _) = 1
602-
> size (Try2 px) = 1 + size px
603-
> size (Ap2 pf px) = 1 + size pf + size px
604-
> size (Or2 px py) = 1 + size px + size py
605644

606645
%format Parser3
607646
%format pure3

main.pdf

-365 KB
Binary file not shown.

0 commit comments

Comments
 (0)