-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
1571 lines (1487 loc) · 82.9 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<!-- BEGIN Info -->
<meta
name="description"
content="Fjord - Fjord is an open-source framework that allows end users to receive Kafka streaming data in real-time."
/>
<meta name="title" property="og:title" content="Fjord" />
<meta property="og:type" content="Website" />
<meta name="image" property="og:image" content="assets/thumb.png" />
<meta
name="description"
property="og:description"
content="Fjord - Fjord is an open-source framework that allows end users to receive Kafka streaming data in real-time."
/>
<meta name="author" content="Fjord" />
<!-- END Info -->
<!-- BEGIN favicon -->
<link
rel="apple-touch-icon"
sizes="180x180"
href="assets/favicon/apple-touch-icon.png"
/>
<link
rel="mask-icon"
href="assets/favicon/safari-pinned-tab.svg"
color="#5bbad5"
/>
<link rel="shortcut icon" href="assets/favicon/favicon.ico" />
<meta name="msapplication-TileColor" content="#ffffff" />
<meta
name="msapplication-config"
content="images/favicon/browserconfig.xml"
/>
<meta name="theme-color" content="#ffffff" />
<!-- END favicon -->
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Fjord</title>
<link
rel="stylesheet"
href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.3/css/all.min.css"
/>
<link
rel="stylesheet"
href="https://unpkg.com/@tailwindcss/[email protected]/dist/typography.min.css"
/>
<link rel="stylesheet" href="stylesheets/reset.css" />
<link rel="stylesheet" href="stylesheets/style.css" />
<link rel="stylesheet" href="stylesheets/responsive.css" />
</head>
<body>
<header class="mobile-menu-closed">
<div id="header">
<a href="./index.html">
<img src="assets/logo/logo-name.svg" />
</a>
<nav>
<a href="#start-here" class="selected">Overview</a>
<a href="#case-study">Case Study</a>
<a href="#presentation">Presentation</a>
<a href="#our-team">Our Team</a>
<a
href="https://github.com/fjord-framework"
target="_blank"
class="icon"
><i class="fab fa-github"></i
></a>
</nav>
<div id="menu">
<button type="button">
<svg
id="mobile-open"
xmlns="http://www.w3.org/2000/svg"
fill="none"
viewBox="0 0 24 24"
stroke="currentColor"
aria-hidden="true"
>
<path
stroke-linecap="round"
stroke-linejoin="round"
stroke-width="2"
d="M4 6h16M4 12h16M4 18h16"
/>
</svg>
<svg
id="mobile-close"
xmlns="http://www.w3.org/2000/svg"
fill="none"
viewBox="0 0 24 24"
stroke="currentColor"
aria-hidden="true"
>
<path
stroke-linecap="round"
stroke-linejoin="round"
stroke-width="2"
d="M6 18L18 6M6 6l12 12"
/>
</svg>
</button>
</div>
</div>
<div id="header-buffer"></div>
<div id="mobile-menu">
<a href="#start-here" class="selected">Overview</a>
<a href="#case-study">Case Study</a>
<a href="#presentation">Presentation</a>
<a href="#our-team">Our Team</a>
<a href="https://github.com/fjord-framework" target="_blank"
><i class="fab fa-github"></i></a>
</div>
</header>
<div id="start-here" class="main-section">
<div class="h-full">
<div class="static-logo-color"></div>
<div class="">
<img
class="fjord sm-screen"
src="assets/logo/fjord-vertical.png"
/>
<img class="fjord lg-screen" src="assets/logo/fjord-name.png" />
<p class="light-text">
An open-source framework that<br />provides a
<span class="text-pink">real-time API Proxy</span> for<br />
<span class="text-teal">Kafka</span><br />
</p>
</div>
</div>
<div class="h-full">
<div class="bg-dark-blue static-logo-blue">
<h2>Real-time Streaming from Kafka</h2>
</div>
<div class="bg-dark-blue">
<h2 class="sm-header">Real-time Streaming from Kafka</h2>
<p>Fjord exposes real-time API endpoints to allow internet-facing clients to stream from Kafka </p>
<video autoplay loop muted playsinline>
<source src="./assets/media/mp4/API-Proxy-dark.mp4" type="video/mp4" />
Your browser does not support the HTML5 Video element.
</video>
</div>
</div>
<div class="h-full">
<div class="bg-light-blue static-logo-light-blue">
<h2>Easy To Deploy</h2>
</div>
<div class="bg-light-blue">
<h2 class="sm-header">Easy to Deploy</h2>
<p>
Use Fjord's CLI to deploy all the necessary infrastructure
to Amazon Web Services (AWS).
</p>
<img src="assets/media/gifs/fjordCLI.gif" class="cli"/>
</div>
</div>
<div class="h-full">
<div class="bg-dark-blue static-logo-blue">
<h2>Scalable Infrastructure</h2>
</div>
<div class="bg-dark-blue">
<h2 class="sm-header">Scalable Infrastructure</h2>
<p>
Fjord's infrastructure automatically scales up and down based on demand.
</p>
<img class="lazy" data-src="./assets/media/gifs/full_flow.gif"/>
</div>
</div>
</div>
<aside id="toc">
<!-- Case Study <br /><br /> -->
<ul>
<!-- Section 1 -->
<li data-section="section-1" class="selected">
<a href="#section-1">
<div>
<div class="bullet"><div></div></div>
<p>1. What is Fjord?</p>
</div>
</a>
</li>
<!-- Section 2 -->
<li data-section="section-2">
<a href="#section-2">
<div>
<div class="bullet"><div></div></div>
<p>2. Use Case</p>
</div>
</a>
</li>
<li data-section="section-2" class="subitem">
<a href="#section-2-1">
<div>
<div class="bullet"><div></div></div>
<p>Who Might Use Ford?</p>
</div>
</a>
</li>
<li data-section="section-2" class="subitem">
<a href="#section-2-2">
<div>
<div class="bullet"><div></div></div>
<p>SuperEats Example</p>
</div>
</a>
</li>
<!-- Section 3 -->
<li data-section="section-3">
<a href="#section-3">
<div>
<div class="bullet"><div></div></div>
<p>3. What is Real-time?</p>
</div>
</a>
</li>
<li data-section="section-3" class="subitem">
<a href="#section-3-1">
<div>
<div class="bullet"><div></div></div>
<p>Defining Real-time</p>
</div>
</a>
</li>
<li data-section="section-3" class="subitem">
<a href="#section-3-2">
<div>
<div class="bullet"><div></div></div>
<p>Real-time Techniques</p>
</div>
</a>
</li>
<li data-section="section-3" class="subitem">
<a href="#section-3-3">
<div>
<div class="bullet"><div></div></div>
<p>Choosing a Real-time API</p>
</div>
</a>
</li>
<!-- Section 4 -->
<li data-section="section-4">
<a href="#section-4">
<div>
<div class="bullet"><div></div></div>
<p>4. What is Kafka?</p>
</div>
</a>
</li>
<li data-section="section-4" class="subitem">
<a href="#section-4-1">
<div>
<div class="bullet"><div></div></div>
<p>The evolution that led to Kafka</p>
</div>
</a>
</li>
<li data-section="section-4" class="subitem">
<a href="#section-4-2">
<div>
<div class="bullet"><div></div></div>
<p>The rise of Apache Kafka</p>
</div>
</a>
</li>
<li data-section="section-4" class="subitem">
<a href="#section-4-3">
<div>
<div class="bullet"><div></div></div>
<p>Kafka, Real-time, and SuperEats</p>
</div>
</a>
</li>
<!-- Section 5 -->
<li data-section="section-5">
<a href="#section-5">
<div>
<div class="bullet"><div></div></div>
<p>5. Why an API Proxy for Kafka?</p>
</div>
</a>
</li>
<li data-section="section-5" class="subitem">
<a href="#section-5-1">
<div>
<div class="bullet"><div></div></div>
<p>Protocol Interoperability Issue</p>
</div>
</a>
</li>
<li data-section="section-5" class="subitem">
<a href="#section-5-2">
<div>
<div class="bullet"><div></div></div>
<p>Using an API Proxy as Middleware</p>
</div>
</a>
</li>
<li data-section="section-5" class="subitem">
<a href="#section-5-3">
<div>
<div class="bullet"><div></div></div>
<p>Additional Benefits of an API Proxy</p>
</div>
</a>
</li>
<!-- Section 6 -->
<li data-section="section-6">
<a href="#section-6">
<div>
<div class="bullet"><div></div></div>
<p>6. Existing Solutions</p>
</div>
</a>
</li>
<li data-section="section-6" class="subitem">
<a href="#section-6-1">
<div>
<div class="bullet"><div></div></div>
<p>Paid Solutions</p>
</div>
</a>
</li>
<li data-section="section-6" class="subitem">
<a href="#section-6-2">
<div>
<div class="bullet"><div></div></div>
<p>DIY Approach</p>
</div>
</a>
</li>
<li data-section="section-6" class="subitem">
<a href="#section-6-3">
<div>
<div class="bullet"><div></div></div>
<p>Fjord: the only open source, full service solution</p>
</div>
</a>
</li>
<!-- Section 7 -->
<li data-section="section-7">
<a href="#section-7">
<div>
<div class="bullet"><div></div></div>
<p>7. Building Fjord</p>
</div>
</a>
</li>
<li data-section="section-7" class="subitem">
<a href="#section-7-1">
<div>
<div class="bullet"><div></div></div>
<p>Overview of Triangular Pattern</p>
</div>
</a>
</li>
<li data-section="section-7" class="subitem">
<a href="#section-7-2">
<div>
<div class="bullet"><div></div></div>
<p>Design Goals</p>
</div>
</a>
</li>
<li data-section="section-7" class="subitem">
<a href="#section-7-3">
<div>
<div class="bullet"><div></div></div>
<p>The Evolution of Fjord</p>
</div>
</a>
</li>
<li data-section="section-7" class="subitem">
<a href="#section-7-4">
<div>
<div class="bullet"><div></div></div>
<p>Fjord's Architecture</p>
</div>
</a>
</li>
<!-- Section 8 -->
<li data-section="section-8">
<a href="#section-8">
<div>
<div class="bullet"><div></div></div>
<p>8. Deploying & Using Fjord</p>
</div>
</a>
</li>
<li data-section="section-8" class="subitem">
<a href="#section-8-1">
<div>
<div class="bullet"><div></div></div>
<p>The Fjord CLI</p>
</div>
</a>
</li>
<li data-section="section-8" class="subitem">
<a href="#section-8-2">
<div>
<div class="bullet"><div></div></div>
<p>Customizing your infrastructure components</p>
</div>
</a>
</li>
<li data-section="section-8" class="subitem">
<a href="#section-8-3">
<div>
<div class="bullet"><div></div></div>
<p>Deploying Fjord to AWS</p>
</div>
</a>
</li>
<li data-section="section-8" class="subitem">
<a href="#section-8-4">
<div>
<div class="bullet"><div></div></div>
<p>Client-side Integration</p>
</div>
</a>
</li>
<!-- Section 9 -->
<li data-section="section-9">
<a href="#section-9">
<div>
<div class="bullet"><div></div></div>
<p>9. Technical Challenges</p>
</div>
</a>
</li>
<li data-section="section-9" class="subitem">
<a href="#section-9-1">
<div>
<div class="bullet"><div></div></div>
<p>Auto-scaling and Load Balancing</p>
</div>
</a>
</li>
<li data-section="section-9" class="subitem">
<a href="#section-9-2">
<div>
<div class="bullet"><div></div></div>
<p>CORS Issue & Heartbeat</p>
</div>
</a>
</li>
<li data-section="section-9" class="subitem">
<a href="#section-9-3">
<div>
<div class="bullet"><div></div></div>
<p>Load Testing Fjord</p>
</div>
</a>
</li>
<!-- Section 10 -->
<li data-section="section-10">
<a href="#section-10">
<div>
<div class="bullet"><div></div></div>
<p>10. Future Work</p>
</div>
</a>
</li>
<li data-section="section-10" class="subitem">
<a href="#section-10-1">
<div>
<div class="bullet"><div></div></div>
<p>More Advanced Security Features</p>
</div>
</a>
</li>
<li data-section="section-10" class="subitem">
<a href="#section-10-2">
<div>
<div class="bullet"><div></div></div>
<p>Bidirectional Communication</p>
</div>
</a>
</li>
</ul>
</aside>
<div id="case-study" class="main-section">
<div id="case-study-content">
<div class="prose">
<h1>Case Study</h1>
<!-- Section 1 -->
<h2 id="section-1">1. What is Fjord?</h2>
<p>
Fjord is an open-source framework that enables client-side streaming from Kafka in real-time.
</p>
<figure>
<img src="assets/media/images/vennDiagram.png" class="case-study-image" />
</figure>
<p>
Through Fjord’s CLI, developers can easily and quickly deploy dozens of components on Amazon Web Services (AWS) to offload the streaming responsibilities to Fjord’s scalable real-time API proxy infrastructure.
</p>
<video autoplay loop muted playsinline>
<source src="./assets/media/mp4/API-Proxy-white.mp4" type="video/mp4" />
Your browser does not support the HTML5 Video element.
</video>
<p>
This enables any number of authorized end users--whether on a browser or a mobile app--to receive data streams from any number of Kafka topics from any number of Kafka clusters.
</p>
<p>
In this case study, we outline the key challenges we faced as we worked with real-time APIs, event driven architectures, and Kafka as an event streaming platform.
</p>
<!-- Section 2 -->
<h2 id="section-2" class="h2">2. Use Case & SuperEats example</h2>
<h3 id="section-2-1">2.1 Who Might Use Fjord?</h3>
<p>
Any company or organization that uses Kafka may need to expose one or more Kafka topics to a client interface, whether because this is a central or tangential aspect of their business, or even just as an additional layer of human observability into a particular data stream.
</p>
<p>
Building, hosting, and managing the infrastructure necessary to handle client-side
streaming from Kafka can be challenging, time consuming, and the kind of undifferentiated heavy lifting that is best outsourced to a third party.
</p>
<p>We see our target user as an organization that:</p>
<ol>
<li>Is <strong>large enough</strong> to already use Kafka for some aspect of their infrastructure,</li>
<li>Wants to <strong>expose some stream of data</strong> (i.e. from a Kafka topic) in real-time to their employees, customers, suppliers, or any other type of stakeholder, including the public at large--or any combination of these.</li>
</ol>
<p>
Of course, this organization is cost-conscious and wants the most efficient and easy way to deploy the infrastructure needed to solve this problem, while still managing, owning, and closely handling all the data themselves.
</p>
<h3 id="section-2-2">2.2 SuperEats Example</h3>
<p>
Let’s imagine that you run a company called SuperEats. Your company provides a platform that enables customers to order a meal from any registered restaurant, and independent contractors to pick up the order from that restaurant, and drive the meal to the customer.
</p>
<figure>
<img src="assets/media/images/superEatsLogo.png" class="case-study-image" />
</figure>
<p>
SuperEats works with thousands of restaurants, from well-known multinational chains like Chipotle to small hole-in-the-wall shops like Cousin Vinny’s Hot Dogs. Customers place an order from their favorite restaurant, which then comes into SuperEats’ Kafka cluster.
</p>
<figure>
<img src="assets/media/gifs/supereats.gif" class="case-study-image" />
</figure>
<p>
You want your platform to then not only send orders out to drivers, but also for customers to receive real-time updates at every step of the way. For example, customers should know when their meal was made, when the driver has picked it up, when the driver is about five minutes away from their home, and finally when their meal is at their doorstep.
</p>
<p>
To accomplish this, SuperEats needs a real-time infrastructure that is secure, easy to deploy and use, and takes advantage of their existing infrastructure.
</p>
<figure>
<img src="assets/media/gifs/supereats_responds.gif" class="case-study-image" />
</figure>
<!-- Section 3 -->
<h2 id="section-3" class="h2">3. What is Real-time?</h2>
<h3 id="section-3-1">3.1 Defining Real-time</h3>
<p>
Ably, a leader in the real-time space, defines real-time thus:
</p>
<figure>
<blockquote>
<p>Real-time is the ability to react to anything that occurs as it occurs, before it loses its importance.</p>
</blockquote>
</figure>
<p>
This definition illustrates the main idea here: within the context of web applications, the goal of real-time streaming is to allow the recipient to get the information they need within sufficient time to adequately respond to it.
</p>
<figure>
<img src="assets/media/gifs/Events.gif" class="case-study-image" />
</figure>
<p>
Restaurants expect real-time updates on orders as soon as they are placed by customers. Customers expect updates on their order status, and drivers are constantly on the look out for nearby delivery requests.
If any of these groups aren’t able to get updates in real-time, their user experience is degraded, and updates would no longer be meaningful.
</p>
<h3 id="section-3-2">3.2 Real-time Techniques</h3>
<p>
There are different techniques developers could use to deliver real-time updates over the web.
</p>
<h4>3.2.1 Long polling</h4>
<p>
With long polling, the client sends an initial HTTP request to the server, and the server then waits until there’s a new piece of data to send back. Once the server gets a new update, it sends a response and immediately closes the request. The client must then send yet a <strong>new</strong> request to receive another update from the server.
</p>
<figure>
<img src="assets/media/gifs/polling.gif" class="case-study-image" />
</figure>
<p>
This is great in situations when the rate of new message production is not high. For example, the current outdoor temperature usually doesn't significantly change every quarter or even hour.
</p>
<p>
There are downsides to using long polling. Having the client send a new request and the server send a new response for every single new message not only adds more latency to the process, but also puts more work on both the client and the server.
</p>
<h4>3.2.2 Server Sent Events (SSE)</h4>
<p>
SSE (also known as EventSource) is a web API that enables a client to receive a continuous stream of data from a server.
</p>
<p>
The client sends an HTTP request through the EventSource WebAPI, which lets the server know this request is for a stream of data. The server then returns a never-ending HTTP response whose headers indicate that the connection will be ongoing until explicitly closed. At the same time, the server starts to send any data received, all in <strong>the same HTTP response</strong>.
</p>
<figure>
<img src="assets/media/gifs/sse.gif" class="case-study-image" />
</figure>
<p>
Just like long polling, SSE is <strong>unidirectional</strong>. The data is only streaming from the server to the client, and not vice versa.
</p>
<p>
However, the SSE approach is much more efficient than long polling, as there’s no longer the need to continuously open and close HTTP responses for every message. All data is being transmitted through one never ending HTTP response.
</p>
<p>
SSE is therefore ideal for situations where the client is not regularly sending information to the server, but is instead receiving a constant stream from the server.
</p>
<h4>3.2.3 WebSockets</h4>
<p>
WebSockets is a protocol that allows a client and a server to repeatedly exchange data through a single TCP connection. This <strong>bidirectional</strong> protocol means that both the client and the server can send data to each other as long as the WebSocket connection remains open.
</p>
<figure>
<img src="assets/media/gifs/ws.gif" class="case-study-image" />
</figure>
<p>
The client first sends a normal HTTP request, but this request contains headers that ask the connection to be upgraded to a WebSocket connection. The server then sends a response that opens the bidirectional communication line over the WebSockets protocol.
</p>
<p>
WebSockets are great in situations where both the server and the client need to frequently send data to each other, such as in online gaming and chat room applications. WebSockets are also seen as the de facto technology for real-time communication, which means that there’s a large community available for support, and many open source libraries available.
</p>
<h3 id="section-3-3">3.3 Choosing a Real-time API</h3>
<figure>
<img src="assets/media/gifs/triangular_pattern.gif" class="case-study-image" />
</figure>
<h4>3.3.1 Unidirectional v.s. Bidirectional</h4>
<p>
SSE’s unidirectional limitation was not an impediment in our use case. Our goal was always to stream content from Kafka to end users, and not vice versa. For a company like SuperEats, having all customers and drivers be able to send a stream of data back to the company’s Kafka cluster may actually open up a new set of challenges that would require additional security and maintenance. That was an additional challenge not warranted by our use case.
</p>
<p>
It’s true that we could have still used WebSockets and just chosen to not implement any client-side push back to Kafka. But that would defeat the main purpose of using WebSockets in the first place.
</p>
<p>
There are also two useful SSE features absent with WebSockets that tipped the scale in favor of using SSE for Fjord: auto-reconnect, and native infrastructure compatibility.
</p>
<h4>3.3.2 Auto-reconnect</h4>
<p>
By default, SSE’s EventSource WebAPI automatically tries to reconnect the client to the server every time the client gets disconnected from the server. Because our use case involved streaming to drivers on mobile devices that may need to switch cell phone towers when driving across town, this seemed like a useful feature.
</p>
<figure>
<img src="assets/media/gifs/sse_autoreconnect.gif" class="case-study-image" />
</figure> <p>
The auto-reconnect feature also proved useful for auto-scaling purposes. Because, as we’ll explore later, the URL that clients are connected to is actually that of a load balancer, this meant that whenever a server has either crashed or needs to be shut down because of low activity, all of the clients currently receiving data from that server would automatically be reconnected to another existing server.
There was no need for us to configure any additional logic to handle that.
</p>
<p>
The auto-reconnect feature worked in perfect tandem with the load balancer to persist client-server connections.
</p>
<h4>3.3.3 Native Infrastructure Compatibility</h4>
<p>
Lastly, the main advantage of SSE has to do with the fact that it works over the standard HTTP protocol and not a more specialized protocol like WebSockets. This means that SSE works right out of the box with all your infrastructure components like load balancers, proxies, etc.
</p>
<p>
Configuring your infrastructure to work with WebSockets is of course possible, but it would require using additional libraries and spending more time around configurations.
</p>
<p>
For all these reasons, we used SSE to stream records from our server to clients.
</p>
<figure>
<img src="assets/media/images/WS vs SSE reworked.svg" class="case-study-image" />
</figure>
<p>
Next, let’s understand why we decided to specifically build an API proxy for Kafka.
</p>
<!-- Section 4 -->
<h2 id="section-4" class="h2">4. What is Kafka?</h2>
<p>
In this section, we will explore Kafka's role as an event streaming platform, why it often serves as the backbone of an organization’s infrastructure, and finally why it's a great conveyor of real-time data.
</p>
<h3 id="section-4-1">4.1 The evolution that led to Kafka</h3>
<h4>4.1.1 EDA as Messaging Pattern for Microservices</h4>
<p>
An event driven architecture (EDA) offers a paradigm that decouples the production and consumption of messages (or events) in order to facilitate inter-microservices communication.
</p>
<figure>
<img src="assets/media/gifs/producer_consumer.gif" class="case-study-image" />
</figure>
<p>
By adding a broker in between the microservices that generate events (“producers”) and the microservices that receive events (“consumers”), an EDA allows microservices to communicate with each other without even being aware of each other’s existence.
</p>
<p>
The three main architectural pieces of EDAs therefore include:
</p>
<ol>
<li><strong>Producers</strong> that generate and send events to a broker,
</li>
<li>The routing of events through a <strong>Broker</strong> that acts as a middleware, and
</li>
<li><strong>Consumers</strong> that have access to any data they need from the broker.
</li>
</ol>
<p>
At the core of EDAs are <strong>events</strong>. An event is any significant occurrence or change in state for a distributed system. An event contains both a payload describing the systemic change or action that occurred, as well as a timestamp of when it occurred.
</p>
<figure>
<img src="assets/media/images/json.png" class="case-study-image" />
</figure>
<p>
Producers create and send events to the same broker, and then move on with their own business logic process, completely unaware of what happens to the event afterward.
</p>
<figure>
<img src="assets/media/images/edaServices.png" class="case-study-image" />
</figure>
<p>
All consumers that are interested in this particular event can then read it from the broker. For example, the inventory, billings, and delivery services could all react to the same order event. Events are immutable (i.e. they cannot be edited), but they may expire or be deleted.
</p>
<h4>4.1.2 Limitations of Traditional EDAs</h4>
<p>
Traditional EDAs simplify communication between microservices and are typically based on a message queue model. However, they also present a new set of challenges.
</p>
<p>
In the traditional EDA model, the broker wears many hats. It has to:
</p>
<ol>
<li>Push events to each appropriate consumer interested in that event,
</li>
<li>Keep track of which event was last consumed by each consumer (the “offset” or index), and this for all consumers, and
</li>
<li>Delete each event as soon as it is read by the appropriate consumer.
</li>
</ol>
<p>
The broker’s workload therefore grows in proportion to the number of consumers and events it must service. More events mean more work and more time is required for the broker to process the events.
</p>
<h4>4.1.3 Event Streaming</h4>
<p>
To handle an extremely high flow of events, event streaming platforms were born. They are still considered a subset of EDAs because they have the same three components of producers, consumers, and a middleware broker. However, they are designed to handle a higher velocity of events than traditional EDAs.
</p>
<figure>
<img src="assets/media/images/eda vs event streaming.png" class="case-study-image" />
</figure>
<p>
The main paradigm shift and key differentiating factor behind event streaming platforms is that the broker actually does <em>less</em> work than it does in the traditional EDA model.
</p>
<p>
Instead of the broker pushing events to all consumers, each consumer is responsible for pulling each record from the broker. The broker similarly does not have to worry about keeping track of the offset of each consumer, since each consumer handles their own offset themselves.
</p>
<p>
Finally, the broker does not have to delete events. While traditional EDA technologies use a <strong>queue-based</strong> structure, where events are deleted after they’re consumed, event streaming platforms use a <strong>log-based</strong> structure to durably store events.
</p>
<figure>
<img src="assets/media/gifs/kafka.gif" class="case-study-image" />
</figure>
<p>
With event streaming, newly added consumers can not only pick up newly received events, but they can also start to stream records from the very beginning of the log’s creation. This is, of course, only possible because events are not deleted after they are read.
</p>
<p>
In order to ensure a scalable event streaming platform, where we place more business logic becomes significant. With traditional EDA technologies, the broker holds the bulk of the integration logic. Consumers just receive whatever is sent to them by the broker.
This sets up the <em>“Smart Broker, Dumb Consumer”</em> approach.
</p>
<p>
In contrast, event streaming platforms opt for the <em>“Dumb Broker, Smart Consumer”</em> approach, placing more integration logic on each consumer instead. This enables a high volume and velocity of events, because there’s a significantly lower relationship between the number of events and consumers on the one hand, and the amount of work the broker needs to do on the other.
</p>
<p>
In other words, increasing either the pace of events that are entering the system, or the number of consumers reading from the
broker, has a much less noticeable impact on the additional work the broker needs to do.
</p>
<p>
This finally leads us to our next point, the gold standard in event-streaming platforms: Apache Kafka.
</p>
<h3 id="section-4-2">4.2 The rise of Apache Kafka</h3>
<p>
If data is the lifeblood of an organization, then Apache Kafka is like the organization’s circulatory system. Kafka offers a powerful, scalable, efficient, and redundant infrastructure that allows your distributed services to communicate with each other in real-time.
</p>
<figure>
<img src="assets/media/images/kafka-logo.png" class="case-study-image" />
</figure>
<p>
The technology was created at LinkedIn out of a need to track vast numbers of site events like page views and user actions, as well as to aggregate large quantities of logs from disparate sources within its distributed architecture. It later became an open source project of the Apache Foundation in 2011.
</p>
<p>
Kafka was designed to be used to manage machine to machine communication. Kafka’s custom <strong>binary protocol</strong> over TCP is built to take advantage of advanced TCP features (e.g., the ability to multiplex requests and the ability to simultaneously poll any connections).
</p>
<p>
Kafka is optimized to handle extremely high throughput of messages. Back in 2019, LinkedIn was already processing <strong>7 trillion</strong> messages per day on their Kafka clusters. This is probably much higher today.
</p>
<p>
Today, over <strong>80% of all Fortune 100</strong> companies use Kafka.
</p>
<figure>
<img src="assets/media/images/brandsUsingKafka.png" class="case-study-image" />
</figure>
<h3 id="section-4-3">4.3 Kafka, Real-time, and SuperEats</h3>
<p>
A company like SuperEats can use Kafka to set up an infrastructure that can scale as the business grows, without worrying about performance issues down the line.
</p>
<figure>
<img src="assets/media/gifs/supereats.gif" class="case-study-image" />
</figure>
<p>
Kafka’s high throughput capacity makes it an ideal candidate to serve real-time events to end users. Kafka is also able to segregate streams of data into any number of topics, which themselves are further divided into multiple partitions. Each partition can have replicas on different brokers, which ensures redundancy in case a broker fails.
</p>
<p>
Kafka provides yet another useful component not found in all event streaming platforms--a <strong>key</strong>. Kafka records have both a payload and a timestamp just like typical EDA events do. However, the payload of a Kafka record is itself comprised of two components: a key and a value. The record <strong>value</strong> holds the actual business data. But the record <strong>key</strong> can be a very useful tool to not only further segregate data, but to also ensure in-order delivery.
</p>
<p>
SuperEats needs a way to segregate data streams by activity type (e.g. incoming order v.s. Driver GPS position), by restaurant, and by customer or driver. For example, SuperEats could have one topic be <span class="code-snippet">orders</span>, and another topic be <span class="code-snippet">driver GPS</span> information.
</p>
<p>
Within the <span class="code-snippet">order</span> topic, the <strong>key</strong> of all records should probably be some type of concatenation of three components: the restaurant's, the customer’s, and the order’s unique identifiers. Because Kafka stores all records of the same key on the same partition, this would ensure that all the information related to order 2235407 for Jane4022354 from restaurant 72544986 would be stored on the same partition, and read in-order.
</p>
<p>
The <span class="code-snippet">order</span> topic’s record <strong>value</strong> would itself contain information regarding the status of the order, the contents of the order, the restaurant's address, the customer's address, etc.
</p>
<p>
Within the <span class="code-snippet">driver GPS</span> topic, the <strong>key</strong> would probably be the driver’s unique identifier. The <strong>value</strong> would contain the actual GPS data.
</p>
<p>
In the next section, we’ll dive deeper into why you need an API proxy if you want to allow clients to stream from a Kafka cluster.
</p>
<!-- Section 5 -->
<h2 id="section-5" class="h2">5. Why need an API Proxy for Kafka?</h2>
<h3 id="section-5-1">5.1 Protocol Interoperability Issue</h3>
<p>
Kafka is a robust event streaming platform great for handling high volumes of events flowing between a relatively manageable number of producers and consumers. It uses a proprietary binary protocol designed to facilitate machine to machine communication.
</p>
<figure>
<img src="assets/media/gifs/transport_protocol.gif" class="case-study-image" />
</figure>
<p>
However, internet facing end users use their mobile phones, laptops, tablets and desktops and use the HTTP/S protocol.
As we saw earlier, the choice here is to leverage SSE and receive a continuous stream of data without any additional effort from the end user. There is a <strong>clear mismatch</strong> between Kafka and the end-user's protocol of consuming streams of data.
</p>
<h3 id="section-5-2">5.2 Using an API Proxy as Middleware</h3>
<p>
An API proxy can be designed to tackle the challenges of moving Kafka data online for public consumption. An API proxy is generally a server that sits between your web application and a backend service. Developers can build web applications using the set of API endpoints without knowing anything about the back-end.
</p>
<figure>
<img src="assets/media/gifs/api proxy.gif" class="case-study-image" />
</figure>
<p>
By positioning an API proxy between Kafka and the web, the API proxy can pull data from Kafka over Kafka’s binary protocol, and can push that data to connected devices in real-time via SSE, which is delivered over the HTTP protocol.
</p>
<h3 id="section-5-3">5.3 Additional Benefits of an API Proxy</h3>
<p>
An API Proxy not only makes client-side streaming from Kafka possible, but it also provides some additional benefits.
</p>
<div>
<ul class="chart">
<li>
<img src="assets/media/images/security_layer.png" class="icon" />
Letting end users connect directly to Kafka would present some significant security risks. Using a proxy provides an additional layer of security in between end users and your Kafka cluster.
</li>
<li>
<img src="assets/media/images/external_fanout.png" class="icon" />
An API proxy allows you to fanout Kafka records to thousands of end users over HTTP, significantly reducing the number of direct connections to your Kafka cluster. Using a proxy also allows you to dynamically scale the servers up and down to respond to external user traffic.
</li>
<li>
<img src="assets/media/images/customizable_api.png" class="icon" />
Having an additional layer between Kafka and end users enables more customizations, so you can group various Kafka topics from different Kafka clusters into any number of API end-points with any custom names you want.
</li>
<li>
<img src="assets/media/images/paas.png" class="icon" />
This additional messaging layer allows you to offload the resource-intensive task of real-time streaming to an external infrastructure, so you can focus on value creation activities that are central to your business.
</li>
</ul>
</div>
<p>
What are some existing solutions for this API proxy? Let’s look at that next.
</p>
<!-- Section 6 -->
<h2 id="section-6" class="h2">6. Existing Solutions</h2>
<h3 id="section-6-1">6.1 Paid Solutions</h3>
<p>
If SuperEats wanted to use an existing, paid solution, what options would there be?
</p>
<figure>
<img src="assets/media/images/existingSolutions.png" class="case-study-image" />
</figure>
<p>
Ably, PubNub and MigratoryData offer specialized services for Kafka. All of these companies offer feature-rich, highly scalable solutions. The main downside of these services is that you pay a high price for the convenience and ease of use, and you’re locked into their ecosystems.
</p>
<h3 id="section-6-2">6.2 DIY Approach</h3>
<p>
If you wish to go the Do-It-Yourself (DIY) route, you will find many open-source platforms and services to help you accomplish this goal.
</p>
<figure>
<img src="assets/media/images/DIY.png" class="case-study-image" />
</figure>
<p>The downside to this approach is the time, energy and expertise it requires to connect everything together. Installing and connecting all the different components is a challenge in itself, not to mention deploying and maintaining this infrastructure.
</p>
<p>
If you want to add scalability into the mix, you have to understand how these components handle load and how to effectively tweak them to meet your use case. This is not always a realistic approach for small to medium sized companies.
</p>
<h3 id="section-6-3">6.3 Fjord: the only open source, full-service solution</h3>
<p>Fjord positions itself in between the paid services and the DIY approach.
</p>
<figure>
<img src="assets/media/images/solutions_chart.png" class="case-study-image" />
</figure>
<p>
Fjord is a real-time API Proxy for Kafka.
</p>
<ul>
<li>
It is open-source, scalable, and simple to deploy to AWS.
</li>
<li>
Fjord lets you have full and exclusive ownership of the data coming in and out of your Kafka clusters.
</li>
<li>
We offer a minimal feature set and a simple developer experience so that you can easily add any additional features you want.
</li>
</ul>
<p>As far as we know, Fjord is the only open-source platform that incorporates all the infrastructure pieces you need to deploy an API proxy for Kafka right out of the box.
</p>
<!-- Section 7 -->
<h2 id="section-7" class="h2">7. Building Fjord</h2>
<h3 id="section-7-1">7.1 Overview of Triangular Pattern</h3>
<p>Let’s first look at a high-level view of how an organization like SuperEats would integrate Fjord into their infrastructure.</p>
<figure>
<img src="assets/media/gifs/triangular_pattern.gif" class="case-study-image" />
</figure>
<p>
First, SuperEats drivers, customers, and any restaurants working with SuperEats would all connect to the SuperEats web servers on a mobile app or a browser via HTTP/S (e.g., to SuperEats.com).
</p>
<figure>
<img src="assets/media/gifs/end_users_superEats.gif" class="case-study-image" />
</figure>
<p>
These servers would deliver, via either a mobile app or a browser, a page that initiates an SSE connection with (i.e. receives push updates from) the Fjord cluster deployed on SuperEats’ AWS account.
</p>
<figure>
<img src="assets/media/gifs/supereats_responds_sse.gif" class="case-study-image" />
</figure>
<p>
Fjord, in turn, would pull records from SuperEats’ Kafka cluster, and deliver them to connected clients who are interested in particular API topics.
</p>
<figure>
<img src="assets/media/gifs/pull_and_deliver.gif" class="case-study-image" />
</figure>
<p>
It’s important to note that SuperEats drivers and customers are not aware that they are also receiving a data stream from Fjord. From their perspective, all they see is that they’re connected to the SuperEats domain or app.
</p>
<p>
This allows SuperEats to deliver customized content that has the look and feel of their own website, without having to deal with the additional streaming load on their servers.
</p>
<p>This structure is commonly referred to as the triangular pattern.</p>
<h3 id="section-7-2">7.2 Design Goals</h3>
<p>We designed Fjord with four main goals in mind:</p>
<ul>
<li><strong>API Proxy</strong>: open Kafka topics to client-side streaming,</li>
<li><strong>Security</strong>: enable Fjord business users to restrict access to their Kafka stream through security parameters,</li>
<li><strong>Scalability</strong>: create a scalable platform-as-a-service (PaaS) infrastructure, and</li>
<li><strong>Ease of deployment</strong>: make deploying Fjord super simple.</li>
</ul>
<p>
In the next section, we’ll walk through the evolution of how we built Fjord to achieve these design goals, and address some of the key decisions we made along the way.
</p>
<h3 id="section-7-3">7.3 The Evolution of Fjord</h3>
<p>