You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/core-concepts/architecture-and-concepts.md
+38
Original file line number
Diff line number
Diff line change
@@ -3,4 +3,42 @@ sidebar_label: Architecture and Concepts
3
3
sidebar_position: 3
4
4
---
5
5
6
+
6
7
# Architecture and Concepts
8
+
9
+
## Architecture
10
+
11
+
Fluid is built in the Kubernetes native fashion. It lies between existing underlying cloud native storage systems and the upper layer data-intensive applications. The architecture of Fluid in Kubernetes is as following:
12
+
13
+

14
+
Specifically, Fluid is logically split into a data plane and a control plane.
15
+
16
+
- The control plane is composed of **Dataset/Runtime Controller** and **Application Manager**
17
+
-**Dataset/Runtime Controller:** It manages the datasets and also automates the data operations of the dataset, like data load, data migrate, data process and so on.
18
+
-**Application Manager**: It is responsible for scheduling the workload pods according to cache location and managing their life cycles.
19
+
- The data plane is composed of Runtime Plugin and CSI Plugin:
20
+
-**Runtime Plugin**: As a highly extensible plugin, it can help turn various data cache engines into self-managing, self-scaling, self-healing and observable cache services inside of Kubernetes by providing the common framework of Fluid.
21
+
-**Data Access Plugin**: It is responsible for managing different kinds of storage clients in container mode in the same manner. It supports both CSI Plugin and sidecar mode to run FUSE containers.
22
+
23
+
The following diagram shows the different components.
24
+
25
+

26
+
27
+
## Key Concepts
28
+
29
+
For achieving its goals, Fluid provides some core concepts.
30
+
31
+
**Dataset**: A Dataset is a set of data logically related that can be used by computing engines, such as Spark for big data analytics and TensorFlow for AI applications.
32
+
* Same as native Kubernetes API definitions, including CRDs
33
+
* Users describe the data’s source, type, access mode and cache location
34
+
* Users can use observability to make scaling decisions of distributed cache
35
+
36
+
**Runtime**: The Runtime enforces dataset isolation/share, provides version management, and enables data acceleration by defining a set of interfaces to handle DataSets throughout their lifecycle, allowing for the implementation of management and acceleration functionalities behind these interfaces. Fluid has two kind of Runtime: CacheRuntime and ThinRuntime.
37
+
* CacheRuntime, which implements distributed caching solutions including Alluxio, JuiceFS, Vineyard and others
38
+
* ThinRuntime, that provides a unified access interface to systems like CubeFS, GlusterFS, NFS and others.
39
+
40
+
**Data Operations**: Unlike traditional PVC-based storage abstraction, Fluid takes an Application-oriented perspective to abstract the “process of manipulating data on Kubernetes”. It introduces the concept of elastic Dataset and implements it as a first-class citizen in Kubernetes to enable Dataset CRUD operation, permission control, and data access acceleration. Besides the basic operations like creation, Fluid also provides a set of operations for the defined Dataset for users to manipulate the data flow.
41
+
* Data Load prefetches data from dataset source to cache system.
42
+
* Data Migration syncs data between external storages and dataset .
43
+
* Data Process can be used to transform, split, applying dimensionality reduction to data
0 commit comments