Skip to content

Commit 471d712

Browse files
authored
Merge pull request #24 from RongGu/br-pr6
update architecture-and-concepts doc page
2 parents a97caf8 + 60e06e6 commit 471d712

File tree

4 files changed

+38
-0
lines changed

4 files changed

+38
-0
lines changed

docs/core-concepts/architecture-and-concepts.md

+38
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,42 @@ sidebar_label: Architecture and Concepts
33
sidebar_position: 3
44
---
55

6+
67
# Architecture and Concepts
8+
9+
## Architecture
10+
11+
Fluid is built in the Kubernetes native fashion. It lies between existing underlying cloud native storage systems and the upper layer data-intensive applications. The architecture of Fluid in Kubernetes is as following:
12+
13+
![](../static/img/docs/architecture.png)
14+
Specifically, Fluid is logically split into a data plane and a control plane.
15+
16+
- The control plane is composed of **Dataset/Runtime Controller** and **Application Manager**
17+
- **Dataset/Runtime Controller:** It manages the datasets and also automates the data operations of the dataset, like data load, data migrate, data process and so on.
18+
- **Application Manager**: It is responsible for scheduling the workload pods according to cache location and managing their life cycles.
19+
- The data plane is composed of Runtime Plugin and CSI Plugin:
20+
- **Runtime Plugin**: As a highly extensible plugin, it can help turn various data cache engines into self-managing, self-scaling, self-healing and observable cache services inside of Kubernetes by providing the common framework of Fluid.
21+
- **Data Access Plugin**: It is responsible for managing different kinds of storage clients in container mode in the same manner. It supports both CSI Plugin and sidecar mode to run FUSE containers.
22+
23+
The following diagram shows the different components.
24+
25+
![](../static/img/docs/componnents.png)
26+
27+
## Key Concepts
28+
29+
For achieving its goals, Fluid provides some core concepts.
30+
31+
**Dataset**: A Dataset is a set of data logically related that can be used by computing engines, such as Spark for big data analytics and TensorFlow for AI applications.
32+
* Same as native Kubernetes API definitions, including CRDs
33+
* Users describe the data’s source, type, access mode and cache location
34+
* Users can use observability to make scaling decisions of distributed cache
35+
36+
**Runtime**: The Runtime enforces dataset isolation/share, provides version management, and enables data acceleration by defining a set of interfaces to handle DataSets throughout their lifecycle, allowing for the implementation of management and acceleration functionalities behind these interfaces. Fluid has two kind of Runtime: CacheRuntime and ThinRuntime.
37+
* CacheRuntime, which implements distributed caching solutions including Alluxio, JuiceFS, Vineyard and others
38+
* ThinRuntime, that provides a unified access interface to systems like CubeFS, GlusterFS, NFS and others.
39+
40+
**Data Operations**: Unlike traditional PVC-based storage abstraction, Fluid takes an Application-oriented perspective to abstract the “process of manipulating data on Kubernetes”. It introduces the concept of elastic Dataset and implements it as a first-class citizen in Kubernetes to enable Dataset CRUD operation, permission control, and data access acceleration. Besides the basic operations like creation, Fluid also provides a set of operations for the defined Dataset for users to manipulate the data flow.
41+
* Data Load prefetches data from dataset source to cache system.
42+
* Data Migration syncs data between external storages and dataset .
43+
* Data Process can be used to transform, split, applying dimensionality reduction to data
44+
Distributed cache scale up and down.

static/architecture.png

-781 KB
Binary file not shown.

static/img/docs/architecture.png

768 KB
Loading

static/img/docs/componnents.png

555 KB
Loading

0 commit comments

Comments
 (0)