The cluster-concepts document (#436)

CRZbulabula · web-flow · commit 5555346af4e8 · 2024-11-26T13:12:28.000+08:00
* draft of cluster-concepts

* version control

* rollback data type doc

* Alter sidebar
diff --git a/src/.vuepress/sidebar/V1.3.0-2/en.ts b/src/.vuepress/sidebar/V1.3.0-2/en.ts
@@ -51,6 +51,7 @@ export const enSidebar = {
         { text: 'Data Model', link: 'Data-Model-and-Terminology' },
         { text: 'Data Type', link: 'Data-Type' },
         { text: 'Encoding and Compression', link: 'Encoding-and-Compression' },
+        { text: 'Cluster-related Concepts', link: 'Cluster-Concept' },
         { text: 'Data Partitioning & Load Balancing', link: 'Cluster-data-partitioning' },
       ],
     },
diff --git a/src/.vuepress/sidebar/V1.3.0-2/zh.ts b/src/.vuepress/sidebar/V1.3.0-2/zh.ts
@@ -51,6 +51,7 @@ export const zhSidebar = {
         { text: '数据模型', link: 'Data-Model-and-Terminology' },
         { text: '数据类型', link: 'Data-Type' },
         { text: '编码和压缩', link: 'Encoding-and-Compression' },
+        { text: '集群相关概念', link: 'Cluster-Concept' },
         { text: '数据分区与负载均衡', link: 'Cluster-data-partitioning' },
       ],
     },
diff --git a/src/.vuepress/sidebar/V1.3.x/en.ts b/src/.vuepress/sidebar/V1.3.x/en.ts
@@ -39,6 +39,7 @@ export const enSidebar = {
       prefix: 'Preparatory-knowledge/',
       children: [
         { text: 'Data Type', link: 'Data-Type' },
+        { text: 'Cluster-related Concepts', link: 'Cluster-Concept' },
       ],
     },
     {
diff --git a/src/.vuepress/sidebar/V1.3.x/zh.ts b/src/.vuepress/sidebar/V1.3.x/zh.ts
@@ -39,6 +39,7 @@ export const zhSidebar = {
       prefix: 'Preparatory-knowledge/',
       children: [
         { text: '数据类型', link: 'Data-Type' },
+        { text: '集群相关概念', link: 'Cluster-Concept' },
       ],
     },
     {
diff --git a/src/.vuepress/sidebar_timecho/V1.3.0-2/en.ts b/src/.vuepress/sidebar_timecho/V1.3.0-2/en.ts
@@ -51,6 +51,7 @@ export const enSidebar = {
         { text: 'Data Model', link: 'Data-Model-and-Terminology' },
         { text: 'Data Type', link: 'Data-Type' },
         { text: 'Encoding and Compression', link: 'Encoding-and-Compression' },
+        { text: 'Cluster-related Concepts', link: 'Cluster-Concept' },
         { text: 'Data Partitioning & Load Balancing', link: 'Cluster-data-partitioning' },
       ],
     },
diff --git a/src/.vuepress/sidebar_timecho/V1.3.0-2/zh.ts b/src/.vuepress/sidebar_timecho/V1.3.0-2/zh.ts
@@ -51,6 +51,7 @@ export const zhSidebar = {
         { text: '数据模型', link: 'Data-Model-and-Terminology' },
         { text: '数据类型', link: 'Data-Type' },
         { text: '编码和压缩', link: 'Encoding-and-Compression' },
+        { text: '集群相关概念', link: 'Cluster-Concept' },
         { text: '数据分区与负载均衡', link: 'Cluster-data-partitioning' },
       ],
     },
diff --git a/src/.vuepress/sidebar_timecho/V1.3.x/en.ts b/src/.vuepress/sidebar_timecho/V1.3.x/en.ts
@@ -39,6 +39,7 @@ export const enSidebar = {
       prefix: 'Preparatory-knowledge/',
       children: [
         { text: 'Data Type', link: 'Data-Type' },
+        { text: 'Cluster-related Concepts', link: 'Cluster-Concept' },
       ],
     },
     {
diff --git a/src/.vuepress/sidebar_timecho/V1.3.x/zh.ts b/src/.vuepress/sidebar_timecho/V1.3.x/zh.ts
@@ -39,6 +39,7 @@ export const zhSidebar = {
       prefix: 'Preparatory-knowledge/',
       children: [
         { text: '数据类型', link: 'Data-Type' },
+        { text: '集群相关概念', link: 'Cluster-Concept' },
       ],
     },
     {
diff --git a/src/UserGuide/Master/Tree/Preparatory-knowledge/Cluster-Concept.md b/src/UserGuide/Master/Tree/Preparatory-knowledge/Cluster-Concept.md
@@ -0,0 +1,59 @@
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+    
+        http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+-->
+
+# Cluster-related Concepts
+The figure below illustrates a typical IoTDB 3C3D1A cluster deployment mode, comprising 3 ConfigNodes, 3 DataNodes, and 1 AINode:  
+<img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://alioss.timecho.com/docs/img/Common-Concepts_02.png">
+
+This deployment involves several key concepts that users commonly encounter when working with IoTDB clusters, including:  
+- **Nodes** (ConfigNode, DataNode, AINode);  
+- **Slots** (SchemaSlot, DataSlot);  
+- **Regions** (SchemaRegion, DataRegion);  
+- **Replica Groups**.
+
+The following sections will provide a detailed introduction to these concepts.
+
+## Nodes
+
+An IoTDB cluster consists of three types of nodes (processes): **ConfigNode** (the main node), **DataNode**, and **AINode**, as detailed below:
+- **ConfigNode:** ConfigNodes store cluster configurations, database metadata, the routing information of time series' schema and data. They also monitor cluster nodes and conduct load balancing. All ConfigNodes maintain full mutual backups, as shown in the figure with ConfigNode-1, ConfigNode-2, and ConfigNode-3. ConfigNodes do not directly handle client read or write requests. Instead, they guide the distribution of time series' schema and data within the cluster using a series of [load balancing algorithms](https://iotdb.apache.org/UserGuide/latest/Technical-Insider/Cluster-data-partitioning.html).
+- **DataNode:** DataNodes are responsible for reading and writing time series' schema and data. Each DataNode can accept client read and write requests and provide corresponding services, as illustrated with DataNode-1, DataNode-2, and DataNode-3 in the above figure. When a DataNode receives client requests, it can process them directly or forward them if it has the relevant routing information cached locally. Otherwise, it queries the ConfigNode for routing details and caches the information to improve the efficiency of subsequent requests.
+- **AINode:** AINodes interact with ConfigNodes and DataNodes to extend IoTDB's capabilities for data intelligence analysis on time series data. They support registering pre-trained machine learning models from external sources and performing time series analysis tasks using simple SQL statements on specified data. This process integrates model creation, management, and inference within the database engine. Currently, the system provides built-in algorithms or self-training models for common time series analysis scenarios, such as forecasting and anomaly detection.
+
+## Slots
+
+IoTDB divides time series' schema and data into smaller, more manageable units called **slots**. Slots are logical entities, and in an IoTDB cluster, the **SchemaSlots** and **DataSlots** are defined as follows:
+- **SchemaSlot:** A SchemaSlot represents a subset of the time series' schema collection. The total number of SchemaSlots is fixed, with a default value of 1000. IoTDB uses a hashing algorithm to evenly distribute all devices across these SchemaSlots.
+- **DataSlot:** A DataSlot represents a subset of the time series' data collection. Based on the SchemaSlots, the data for corresponding devices is further divided into DataSlots by a fixed time interval. The default time interval for a DataSlot is 7 days.
+
+## Region
+
+In IoTDB, time series' schema and data are replicated across DataNodes to ensure high availability in the cluster. However, replicating data at the slot level can increase management complexity and reduce write throughput. To address this, IoTDB introduces the concept of **Region**, which groups SchemaSlots and DataSlots into **SchemaRegions** and **DataRegions** respectively. Replication is then performed at the Region level. The definitions of SchemaRegion and DataRegion are as follows:
+- **SchemaRegion**: A SchemaRegion is the basic unit for storing and replicating time series' schema. All SchemaSlots in a database are evenly distributed across the database's SchemaRegions. SchemaRegions with the same RegionID are replicas of each other. For example, in the figure above, SchemaRegion-1 has three replicas located on DataNode-1, DataNode-2, and DataNode-3.  
+- **DataRegion**: A DataRegion is the basic unit for storing and replicating time series' data. All DataSlots in a database are evenly distributed across the database's DataRegions. DataRegions with the same RegionID are replicas of each other. For instance, in the figure above, DataRegion-2 has two replicas located on DataNode-1 and DataNode-2.  
+
+## Replica Groups
+Region replicas are critical for the fault tolerance of the cluster. Each Region's replicas are organized into **replica groups**, where the replicas are assigned roles as either **leader** or **follower**, working together to provide read and write services. Recommended replica group configurations under different architectures are as follows:
+
+| Category     | Parameter       | Single-node Recommended Configuration | Distributed Recommended Configuration |
+|:------------:|:-----------------------:|:------------------------------------:|:-------------------------------------:|
+| Schema     | `schema_replication_factor` | 1                                    | 3                                     |
+| Data         | `data_replication_factor`   | 1                                    | 2                                     |
diff --git a/src/UserGuide/V1.3.0-2/Preparatory-knowledge/Cluster-Concept.md b/src/UserGuide/V1.3.0-2/Preparatory-knowledge/Cluster-Concept.md
@@ -0,0 +1,59 @@
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+    
+        http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+-->
+
+# Cluster-related Concepts
+The figure below illustrates a typical IoTDB 3C3D1A cluster deployment mode, comprising 3 ConfigNodes, 3 DataNodes, and 1 AINode:  
+<img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://alioss.timecho.com/docs/img/Common-Concepts_02.png">
+
+This deployment involves several key concepts that users commonly encounter when working with IoTDB clusters, including:  
+- **Nodes** (ConfigNode, DataNode, AINode);  
+- **Slots** (SchemaSlot, DataSlot);  
+- **Regions** (SchemaRegion, DataRegion);  
+- **Replica Groups**.
+
+The following sections will provide a detailed introduction to these concepts.
+
+## Nodes
+
+An IoTDB cluster consists of three types of nodes (processes): **ConfigNode** (the main node), **DataNode**, and **AINode**, as detailed below:
+- **ConfigNode:** ConfigNodes store cluster configurations, database metadata, the routing information of time series' schema and data. They also monitor cluster nodes and conduct load balancing. All ConfigNodes maintain full mutual backups, as shown in the figure with ConfigNode-1, ConfigNode-2, and ConfigNode-3. ConfigNodes do not directly handle client read or write requests. Instead, they guide the distribution of time series' schema and data within the cluster using a series of [load balancing algorithms](https://iotdb.apache.org/UserGuide/latest/Technical-Insider/Cluster-data-partitioning.html).
+- **DataNode:** DataNodes are responsible for reading and writing time series' schema and data. Each DataNode can accept client read and write requests and provide corresponding services, as illustrated with DataNode-1, DataNode-2, and DataNode-3 in the above figure. When a DataNode receives client requests, it can process them directly or forward them if it has the relevant routing information cached locally. Otherwise, it queries the ConfigNode for routing details and caches the information to improve the efficiency of subsequent requests.
+- **AINode:** AINodes interact with ConfigNodes and DataNodes to extend IoTDB's capabilities for data intelligence analysis on time series data. They support registering pre-trained machine learning models from external sources and performing time series analysis tasks using simple SQL statements on specified data. This process integrates model creation, management, and inference within the database engine. Currently, the system provides built-in algorithms or self-training models for common time series analysis scenarios, such as forecasting and anomaly detection.
+
+## Slots
+
+IoTDB divides time series' schema and data into smaller, more manageable units called **slots**. Slots are logical entities, and in an IoTDB cluster, the **SchemaSlots** and **DataSlots** are defined as follows:
+- **SchemaSlot:** A SchemaSlot represents a subset of the time series' schema collection. The total number of SchemaSlots is fixed, with a default value of 1000. IoTDB uses a hashing algorithm to evenly distribute all devices across these SchemaSlots.
+- **DataSlot:** A DataSlot represents a subset of the time series' data collection. Based on the SchemaSlots, the data for corresponding devices is further divided into DataSlots by a fixed time interval. The default time interval for a DataSlot is 7 days.
+
+## Region
+
+In IoTDB, time series' schema and data are replicated across DataNodes to ensure high availability in the cluster. However, replicating data at the slot level can increase management complexity and reduce write throughput. To address this, IoTDB introduces the concept of **Region**, which groups SchemaSlots and DataSlots into **SchemaRegions** and **DataRegions** respectively. Replication is then performed at the Region level. The definitions of SchemaRegion and DataRegion are as follows:
+- **SchemaRegion**: A SchemaRegion is the basic unit for storing and replicating time series' schema. All SchemaSlots in a database are evenly distributed across the database's SchemaRegions. SchemaRegions with the same RegionID are replicas of each other. For example, in the figure above, SchemaRegion-1 has three replicas located on DataNode-1, DataNode-2, and DataNode-3.  
+- **DataRegion**: A DataRegion is the basic unit for storing and replicating time series' data. All DataSlots in a database are evenly distributed across the database's DataRegions. DataRegions with the same RegionID are replicas of each other. For instance, in the figure above, DataRegion-2 has two replicas located on DataNode-1 and DataNode-2.  
+
+## Replica Groups
+Region replicas are critical for the fault tolerance of the cluster. Each Region's replicas are organized into **replica groups**, where the replicas are assigned roles as either **leader** or **follower**, working together to provide read and write services. Recommended replica group configurations under different architectures are as follows:
+
+| Category     | Parameter       | Single-node Recommended Configuration | Distributed Recommended Configuration |
+|:------------:|:-----------------------:|:------------------------------------:|:-------------------------------------:|
+| Schema     | `schema_replication_factor` | 1                                    | 3                                     |
+| Data         | `data_replication_factor`   | 1                                    | 2                                     |
diff --git a/src/UserGuide/latest/Preparatory-knowledge/Cluster-Concept.md b/src/UserGuide/latest/Preparatory-knowledge/Cluster-Concept.md
diff --git a/src/zh/UserGuide/Master/Tree/Preparatory-knowledge/Cluster-Concept.md b/src/zh/UserGuide/Master/Tree/Preparatory-knowledge/Cluster-Concept.md
diff --git a/src/zh/UserGuide/V1.3.0-2/Preparatory-knowledge/Cluster-Concept.md b/src/zh/UserGuide/V1.3.0-2/Preparatory-knowledge/Cluster-Concept.md
diff --git a/src/zh/UserGuide/latest/Preparatory-knowledge/Cluster-Concept.md b/src/zh/UserGuide/latest/Preparatory-knowledge/Cluster-Concept.md

Original file line number	Diff line number	Diff line change
`@@ -39,6 +39,7 @@ export const enSidebar = {`
`39`	`39`	`prefix: 'Preparatory-knowledge/',`
`40`	`40`	`children: [`
`41`	`41`	`{ text: 'Data Type', link: 'Data-Type' },`
	`42`	`+ { text: 'Cluster-related Concepts', link: 'Cluster-Concept' },`
`42`	`43`	`],`
`43`	`44`	`},`
`44`	`45`	`{`
Original file line number	Diff line number	Diff line change
`@@ -39,6 +39,7 @@ export const zhSidebar = {`
`39`	`39`	`prefix: 'Preparatory-knowledge/',`
`40`	`40`	`children: [`
`41`	`41`	`{ text: '数据类型', link: 'Data-Type' },`
	`42`	`+ { text: '集群相关概念', link: 'Cluster-Concept' },`
`42`	`43`	`],`
`43`	`44`	`},`
`44`	`45`	`{`