You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+41-261
Original file line number
Diff line number
Diff line change
@@ -8,23 +8,12 @@
8
8
<palign="center">Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB. Visit <ahref="https://olake.io/"target="_blank">olake.io/docs</a> for the full documentation, and benchmarks</p>
💡 **Note:** In below configurations replace `olake_folder_path` with the newly created folder path.
46
-
47
-
</div>
48
-
2. Inside this folder, create two files:
49
-
- config.json: This file contains your connection details. You can find examples and instructions [here](https://github.com/datazip-inc/olake/tree/master/drivers/mongodb#config-file).
50
-
- writer.json: This file specifies where to save your data (local machine or S3).
51
-
52
-
#### Example Structure of `writer.json` :
53
-
Example (For Local):
54
-
```json
55
-
{
56
-
"type": "PARQUET",
57
-
"writer": {
58
-
"normalization":false, // to enable/disable level one flattening
59
-
"local_path": "/mnt/config/{olake_reader}"// replace olake_reader with desired folder name
60
-
}
61
-
}
62
-
```
63
-
Example (For S3):
64
-
```json
65
-
{
66
-
"type": "PARQUET",
67
-
"writer": {
68
-
"normalization":false, // to enable/disable level one flattening
69
-
"s3_bucket": "olake",
70
-
"s3_region": "",
71
-
"s3_access_key": "",
72
-
"s3_secret_key": "",
73
-
"s3_path": ""
74
-
}
75
-
}
76
-
```
77
-
2. ### Generate a Catalog File
78
-
79
-
Run the discovery process to identify your MongoDB data:
80
-
```bash
81
-
docker run -v olake_folder_path:/mnt/config olakego/source-mongodb:latest discover --config /mnt/config/config.json
82
-
```
83
-
This will create a catalog.json file in your folder. The file lists the data streams from your MongoDB
For more details, refer to the [documentation](https://olake.io/docs).
171
-
172
-
173
-
174
-
## Benchmark Results: Refer to this doc for complete information
175
-
176
-
### Speed Comparison: Full Load Performance
177
-
178
-
For a collection of 230 million rows (664.81GB) from [Twitter data](https://archive.org/details/archiveteam-twitter-stream-2017-11), here's how Olake compares to other tools:
Cost Comparison: (Considering 230 million first full load & 50 million rows incremental rows per month) as dated 30th September 2025: Find more [here](https://olake.io/docs/connectors/mongodb/benchmarks).
198
-
199
-
200
-
201
-
### Testing Infrastructure
202
-
203
-
Virtual Machine: `Standard_D64as_v5`
204
-
205
-
- CPU: `64` vCPUs
206
-
- Memory: `256` GiB RAM
207
-
- Storage: `250` GB of shared storage
208
-
209
-
### MongoDB Setup:
210
-
211
-
- 3 Nodes running in a replica set configuration:
212
-
- 1 Primary Node (Master) that handles all write operations.
213
-
- 2 Secondary Nodes (Replicas) that replicate data from the primary node.
214
-
215
-
Find more [here](https://olake.io/docs/connectors/mongodb/benchmarks).
216
-
25
+
## Getting Started with OLake
217
26
27
+
### Source / Connectors
28
+
1.[Getting started Postgres -> Writers](https://github.com/datazip-inc/olake/tree/master/drivers/postgres) | [Postgres Docs](https://olake.io/docs/category/postgres)
29
+
2.[Getting started MongoDB -> Writers](https://github.com/datazip-inc/olake/tree/master/drivers/mongodb) | [MongoDB Docs](https://olake.io/docs/category/mongodb)
30
+
3.[Getting started MySQL -> Writers](https://github.com/datazip-inc/olake/tree/master/drivers/mysql) | [MySQL Docs](https://olake.io/docs/category/mysql)
218
31
219
-
Detailed roadmap can be found on [GitHub OLake Roadmap 2024-25](https://github.com/orgs/datazip-inc/projects/5)
We have additionally planned the following sources - [AWS S3](https://github.com/datazip-inc/olake/issues/86) | [Kafka](https://github.com/datazip-inc/olake/issues/87)
@@ -257,13 +70,7 @@ We have additionally planned the following sources - [AWS S3](https://github.co
257
70
| Azure Purview | Not Planned, [submit a request](https://github.com/datazip-inc/olake/issues/new?template=new-feature.md)|
258
71
| BigLake Metastore | Not Planned, [submit a request](https://github.com/datazip-inc/olake/issues/new?template=new-feature.md)|
259
72
260
-
261
-
262
-
See [Roadmap](https://github.com/orgs/datazip-inc/projects/5) for more details.
263
-
264
-
265
-
### Core
266
-
73
+
## Core
267
74
Core or framework is the component/logic that has been abstracted out from Connectors to follow DRY. This includes base CLI commands, State logic, Validation logic, Type detection for unstructured data, handling Config, State, Catalog, and Writer config file, logging etc.
268
75
269
76
Core includes http server that directly exposes live stats about running sync such as:
@@ -279,42 +86,15 @@ Core handles the commands to interact with a driver via these:
279
86
280
87
Find more about how OLake works [here.](https://olake.io/docs/category/understanding-olake)
281
88
282
-
### SDKs
283
-
284
-
SDKs are libraries/packages that can orchestrate the connector in two environments i.e. Docker and Kubernetes. These SDKs can be directly consumed by users similar to PyAirbyte, DLT-hub.
285
-
286
-
(Unconfirmed) SDKs can interact with Connectors via potential GRPC server to override certain default behavior of the system by adding custom functions to enable features like Transformation, Custom Table Name via writer, or adding hooks.
287
-
288
-
### Olake
289
-
290
-
Olake will be built on top of SDK providing persistent storage and a user interface that enables orchestration directly from your machine with default writer mode as `S3 Iceberg Parquet`
291
-
292
-
89
+
## Roadmap
90
+
Checkout [GitHub Project Roadmap](https://github.com/orgs/datazip-inc/projects/5) and [Upcoming OLake Roadmap](https://olake.io/docs/roadmap) to track and influence the way we build it.
91
+
If you have any ideas, questions, or any feedback, please share on our [Github Discussions](https://github.com/datazip-inc/olake/discussions) or raise an issue.
293
92
294
93
## Contributing
94
+
We ❤️ contributions big or small check our [Bounty Program](https://olake.io/docs/community/issues-and-prs#goodies). As always, thanks to our amazing [contributors!](https://github.com/datazip-inc/olake/graphs/contributors).
95
+
- To contribute to Olake Check [CONTRIBUTING.md](CONTRIBUTING.md)
96
+
- To contribute to UI, visit [OLake UI Repository](https://github.com/datazip-inc/olake-frontend/).
97
+
- To contribute to OLake website and documentation (olake.io), visit [Olake Docs Repository][GITHUB_DOCS].
295
98
296
-
We ❤️ contributions big or small. Please read [CONTRIBUTING.md](CONTRIBUTING.md) to get started with making contributions to OLake.
297
-
298
-
- To contribute to Frontend, go to [OLake Frontend GitHub repo](https://github.com/datazip-inc/olake-frontend/).
299
-
300
-
- To contribute to OLake website and documentation (olake.io), go to [OLake Frontend GitHub repo](https://github.com/datazip-inc/olake-docs).
301
-
302
-
Not sure how to get started? Just ping us on `#contributing-to-olake` in our [slack community](https://olake.io/slack)
303
-
304
-
## [Documentation](olake.io/docs)
305
-
306
-
307
-
If you need any clarification or find something missing, feel free to raise a GitHub issue with the label `documentation` at [olake-docs](https://github.com/datazip-inc/olake-docs/) repo or reach out to us at the community slack channel.
308
-
309
-
310
-
311
-
312
-
## Community
313
-
314
-
Join the [slack community](https://olake.io/slack) to know more about OLake, future roadmaps and community meetups, about Data Lakes and Lakehouses, the Data Engineering Ecosystem and to connect with other users and contributors.
315
-
316
-
Checkout [OLake Roadmap](https://olake.io/docs/roadmap) to track and influence the way we build it, your expert opinion is always welcomed for us to build a best class open source offering in Data space.
317
-
318
-
If you have any ideas, questions, or any feedback, please share on our [Github Discussions](https://github.com/datazip-inc/olake/discussions) or raise an issue.
319
-
320
-
As always, thanks to our amazing [contributors!](https://github.com/datazip-inc/olake/graphs/contributors)
0 commit comments