Skip to content

Commit 6f19f98

Browse files
Update README.md to add whitespace for consistency (#105)
Signed-off-by: Priyansh Khodiyar <[email protected]> Co-authored-by: Ankit Sharma <[email protected]>
1 parent ed705d2 commit 6f19f98

File tree

2 files changed

+76
-43
lines changed

2 files changed

+76
-43
lines changed

README.md

+72-41
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,13 @@
1616
</a>
1717
</p>
1818

19-
19+
2020
<h3 align="center">
2121
<a href="https://olake.io/docs"><b>Documentation</b></a> &bull;
22-
<a href="https://twitter.com/_olake"><b>Twitter</b></a>
22+
<a href="https://twitter.com/_olake"><b>Twitter</b></a> &bull;
23+
<a href="https://www.youtube.com/@olakeio"><b>YouTube</b></a> &bull;
24+
<a href="https://meetwaves.com/library/olake"><b>Slack Knowledgebase</b></a> &bull;
25+
<a href="https://olake.io/blog"><b>Blogs</b></a>
2326
</h3>
2427

2528

@@ -174,24 +177,24 @@ For more details, refer to the [documentation](https://olake.io/docs).
174177

175178
For a collection of 230 million rows (664.81GB) from [Twitter data](https://archive.org/details/archiveteam-twitter-stream-2017-11), here's how Olake compares to other tools:
176179

177-
| Tool | Full Load Time | Performance |
178-
|-------------------|-------------------|----------------------|
179-
| **Olake** | 46 mins | X times faster |
180-
| **Fivetran** | 4 hours 39 mins (279 mins) | 6x slower |
181-
| **Airbyte** | 16 hours (960 mins) | 20x slower |
182-
| **Debezium (Embedded)** | 11.65 hours (699 mins) | 15x slower |
180+
| Tool | Full Load Time | Performance |
181+
| ----------------------- | -------------------------- | -------------- |
182+
| **Olake** | 46 mins | X times faster |
183+
| **Fivetran** | 4 hours 39 mins (279 mins) | 6x slower |
184+
| **Airbyte** | 16 hours (960 mins) | 20x slower |
185+
| **Debezium (Embedded)** | 11.65 hours (699 mins) | 15x slower |
183186

184187

185188
### Incremental Sync Performance
186189

187-
| Tool | Incremental Sync Time | Records per Second (r/s) | Performance |
188-
|----------------------|------------------------|---------------------------|------------------|
189-
| **Olake** | 28.3 sec | 35,694 r/s | X times faster |
190-
| **Fivetran** | 3 min 10 sec | 5,260 r/s | 6.7x slower |
191-
| **Airbyte** | 12 min 44 sec | 1,308 r/s | 27.3x slower |
192-
| **Debezium (Embedded)** | 12 min 44 sec | 1,308 r/s | 27.3x slower |
190+
| Tool | Incremental Sync Time | Records per Second (r/s) | Performance |
191+
| ----------------------- | --------------------- | ------------------------ | -------------- |
192+
| **Olake** | 28.3 sec | 35,694 r/s | X times faster |
193+
| **Fivetran** | 3 min 10 sec | 5,260 r/s | 6.7x slower |
194+
| **Airbyte** | 12 min 44 sec | 1,308 r/s | 27.3x slower |
195+
| **Debezium (Embedded)** | 12 min 44 sec | 1,308 r/s | 27.3x slower |
193196

194-
Cost Comparison: (Considering 230 million first full load & 50 million rows incremental rows per month) as dated 30th September: Find more [here](https://olake.io/docs/connectors/mongodb/benchmarks).
197+
Cost Comparison: (Considering 230 million first full load & 50 million rows incremental rows per month) as dated 30th September 2025: Find more [here](https://olake.io/docs/connectors/mongodb/benchmarks).
195198

196199

197200

@@ -212,44 +215,69 @@ Virtual Machine: `Standard_D64as_v5`
212215
Find more [here](https://olake.io/docs/connectors/mongodb/benchmarks).
213216

214217

215-
## Components
216-
### Drivers
217218

218-
Drivers aka Connectors/Source that includes the logic for interacting with database. Upcoming drivers being planned are
219-
- [x] MongoDB ([Documentation](https://github.com/datazip-inc/olake/tree/master/drivers/mongodb))
220-
- [ ] MySQL (Coming Soon!)
221-
- [ ] Postgres (Coming Soon!)
222-
- [ ] DynamoDB
223-
- [ ] Kafka
219+
Detailed roadmap can be found on [GitHub OLake Roadmap 2024-25](https://github.com/orgs/datazip-inc/projects/5)
220+
221+
## Source Connector Level Functionalities Supported
222+
223+
| Connector Functionalities | MongoDB [(docs)](https://olake.io/docs/connectors/mongodb/overview) | Postgres [(docs)](https://olake.io/docs/connectors/postgres/overview) | MySQL [(docs)](https://olake.io/docs/connectors/mysql/overview) |
224+
| ------------------------- | ------- | -------- | ------------------------------------------------------------ |
225+
| Full Refresh Sync Mode | ✅ | ✅ | ✅ |
226+
| Incremental Sync Mode | ❌ | ❌ | ❌ |
227+
| CDC Sync Mode | ✅ | ✅ | ✅ |
228+
| Full Parallel Processing | ✅ | ✅ | ✅ |
229+
| CDC Parallel Processing | ✅ | ❌ | ❌ |
230+
| Resumable Full Load | ✅ | ✅ | ✅ |
231+
| CDC Heart Beat | ❌ | ❌ | ❌ |
232+
233+
We have additionally planned the following sources - [AWS S3](https://github.com/datazip-inc/olake/issues/86) | [Kafka](https://github.com/datazip-inc/olake/issues/87)
234+
235+
236+
## Writer Level Functionalities Supported
224237

238+
| Features/Functionality | Local Filesystem [(docs)](https://olake.io/docs/writers/local) | AWS S3 [(docs)](https://olake.io/docs/writers/s3/overview) | Iceberg (WIP) |
239+
| ------------------------------- | ---------------------- | --- | ------------- |
240+
| Flattening & Normalization (L1) | ✅ | ✅ | |
241+
| Partitioning | ✅ | ✅ | |
242+
| Schema Changes | ✅ | ✅ | |
243+
| Schema Evolution | ✅ | ✅ | |
225244

226245

227-
### Writers
246+
## Catalogue Support
228247

229-
Writers are directly integrated into drivers to avoid blockage of writing/reading into/from os.StdOut or any other type of I/O. This enables direct insertion of records from each individual fired query to the destination.
248+
| Catalogues | Support |
249+
| -------------------------- | -------------------------------------------------------------------------------------------------------- |
250+
| Glue Catalog | [WIP](https://github.com/datazip-inc/olake/pull/113) |
251+
| Hive Meta Store | Upcoming |
252+
| JDBC Catalogue | Upcoming |
253+
| REST Catalogue - Nessie | Upcoming |
254+
| REST Catalogue - Polaris | Upcoming |
255+
| REST Catalogue - Unity | Upcoming |
256+
| REST Catalogue - Gravitino | Upcoming |
257+
| Azure Purview | Not Planned, [submit a request](https://github.com/datazip-inc/olake/issues/new?template=new-feature.md) |
258+
| BigLake Metastore | Not Planned, [submit a request](https://github.com/datazip-inc/olake/issues/new?template=new-feature.md) |
259+
260+
261+
262+
See [Roadmap](https://github.com/orgs/datazip-inc/projects/5) for more details.
230263

231-
Writers are being planned in this order
232-
- [x] Parquet Writer (Writes Parquet files on Local/S3)
233-
- [ ] S3 Iceberg Parquet (Coming Soon!)
234-
- [ ] Snowflake
235-
- [ ] BigQuery
236-
- [ ] RedShift
237264

238265
### Core
239266

240267
Core or framework is the component/logic that has been abstracted out from Connectors to follow DRY. This includes base CLI commands, State logic, Validation logic, Type detection for unstructured data, handling Config, State, Catalog, and Writer config file, logging etc.
241268

242-
Core includes http server that directly exposes live stats about running sync such as
269+
Core includes http server that directly exposes live stats about running sync such as:
243270
- Possible finish time
244271
- Concurrently running processes
245272
- Live record count
246273

247-
Core handles the commands to interact with a driver via these
248-
- spec command: Returns render-able JSON Schema that can be consumed by rjsf libraries in frontend
249-
- check command: performs all necessary checks on the Config, Catalog, State and Writer config
250-
- discover command: Returns all streams and their schema
251-
- sync command: Extracts data out of Source and writes into destinations
274+
Core handles the commands to interact with a driver via these:
275+
- `spec` command: Returns render-able JSON Schema that can be consumed by rjsf libraries in frontend
276+
- `check` command: performs all necessary checks on the Config, Catalog, State and Writer config
277+
- `discover` command: Returns all streams and their schema
278+
- `sync` command: Extracts data out of Source and writes into destinations
252279

280+
Find more about how OLake works [here.](https://olake.io/docs/category/understanding-olake)
253281

254282
### SDKs
255283

@@ -267,15 +295,18 @@ Olake will be built on top of SDK providing persistent storage and a user interf
267295

268296
We ❤️ contributions big or small. Please read [CONTRIBUTING.md](CONTRIBUTING.md) to get started with making contributions to OLake.
269297

298+
- To contribute to Frontend, go to [OLake Frontend GitHub repo](https://github.com/datazip-inc/olake-frontend/).
299+
300+
- To contribute to OLake website and documentation (olake.io), go to [OLake Frontend GitHub repo](https://github.com/datazip-inc/olake-docs).
301+
270302
Not sure how to get started? Just ping us on `#contributing-to-olake` in our [slack community](https://olake.io/slack)
271303

272-
<br /><br />
304+
## [Documentation](olake.io/docs)
305+
273306

274-
## Documentation
307+
If you need any clarification or find something missing, feel free to raise a GitHub issue with the label `documentation` at [olake-docs](https://github.com/datazip-inc/olake-docs/) repo or reach out to us at the community slack channel.
275308

276-
You can find docs at https://olake.io/docs. If you need any clarification or find something missing, feel free to raise a GitHub issue with the label `documentation` at [olake-docs](https://github.com/datazip-inc/olake-docs/) repo or reach out to us at the community slack channel.
277309

278-
<br /><br />
279310

280311

281312
## Community

drivers/mongodb/README.md

+4-2
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Add MongoDB credentials in following format in config.json file
4242
"server-ram": 16,
4343
"database": "database",
4444
"max_threads": 50,
45-
"default_mode" :"cdc",
45+
"default_mode" : "cdc",
4646
"backoff_retry_count": 2,
4747
"partition_strategy":""
4848
}
@@ -198,4 +198,6 @@ You can save the state in a `state.json` file using the following format:
198198
}
199199
]
200200
}
201-
```
201+
```
202+
203+
For more information, refer to [MongoDB Connector Docs](https://olake.io/docs/connectors/mongodb/overview)

0 commit comments

Comments
 (0)