Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ Open Source Handbook is a resource for people of **all skill and experience leve
- [Audio visualization](https://github.com/willianjusten/awesome-audio-visualization)
- [Big data](categories/big-data.md)
- [Datasets](https://github.com/awesomedata/awesome-public-datasets/blob/master/README.rst)
- [Cloud Native & DevOps](categories/cloud-native.md)
- [Frameworks](https://github.com/topics/framework)
- [Gaming](https://gist.github.com/roachhd/d579b58148d7e36a6b72)
- [iOS development](https://github.com/dkhamsing/open-source-ios-apps/blob/master/APPSTORE.md#apple-watch)
Expand All @@ -60,6 +61,10 @@ Open Source Handbook is a resource for people of **all skill and experience leve
- Web development
- [Front-end tools and resources](https://github.com/MilanAryal/web-development-resources)
- [GitHub Pages](categories/github-pages.md)
- [Node.js](https://github.com/nodejs/node) - JavaScript runtime built on Chrome's V8 engine
- [React](https://github.com/facebook/react) - JavaScript library for building user interfaces
- [Vue.js](https://github.com/vuejs/vue) - Progressive JavaScript framework
- [Angular](https://github.com/angular/angular) - Platform for building mobile and desktop web applications

[return to top](README.md)

Expand Down
79 changes: 68 additions & 11 deletions categories/big-data.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,71 @@
<h1 align="center"><a href="../README.md">Open Source Handbook</a></h1>
<h3 align="center">Big Data Projects</h3>

- [Apache Crunch](https://github.com/apache/crunch)
- [Apache Hadoop](https://github.com/apache/hadoop)
- [Apache Kafka](https://github.com/apache/kafka)
- [Apache Samoa](https://github.com/apache/incubator-samoa)
- [Apache Storm](https://github.com/apache/storm)
- [Elasticsearch](https://github.com/elastic/elasticsearch)
- [HPCC Systems](https://github.com/hpcc-systems/HPCC-Platform)
- [Lumify](https://github.com/lumifyio/lumify)
- [MongoDB](https://github.com/mongodb)
- [RapidMiner](https://github.com/rapidminer)
- [Talend Open Studio for Big Data](https://github.com/Talend)
## Data Processing & Analytics

### Stream Processing
- [Apache Kafka](https://github.com/apache/kafka) - Distributed streaming platform for building real-time data pipelines
- [Apache Storm](https://github.com/apache/storm) - Real-time computation system for processing streams of data
- [Apache Flink](https://github.com/apache/flink) - Stream processing framework for distributed, high-performing data streaming applications
- [Apache Pulsar](https://github.com/apache/pulsar) - Cloud-native, distributed messaging and streaming platform

### Batch Processing
- [Apache Hadoop](https://github.com/apache/hadoop) - Framework for distributed storage and processing of large datasets
- [Apache Spark](https://github.com/apache/spark) - Unified analytics engine for large-scale data processing
- [Apache Crunch](https://github.com/apache/crunch) - Java library for writing MapReduce pipelines
- [Apache Samoa](https://github.com/apache/incubator-samoa) - Distributed streaming machine learning framework

## Data Storage & Databases

### NoSQL Databases
- [MongoDB](https://github.com/mongodb/mongo) - Document-oriented NoSQL database
- [Apache Cassandra](https://github.com/apache/cassandra) - Highly scalable distributed NoSQL database
- [Redis](https://github.com/redis/redis) - In-memory data structure store
- [ClickHouse](https://github.com/ClickHouse/ClickHouse) - Column-oriented database for analytics

### Search & Analytics
- [Elasticsearch](https://github.com/elastic/elasticsearch) - Distributed search and analytics engine
- [Apache Solr](https://github.com/apache/solr) - Enterprise search platform
- [OpenSearch](https://github.com/opensearch-project/OpenSearch) - Community-driven search and analytics suite

## Data Tools & Platforms

### Workflow Management
- [Apache Airflow](https://github.com/apache/airflow) - Platform for developing, scheduling, and monitoring workflows
- [Prefect](https://github.com/PrefectHQ/prefect) - Modern workflow orchestration framework
- [Dagster](https://github.com/dagster-io/dagster) - Data orchestrator for machine learning, analytics, and ETL

### Data Integration & ETL
- [Talend Open Studio for Big Data](https://github.com/Talend/tdi-studio-se) - Open source data integration platform
- [Apache NiFi](https://github.com/apache/nifi) - System for processing and distributing data
- [Singer](https://github.com/singer-io) - Open source standard for writing scripts that move data

### Analytics & Visualization
- [Apache Superset](https://github.com/apache/superset) - Modern data exploration and visualization platform
- [Metabase](https://github.com/metabase/metabase) - Business intelligence tool for everyone in your company
- [Grafana](https://github.com/grafana/grafana) - Observability and data visualization platform

## Specialized Platforms

### Machine Learning & AI
- [MLflow](https://github.com/mlflow/mlflow) - Machine learning lifecycle management
- [Kubeflow](https://github.com/kubeflow/kubeflow) - Machine learning toolkit for Kubernetes
- [Apache Mahout](https://github.com/apache/mahout) - Distributed linear algebra framework

### Data Lakes & Warehouses
- [Apache Iceberg](https://github.com/apache/iceberg) - High-performance format for huge analytic tables
- [Delta Lake](https://github.com/delta-io/delta) - Storage framework that brings ACID transactions to Apache Spark
- [Apache Hudi](https://github.com/apache/hudi) - Transactional data lake platform

### Legacy & Specialized
- [HPCC Systems](https://github.com/hpcc-systems/HPCC-Platform) - Massive parallel-processing computing platform
- [RapidMiner](https://github.com/rapidminer/rapidminer-studio) - Data science platform for teams

## Getting Started Tips

- **For Beginners**: Start with Apache Spark or Elasticsearch - they have great documentation and active communities
- **For Data Engineers**: Check out Apache Airflow for workflow management or Apache Kafka for streaming
- **For Analysts**: Try Apache Superset or Metabase for visualization projects
- **Good First Issues**: Look for repositories with "good first issue" or "beginner-friendly" labels

[return to top](../README.md)
81 changes: 81 additions & 0 deletions categories/cloud-native.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
<h1 align="center"><a href="../README.md">Open Source Handbook</a></h1>
<h3 align="center">Cloud Native & DevOps Projects</h3>

## Container Orchestration

### Kubernetes Ecosystem
- [Kubernetes](https://github.com/kubernetes/kubernetes) - Container orchestration platform
- [Helm](https://github.com/helm/helm) - Package manager for Kubernetes
- [Istio](https://github.com/istio/istio) - Service mesh for microservices
- [Linkerd](https://github.com/linkerd/linkerd2) - Ultralight service mesh for Kubernetes

### Container Runtimes
- [Docker](https://github.com/moby/moby) - Container platform (Moby project)
- [Podman](https://github.com/containers/podman) - Daemonless container engine
- [containerd](https://github.com/containerd/containerd) - Industry-standard container runtime

## CI/CD & Automation

### Continuous Integration
- [Jenkins](https://github.com/jenkinsci/jenkins) - Automation server for CI/CD
- [GitLab CI](https://github.com/gitlabhq/gitlabhq) - Complete DevOps platform
- [Tekton](https://github.com/tektoncd/pipeline) - Cloud-native CI/CD building blocks
- [Drone](https://github.com/harness/drone) - Container-native CI/CD platform

### Infrastructure as Code
- [Terraform](https://github.com/hashicorp/terraform) - Infrastructure provisioning tool
- [Pulumi](https://github.com/pulumi/pulumi) - Modern infrastructure as code
- [Ansible](https://github.com/ansible/ansible) - IT automation platform
- [Chef](https://github.com/chef/chef) - Configuration management tool

## Monitoring & Observability

### Metrics & Monitoring
- [Prometheus](https://github.com/prometheus/prometheus) - Monitoring system and time series database
- [Grafana](https://github.com/grafana/grafana) - Observability and data visualization platform
- [Jaeger](https://github.com/jaegertracing/jaeger) - Distributed tracing platform
- [OpenTelemetry](https://github.com/open-telemetry) - Observability framework

### Logging
- [Fluentd](https://github.com/fluent/fluentd) - Data collector for unified logging layer
- [Logstash](https://github.com/elastic/logstash) - Server-side data processing pipeline
- [Vector](https://github.com/vectordotdev/vector) - High-performance observability data pipeline

## Service Mesh & Networking

- [Envoy](https://github.com/envoyproxy/envoy) - Cloud-native high-performance edge/middle/service proxy
- [Consul](https://github.com/hashicorp/consul) - Service networking solution
- [Traefik](https://github.com/traefik/traefik) - Modern HTTP reverse proxy and load balancer
- [NGINX](https://github.com/nginx/nginx) - HTTP and reverse proxy server

## Security & Policy

- [Open Policy Agent (OPA)](https://github.com/open-policy-agent/opa) - Policy engine for cloud native environments
- [Falco](https://github.com/falcosecurity/falco) - Runtime security monitoring
- [Trivy](https://github.com/aquasecurity/trivy) - Vulnerability scanner for containers
- [Cert-Manager](https://github.com/cert-manager/cert-manager) - X.509 certificate management for Kubernetes

## Storage & Databases

### Cloud-Native Storage
- [Rook](https://github.com/rook/rook) - Storage orchestrator for Kubernetes
- [Longhorn](https://github.com/longhorn/longhorn) - Distributed block storage system for Kubernetes
- [OpenEBS](https://github.com/openebs/openebs) - Container-attached storage

### Cloud-Native Databases
- [CockroachDB](https://github.com/cockroachdb/cockroach) - Distributed SQL database
- [TiDB](https://github.com/pingcap/tidb) - Distributed HTAP database
- [Vitess](https://github.com/vitessio/vitess) - Database clustering system for horizontal scaling of MySQL

## Getting Started Tips

- **For Beginners**: Start with Docker and Kubernetes basics, then explore Helm for package management
- **For DevOps Engineers**: Check out Prometheus + Grafana for monitoring or Terraform for infrastructure
- **For Security**: Try Trivy for vulnerability scanning or Falco for runtime security
- **Good First Issues**: Many CNCF projects have excellent "good first issue" labels and mentorship programs

## CNCF Landscape

Most of these projects are part of the [Cloud Native Computing Foundation (CNCF)](https://landscape.cncf.io/), which provides excellent resources for contributors and maintains a comprehensive landscape of cloud-native technologies.

[return to top](../README.md)