Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 33 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

## Who this is for

Open Source Handbook is a resource for people of **all skill and experience levels** who want to learn about open source and save time finding projects!
Open Source Handbook is a comprehensive resource for people of **all skill and experience levels** who want to learn about open source, contribute to meaningful projects, and save time finding the right repositories to get started with!

## Intro

Expand All @@ -35,6 +35,9 @@ Open Source Handbook is a resource for people of **all skill and experience leve
- [Beginner projects](https://github.com/showcases/great-for-new-contributors)
- [Beginner projects (more!)](https://github.com/MunGell/awesome-for-beginners)
- [Searching GitHub](https://help.github.com/articles/finding-open-source-projects-on-github/)
- [First Timers Only](https://www.firsttimersonly.com/) - Friendly open source projects for new contributors
- [Good First Issues](https://goodfirstissues.com/) - Find projects with good first issues
- [CodeTriage](https://www.codetriage.com/) - Help your favorite open source projects

## Collections

Expand All @@ -48,6 +51,8 @@ Open Source Handbook is a resource for people of **all skill and experience leve
- [Audio visualization](https://github.com/willianjusten/awesome-audio-visualization)
- [Big data](categories/big-data.md)
- [Datasets](https://github.com/awesomedata/awesome-public-datasets/blob/master/README.rst)
- [Cloud Native & DevOps](categories/cloud-native.md)
- [Data Science & Machine Learning](categories/data-science.md)
- [Frameworks](https://github.com/topics/framework)
- [Gaming](https://gist.github.com/roachhd/d579b58148d7e36a6b72)
- [iOS development](https://github.com/dkhamsing/open-source-ios-apps/blob/master/APPSTORE.md#apple-watch)
Expand All @@ -60,6 +65,10 @@ Open Source Handbook is a resource for people of **all skill and experience leve
- Web development
- [Front-end tools and resources](https://github.com/MilanAryal/web-development-resources)
- [GitHub Pages](categories/github-pages.md)
- [Node.js](https://github.com/nodejs/node) - JavaScript runtime built on Chrome's V8 engine
- [React](https://github.com/facebook/react) - JavaScript library for building user interfaces
- [Vue.js](https://github.com/vuejs/vue) - Progressive JavaScript framework
- [Angular](https://github.com/angular/angular) - Platform for building mobile and desktop web applications

[return to top](README.md)

Expand Down Expand Up @@ -100,16 +109,38 @@ Open Source Handbook is a resource for people of **all skill and experience leve
- [All Languages](https://github.com/trending)
- [C++](https://github.com/trending/c++)
- [CSS](https://github.com/trending/css)
- [Go](https://github.com/trending/go)
- [HTML](https://github.com/trending/html)
- [Java](https://github.com/trending/java)
- [JavaScript](https://github.com/trending/javascript)
- [Python](https://github.com/trending/python)
- [Ruby](https://github.com/trending/ruby)
- [Rust](https://github.com/trending/rust)
- [Swift 📱](https://github.com/trending/swift)
- [TypeScript](https://github.com/trending/typescript)
- [Unknown languages](https://github.com/trending/unknown)

## Quick Start Guide for New Contributors

### 🚀 Your First Contribution in 5 Steps
1. **Find a project** - Use [Good First Issues](https://goodfirstissues.com/) or browse our categories
2. **Read the docs** - Check README.md and CONTRIBUTING.md files
3. **Set up locally** - Fork, clone, and follow setup instructions
4. **Make your change** - Start small with documentation or bug fixes
5. **Submit a PR** - Follow the project's contribution guidelines

### 💡 Pro Tips for Success
- **Start small** - Documentation improvements and bug fixes are great first contributions
- **Be patient** - Maintainers are volunteers; reviews may take time
- **Ask questions** - Use issues or discussions to clarify requirements
- **Follow conventions** - Match the project's coding style and commit message format

## Open Source Internships, Competitions, and Careers

- [Internships and Competitions](https://github.com/tapaswenipathak/Open-Source-Programs)
- [Google Summer of Code](https://summerofcode.withgoogle.com/) - Annual program for students
- [Outreachy](https://www.outreachy.org/) - Internships for underrepresented groups
- [MLH Fellowship](https://fellowship.mlh.io/) - Remote internship program
- [Careers](https://github.com/t9tio/open-source-jobs)

[return to top](README.md)
Expand All @@ -125,5 +156,6 @@ We would love for you to contribute! Please [fork and make a pull request](https
## Maintainers

- [Shaina Krumme](https://github.com/shainakrumme)
- [Siva Murthy](https://github.com/sivamurthy30)

[return to top](README.md)
79 changes: 68 additions & 11 deletions categories/big-data.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,71 @@
<h1 align="center"><a href="../README.md">Open Source Handbook</a></h1>
<h3 align="center">Big Data Projects</h3>

- [Apache Crunch](https://github.com/apache/crunch)
- [Apache Hadoop](https://github.com/apache/hadoop)
- [Apache Kafka](https://github.com/apache/kafka)
- [Apache Samoa](https://github.com/apache/incubator-samoa)
- [Apache Storm](https://github.com/apache/storm)
- [Elasticsearch](https://github.com/elastic/elasticsearch)
- [HPCC Systems](https://github.com/hpcc-systems/HPCC-Platform)
- [Lumify](https://github.com/lumifyio/lumify)
- [MongoDB](https://github.com/mongodb)
- [RapidMiner](https://github.com/rapidminer)
- [Talend Open Studio for Big Data](https://github.com/Talend)
## Data Processing & Analytics

### Stream Processing
- [Apache Kafka](https://github.com/apache/kafka) - Distributed streaming platform for building real-time data pipelines
- [Apache Storm](https://github.com/apache/storm) - Real-time computation system for processing streams of data
- [Apache Flink](https://github.com/apache/flink) - Stream processing framework for distributed, high-performing data streaming applications
- [Apache Pulsar](https://github.com/apache/pulsar) - Cloud-native, distributed messaging and streaming platform

### Batch Processing
- [Apache Hadoop](https://github.com/apache/hadoop) - Framework for distributed storage and processing of large datasets
- [Apache Spark](https://github.com/apache/spark) - Unified analytics engine for large-scale data processing
- [Apache Crunch](https://github.com/apache/crunch) - Java library for writing MapReduce pipelines
- [Apache Samoa](https://github.com/apache/incubator-samoa) - Distributed streaming machine learning framework

## Data Storage & Databases

### NoSQL Databases
- [MongoDB](https://github.com/mongodb/mongo) - Document-oriented NoSQL database
- [Apache Cassandra](https://github.com/apache/cassandra) - Highly scalable distributed NoSQL database
- [Redis](https://github.com/redis/redis) - In-memory data structure store
- [ClickHouse](https://github.com/ClickHouse/ClickHouse) - Column-oriented database for analytics

### Search & Analytics
- [Elasticsearch](https://github.com/elastic/elasticsearch) - Distributed search and analytics engine
- [Apache Solr](https://github.com/apache/solr) - Enterprise search platform
- [OpenSearch](https://github.com/opensearch-project/OpenSearch) - Community-driven search and analytics suite

## Data Tools & Platforms

### Workflow Management
- [Apache Airflow](https://github.com/apache/airflow) - Platform for developing, scheduling, and monitoring workflows
- [Prefect](https://github.com/PrefectHQ/prefect) - Modern workflow orchestration framework
- [Dagster](https://github.com/dagster-io/dagster) - Data orchestrator for machine learning, analytics, and ETL

### Data Integration & ETL
- [Talend Open Studio for Big Data](https://github.com/Talend/tdi-studio-se) - Open source data integration platform
- [Apache NiFi](https://github.com/apache/nifi) - System for processing and distributing data
- [Singer](https://github.com/singer-io) - Open source standard for writing scripts that move data

### Analytics & Visualization
- [Apache Superset](https://github.com/apache/superset) - Modern data exploration and visualization platform
- [Metabase](https://github.com/metabase/metabase) - Business intelligence tool for everyone in your company
- [Grafana](https://github.com/grafana/grafana) - Observability and data visualization platform

## Specialized Platforms

### Machine Learning & AI
- [MLflow](https://github.com/mlflow/mlflow) - Machine learning lifecycle management
- [Kubeflow](https://github.com/kubeflow/kubeflow) - Machine learning toolkit for Kubernetes
- [Apache Mahout](https://github.com/apache/mahout) - Distributed linear algebra framework

### Data Lakes & Warehouses
- [Apache Iceberg](https://github.com/apache/iceberg) - High-performance format for huge analytic tables
- [Delta Lake](https://github.com/delta-io/delta) - Storage framework that brings ACID transactions to Apache Spark
- [Apache Hudi](https://github.com/apache/hudi) - Transactional data lake platform

### Legacy & Specialized
- [HPCC Systems](https://github.com/hpcc-systems/HPCC-Platform) - Massive parallel-processing computing platform
- [RapidMiner](https://github.com/rapidminer/rapidminer-studio) - Data science platform for teams

## Getting Started Tips

- **For Beginners**: Start with Apache Spark or Elasticsearch - they have great documentation and active communities
- **For Data Engineers**: Check out Apache Airflow for workflow management or Apache Kafka for streaming
- **For Analysts**: Try Apache Superset or Metabase for visualization projects
- **Good First Issues**: Look for repositories with "good first issue" or "beginner-friendly" labels

[return to top](../README.md)
81 changes: 81 additions & 0 deletions categories/cloud-native.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
<h1 align="center"><a href="../README.md">Open Source Handbook</a></h1>
<h3 align="center">Cloud Native & DevOps Projects</h3>

## Container Orchestration

### Kubernetes Ecosystem
- [Kubernetes](https://github.com/kubernetes/kubernetes) - Container orchestration platform
- [Helm](https://github.com/helm/helm) - Package manager for Kubernetes
- [Istio](https://github.com/istio/istio) - Service mesh for microservices
- [Linkerd](https://github.com/linkerd/linkerd2) - Ultralight service mesh for Kubernetes

### Container Runtimes
- [Docker](https://github.com/moby/moby) - Container platform (Moby project)
- [Podman](https://github.com/containers/podman) - Daemonless container engine
- [containerd](https://github.com/containerd/containerd) - Industry-standard container runtime

## CI/CD & Automation

### Continuous Integration
- [Jenkins](https://github.com/jenkinsci/jenkins) - Automation server for CI/CD
- [GitLab CI](https://github.com/gitlabhq/gitlabhq) - Complete DevOps platform
- [Tekton](https://github.com/tektoncd/pipeline) - Cloud-native CI/CD building blocks
- [Drone](https://github.com/harness/drone) - Container-native CI/CD platform

### Infrastructure as Code
- [Terraform](https://github.com/hashicorp/terraform) - Infrastructure provisioning tool
- [Pulumi](https://github.com/pulumi/pulumi) - Modern infrastructure as code
- [Ansible](https://github.com/ansible/ansible) - IT automation platform
- [Chef](https://github.com/chef/chef) - Configuration management tool

## Monitoring & Observability

### Metrics & Monitoring
- [Prometheus](https://github.com/prometheus/prometheus) - Monitoring system and time series database
- [Grafana](https://github.com/grafana/grafana) - Observability and data visualization platform
- [Jaeger](https://github.com/jaegertracing/jaeger) - Distributed tracing platform
- [OpenTelemetry](https://github.com/open-telemetry) - Observability framework

### Logging
- [Fluentd](https://github.com/fluent/fluentd) - Data collector for unified logging layer
- [Logstash](https://github.com/elastic/logstash) - Server-side data processing pipeline
- [Vector](https://github.com/vectordotdev/vector) - High-performance observability data pipeline

## Service Mesh & Networking

- [Envoy](https://github.com/envoyproxy/envoy) - Cloud-native high-performance edge/middle/service proxy
- [Consul](https://github.com/hashicorp/consul) - Service networking solution
- [Traefik](https://github.com/traefik/traefik) - Modern HTTP reverse proxy and load balancer
- [NGINX](https://github.com/nginx/nginx) - HTTP and reverse proxy server

## Security & Policy

- [Open Policy Agent (OPA)](https://github.com/open-policy-agent/opa) - Policy engine for cloud native environments
- [Falco](https://github.com/falcosecurity/falco) - Runtime security monitoring
- [Trivy](https://github.com/aquasecurity/trivy) - Vulnerability scanner for containers
- [Cert-Manager](https://github.com/cert-manager/cert-manager) - X.509 certificate management for Kubernetes

## Storage & Databases

### Cloud-Native Storage
- [Rook](https://github.com/rook/rook) - Storage orchestrator for Kubernetes
- [Longhorn](https://github.com/longhorn/longhorn) - Distributed block storage system for Kubernetes
- [OpenEBS](https://github.com/openebs/openebs) - Container-attached storage

### Cloud-Native Databases
- [CockroachDB](https://github.com/cockroachdb/cockroach) - Distributed SQL database
- [TiDB](https://github.com/pingcap/tidb) - Distributed HTAP database
- [Vitess](https://github.com/vitessio/vitess) - Database clustering system for horizontal scaling of MySQL

## Getting Started Tips

- **For Beginners**: Start with Docker and Kubernetes basics, then explore Helm for package management
- **For DevOps Engineers**: Check out Prometheus + Grafana for monitoring or Terraform for infrastructure
- **For Security**: Try Trivy for vulnerability scanning or Falco for runtime security
- **Good First Issues**: Many CNCF projects have excellent "good first issue" labels and mentorship programs

## CNCF Landscape

Most of these projects are part of the [Cloud Native Computing Foundation (CNCF)](https://landscape.cncf.io/), which provides excellent resources for contributors and maintains a comprehensive landscape of cloud-native technologies.

[return to top](../README.md)
72 changes: 72 additions & 0 deletions categories/data-science.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
<h1 align="center"><a href="../README.md">Open Source Handbook</a></h1>
<h3 align="center">Data Science & Machine Learning Projects</h3>

## Machine Learning Frameworks

### Python-Based
- [scikit-learn](https://github.com/scikit-learn/scikit-learn) - Machine learning library for Python
- [TensorFlow](https://github.com/tensorflow/tensorflow) - End-to-end open source platform for machine learning
- [PyTorch](https://github.com/pytorch/pytorch) - Tensors and Dynamic neural networks in Python
- [Keras](https://github.com/keras-team/keras) - Deep learning API written in Python
- [XGBoost](https://github.com/dmlc/xgboost) - Optimized distributed gradient boosting library

### Multi-Language
- [Apache Spark MLlib](https://github.com/apache/spark) - Scalable machine learning library
- [H2O](https://github.com/h2oai/h2o-3) - Fast scalable machine learning platform
- [Weka](https://github.com/Waikato/weka-3.8) - Collection of machine learning algorithms

## Data Analysis & Visualization

### Python Libraries
- [Pandas](https://github.com/pandas-dev/pandas) - Powerful data structures for data analysis
- [NumPy](https://github.com/numpy/numpy) - Fundamental package for scientific computing
- [Matplotlib](https://github.com/matplotlib/matplotlib) - Comprehensive library for creating static, animated, and interactive visualizations
- [Seaborn](https://github.com/mwaskom/seaborn) - Statistical data visualization library
- [Plotly](https://github.com/plotly/plotly.py) - Interactive graphing library

### R Ecosystem
- [R](https://github.com/wch/r-source) - Language and environment for statistical computing
- [Shiny](https://github.com/rstudio/shiny) - Web application framework for R
- [ggplot2](https://github.com/tidyverse/ggplot2) - Grammar of graphics for R

## Jupyter & Notebooks

- [Jupyter](https://github.com/jupyter/jupyter) - Interactive computing across dozens of programming languages
- [JupyterLab](https://github.com/jupyterlab/jupyterlab) - Next-generation web-based user interface for Project Jupyter
- [Voilà](https://github.com/voila-dashboards/voila) - Rendering of live Jupyter notebooks with interactive widgets

## MLOps & Model Management

- [MLflow](https://github.com/mlflow/mlflow) - Machine learning lifecycle management
- [DVC](https://github.com/iterative/dvc) - Data version control system for machine learning projects
- [Weights & Biases](https://github.com/wandb/wandb) - Developer tools for machine learning
- [Kubeflow](https://github.com/kubeflow/kubeflow) - Machine learning toolkit for Kubernetes

## Natural Language Processing

- [spaCy](https://github.com/explosion/spaCy) - Industrial-strength Natural Language Processing
- [NLTK](https://github.com/nltk/nltk) - Natural Language Toolkit
- [Transformers](https://github.com/huggingface/transformers) - State-of-the-art Machine Learning for PyTorch, TensorFlow, and JAX
- [Gensim](https://github.com/RaRe-Technologies/gensim) - Topic modeling and document similarity analysis

## Computer Vision

- [OpenCV](https://github.com/opencv/opencv) - Open source computer vision and machine learning software library
- [Pillow](https://github.com/python-pillow/Pillow) - Python Imaging Library
- [ImageIO](https://github.com/imageio/imageio) - Python library for reading and writing image data

## Getting Started Tips

- **For Beginners**: Start with scikit-learn and Pandas - excellent documentation and community
- **For Deep Learning**: Try PyTorch or TensorFlow tutorials
- **For Visualization**: Begin with Matplotlib or Plotly
- **For NLP**: spaCy has great beginner-friendly examples
- **Good First Issues**: Look for "good first issue" labels in scikit-learn, Pandas, or Matplotlib

## Learning Resources

- [Kaggle Learn](https://www.kaggle.com/learn) - Free micro-courses in data science
- [Papers With Code](https://github.com/paperswithcode/paperswithcode) - Machine learning papers with code implementations
- [Awesome Machine Learning](https://github.com/josephmisiti/awesome-machine-learning) - Curated list of ML frameworks and libraries

[return to top](../README.md)