From 798efa39da8bd9591df974e518710ec6c4aade15 Mon Sep 17 00:00:00 2001 From: sivamurthy30 Date: Thu, 9 Oct 2025 01:53:04 +0530 Subject: [PATCH 1/3] Enhance Open Source Handbook with comprehensive updates - Expanded big-data.md with modern tools, better organization, and descriptions - Added new cloud-native.md category covering DevOps, Kubernetes, CI/CD, and monitoring tools - Updated main README.md to include cloud-native category and popular web frameworks - Improved structure with beginner-friendly tips and getting started guidance --- README.md | 5 +++ categories/big-data.md | 79 +++++++++++++++++++++++++++++++------ categories/cloud-native.md | 81 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 154 insertions(+), 11 deletions(-) create mode 100644 categories/cloud-native.md diff --git a/README.md b/README.md index 98ca5b4..acd627e 100644 --- a/README.md +++ b/README.md @@ -48,6 +48,7 @@ Open Source Handbook is a resource for people of **all skill and experience leve - [Audio visualization](https://github.com/willianjusten/awesome-audio-visualization) - [Big data](categories/big-data.md) - [Datasets](https://github.com/awesomedata/awesome-public-datasets/blob/master/README.rst) + - [Cloud Native & DevOps](categories/cloud-native.md) - [Frameworks](https://github.com/topics/framework) - [Gaming](https://gist.github.com/roachhd/d579b58148d7e36a6b72) - [iOS development](https://github.com/dkhamsing/open-source-ios-apps/blob/master/APPSTORE.md#apple-watch) @@ -60,6 +61,10 @@ Open Source Handbook is a resource for people of **all skill and experience leve - Web development - [Front-end tools and resources](https://github.com/MilanAryal/web-development-resources) - [GitHub Pages](categories/github-pages.md) + - [Node.js](https://github.com/nodejs/node) - JavaScript runtime built on Chrome's V8 engine + - [React](https://github.com/facebook/react) - JavaScript library for building user interfaces + - [Vue.js](https://github.com/vuejs/vue) - Progressive JavaScript framework + - [Angular](https://github.com/angular/angular) - Platform for building mobile and desktop web applications [return to top](README.md) diff --git a/categories/big-data.md b/categories/big-data.md index c198e1b..9f19c41 100644 --- a/categories/big-data.md +++ b/categories/big-data.md @@ -1,14 +1,71 @@

Open Source Handbook

Big Data Projects

- - [Apache Crunch](https://github.com/apache/crunch) - - [Apache Hadoop](https://github.com/apache/hadoop) - - [Apache Kafka](https://github.com/apache/kafka) - - [Apache Samoa](https://github.com/apache/incubator-samoa) - - [Apache Storm](https://github.com/apache/storm) - - [Elasticsearch](https://github.com/elastic/elasticsearch) - - [HPCC Systems](https://github.com/hpcc-systems/HPCC-Platform) - - [Lumify](https://github.com/lumifyio/lumify) - - [MongoDB](https://github.com/mongodb) - - [RapidMiner](https://github.com/rapidminer) - - [Talend Open Studio for Big Data](https://github.com/Talend) +## Data Processing & Analytics + +### Stream Processing +- [Apache Kafka](https://github.com/apache/kafka) - Distributed streaming platform for building real-time data pipelines +- [Apache Storm](https://github.com/apache/storm) - Real-time computation system for processing streams of data +- [Apache Flink](https://github.com/apache/flink) - Stream processing framework for distributed, high-performing data streaming applications +- [Apache Pulsar](https://github.com/apache/pulsar) - Cloud-native, distributed messaging and streaming platform + +### Batch Processing +- [Apache Hadoop](https://github.com/apache/hadoop) - Framework for distributed storage and processing of large datasets +- [Apache Spark](https://github.com/apache/spark) - Unified analytics engine for large-scale data processing +- [Apache Crunch](https://github.com/apache/crunch) - Java library for writing MapReduce pipelines +- [Apache Samoa](https://github.com/apache/incubator-samoa) - Distributed streaming machine learning framework + +## Data Storage & Databases + +### NoSQL Databases +- [MongoDB](https://github.com/mongodb/mongo) - Document-oriented NoSQL database +- [Apache Cassandra](https://github.com/apache/cassandra) - Highly scalable distributed NoSQL database +- [Redis](https://github.com/redis/redis) - In-memory data structure store +- [ClickHouse](https://github.com/ClickHouse/ClickHouse) - Column-oriented database for analytics + +### Search & Analytics +- [Elasticsearch](https://github.com/elastic/elasticsearch) - Distributed search and analytics engine +- [Apache Solr](https://github.com/apache/solr) - Enterprise search platform +- [OpenSearch](https://github.com/opensearch-project/OpenSearch) - Community-driven search and analytics suite + +## Data Tools & Platforms + +### Workflow Management +- [Apache Airflow](https://github.com/apache/airflow) - Platform for developing, scheduling, and monitoring workflows +- [Prefect](https://github.com/PrefectHQ/prefect) - Modern workflow orchestration framework +- [Dagster](https://github.com/dagster-io/dagster) - Data orchestrator for machine learning, analytics, and ETL + +### Data Integration & ETL +- [Talend Open Studio for Big Data](https://github.com/Talend/tdi-studio-se) - Open source data integration platform +- [Apache NiFi](https://github.com/apache/nifi) - System for processing and distributing data +- [Singer](https://github.com/singer-io) - Open source standard for writing scripts that move data + +### Analytics & Visualization +- [Apache Superset](https://github.com/apache/superset) - Modern data exploration and visualization platform +- [Metabase](https://github.com/metabase/metabase) - Business intelligence tool for everyone in your company +- [Grafana](https://github.com/grafana/grafana) - Observability and data visualization platform + +## Specialized Platforms + +### Machine Learning & AI +- [MLflow](https://github.com/mlflow/mlflow) - Machine learning lifecycle management +- [Kubeflow](https://github.com/kubeflow/kubeflow) - Machine learning toolkit for Kubernetes +- [Apache Mahout](https://github.com/apache/mahout) - Distributed linear algebra framework + +### Data Lakes & Warehouses +- [Apache Iceberg](https://github.com/apache/iceberg) - High-performance format for huge analytic tables +- [Delta Lake](https://github.com/delta-io/delta) - Storage framework that brings ACID transactions to Apache Spark +- [Apache Hudi](https://github.com/apache/hudi) - Transactional data lake platform + +### Legacy & Specialized +- [HPCC Systems](https://github.com/hpcc-systems/HPCC-Platform) - Massive parallel-processing computing platform +- [RapidMiner](https://github.com/rapidminer/rapidminer-studio) - Data science platform for teams + +## Getting Started Tips + +- **For Beginners**: Start with Apache Spark or Elasticsearch - they have great documentation and active communities +- **For Data Engineers**: Check out Apache Airflow for workflow management or Apache Kafka for streaming +- **For Analysts**: Try Apache Superset or Metabase for visualization projects +- **Good First Issues**: Look for repositories with "good first issue" or "beginner-friendly" labels + +[return to top](../README.md) diff --git a/categories/cloud-native.md b/categories/cloud-native.md new file mode 100644 index 0000000..43d3861 --- /dev/null +++ b/categories/cloud-native.md @@ -0,0 +1,81 @@ +

Open Source Handbook

+

Cloud Native & DevOps Projects

+ +## Container Orchestration + +### Kubernetes Ecosystem +- [Kubernetes](https://github.com/kubernetes/kubernetes) - Container orchestration platform +- [Helm](https://github.com/helm/helm) - Package manager for Kubernetes +- [Istio](https://github.com/istio/istio) - Service mesh for microservices +- [Linkerd](https://github.com/linkerd/linkerd2) - Ultralight service mesh for Kubernetes + +### Container Runtimes +- [Docker](https://github.com/moby/moby) - Container platform (Moby project) +- [Podman](https://github.com/containers/podman) - Daemonless container engine +- [containerd](https://github.com/containerd/containerd) - Industry-standard container runtime + +## CI/CD & Automation + +### Continuous Integration +- [Jenkins](https://github.com/jenkinsci/jenkins) - Automation server for CI/CD +- [GitLab CI](https://github.com/gitlabhq/gitlabhq) - Complete DevOps platform +- [Tekton](https://github.com/tektoncd/pipeline) - Cloud-native CI/CD building blocks +- [Drone](https://github.com/harness/drone) - Container-native CI/CD platform + +### Infrastructure as Code +- [Terraform](https://github.com/hashicorp/terraform) - Infrastructure provisioning tool +- [Pulumi](https://github.com/pulumi/pulumi) - Modern infrastructure as code +- [Ansible](https://github.com/ansible/ansible) - IT automation platform +- [Chef](https://github.com/chef/chef) - Configuration management tool + +## Monitoring & Observability + +### Metrics & Monitoring +- [Prometheus](https://github.com/prometheus/prometheus) - Monitoring system and time series database +- [Grafana](https://github.com/grafana/grafana) - Observability and data visualization platform +- [Jaeger](https://github.com/jaegertracing/jaeger) - Distributed tracing platform +- [OpenTelemetry](https://github.com/open-telemetry) - Observability framework + +### Logging +- [Fluentd](https://github.com/fluent/fluentd) - Data collector for unified logging layer +- [Logstash](https://github.com/elastic/logstash) - Server-side data processing pipeline +- [Vector](https://github.com/vectordotdev/vector) - High-performance observability data pipeline + +## Service Mesh & Networking + +- [Envoy](https://github.com/envoyproxy/envoy) - Cloud-native high-performance edge/middle/service proxy +- [Consul](https://github.com/hashicorp/consul) - Service networking solution +- [Traefik](https://github.com/traefik/traefik) - Modern HTTP reverse proxy and load balancer +- [NGINX](https://github.com/nginx/nginx) - HTTP and reverse proxy server + +## Security & Policy + +- [Open Policy Agent (OPA)](https://github.com/open-policy-agent/opa) - Policy engine for cloud native environments +- [Falco](https://github.com/falcosecurity/falco) - Runtime security monitoring +- [Trivy](https://github.com/aquasecurity/trivy) - Vulnerability scanner for containers +- [Cert-Manager](https://github.com/cert-manager/cert-manager) - X.509 certificate management for Kubernetes + +## Storage & Databases + +### Cloud-Native Storage +- [Rook](https://github.com/rook/rook) - Storage orchestrator for Kubernetes +- [Longhorn](https://github.com/longhorn/longhorn) - Distributed block storage system for Kubernetes +- [OpenEBS](https://github.com/openebs/openebs) - Container-attached storage + +### Cloud-Native Databases +- [CockroachDB](https://github.com/cockroachdb/cockroach) - Distributed SQL database +- [TiDB](https://github.com/pingcap/tidb) - Distributed HTAP database +- [Vitess](https://github.com/vitessio/vitess) - Database clustering system for horizontal scaling of MySQL + +## Getting Started Tips + +- **For Beginners**: Start with Docker and Kubernetes basics, then explore Helm for package management +- **For DevOps Engineers**: Check out Prometheus + Grafana for monitoring or Terraform for infrastructure +- **For Security**: Try Trivy for vulnerability scanning or Falco for runtime security +- **Good First Issues**: Many CNCF projects have excellent "good first issue" labels and mentorship programs + +## CNCF Landscape + +Most of these projects are part of the [Cloud Native Computing Foundation (CNCF)](https://landscape.cncf.io/), which provides excellent resources for contributors and maintains a comprehensive landscape of cloud-native technologies. + +[return to top](../README.md) \ No newline at end of file From e7cde31db914a8a6ee3e8b02c7937d016805a910 Mon Sep 17 00:00:00 2001 From: sivamurthy30 Date: Thu, 9 Oct 2025 01:56:01 +0530 Subject: [PATCH 2/3] Add comprehensive Data Science & Machine Learning category - Created new data-science.md with ML frameworks, data analysis tools, and visualization libraries - Organized by categories: ML frameworks, data analysis, Jupyter, MLOps, NLP, and computer vision - Added beginner-friendly tips and learning resources - Updated main README to include the new category --- README.md | 1 + categories/data-science.md | 72 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 73 insertions(+) create mode 100644 categories/data-science.md diff --git a/README.md b/README.md index acd627e..c395a11 100644 --- a/README.md +++ b/README.md @@ -49,6 +49,7 @@ Open Source Handbook is a resource for people of **all skill and experience leve - [Big data](categories/big-data.md) - [Datasets](https://github.com/awesomedata/awesome-public-datasets/blob/master/README.rst) - [Cloud Native & DevOps](categories/cloud-native.md) + - [Data Science & Machine Learning](categories/data-science.md) - [Frameworks](https://github.com/topics/framework) - [Gaming](https://gist.github.com/roachhd/d579b58148d7e36a6b72) - [iOS development](https://github.com/dkhamsing/open-source-ios-apps/blob/master/APPSTORE.md#apple-watch) diff --git a/categories/data-science.md b/categories/data-science.md new file mode 100644 index 0000000..bb20aeb --- /dev/null +++ b/categories/data-science.md @@ -0,0 +1,72 @@ +

Open Source Handbook

+

Data Science & Machine Learning Projects

+ +## Machine Learning Frameworks + +### Python-Based +- [scikit-learn](https://github.com/scikit-learn/scikit-learn) - Machine learning library for Python +- [TensorFlow](https://github.com/tensorflow/tensorflow) - End-to-end open source platform for machine learning +- [PyTorch](https://github.com/pytorch/pytorch) - Tensors and Dynamic neural networks in Python +- [Keras](https://github.com/keras-team/keras) - Deep learning API written in Python +- [XGBoost](https://github.com/dmlc/xgboost) - Optimized distributed gradient boosting library + +### Multi-Language +- [Apache Spark MLlib](https://github.com/apache/spark) - Scalable machine learning library +- [H2O](https://github.com/h2oai/h2o-3) - Fast scalable machine learning platform +- [Weka](https://github.com/Waikato/weka-3.8) - Collection of machine learning algorithms + +## Data Analysis & Visualization + +### Python Libraries +- [Pandas](https://github.com/pandas-dev/pandas) - Powerful data structures for data analysis +- [NumPy](https://github.com/numpy/numpy) - Fundamental package for scientific computing +- [Matplotlib](https://github.com/matplotlib/matplotlib) - Comprehensive library for creating static, animated, and interactive visualizations +- [Seaborn](https://github.com/mwaskom/seaborn) - Statistical data visualization library +- [Plotly](https://github.com/plotly/plotly.py) - Interactive graphing library + +### R Ecosystem +- [R](https://github.com/wch/r-source) - Language and environment for statistical computing +- [Shiny](https://github.com/rstudio/shiny) - Web application framework for R +- [ggplot2](https://github.com/tidyverse/ggplot2) - Grammar of graphics for R + +## Jupyter & Notebooks + +- [Jupyter](https://github.com/jupyter/jupyter) - Interactive computing across dozens of programming languages +- [JupyterLab](https://github.com/jupyterlab/jupyterlab) - Next-generation web-based user interface for Project Jupyter +- [Voilà](https://github.com/voila-dashboards/voila) - Rendering of live Jupyter notebooks with interactive widgets + +## MLOps & Model Management + +- [MLflow](https://github.com/mlflow/mlflow) - Machine learning lifecycle management +- [DVC](https://github.com/iterative/dvc) - Data version control system for machine learning projects +- [Weights & Biases](https://github.com/wandb/wandb) - Developer tools for machine learning +- [Kubeflow](https://github.com/kubeflow/kubeflow) - Machine learning toolkit for Kubernetes + +## Natural Language Processing + +- [spaCy](https://github.com/explosion/spaCy) - Industrial-strength Natural Language Processing +- [NLTK](https://github.com/nltk/nltk) - Natural Language Toolkit +- [Transformers](https://github.com/huggingface/transformers) - State-of-the-art Machine Learning for PyTorch, TensorFlow, and JAX +- [Gensim](https://github.com/RaRe-Technologies/gensim) - Topic modeling and document similarity analysis + +## Computer Vision + +- [OpenCV](https://github.com/opencv/opencv) - Open source computer vision and machine learning software library +- [Pillow](https://github.com/python-pillow/Pillow) - Python Imaging Library +- [ImageIO](https://github.com/imageio/imageio) - Python library for reading and writing image data + +## Getting Started Tips + +- **For Beginners**: Start with scikit-learn and Pandas - excellent documentation and community +- **For Deep Learning**: Try PyTorch or TensorFlow tutorials +- **For Visualization**: Begin with Matplotlib or Plotly +- **For NLP**: spaCy has great beginner-friendly examples +- **Good First Issues**: Look for "good first issue" labels in scikit-learn, Pandas, or Matplotlib + +## Learning Resources + +- [Kaggle Learn](https://www.kaggle.com/learn) - Free micro-courses in data science +- [Papers With Code](https://github.com/paperswithcode/paperswithcode) - Machine learning papers with code implementations +- [Awesome Machine Learning](https://github.com/josephmisiti/awesome-machine-learning) - Curated list of ML frameworks and libraries + +[return to top](../README.md) \ No newline at end of file From 22bcf676ff3fc777a586cbae96c3f0e6bea02022 Mon Sep 17 00:00:00 2001 From: sivamurthy30 Date: Thu, 9 Oct 2025 02:02:48 +0530 Subject: [PATCH 3/3] Enhance README with improved guidance and resources - Added comprehensive Quick Start Guide for new contributors with 5-step process - Included pro tips for successful contributions - Enhanced Finding Projects section with First Timers Only, Good First Issues, and CodeTriage - Expanded trending languages to include Go, Java, Rust, and TypeScript - Added major internship programs (GSoC, Outreachy, MLH Fellowship) - Updated maintainers section to include current contributor - Improved introduction text for better clarity --- README.md | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index c395a11..aea0719 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ ## Who this is for -Open Source Handbook is a resource for people of **all skill and experience levels** who want to learn about open source and save time finding projects! +Open Source Handbook is a comprehensive resource for people of **all skill and experience levels** who want to learn about open source, contribute to meaningful projects, and save time finding the right repositories to get started with! ## Intro @@ -35,6 +35,9 @@ Open Source Handbook is a resource for people of **all skill and experience leve - [Beginner projects](https://github.com/showcases/great-for-new-contributors) - [Beginner projects (more!)](https://github.com/MunGell/awesome-for-beginners) - [Searching GitHub](https://help.github.com/articles/finding-open-source-projects-on-github/) + - [First Timers Only](https://www.firsttimersonly.com/) - Friendly open source projects for new contributors + - [Good First Issues](https://goodfirstissues.com/) - Find projects with good first issues + - [CodeTriage](https://www.codetriage.com/) - Help your favorite open source projects ## Collections @@ -106,16 +109,38 @@ Open Source Handbook is a resource for people of **all skill and experience leve - [All Languages](https://github.com/trending) - [C++](https://github.com/trending/c++) - [CSS](https://github.com/trending/css) + - [Go](https://github.com/trending/go) - [HTML](https://github.com/trending/html) + - [Java](https://github.com/trending/java) - [JavaScript](https://github.com/trending/javascript) - [Python](https://github.com/trending/python) - [Ruby](https://github.com/trending/ruby) + - [Rust](https://github.com/trending/rust) - [Swift 📱](https://github.com/trending/swift) + - [TypeScript](https://github.com/trending/typescript) - [Unknown languages](https://github.com/trending/unknown) +## Quick Start Guide for New Contributors + +### 🚀 Your First Contribution in 5 Steps +1. **Find a project** - Use [Good First Issues](https://goodfirstissues.com/) or browse our categories +2. **Read the docs** - Check README.md and CONTRIBUTING.md files +3. **Set up locally** - Fork, clone, and follow setup instructions +4. **Make your change** - Start small with documentation or bug fixes +5. **Submit a PR** - Follow the project's contribution guidelines + +### 💡 Pro Tips for Success +- **Start small** - Documentation improvements and bug fixes are great first contributions +- **Be patient** - Maintainers are volunteers; reviews may take time +- **Ask questions** - Use issues or discussions to clarify requirements +- **Follow conventions** - Match the project's coding style and commit message format + ## Open Source Internships, Competitions, and Careers - [Internships and Competitions](https://github.com/tapaswenipathak/Open-Source-Programs) + - [Google Summer of Code](https://summerofcode.withgoogle.com/) - Annual program for students + - [Outreachy](https://www.outreachy.org/) - Internships for underrepresented groups + - [MLH Fellowship](https://fellowship.mlh.io/) - Remote internship program - [Careers](https://github.com/t9tio/open-source-jobs) [return to top](README.md) @@ -131,5 +156,6 @@ We would love for you to contribute! Please [fork and make a pull request](https ## Maintainers - [Shaina Krumme](https://github.com/shainakrumme) +- [Siva Murthy](https://github.com/sivamurthy30) [return to top](README.md)