diff --git a/README.md b/README.md index 4079f81..3ebe48e 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,7 @@ # CloudDojo CLI ⚡ > **Learn DevOps troubleshooting through gamified, story-driven challenges** -> *“The world isn’t perfect. But it’s there for us, doing the best it can…”* — **Madoka Kaname** - +> _“The world isn’t perfect. But it’s there for us, doing the best it can…”_ — **Madoka Kaname** [![PyPI version](https://badge.fury.io/py/clouddojo.svg)](https://badge.fury.io/py/clouddojo) [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/) @@ -23,8 +22,10 @@ clouddojo setup clouddojo dojo ``` + ## 🌟 What is CloudDojo? -CloudDojo is your gamified dojo where you learn DevOps troubleshooting through stories, sweat, and a dash of *anime spirit*. It’s like leveling up in your favorite RPG — but instead of dragons, you fight container demons. + +CloudDojo is your gamified dojo where you learn DevOps troubleshooting through stories, sweat, and a dash of _anime spirit_. It’s like leveling up in your favorite RPG — but instead of dragons, you fight container demons. - **🎮 Story-driven scenarios** — Real company drama, but without the annoying meetings - **🛤️ Structured learning paths** — Beginner → Intermediate → Pro-level boss fights @@ -34,13 +35,11 @@ CloudDojo is your gamified dojo where you learn DevOps troubleshooting through s ## 🎓 Learning Paths -| Path | Level | Focus | Scenarios | -|------|-------|-------|-----------| -| 🐋 **Container Fundamentals** | Beginner | Docker, nginx, configs | 3 | -| ⚓ **Kubernetes Warrior** | Intermediate | Pods, services, networking | 3 | -| 🔥 **Production SRE** | Advanced | System admin, automation | 2 | - - +| Path | Level | Focus | Scenarios | +| ----------------------------- | ------------ | -------------------------- | --------- | +| 🐋 **Container Fundamentals** | Beginner | Docker, nginx, configs | 3 | +| ⚓ **Kubernetes Warrior** | Intermediate | Pods, services, networking | 3 | +| 🔥 **Production SRE** | Advanced | System admin, automation | 2 | ## 🎯 What You'll Learn @@ -48,11 +47,12 @@ CloudDojo is your gamified dojo where you learn DevOps troubleshooting through s **Kubernetes debugging** — Because clusters love throwing tantrums -**Production scenarios** — Real-world chaos without the screaming bosses (hopefully) +**Production scenarios** — Real-world chaos without the screaming bosses (hopefully) **Best practices** — The ancient scrolls of DevOps wisdom, passed down so you don’t mess up ## 🤝 Contributing (Join the guild) + Think you can add cool new scenarios or fix stuff? Sweet! Here’s your questline: **Start here**: CONTRIBUTING.md @@ -63,10 +63,18 @@ Think you can add cool new scenarios or fix stuff? Sweet! Here’s your questlin **Submit your PR**: Await our wise elders’ approval +### 🐞 Release & Branching Policy Update + +All future **code changes, bug fixes, or feature updates** must be committed to the **latest release branch** (e.g., `release_v_0.x.x`). Once tested and verified, these changes will be **merged into `master`**, which always represents the **latest stable version**. After successful testing, a **new release version** will be published on **PyPI**. + +> ⚠️ Do not push directly to `master`. All merges must go through the release branch. + ## 🤖 AI-Assisted Contributions Welcome! + AI sidekicks like ChatGPT or Copilot are totally legit — just use our AI PR Template so we know you didn’t summon a demon instead. ## 📚 Documentation Scrolls + **CONTRIBUTING.md** — How to add epic scenarios **ARCHITECTURE.md** — The bones of CloudDojo @@ -76,6 +84,7 @@ AI sidekicks like ChatGPT or Copilot are totally legit — just use our AI PR Te **SECURITY.md** — Because safety first, ninja style ## 📋 Prerequisites (Your gear before the battle) + **Python 3.8+** **Docker** (auto-installed by setup, if you don’t want to wait forever) @@ -85,8 +94,7 @@ AI sidekicks like ChatGPT or Copilot are totally legit — just use our AI PR Te **NOTE:** Auto-install might take a while (because patience is a virtue nobody asked for). We highly recommend installing Docker, Kubernetes, and Minikube yourself first — then run the auto-setup to verify everything didn’t self-destruct. -We’re *“working hard”* on the setup script… but if you want to help, contributions are welcome. No pressure. Seriously. - +We’re _“working hard”_ on the setup script… but if you want to help, contributions are welcome. No pressure. Seriously. ## 🆘 Need Help? (Because even heroes ask for directions) @@ -106,6 +114,8 @@ We’re *“working hard”* on the setup script… but if you want to help, con If CloudDojo helped you level up your DevOps skills, please ⭐ **star the repository**! -> *“You should enjoy the little detours. To the fullest. Because that's where you'll find the things more important than what you want.” — Gintama* +> _“You should enjoy the little detours. To the fullest. Because that's where you'll find the things more important than what you want.” — Gintama_ + --- + **Ready to become a DevOps master?** `pip install clouddojo` 🚀 diff --git a/clouddojo/cli.py b/clouddojo/cli.py index d595e04..cc580b4 100644 --- a/clouddojo/cli.py +++ b/clouddojo/cli.py @@ -34,6 +34,11 @@ import subprocess import platform +# --- Runtime OS Check --- +# --- Block Windows (except WSL) --- +if sys.platform.startswith("win") and "microsoft" not in platform.release().lower(): + sys.exit("\nERROR: CloudDojo CLI does not support Windows. Please use macOS, Linux, or WSL.\n") + from clouddojo.base_scenario import BaseScenario from clouddojo.metadata_registry import registry from clouddojo.progress import ProgressTracker diff --git a/clouddojo/scenarios/no-space-left/Dockerfile b/clouddojo/scenarios/no-space-left/Dockerfile new file mode 100644 index 0000000..28d3b35 --- /dev/null +++ b/clouddojo/scenarios/no-space-left/Dockerfile @@ -0,0 +1,25 @@ +# scenarios/no-space-left/Dockerfile +FROM alpine:3.18 + +# Install basic tools including e2fsprogs for mkfs.ext4 +RUN apk add --no-cache \ + bash \ + coreutils \ + findutils \ + procps \ + e2fsprogs + +# Create a small filesystem with limited inodes (1000 inodes total) +RUN dd if=/dev/zero of=/tmp/small_fs.img bs=1M count=10 && \ + mkfs.ext4 -N 1000 /tmp/small_fs.img && \ + mkdir -p /mnt/small_fs + +# Create the inode-filling script +COPY fill_inodes.sh /usr/local/bin/ +RUN chmod +x /usr/local/bin/fill_inodes.sh + +WORKDIR /app + +# Expose nothing - this is a system troubleshooting scenario +# Start with bash and keep container running +CMD ["/bin/bash", "-c", "tail -f /dev/null"] \ No newline at end of file diff --git a/clouddojo/scenarios/no-space-left/__init__.py b/clouddojo/scenarios/no-space-left/__init__.py new file mode 100644 index 0000000..47e3ae9 --- /dev/null +++ b/clouddojo/scenarios/no-space-left/__init__.py @@ -0,0 +1,232 @@ +#!/usr/bin/env python3 +""" +No Space Left Scenario - Inode Exhaustion Troubleshooting +""" + +import docker +from pathlib import Path +from typing import Dict, Any, List, Optional +from clouddojo.base_scenario import BaseScenario +from clouddojo.scenario_metadata import ScenarioMetadata, StoryContext, Hint, CompanyType + +class NoSpaceLeftMetadata(ScenarioMetadata): + """Metadata for no-space-left scenario""" + + def get_story_context(self) -> StoryContext: + return StoryContext( + company_name="DataKai INC", + company_type=CompanyType.STARTUP, + your_role="Site Reliability Engineer", + situation="Black Friday sale is ongoing. Suddenly you see no logs on Grafana dashboard. The monitoring service can't create new log files in /mnt/small_fs/ despite having disk space available. The error says 'No space left on device' but df -h shows space available.", + urgency="very-critical", + stakeholders=["Engineering Team", "SRE Team", "Business Operations"], + business_impact="No monitoring data during peak sales period. Revenue at risk.", + success_criteria="Monitoring service can create files in /mnt/small_fs/" + ) + + def get_hints(self) -> List[Hint]: + return [ + Hint(1, "Check All Filesystems", + "Check disk space on all mounted filesystems.", + "df -h"), + + Hint(2, "Focus on the Problem Filesystem", + "Look at /mnt/small_fs - notice the high inode usage. full space usage always dont means df -h ", + "df -i /mnt/small_fs"), + + Hint(3, "Find the Culprit Directory", + "Look for directories with many small files in the problematic filesystem.", + "ls -la /mnt/small_fs/ && find /mnt/small_fs/logs -type f | wc -l"), + + Hint(4, "Clean Up Files", + "Remove the excessive log files to free up inodes.", + "rm -rf /mnt/small_fs/logs/*") + ] + + def get_learning_path(self) -> str: + # This scenrio is running in docker but falls under production-sre, could be implemented without + # docker but need to play with the host machine so choose docker + return "production-sre" + + def get_completion_story(self, time_taken: int) -> str: + time_str = f"{time_taken // 60}m {time_taken % 60}s" if time_taken > 0 else "record time" + return f"Excellent work! You've successfully resolved the inode exhaustion issue on /mnt/small_fs/. The monitoring service can now create log files again and Black Friday metrics are flowing. You learned that 'No space left on device' can mean inode exhaustion, not just disk space! Resolution time: {time_str}" + +class NoSpaceLeft(BaseScenario): + """No space left on device - inode exhaustion scenario""" + + def __init__(self, name: str): + super().__init__(name) + self._metadata = NoSpaceLeftMetadata() + self.docker_client = docker.from_env() + self.container_name = f"clouddojo-{name}" + + def get_metadata(self) -> Optional[ScenarioMetadata]: + return self._metadata + + @property + def description(self) -> str: + return "Troubleshoot inode exhaustion causing 'No space left on device' errors despite available disk space" + + @property + def difficulty(self) -> str: + return "intermediate" + + @property + def technologies(self) -> list: + return ["docker", "linux"] + + def start(self) -> Dict[str, Any]: + """Start the scenario - set up the broken environment""" + try: + self.stop() + + # Build image from scenario directory + image_name = "clouddojo-no-space-left" + if not self._image_exists(image_name): + scenario_dir = Path(__file__).parent + self.docker_client.images.build( + path=str(scenario_dir), + tag=image_name, + rm=True, + forcerm=True + ) + + # Run container with privileged mode for mounting + container = self.docker_client.containers.run( + image_name, + name=self.container_name, + detach=True, + tty=True, + privileged=True + ) + + # Execute the inode filling script with privileged mode for mounting + container.exec_run("/usr/local/bin/fill_inodes.sh", privileged=True) + + connection_info = f"""Container: {self.container_name} +Access: docker exec -it {self.container_name} bash""" + + instructions = """🔧 TROUBLESHOOTING SCENARIO: No Space Left on Device + +📋 SITUATION: +The monitoring service can't create new log files in /mnt/small_fs/. You're getting "No space left on device" errors, +but when you check disk space with 'df -h', there seems to be plenty available. This is confusing! + +🎯 YOUR MISSION: +1. Check disk space usage on ALL filesystems +2. Try creating a file: touch /mnt/small_fs/test.txt +3. Investigate why files can't be created +4. Find and clean up the problematic files +5. Verify you can create files again + +🔍 TEST THE PROBLEM: touch /mnt/small_fs/newfile.txt""" + + return { + "success": True, + "connection_info": connection_info, + "instructions": instructions + } + + except Exception as e: + return {"success": False, "error": f"Failed to start: {str(e)}"} + + def stop(self) -> bool: + """Stop and cleanup the scenario""" + try: + # Stop and remove container + try: + container = self.docker_client.containers.get(self.container_name) + container.stop() + container.remove() + except docker.errors.NotFound: + pass + + # No temporary files to clean up + + return True + except Exception: + return False + + def status(self) -> Dict[str, Any]: + """Get current scenario status""" + try: + container = self.docker_client.containers.get(self.container_name) + if container.status == "running": + # Check inode usage + result = container.exec_run("df -i") + details = f"Container Status: Running\nInode Status:\n{result.output.decode()}" + return {"running": True, "details": details} + else: + return {"running": False, "details": "Container not running"} + except docker.errors.NotFound: + return {"running": False, "details": "Container not found"} + except Exception as e: + return {"running": False, "details": f"Error: {str(e)}"} + + def check(self) -> Dict[str, Any]: + """Check if the scenario has been solved""" + try: + container = self.docker_client.containers.get(self.container_name) + + # Check if container is running + container_running = container.status == "running" + + # Check if we can create files (inodes available) + can_create_files = False + inode_usage_ok = False + + if container_running: + # Try to create a test file on the small filesystem + result = container.exec_run("touch /mnt/small_fs/test_file.txt") + can_create_files = result.exit_code == 0 + + + checks = [ + ("Container is running", container_running), + ("Can create new files", can_create_files), + ] + + all_passed = all(passed for _, passed in checks) + + feedback_lines = [] + for check_name, passed in checks: + status = " PASS" if passed else " FAIL" + feedback_lines.append(f"{status} {check_name}") + + if all_passed: + return { + "passed": True, + "feedback": "\n".join(feedback_lines) + "\n\n🎉 Scenario completed successfully! You've resolved the inode exhaustion issue!" + } + else: + return { + "passed": False, + "feedback": "\n".join(feedback_lines), + "hints": "Check inode usage with 'df -i /mnt/small_fs' and look in /mnt/small_fs/logs/ for many small files" + } + + except docker.errors.NotFound: + return {"passed": False, "feedback": " Container not found"} + except Exception as e: + return {"passed": False, "feedback": f" Error: {str(e)}"} + + def reset(self) -> bool: + """Reset scenario to broken state""" + try: + # TODO: Reset to broken state + # Usually: stop() then start() + return self.stop() and self.start().get("success", False) + except Exception: + return False + + def _image_exists(self, image_name: str) -> bool: + """Check if Docker image exists""" + try: + self.docker_client.images.get(image_name) + return True + except docker.errors.ImageNotFound: + return False + +# REQUIRED: Export the scenario class +scenario_class = NoSpaceLeft \ No newline at end of file diff --git a/clouddojo/scenarios/no-space-left/fill_inodes.sh b/clouddojo/scenarios/no-space-left/fill_inodes.sh new file mode 100644 index 0000000..ccad896 --- /dev/null +++ b/clouddojo/scenarios/no-space-left/fill_inodes.sh @@ -0,0 +1,25 @@ +#!/bin/bash +# Fill up inodes using a small filesystem +# This simulates a runaway logging process + +echo "=== Inode Exhaustion Simulation ===" +echo "Setting up small filesystem with limited inodes..." + +# Check if filesystem is already mounted, if not mount it +if ! mountpoint -q /mnt/small_fs; then + mount -o loop /tmp/small_fs.img /mnt/small_fs +fi + +echo "Initial inode status:" +df -i /mnt/small_fs + +echo "" +echo "Simulating runaway log file creation..." +mkdir -p /mnt/small_fs/logs +cd /mnt/small_fs/logs + +# Create 1000 files to exhuast inodes +echo "Creating 1000 log files to nearly exhaust inodes..." +for i in {1..1000}; do + touch "app_${i}.log" +done diff --git a/tests/test_cli_os_check.py b/tests/test_cli_os_check.py new file mode 100644 index 0000000..45f5755 --- /dev/null +++ b/tests/test_cli_os_check.py @@ -0,0 +1,29 @@ +import sys +import importlib +import pytest +from unittest.mock import patch + +@pytest.mark.parametrize( + "platform_str, release_str, should_exit", + [ + ("linux", "5.15.0-73-generic", False), + ("darwin", "22.6.0", False), + # native Windows -> should exit + ("win32", "11", True), + # WSL -> should pass + ("win32", "5.15.90.1-microsoft-standard-WSL2", False), + ] +) +def test_cli_os_check(platform_str, release_str, should_exit): + """Test the top-level OS check in clouddojo/cli.py""" + + with patch("sys.platform", platform_str), patch("platform.release", return_value=release_str): + if "clouddojo.cli" in sys.modules: + del sys.modules["clouddojo.cli"] + + if should_exit: + with pytest.raises(SystemExit) as excinfo: + importlib.import_module("clouddojo.cli") + assert "CloudDojo CLI does not support Windows. Please use macOS, Linux, or WSL." in str(excinfo.value) + else: + importlib.import_module("clouddojo.cli")