waggle-sensor · saumya-pailwan · Aug 21, 2025 · Aug 21, 2025 · Sep 11, 2025 · Sep 11, 2025
diff --git a/Dockerfile b/Dockerfile
@@ -4,6 +4,8 @@ FROM nvcr.io/nvidia/pytorch:24.06-py3
 WORKDIR /app
 COPY . ./
 
+ENV PLANTNET_API_KEY=2b10tSubhbpUaT0XF3sNpl0hYe
+
 # 2. Install dependencies
 RUN pip install --upgrade pip
 RUN pip install -r requirements.txt
@@ -18,18 +20,8 @@ RUN python -c "import cv2; print('OpenCV imported successfully')"
 
 # 3. Create a local cache directory for the model
 RUN mkdir -p /hf_cache/microsoft/Florence-2-base
-# RUN mkdir -p /hf_cache/microsoft/Florence-2-large
-
-# 4. Download the model files info /hf_cache/microsoft/Florence-2-large
-RUN huggingface-cli download \
-	microsoft/Florence-2-large \
-	--repo-type model \
-	--cache-dir /hf_cache \
-	--local-dir /hf_cache/microsoft/Florence-2-large \
-	--resume \
-	--force  # <-- ensures we overwrite any existing files or resume a download
 
-# 5. Download the model files info /hf_cache/microsoft/Florence-2-base
+# 4. Download the model files info /hf_cache/microsoft/Florence-2-base
 RUN huggingface-cli download \
 	microsoft/Florence-2-base \
 	--repo-type model \
@@ -38,7 +30,7 @@ RUN huggingface-cli download \
 	--resume \
 	--force  # <-- ensures we overwrite any existing files or resume a download
 
-# 6. Set the environment variables for offline mode
+# 5. Set the environment variables for offline mode
 ENV HF_HOME=/hf_cache
 ENV TRANSFORMERS_OFFLINE=1
 ENV HF_DATASETS_OFFLINE=1

diff --git a/README.md b/README.md
@@ -1,26 +1,58 @@
 # PTZ APP
 
-This is an application for sending images of specific objects autonomously using PTZ cameras.
+This is an intelligent, autonomous PTZ camera application that uses advanced vision-language models (YOLO or Florence-2) to detect, frame, and analyze objects of interest in real-time. It is designed for deployment on edge computing nodes within the Sage project.
+
+---
+
+## What’s New
+
+- **Florence-2 Enhancements**
+  - Automatic scene context generation at the start of scans.
+  - Support for manual prompt injection with `--prompt_prefix` to guide detections.
+
+- **Enhanced Data Logging**
+  - Publishes scene captions (Florence-2) and raw detection data (label, confidence, position) with each scan.
+  - Improved debug-level logging.
+
+- **Pipeline Updates**
+  - Added Scene Analysis step (Florence-2 only).
+  - Data Publishing step now explicitly includes metadata publishing.
+
+---
 
 ## How It Works
 
 The algorithm performs the following steps:
 
-1. **Initialization**: Sets up object detection model (YOLO or Florence) based on user parameters.
+1. **Initialization**  
+   Sets up the object detection model (`YOLO` or `Florence-2`) based on user parameters.
 
-2. **Area Scanning**: Systematically scans the environment by rotating the PTZ camera in pan steps (default: 15 degrees) through a full 360° rotation at the specified tilt and zoom level.
+2. **Scene Analysis (Florence-2 Only)**  
+   At the start of a scan, Florence-2 can automatically generate a detailed text caption of the current scene to use as a dynamic, contextual prompt.  
 
-3. **Object Detection**: At each camera position, captures an image and runs object detection to identify specified objects (e.g., person, car, dog).
+3. **Contextual Area Scanning**  
+   Systematically scans the environment by rotating the PTZ camera in pan steps (default: 15°) through a full 360° rotation at the specified tilt and zoom level.  
+   When using Florence-2, the generated (or user-specified) context is incorporated to improve detection relevance.
 
-4. **Filtering**: Filters detections based on confidence threshold (default: 0.1).
+4. **Object Detection**  
+   At each camera position, captures an image and runs object detection to identify specified objects (e.g., person, car, dog).
 
-5. **Object Tracking**: When an object of interest is detected with sufficient confidence, the algorithm:
+5. **Filtering**  
+   Filters detections based on confidence threshold (default: 0.1).
+
+6. **Object Tracking**  
+   When an object of interest is detected with sufficient confidence, the algorithm:
    - Centers the camera on the detected object
    - Adjusts zoom to maximize the object in the frame
 
-6. **Image Publishing**: Saves and publishes the optimized images of detected objects.
+7. **Data Publishing**  
+   Saves and publishes the optimized images of detected objects.  
+   Publishes rich metadata—including scene captions (Florence-2), raw detection data (labels, confidence, position), and logging outputs—to the Sage data portal.
+
+8. **Iteration**  
+   Repeats the process for the specified number of iterations with configurable delay between scans.
 
-7. **Iteration**: Repeats the process for the specified number of iterations with configurable delay between scans.
+---
 
 ## Build the container
 
@@ -51,6 +83,24 @@ sudo docker run -it --rm your_docker_hub_user_name/ptzapp:latest -ki -it 5 -un c
 ```bash
 sudo docker run --gpus all -it --rm your_docker_hub_user_name/ptzapp:latest --model Florence-base --iterations 5 --username username --password 'password' --cameraip 130.202.23.92 --objects 'person,car'
 ```
+## Advanced Usage with Florence-2
+
+### Fully Autonomous Mode (Automatic Context)
+
+When using Florence-2 without a manual prompt, the application will automatically analyze the scene to generate its own context before searching for objects:
+
+```bash
+sudo docker run --gpus all -it --rm your_docker_hub_user_name/ptzapp:latest --model Florence-base --objects 'animal,bird,deer' --username <user> --password '<pass>' --cameraip <ip>
+```
+
+### Manual Context Prompt
+
+You can provide your own context to the model using the `--prompt_prefix` argument to guide detections:
+
+```bash
+sudo docker run --gpus all -it --rm your_docker_hub_user_name/ptzapp:latest --model Florence-base --objects 'animal,bird' --prompt_prefix 'A photo from a trail camera in a wilderness environment' --username <user> --password '<pass>' --cameraip <ip>
+```
+
 
 ## Using Different Object Detection Models
 
@@ -94,4 +144,91 @@ sudo docker run --gpus all -it --rm your_docker_hub_user_name/ptzapp:latest --mo
 | `--zoom` | `-zm` | Zoom value | 1 |
 | `--confidence` | `-conf` | Confidence threshold (0-1) | 0.1 |
 | `--iterdelay` | `-id` | Minimum delay in seconds between iterations | 60.0 |
+| `--prompt_prefix` |  | Manual text prompt for Florence-2 context | "" |
 | `--debug` | | Enable debug level logging | False |
+
+## Environment Variables
+1. `PLANTNET_API_KEY` — for PlantNet API calls
+2. `BLUR_MIN` — Laplacian variance threshold to trigger a focus retry (default ~120)
+3. `SPECIES_MIN_SCORE` — minimum PlantNet confidence to treat as “confident” (e.g., 0.25)
+
+## Results & Observations
+
+When you launch the app (either with python main.py … or via the Docker command shown above), you’ll see three kinds of outputs:
+
+1. Console logs
+2. Saved images under /imgs (mount this to a host folder to persist)
+3. Published messages (via Waggle plugin) containing detections & species results
+
+### 1. Console Logs
+You should see a sequence like:
+- Scene caption (Florence only, if enabled)
+  ```bash
+  Generating dynamic context caption for the scene...
+  Scene Context: "The image shows a red fence with multiple rows of small holes..."
+  ```
+- PTZ sweep & detection
+  ```bash
+    Trying PTZ: 0 0 1
+    Published detection: ptz.detection.p0t0z1
+    Plant detected (trees). Starting species identification workflow...
+  ```
+- Centering & zoom math (in degrees) and a best-of-N capture with blur score
+  ```bash
+  CAMERA MOVEMNET
+  zoom_level: 
+  current_h_fov: 
+  current_v_fov:
+  Move the camera to center the object
+  Pan:
+  Tilt:
+
+  Taking final snapshot(s) for PlantNet... (Example output)
+  [PLANTNET] using image -> /imgs/50.19,-13.11,11.38_plantnet_try_2025-09-10_23:19:17.327619.jpg (blur=9242.4)
+  ```
+- PlantNet result (success)
+  ```bash
+  Species: Quercus garryana
+  Common Names: ['Garry oak', 'Oregon oak', 'Oregon white oak']
+  Score: 0.3217
+  ```
+- PlantNet error example (no match / 404) (Does not publish misclassification)
+  ```bash
+  PlantNet identification failed: PlantNet API request failed with status 404: {"statusCode":404,"error":"Not Found","message":"Species not found"}
+  ```
+
+### 2) Saved Images
+All captured frames land in /imgs inside the container. Mount it to your host to persist.
+Filename format: 
+```bash
+  <pan>,<tilt>,<zoom>_<action>_<YYYY-MM-DD_HH:MM:SS.ffffff>.jpg
+```
+Without --keepimages, interim candidates may be cleaned up; the final selection is kept when you mount /imgs
+
+### 3) Published Messages (Waggle) - Sample output
+- Scene caption (Florence): ptz.scene.caption — free-text description
+- Per-position detection: ptz.detection.p{pan}t{tilt}z{zoom}
+- Blur/sharpness telemetry: ptz.image.blur
+- PlantNet species (if any): ptz.plantnet.species
+- Plain score: ptz.plantnet.score
+- Alerts (optional, via alert_system.py): ptz.alert.<ALERT_TYPE> with the species JSON
+
+### What a “Good” Run Looks Like
+
+- Multiple `Trying PTZ: …` lines per iteration
+- At least one `ptz.detection.p...` with confidence ≥ your `--confidence`
+- For plant labels: centering/zoom logs, blur telemetry, PlantNet success block or a clear error
+- Images written to `imgs/` (mount or `--keepimages`)
+
+### Troubleshooting
+- No species shown: PlantNet may return 404/no match. Ensure `PLANTNET_API_KEY` is set; improve view (more leaves/flowers, less backlight), increase `--species_zoom`, or adjust framing.
+- Detections but no centering/zoom: Detection didn’t pass `--confidence`. Lower it slightly or ensure your `--objects` include plant terms (plant,tree,flower,bush,wildflower…).
+- No images on host: Mount `/imgs` (`-v "$(pwd)/imgs":/imgs`) or use `--keepimages`.
+- Soft images (low blur): Increase settle delays, try a larger `--species_zoom`, or lower `BLUR_MIN` to reduce retries.
+
+### How It Works
+- Detect objects with YOLO or Florence-2.
+- Route plants via a keyword map (tree, bush, flower, plant, …).
+- Center & maximize the bbox using FOV-based pan/tilt and relative zoom.
+- Best-of-N capture with Laplacian variance; pick the sharpest (optionally focus-jiggle retry).
+- PlantNet identify and publish results + blur telemetry + optional alerts.
diff --git a/ecr-meta/ecr-icon.jpg b/ecr-meta/ecr-icon.jpg
diff --git a/ecr-meta/ecr-science-description.md b/ecr-meta/ecr-science-description.md
@@ -7,7 +7,10 @@ The application can deploy either **YOLO** (yolov8-yolo11n) or **Florence v2** m
 The workflow is:
 1. The camera rotates (pan/tilt) and zooms in pre-determined or incremental steps to scan the environment.  
 2. Live frames are captured and processed by the selected AI model (YOLO or Florence v2).  
-3. If an object of interest is detected with sufficient confidence, the system automatically adjusts the PTZ camera to center and maximize the object in the frame.  
+3. If an object of interest is detected with sufficient confidence, the system automatically adjusts the PTZ camera to center and maximize the object in the frame.
+  i. After centering/zooming on plants, the app can optionally perform species identification using the PlantNet API. 
+  ii. It captures several bracketed snapshots at different zooms, ranks them by sharpness (variance of Laplacian), and submits the sharpest image to PlantNet. 
+  iii. Results (top candidate and optional top-K) are published as Waggle telemetry, with an optional alert if the species matches a monitored list (e.g., invasive/rare)
 4. A picture is taken and sent to the cloud infrastructure for further processing, archiving, or real-time alerts.
 
 By pushing this AI capability to the edge, the system operates continuously with minimal latency and reduced bandwidth usage—uploading only relevant snapshots rather than a constant video feed.
@@ -24,6 +27,23 @@ By pushing this AI capability to the edge, the system operates continuously with
 - Can detect virtually any object when used with the wildcard (`*`) parameter
 - Operates in `<OD>` task mode for general object detection
 - More resource-intensive but provides greater detection flexibility
+- Can optionally caption the scene to build context for detection
+- Used in <OD> (object detection) mode for general objects, then branches to PlantNet when labels resemble plants 
+
+### Image Quality & Focus
+- Sharpness metric: variance of Laplacian (higher = sharper).
+- Telemetry: publishes blur as ptz.image.blur with blur_var_laplacian.
+- Refocus gate: if blur < BLUR_MIN (env), a short focus pulse is attempted before retrying.
+- Settle delays: short sleeps after pan/tilt/zoom to allow AF/exposure to stabilize.
+
+### Telemetry Topics (Waggle)
+- ptz.detection.p{pan}t{tilt}z{zoom} — label, confidence, bbox, PTZ pose, timestamp
+- ptz.scene.caption — optional Florence scene caption used as context
+- ptz.image.blur — blur metric for the chosen PlantNet frame
+- ptz.plantnet.candidates — (debug) top-K candidates with scores
+- ptz.plantnet.species — final published species (gated by SPECIES_MIN_SCORE)
+- ptz.plantnet.score — convenience score metric
+- ptz.alert.{type} — alert on invasive/rare species (if configured)
 
 # Arguments
 The application supports the following command-line arguments:
@@ -53,6 +73,18 @@ The application supports the following command-line arguments:
   Keep collected images in persistent folder for later use (Default: False)
 - **`--debug`**  
   Enable debug level logging (Default: False)
+- **`--prompt_prefix`**  
+  Optional text prefix to add context for Florence prompts (empty = auto caption).
+- **`--species_zoom`**  
+  Extra relative zoom step used for species detail (Default: 10).
+
+### Environment Variables
+- **`PLANTNET_API_KEY`**  
+  Required for PlantNet API calls.
+- **`BLUR_MIN`**  
+  Laplacian variance threshold to trigger a focus retry (default ~120).
+- **`SPECIES_MIN_SCORE`**  
+  Minimum PlantNet confidence to treat as “confident” (eg 0.25).
 
 ## Example Usage
 
@@ -71,6 +103,14 @@ python main.py -it 5 -obj "person,car,dog" -un admin -pw secret -ip 192.168.1.10
 python main.py -it 5 -obj "*" -un username -pw 'password' -ip 130.202.23.92 -m Florence-base -conf 0.15
 ```
 
+### Using Florence for Plant species detection
+```bash
+PLANTNET_API_KEY=... BLUR_MIN=70 SPECIES_MIN_SCORE=0.25 \
+python main.py \
+  -it 3 -obj "plant,tree" -un camera -pw 'secret' -ip 192.168.1.100 \
+  -m Florence-base --species_zoom 10 --iterdelay 0 --debug
+```
+
 # Ontology
 The interesting images collected by the system are tagged with metadata for easy retrieval and analysis. This includes:
 
@@ -80,4 +120,12 @@ The interesting images collected by the system are tagged with metadata for easy
 - Camera position (pan, tilt, zoom)
 - Location data (if available)
 
+In addition to existing fields, images and messages may include:
+- Species (scientific)
+- Common names
+- PlantNet score
+- Blur sharpness (blur_var_laplacian)
+- Candidates (top-K species + scores, debug)
+- These fields enable downstream filtering by species, quality scoring, and confidence-based triage.
+
 This metadata enables systematic analysis of object presence, movement patterns, and temporal dynamics in the monitored environment.
diff --git a/ecr-meta/ecr-science-image.jpg b/ecr-meta/ecr-science-image.jpg