Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 4 additions & 12 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ FROM nvcr.io/nvidia/pytorch:24.06-py3
WORKDIR /app
COPY . ./

ENV PLANTNET_API_KEY=2b10tSubhbpUaT0XF3sNpl0hYe

# 2. Install dependencies
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
Expand All @@ -18,18 +20,8 @@ RUN python -c "import cv2; print('OpenCV imported successfully')"

# 3. Create a local cache directory for the model
RUN mkdir -p /hf_cache/microsoft/Florence-2-base
# RUN mkdir -p /hf_cache/microsoft/Florence-2-large

# 4. Download the model files info /hf_cache/microsoft/Florence-2-large
RUN huggingface-cli download \
microsoft/Florence-2-large \
--repo-type model \
--cache-dir /hf_cache \
--local-dir /hf_cache/microsoft/Florence-2-large \
--resume \
--force # <-- ensures we overwrite any existing files or resume a download

# 5. Download the model files info /hf_cache/microsoft/Florence-2-base
# 4. Download the model files info /hf_cache/microsoft/Florence-2-base
RUN huggingface-cli download \
microsoft/Florence-2-base \
--repo-type model \
Expand All @@ -38,7 +30,7 @@ RUN huggingface-cli download \
--resume \
--force # <-- ensures we overwrite any existing files or resume a download

# 6. Set the environment variables for offline mode
# 5. Set the environment variables for offline mode
ENV HF_HOME=/hf_cache
ENV TRANSFORMERS_OFFLINE=1
ENV HF_DATASETS_OFFLINE=1
Expand Down
153 changes: 145 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,58 @@
# PTZ APP

This is an application for sending images of specific objects autonomously using PTZ cameras.
This is an intelligent, autonomous PTZ camera application that uses advanced vision-language models (YOLO or Florence-2) to detect, frame, and analyze objects of interest in real-time. It is designed for deployment on edge computing nodes within the Sage project.

---

## What’s New

- **Florence-2 Enhancements**
- Automatic scene context generation at the start of scans.
- Support for manual prompt injection with `--prompt_prefix` to guide detections.

- **Enhanced Data Logging**
- Publishes scene captions (Florence-2) and raw detection data (label, confidence, position) with each scan.
- Improved debug-level logging.

- **Pipeline Updates**
- Added Scene Analysis step (Florence-2 only).
- Data Publishing step now explicitly includes metadata publishing.

---

## How It Works

The algorithm performs the following steps:

1. **Initialization**: Sets up object detection model (YOLO or Florence) based on user parameters.
1. **Initialization**
Sets up the object detection model (`YOLO` or `Florence-2`) based on user parameters.

2. **Area Scanning**: Systematically scans the environment by rotating the PTZ camera in pan steps (default: 15 degrees) through a full 360° rotation at the specified tilt and zoom level.
2. **Scene Analysis (Florence-2 Only)**
At the start of a scan, Florence-2 can automatically generate a detailed text caption of the current scene to use as a dynamic, contextual prompt.

3. **Object Detection**: At each camera position, captures an image and runs object detection to identify specified objects (e.g., person, car, dog).
3. **Contextual Area Scanning**
Systematically scans the environment by rotating the PTZ camera in pan steps (default: 15°) through a full 360° rotation at the specified tilt and zoom level.
When using Florence-2, the generated (or user-specified) context is incorporated to improve detection relevance.

4. **Filtering**: Filters detections based on confidence threshold (default: 0.1).
4. **Object Detection**
At each camera position, captures an image and runs object detection to identify specified objects (e.g., person, car, dog).

5. **Object Tracking**: When an object of interest is detected with sufficient confidence, the algorithm:
5. **Filtering**
Filters detections based on confidence threshold (default: 0.1).

6. **Object Tracking**
When an object of interest is detected with sufficient confidence, the algorithm:
- Centers the camera on the detected object
- Adjusts zoom to maximize the object in the frame

6. **Image Publishing**: Saves and publishes the optimized images of detected objects.
7. **Data Publishing**
Saves and publishes the optimized images of detected objects.
Publishes rich metadata—including scene captions (Florence-2), raw detection data (labels, confidence, position), and logging outputs—to the Sage data portal.

8. **Iteration**
Repeats the process for the specified number of iterations with configurable delay between scans.

7. **Iteration**: Repeats the process for the specified number of iterations with configurable delay between scans.
---

## Build the container

Expand Down Expand Up @@ -51,6 +83,24 @@ sudo docker run -it --rm your_docker_hub_user_name/ptzapp:latest -ki -it 5 -un c
```bash
sudo docker run --gpus all -it --rm your_docker_hub_user_name/ptzapp:latest --model Florence-base --iterations 5 --username username --password 'password' --cameraip 130.202.23.92 --objects 'person,car'
```
## Advanced Usage with Florence-2

### Fully Autonomous Mode (Automatic Context)

When using Florence-2 without a manual prompt, the application will automatically analyze the scene to generate its own context before searching for objects:

```bash
sudo docker run --gpus all -it --rm your_docker_hub_user_name/ptzapp:latest --model Florence-base --objects 'animal,bird,deer' --username <user> --password '<pass>' --cameraip <ip>
```

### Manual Context Prompt

You can provide your own context to the model using the `--prompt_prefix` argument to guide detections:

```bash
sudo docker run --gpus all -it --rm your_docker_hub_user_name/ptzapp:latest --model Florence-base --objects 'animal,bird' --prompt_prefix 'A photo from a trail camera in a wilderness environment' --username <user> --password '<pass>' --cameraip <ip>
```


## Using Different Object Detection Models

Expand Down Expand Up @@ -94,4 +144,91 @@ sudo docker run --gpus all -it --rm your_docker_hub_user_name/ptzapp:latest --mo
| `--zoom` | `-zm` | Zoom value | 1 |
| `--confidence` | `-conf` | Confidence threshold (0-1) | 0.1 |
| `--iterdelay` | `-id` | Minimum delay in seconds between iterations | 60.0 |
| `--prompt_prefix` | | Manual text prompt for Florence-2 context | "" |
| `--debug` | | Enable debug level logging | False |

## Environment Variables
1. `PLANTNET_API_KEY` — for PlantNet API calls
2. `BLUR_MIN` — Laplacian variance threshold to trigger a focus retry (default ~120)
3. `SPECIES_MIN_SCORE` — minimum PlantNet confidence to treat as “confident” (e.g., 0.25)

## Results & Observations

When you launch the app (either with python main.py … or via the Docker command shown above), you’ll see three kinds of outputs:

1. Console logs
2. Saved images under /imgs (mount this to a host folder to persist)
3. Published messages (via Waggle plugin) containing detections & species results

### 1. Console Logs
You should see a sequence like:
- Scene caption (Florence only, if enabled)
```bash
Generating dynamic context caption for the scene...
Scene Context: "The image shows a red fence with multiple rows of small holes..."
```
- PTZ sweep & detection
```bash
Trying PTZ: 0 0 1
Published detection: ptz.detection.p0t0z1
Plant detected (trees). Starting species identification workflow...
```
- Centering & zoom math (in degrees) and a best-of-N capture with blur score
```bash
CAMERA MOVEMNET
zoom_level:
current_h_fov:
current_v_fov:
Move the camera to center the object
Pan:
Tilt:

Taking final snapshot(s) for PlantNet... (Example output)
[PLANTNET] using image -> /imgs/50.19,-13.11,11.38_plantnet_try_2025-09-10_23:19:17.327619.jpg (blur=9242.4)
```
- PlantNet result (success)
```bash
Species: Quercus garryana
Common Names: ['Garry oak', 'Oregon oak', 'Oregon white oak']
Score: 0.3217
```
- PlantNet error example (no match / 404) (Does not publish misclassification)
```bash
PlantNet identification failed: PlantNet API request failed with status 404: {"statusCode":404,"error":"Not Found","message":"Species not found"}
```

### 2) Saved Images
All captured frames land in /imgs inside the container. Mount it to your host to persist.
Filename format:
```bash
<pan>,<tilt>,<zoom>_<action>_<YYYY-MM-DD_HH:MM:SS.ffffff>.jpg
```
Without --keepimages, interim candidates may be cleaned up; the final selection is kept when you mount /imgs

### 3) Published Messages (Waggle) - Sample output
- Scene caption (Florence): ptz.scene.caption — free-text description
- Per-position detection: ptz.detection.p{pan}t{tilt}z{zoom}
- Blur/sharpness telemetry: ptz.image.blur
- PlantNet species (if any): ptz.plantnet.species
- Plain score: ptz.plantnet.score
- Alerts (optional, via alert_system.py): ptz.alert.<ALERT_TYPE> with the species JSON

### What a “Good” Run Looks Like

- Multiple `Trying PTZ: …` lines per iteration
- At least one `ptz.detection.p...` with confidence ≥ your `--confidence`
- For plant labels: centering/zoom logs, blur telemetry, PlantNet success block or a clear error
- Images written to `imgs/` (mount or `--keepimages`)

### Troubleshooting
- No species shown: PlantNet may return 404/no match. Ensure `PLANTNET_API_KEY` is set; improve view (more leaves/flowers, less backlight), increase `--species_zoom`, or adjust framing.
- Detections but no centering/zoom: Detection didn’t pass `--confidence`. Lower it slightly or ensure your `--objects` include plant terms (plant,tree,flower,bush,wildflower…).
- No images on host: Mount `/imgs` (`-v "$(pwd)/imgs":/imgs`) or use `--keepimages`.
- Soft images (low blur): Increase settle delays, try a larger `--species_zoom`, or lower `BLUR_MIN` to reduce retries.

### How It Works
- Detect objects with YOLO or Florence-2.
- Route plants via a keyword map (tree, bush, flower, plant, …).
- Center & maximize the bbox using FOV-based pan/tilt and relative zoom.
- Best-of-N capture with Laplacian variance; pick the sharpest (optionally focus-jiggle retry).
- PlantNet identify and publish results + blur telemetry + optional alerts.
Binary file modified ecr-meta/ecr-icon.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
50 changes: 49 additions & 1 deletion ecr-meta/ecr-science-description.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,10 @@ The application can deploy either **YOLO** (yolov8-yolo11n) or **Florence v2** m
The workflow is:
1. The camera rotates (pan/tilt) and zooms in pre-determined or incremental steps to scan the environment.
2. Live frames are captured and processed by the selected AI model (YOLO or Florence v2).
3. If an object of interest is detected with sufficient confidence, the system automatically adjusts the PTZ camera to center and maximize the object in the frame.
3. If an object of interest is detected with sufficient confidence, the system automatically adjusts the PTZ camera to center and maximize the object in the frame.
i. After centering/zooming on plants, the app can optionally perform species identification using the PlantNet API.
ii. It captures several bracketed snapshots at different zooms, ranks them by sharpness (variance of Laplacian), and submits the sharpest image to PlantNet.
iii. Results (top candidate and optional top-K) are published as Waggle telemetry, with an optional alert if the species matches a monitored list (e.g., invasive/rare)
4. A picture is taken and sent to the cloud infrastructure for further processing, archiving, or real-time alerts.

By pushing this AI capability to the edge, the system operates continuously with minimal latency and reduced bandwidth usage—uploading only relevant snapshots rather than a constant video feed.
Expand All @@ -24,6 +27,23 @@ By pushing this AI capability to the edge, the system operates continuously with
- Can detect virtually any object when used with the wildcard (`*`) parameter
- Operates in `<OD>` task mode for general object detection
- More resource-intensive but provides greater detection flexibility
- Can optionally caption the scene to build context for detection
- Used in <OD> (object detection) mode for general objects, then branches to PlantNet when labels resemble plants

### Image Quality & Focus
- Sharpness metric: variance of Laplacian (higher = sharper).
- Telemetry: publishes blur as ptz.image.blur with blur_var_laplacian.
- Refocus gate: if blur < BLUR_MIN (env), a short focus pulse is attempted before retrying.
- Settle delays: short sleeps after pan/tilt/zoom to allow AF/exposure to stabilize.

### Telemetry Topics (Waggle)
- ptz.detection.p{pan}t{tilt}z{zoom} — label, confidence, bbox, PTZ pose, timestamp
- ptz.scene.caption — optional Florence scene caption used as context
- ptz.image.blur — blur metric for the chosen PlantNet frame
- ptz.plantnet.candidates — (debug) top-K candidates with scores
- ptz.plantnet.species — final published species (gated by SPECIES_MIN_SCORE)
- ptz.plantnet.score — convenience score metric
- ptz.alert.{type} — alert on invasive/rare species (if configured)

# Arguments
The application supports the following command-line arguments:
Expand Down Expand Up @@ -53,6 +73,18 @@ The application supports the following command-line arguments:
Keep collected images in persistent folder for later use (Default: False)
- **`--debug`**
Enable debug level logging (Default: False)
- **`--prompt_prefix`**
Optional text prefix to add context for Florence prompts (empty = auto caption).
- **`--species_zoom`**
Extra relative zoom step used for species detail (Default: 10).

### Environment Variables
- **`PLANTNET_API_KEY`**
Required for PlantNet API calls.
- **`BLUR_MIN`**
Laplacian variance threshold to trigger a focus retry (default ~120).
- **`SPECIES_MIN_SCORE`**
Minimum PlantNet confidence to treat as “confident” (eg 0.25).

## Example Usage

Expand All @@ -71,6 +103,14 @@ python main.py -it 5 -obj "person,car,dog" -un admin -pw secret -ip 192.168.1.10
python main.py -it 5 -obj "*" -un username -pw 'password' -ip 130.202.23.92 -m Florence-base -conf 0.15
```

### Using Florence for Plant species detection
```bash
PLANTNET_API_KEY=... BLUR_MIN=70 SPECIES_MIN_SCORE=0.25 \
python main.py \
-it 3 -obj "plant,tree" -un camera -pw 'secret' -ip 192.168.1.100 \
-m Florence-base --species_zoom 10 --iterdelay 0 --debug
```

# Ontology
The interesting images collected by the system are tagged with metadata for easy retrieval and analysis. This includes:

Expand All @@ -80,4 +120,12 @@ The interesting images collected by the system are tagged with metadata for easy
- Camera position (pan, tilt, zoom)
- Location data (if available)

In addition to existing fields, images and messages may include:
- Species (scientific)
- Common names
- PlantNet score
- Blur sharpness (blur_var_laplacian)
- Candidates (top-K species + scores, debug)
- These fields enable downstream filtering by species, quality scoring, and confidence-based triage.

This metadata enables systematic analysis of object presence, movement patterns, and temporal dynamics in the monitored environment.
Binary file modified ecr-meta/ecr-science-image.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading