[Bug]: find_text returns false on low-contrast UI text that read_text_in_region transcribes cleanly — confidence threshold vs source profile

<div align="center">

[![Severity](https://img.shields.io/badge/Severity-Architectural-d73a49?style=for-the-badge)](.)
[![Surface](https://img.shields.io/badge/Surface-OCR_API-blue?style=for-the-badge)](.)
[![Reproducible](https://img.shields.io/badge/Reproducible-100%25-success?style=for-the-badge)](.)
[![Calls_for](https://img.shields.io/badge/Calls_for-Discussion-orange?style=for-the-badge)](.)

</div>

### Summary

`Region.findText()` returns `null` on low-contrast UI text (e.g. grey-on-white inactive tab labels) that `Region.text()` transcribes correctly. Both paths share the same `TextRecognizer.optimize()` pre-processing, but they diverge at the Tess4J call: `getWords()` filters by confidence to avoid false positives — and the legitimately-low confidence on faint glyphs gets rejected — while `doOCR()` keeps everything in the transcript. It is not, strictly, a bug. It is a precision-vs-recall tradeoff that calls for an architectural refinement, not a patch.

### Steps to reproduce

```markdown
1. Open https://coinmarketcap.com/ (or any modern UI with grey-on-white inactive tabs)
2. From a Jython script, run:
   region = Region(960, 0, 960, 600)
   m  = region.findText("Trending")    # → null
   tx = region.text()                  # → contains "Trending" in the transcript
3. Observe: the tab is clearly visible to a human and to read_text_in_region's transcription, but find_text refuses to localise it.
```

### Expected behavior

`Region.findText("Trending")` should return a `Match` with bounding box coordinates for any text a human can read in the region — including low-contrast glyphs that `Region.text()` already transcribes correctly.

### Actual behavior

`findText` returns `null`. The mismatch lives strictly between two Tess4J entry points called downstream of the identical `optimize()`:

| | `Region.text()` | `Region.findText()` |
|---|---|---|
| Inner path | `OCR.readText()` | `Finder.doFindText()` → `OCR.readLines()` |
| Tess4J call | `doOCR()` — full transcript | `getWords()` — bbox + per-word confidence |
| Confidence filter | none, keeps everything | rejects below internal threshold |
| PSM | `SINGLE_BLOCK` (6) forced | `AUTO` (3) default |

The grey-on-white "Trending" glyph clears Tesseract LSTM's character recognition (it appears in the transcript), but its computed per-word confidence is in the 40–50 % range. `getWords()` discards it as a precaution against returning bounding boxes you would not want to click on — which is the right default for visual automation, where a false positive at the wrong coordinates costs more than a false negative.

### Minimal reproducer (script)

```python
from sikuli import *

# Open any web page with low-contrast tab labels (CoinMarketCap, Yahoo Finance, …).
# Pick a region containing such a tab. "Trending" on coinmarketcap.com is a perfect specimen.
region = Region(960, 0, 960, 600)

# Path 1 — findText (Finder.doFindText → OCR.readLines → Tess4J getWords)
match = region.findText("Trending")
print "findText:", match  # → null

# Path 2 — text (OCR.readText → Tess4J doOCR)
full = region.text()
print "text():", full     # → "Top  Trending  Watchlist  Stocks  Most Visited  ..."

# The word IS read by Tesseract. The word is NOT returned by findText.
# Both paths run through the same TextRecognizer.optimize() preprocessing.
# The split is at the Tess4J entry: getWords (confidence-filtered) vs doOCR (transcript).
```

### Operating system

Windows 10 (reproduced on my side — but the bug lives in `Finder.doFindText()` / `getWords()` confidence filter, which is platform-independent Java + Tesseract code, so reproduction on SUSE, Ubuntu, macOS is expected; field confirmation welcome)

### Java version

openjdk 25.0.2 2026-04-15

### OculiX version / artifact

oculixide-3.0.4-complete-win.jar (built from `claude/i18n-phase3`)

### Where does the bug happen?

API (Screen / Region / Pattern / Match), PaddleOCR / Tesseract

### Logs / console output

```shell
# MCP find_text on the same region
> oculix_find_text("Trending", region={x:960,y:0,width:960,height:600})
{"found":false,"engine":"sikulix-region"}

# MCP read_text_in_region on a serrer scope
> oculix_read_text_in_region(region={x:960,y:240,width:960,height:60})
{"engine":"sikulix-region","text":"Top  Trending Watchlist Stocks Most Visited New  Gainers @ Rehypo >","lines":["Top  Trending Watchlist Stocks Most Visited New  Gainers @ Rehypo >"]}

# Cross-checked with Opus 4.8 on the same screen, same region, same workspace:
# identical "found:false" from find_text, identical successful transcript from read_text_in_region.
# Reproducibility is independent of the model wrapping the calls.
```

### Additional context

#### The steppe analogy first, the code after

Asking `find_text` to localise "Trending" rendered in light grey on a white background is like asking the shepherd of the **OculiX Pastoral Computing Suite™** to send his dog Aikash after **Yak #42** in the steppe fog, three days of horseback from the WiFi router. The **YAK MOOD PREDICTOR** is 47 % confident *"something bovine is grazing over there"*. The **CHAMOIS DETECTION DRONE** returns *"biological presence detected, ungulate profile likely"*. But the shepherd refuses to release Aikash until **SUSPICIOUS YAK ACTIVITY** confirms the identity above 95 %. And he is right: sending Aikash to bite a parasite yak — possibly Yak #41 from neighbour Erlan, or worse, a dairy cow that wandered into the pasture by mistake — is a **casus belli on the steppe**, not a Ctrl+Z away. `read_text_in_region`, in this analogy, is only the observation drone: it produces the raw transcript *"three ruminants down there, one likely yak, maybe two"*. Priceless for writing the daily report. Unusable for engaging.

#### Why this is not strictly a bug — and why an architectural refinement is the right answer anyway

The current behaviour is the correct default for the original SikuliX design assumption: a single OCR engine serving the visual-automation use case where the cost of a false positive (clicking at the wrong coordinates of a phantom word) is higher than the cost of a false negative (returning null and letting the caller retry or fall back). `getWords()`'s confidence filter is the implementation of that principle.

The opportunity surfaces when you observe that **OculiX already knows the source of every image at the entry point** — `ScreenCapture`, `ADBScreen`, `VNCScreen` are first-class types — and that the right preprocessing chain depends entirely on the source profile, not on a single one-size-fits-all `optimize()`.

#### A draft chain dispatched by source profile

```
ScreenCapture                  (lossless, uniform illumination by construction)
  → Core.normalize(NORM_MINMAX)       (global level stretch → "Trending" grey 150 becomes a dark enough grey)
  → light unsharp
  → Tesseract
  // No CLAHE. No denoise. CLAHE corrects spatial illumination variation,
  // which a rendered UI does not have — it would solve a problem that
  // doesn't exist and pay in tile artefacts.

ADBScreen / VNCScreen          (lossy compression by construction)
  → Imgproc.medianBlur(3)               (kills JPEG ringing/blocking cheaply, edge-preserving)
  → CLAHE(clipLimit, tilesX/tilesY derived from region.h and expectedTextHeight)
  → unsharp
  → Tesseract
  // CLAHE earns its keep here: real local variance from compression artefacts.
  // The denoise comes FIRST, before any amplification step — you never
  // amplify then clean.

Future camera / mobile capture (real spatial illumination variation)
  → bilateralFilter (or fastNlMeansDenoising)
  → CLAHE
  → unsharp
  → Tesseract

Raw Mat with no provenance     (marginal case: Image.load(file))
  → blockiness metric on 8-pixel DCT grid  (cheaper than a full FFT,
                                             targeted at JPEG signature)
  → fallback on the ADB chain or the Screen chain depending on signature
```

The point that goes beyond a per-filter improvement: **source-profile dispatch picks the entire chain, not a filter within a single chain**. On the most common case — `ScreenCapture` of a rendered UI — CLAHE drops out of the path entirely, all its tuning evaporates with it, and the "Trending" issue is resolved by a one-line `Core.normalize()` with no tile artefacts and no parameters.

#### Public API: zero change

`Region.findText()`, `Region.text()`, `OCR.readText()`, `OCR.readLines()` keep their signatures. The dispatch happens in the `TextRecognizer` constructor, deciding the chain based on the concrete type of the source passed in. No migration for existing user scripts.

#### Three points open to discussion before I start

1. **Where does `SourceProfile` live?** Internal enum dispatched implicitly from the source's concrete type is the cleanest. But it makes it impossible for a user to override the chain on a stubborn case (someone who loads an Android screenshot as a generic `Image` and would benefit from the ADB chain). An optional explicit `OCR.Options.sourceProfile(SourceProfile)` opt-out preserves the override ability without poisoning the default. Worth the surface ?

2. **Does `ADBScreen` really need its own chain?** Argument for: the ADB pipeline is lossy by default (JPEG mode for performance). Argument against: a user can configure ADB in lossless PNG mode, in which case the ADB chain over-processes. Detection per-frame is expensive (see point 5 of the perf discussion below). Explicit flag on `ADBScreen` constructor, or autodetection from the first frame ?

3. **Is `medianBlur(3)` always the right pre-CLAHE step on a lossy source?** It excels on JPEG ringing/blocking. But certain VNC encoders (Tight ZRLE, Hextile RRE) produce different artefacts. I'd rather a fast median than a costly bilateral by default, but a cross-encoder mini-bench feels warranted before locking the chain.

#### A few opinions I've already cycled through (and discarded), for context

- **A global confidence threshold knob exposed on `OCR.Options`** — solves the symptom by lowering the bar globally, lets false positives back in. Patches the threshold, not the cause.
- **Auto-retry with stepped confidence (70 → 50 → 30)** — three Tesseract calls instead of one, masks the cause in latency. Wrong layer for the fix.
- **Injecting the searched word into Tesseract's `user_words` dictionary** — elegant in theory ("boost the confidence on the word I know I'm looking for"), but rebuilds the Tesseract config per call and is incompatible with `findAllText` use cases where the caller doesn't have a specific target.
- **A CLAHE pre-processing added unconditionally in `optimize()`** — my first proposal. Solves a problem that doesn't exist on the most frequent case, pays in tile artefacts, over-processes screenshots. Wrong tool by default.

The source-profile dispatched chain is the one shape that survives every counter-argument I've thrown at it.

#### Pinging the people whose hand on this matters most

@RaiMan — this is the kind of refinement that lives in direct lineage of the `optimize()` you originally designed. The "single engine, one preprocessing path" was the right call for the design pressures of the time. The question of whether to dispatch by source profile touches on your original architectural call about OCR being source-agnostic. I will not push anything without your read on it.

@adriancostin6 — if you want to benchmark in real conditions, I can rough out the four chains in parallel inside a measurement harness (Trending on web, ADB device capture, VNC remote desktop) and you supply the numbers. Cross-OS measurement with confidence intervals on Trending recall + clickable-coordinate accuracy is exactly your terrain, and would settle the open questions above more reliably than any opinion I have.

The gecko likes it when a bug, on careful look, turns out to be a refactor opportunity — which is rare, and which is what makes the craft interesting.

🦎


	`Region.text()`	`Region.findText()`
Inner path	`OCR.readText()`	`Finder.doFindText()` → `OCR.readLines()`
Tess4J call	`doOCR()` — full transcript	`getWords()` — bbox + per-word confidence
Confidence filter	none, keeps everything	rejects below internal threshold
PSM	`SINGLE_BLOCK` (6) forced	`AUTO` (3) default

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: find_text returns false on low-contrast UI text that read_text_in_region transcribes cleanly — confidence threshold vs source profile #404

Summary

Steps to reproduce

Expected behavior

Actual behavior

Minimal reproducer (script)

Operating system

Java version

OculiX version / artifact

Where does the bug happen?

Logs / console output

Additional context

The steppe analogy first, the code after

Why this is not strictly a bug — and why an architectural refinement is the right answer anyway

A draft chain dispatched by source profile

Public API: zero change

Three points open to discussion before I start

A few opinions I've already cycled through (and discarded), for context

Pinging the people whose hand on this matters most

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: find_text returns false on low-contrast UI text that read_text_in_region transcribes cleanly — confidence threshold vs source profile #404

Description

Summary

Steps to reproduce

Expected behavior

Actual behavior

Minimal reproducer (script)

Operating system

Java version

OculiX version / artifact

Where does the bug happen?

Logs / console output

Additional context

The steppe analogy first, the code after

Why this is not strictly a bug — and why an architectural refinement is the right answer anyway

A draft chain dispatched by source profile

Public API: zero change

Three points open to discussion before I start

A few opinions I've already cycled through (and discarded), for context

Pinging the people whose hand on this matters most

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions