[BUG]: Spending limit detection is too loose

### Describe the bug

The inclusion of the word "reset" in the spending limit check can easily trigger on keywords included in normal LLM pentest responses (e.g. "password reset"). This should be removed.

The other checks could also conceivably trigger, so there should be a flag to disable the pattern-matching spending guard when the user expects the test might trigger it incorrectly

### Steps to reproduce

Run the test against something with a reset functionality. I'm genuinely surprised this didn't trigger against fruit shop since it has a password reset issue.

### Expected behaviour

General words often seen in pentests, such as "reset" are not included in the billing pattern matching. Furthermore, users have some way to disabling the blind pattern matching.

### Actual behaviour

See expected behavior (sorry this is a very small issue)

### Pre-submission checklist (required)

- [x] I have searched the existing open issues and confirmed this bug has not already been reported.
- [x] I am running the latest released version of `shannon`.

### If applicable

- [ ] I have included relevant error messages, stack traces, or failure details.
- [ ] I have checked the audit logs and pasted the relevant errors.
- [ ] I have inspected the failed Temporal workflow run and included the failure reason.
- [ ] I have included clear steps to reproduce the issue.
- [ ] I have redacted any sensitive information (tokens, URLs, repo names).

### Debugging details

_No response_

### Screenshots

_No response_

### Authentication method used

CLAUDE_CODE_OAUTH_TOKEN

### Full ./shannon command with all flags used (with redactions)

./shannon start -u http://host.docker.internal:8000 -r ../juice-shop

### Are you using any experimental models or providers other than default Anthropic models?

No

### If Yes, which one (model/provider)?

_No response_

### OS (with version)

macOS 26.3.1

### Docker version ('docker -v')

Docker version 29.2.1, build a5c7197

### Additional context

Because the Claude SDK does not provide support for custom Bedrock providers (only AWS) and my use case requires a custom (likely proxied) AWS provider URL, I had to create and adapter to adapt the Bedrock API to the Claude API format and use the custom Claude API provider instead. This is almost certainly not causing the issue since it's a simple issue with pattern matching on response text, but I figured it's worth mentioning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Spending limit detection is too loose #263

Describe the bug

Steps to reproduce

Expected behaviour

Actual behaviour

Pre-submission checklist (required)

If applicable

Debugging details

Screenshots

Authentication method used

Full ./shannon command with all flags used (with redactions)

Are you using any experimental models or providers other than default Anthropic models?

If Yes, which one (model/provider)?

OS (with version)

Docker version ('docker -v')

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]: Spending limit detection is too loose #263

Description

Describe the bug

Steps to reproduce

Expected behaviour

Actual behaviour

Pre-submission checklist (required)

If applicable

Debugging details

Screenshots

Authentication method used

Full ./shannon command with all flags used (with redactions)

Are you using any experimental models or providers other than default Anthropic models?

If Yes, which one (model/provider)?

OS (with version)

Docker version ('docker -v')

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions