feat: Add domain filtering to web search tools #1873
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Adds comprehensive domain filtering capability for all web search tools in smolagents, allowing users to control which websites can appear in search results through blocklists and allowlists.
Problem Solved
Agents using web search tools could inadvertently access or return results from:
Solution
Features
Blocklist - Exclude specific domains (spam, malicious, tracking)
Allowlist - Restrict to trusted sources only (.edu, .gov)
Wildcard patterns -
*.edu,*.ads.*,tracker.*Automatic subdomain handling - Blocking
example.comalso blocks subdomainsCombined filtering - Use allowlist with blocklist refinements
Case-insensitive - Domain matching is case-insensitive
Backward compatible - Optional parameters, no breaking changes
Implementation
Core Components:
DomainFilterutility class (src/smolagents/domain_filter.py)DuckDuckGoSearchToolWebSearchToolApiWebSearchTool(Brave Search)GoogleSearchToolAPI Changes:
blocked_domainsparameter to all search toolsallowed_domainsparameter to all search toolsUsage Examples
Testing
37 tests passing (100%)
DomainFilterclassCode quality checks passing
make qualitymake styleUse Cases
Files Changed
domain_filter.py(NEW) — Core filtering utility__init__.py— ExportDomainFilterdefault_tools.py— Integrated filtering into search toolstest_domain_filter.py(NEW) — Unit teststest_search_tools_domain_filtering.py(NEW) — Integration testsdomain_filtering.py(NEW) — Example script with 6 use casesChecklist
Related Issue
Addresses the problem of agents accessing undesirable websites during web searches, improving security, quality, and compliance.
closes #1857