feat: add HTML to markdown conversion for http_request tool #63

mkmeral · 2025-05-30T22:23:02Z

Description

This PR enhances the http_request tool with HTML to markdown conversion capabilities, making web content more readable and suitable for AI processing.

Key Features:

New Parameter: convert_to_markdown boolean parameter to enable conversion
Smart Detection: Automatically detects HTML content by checking Content-Type headers and document structure
Clean Conversion: Uses readabilipy to extract main content and markdownify to convert to clean markdown
Graceful Fallback: Returns original content if conversion fails
User Feedback: Shows success notification when conversion occurs

Use Cases:

Scraping articles and blog posts for better readability
Converting HTML documentation to markdown format
Processing web content for AI analysis
Creating clean text versions of web pages

Example Usage:

# Convert HTML webpage to markdown
response = agent.tool.http_request(
    method="GET",
    url="https://example.com/article",
    convert_to_markdown=True
)

Related Issues

N/A

Documentation PR

N/A - Documentation updated in this PR

Type of Change

Testing

Automated Testing:

hatch fmt --linter ✅
hatch fmt --formatter ✅
hatch test --all ✅ (540 passed, 5 skipped)

Test Coverage:

Added unit tests for HTML conversion functionality
Manually tested, Claude's comment on the markdown and HTML content:

## Results Summary:

**First request (without markdown conversion):**
- Retrieved the raw HTML content of the blog post
- Shows the complete HTML structure with all tags, CSS, JavaScript, and metadata
- Content is in its original HTML format with full page structure

**Second request (with markdown conversion):**
- Retrieved the same content but converted to clean, readable markdown format
- Stripped out all the HTML boilerplate, navigation, headers, footers, and styling
- Focused only on the main article content in an easy-to-read markdown format

Checklist

I have read the CONTRIBUTING document
I have added tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature
My changes generate no new warnings
Any dependent changes have been merged and published
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

- Add markdownify and readabilipy dependencies - Add convert_to_markdown parameter to http_request tool - Automatically detect and convert HTML responses to markdown - Add tests and documentation with usage examples

mkmeral · 2025-05-31T09:33:53Z

See example trace https://cloud.langfuse.com/project/cmb3mu0el0017ad075dgtxwip/traces/9aa133fb1499319013ee4b9043d27932?timestamp=2025-05-30T22%3A17%3A23.772Z&display=details&observation=c08cd5ef10e23b1a

awsarron

Awesome PR, thank you @mkmeral!

One small comment on the test coverage and then I think this is good to merge.

tests/test_http_request.py

* test(load_tool): add e2e test for load tool * test(load_tool): update providing tool path * Update calculator tests to not use .invoke as 0.3.0 removed it (#132) Co-authored-by: Mackenzie Zastrow <[email protected]> * fix(a2a): prefix tools and updated default timeout (#134) Co-authored-by: jer <[email protected]> * feat: add HTML to markdown conversion for http_request tool (#63) * feat: add HTML to markdown conversion for http_request tool - Add markdownify and readabilipy dependencies - Add convert_to_markdown parameter to http_request tool - Automatically detect and convert HTML responses to markdown - Add tests and documentation with usage examples --------- Co-authored-by: Murat Kaan Meral <[email protected]> * feat: Add handoff to user tool (#142) * remove references to max_parallel_tools (#144) * Chore: Remove Preview (#141) Chore: update contributions doc to reflect community driven status * Chore: bump to require v1.0.0 (#145) * feat: migrate multi-agent tools to sdk native patterns and model support (#143) * feat: Major refactor - Migrate to Strands SDK native patterns & add comprehensive model support 🚀 Overview: - Migrate custom implementations to native Strands SDK patterns - Add comprehensive model provider support (9+ providers) - Enhance reliability, performance, and user experience - Maintain backward compatibility 🔄 Key Changes: - agent_graph.py → graph.py (SDK GraphBuilder integration) - use_llm.py → use_agent.py (enhanced nested agent creation) - Complete swarm tool architecture overhaul - Enhanced think tool with model switching - Production-ready workflow orchestration - New model utilities system in src/strands_tools/models/ 📊 Impact: - Files Changed: 25+ files - Lines Added: ~2,000 lines - Lines Removed: ~1,100 lines - New model providers: Bedrock, Anthropic, LiteLLM, LlamaAPI, Ollama, OpenAI, Writer, Cohere, GitHub - 700+ test cases covering new functionality ✅ Backward compatibility maintained with gradual migration path * fix(deps): convert SSH git URLs to HTTPS for GitHub runner compatibility - Changed git+ssh://[email protected] URLs to git+https://github.com URLs - Fixes git clone failures in CI/CD environments that lack SSH key access - Updated both main dependencies and hatch-static-analysis dependencies * feat: restore deprecated tools for backward compatibility and update deps - Add use_llm and agent_graph tools back with deprecation warnings - These tools will be removed in next major release - Move Stability AI documentation to docs/stability_ai_tool.md - Update pyproject.toml dependencies to use proper version ranges - Add comprehensive test suites for both deprecated tools - Maintain backward compatibility while guiding users to new APIs * build: replace git dependency with version constraint for strands-agents * refactor(models): move models to utils/models directory - Move all model files from strands_tools/models/ to strands_tools/utils/models/ - Update import statements in graph.py, think.py, use_agent.py, and workflow.py - Reorganize codebase structure for better organization * fix(deps): remove duplicate strands-agents dependency - Removed duplicate strands-agents[a2a] dependency from pyproject.toml - Keeps only the base strands-agents dependency --------- Co-authored-by: Arron <[email protected]> * build(pyproject): update development status classifier (#146) * Add AgentCoreMemoryToolProvider for AgentCore Memory Service Integration (#147) * feat: add Bedrock AgentCore Code Interpreter tool (#148) * feat: add Bedrock AgentCore Browser Tool (#149) * Add community-driven call out to the top of the file (#151) We do the same further down, but you have to jump past the list of tools + environment variables to see it, so making it more prominent towards the top of the readme. Co-authored-by: Mackenzie Zastrow <[email protected]> * test(load_tool): add e2e test for load tool * test(load_tool): update providing tool path * test(load_tool): fix naming consistency --------- Co-authored-by: Jack Yuan <[email protected]> Co-authored-by: Mackenzie Zastrow <[email protected]> Co-authored-by: Mackenzie Zastrow <[email protected]> Co-authored-by: Jeremiah <[email protected]> Co-authored-by: jer <[email protected]> Co-authored-by: Murat Kaan Meral <[email protected]> Co-authored-by: Murat Kaan Meral <[email protected]> Co-authored-by: Patrick Gray <[email protected]> Co-authored-by: Jonathan Segev <[email protected]> Co-authored-by: ./c² <[email protected]> Co-authored-by: Arron <[email protected]> Co-authored-by: mehtarac <[email protected]> Co-authored-by: Dean Schmigelski <[email protected]>

feat: add HTML to markdown conversion for http_request tool

0c640f9

- Add markdownify and readabilipy dependencies - Add convert_to_markdown parameter to http_request tool - Automatically detect and convert HTML responses to markdown - Add tests and documentation with usage examples

mkmeral requested a review from a team as a code owner May 30, 2025 22:23

awsarron requested changes Jun 10, 2025

View reviewed changes

tests/test_http_request.py Outdated Show resolved Hide resolved

Murat Kaan Meral added 2 commits June 10, 2025 14:20

Merge branch 'main' into feature/html-to-markdown-conversion

fedc5c6

fix: extend test coverage for markdownify in http request tool

095b39c

awsarron approved these changes Jul 14, 2025

View reviewed changes

awsarron enabled auto-merge (squash) July 14, 2025 22:08

Merge branch 'main' into feature/html-to-markdown-conversion

18adaf2

awsarron merged commit 3390e56 into strands-agents:main Jul 14, 2025
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add HTML to markdown conversion for http_request tool #63

feat: add HTML to markdown conversion for http_request tool #63

Uh oh!

mkmeral commented May 30, 2025

Uh oh!

mkmeral commented May 31, 2025

Uh oh!

awsarron left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat: add HTML to markdown conversion for http_request tool #63

feat: add HTML to markdown conversion for http_request tool #63

Uh oh!

Conversation

mkmeral commented May 30, 2025

Description

Related Issues

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

mkmeral commented May 31, 2025

Uh oh!

awsarron left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!