Skip to content

feat: add HTML to markdown conversion for http_request tool #63

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

mkmeral
Copy link
Contributor

@mkmeral mkmeral commented May 30, 2025

Description

This PR enhances the http_request tool with HTML to markdown conversion capabilities, making web content more readable and suitable for AI processing.

Key Features:

  • New Parameter: convert_to_markdown boolean parameter to enable conversion
  • Smart Detection: Automatically detects HTML content by checking Content-Type headers and document structure
  • Clean Conversion: Uses readabilipy to extract main content and markdownify to convert to clean markdown
  • Graceful Fallback: Returns original content if conversion fails
  • User Feedback: Shows success notification when conversion occurs

Use Cases:

  • Scraping articles and blog posts for better readability
  • Converting HTML documentation to markdown format
  • Processing web content for AI analysis
  • Creating clean text versions of web pages

Example Usage:

# Convert HTML webpage to markdown
response = agent.tool.http_request(
    method="GET",
    url="https://example.com/article",
    convert_to_markdown=True
)

Related Issues

N/A

Documentation PR

N/A - Documentation updated in this PR

Type of Change

  • Bug fix
  • New Tool
  • Enhancement to existing tool
  • Breaking change
  • Other (please describe):

Testing

Automated Testing:

  • hatch fmt --linter
  • hatch fmt --formatter
  • hatch test --all ✅ (540 passed, 5 skipped)

Test Coverage:

  • Added unit tests for HTML conversion functionality
  • Manually tested, Claude's comment on the markdown and HTML content:
## Results Summary:

**First request (without markdown conversion):**
- Retrieved the raw HTML content of the blog post
- Shows the complete HTML structure with all tags, CSS, JavaScript, and metadata
- Content is in its original HTML format with full page structure

**Second request (with markdown conversion):**
- Retrieved the same content but converted to clean, readable markdown format
- Stripped out all the HTML boilerplate, navigation, headers, footers, and styling
- Focused only on the main article content in an easy-to-read markdown format

Checklist

  • I have read the CONTRIBUTING document

  • I have added tests that prove my fix is effective or my feature works

  • I have updated the documentation accordingly

  • I have added an appropriate example to the documentation to outline the feature

  • My changes generate no new warnings

  • Any dependent changes have been merged and published

  • By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

- Add markdownify and readabilipy dependencies
- Add convert_to_markdown parameter to http_request tool
- Automatically detect and convert HTML responses to markdown
- Add tests and documentation with usage examples
@mkmeral mkmeral requested a review from a team as a code owner May 30, 2025 22:23
Copy link
Member

@awsarron awsarron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome PR, thank you @mkmeral!

One small comment on the test coverage and then I think this is good to merge.

@awsarron awsarron enabled auto-merge (squash) July 14, 2025 22:08
@awsarron awsarron merged commit 3390e56 into strands-agents:main Jul 14, 2025
14 checks passed
cagataycali added a commit that referenced this pull request Jul 21, 2025
* test(load_tool): add e2e test for load tool

* test(load_tool): update providing tool path

* Update calculator tests to not use .invoke as 0.3.0 removed it (#132)

Co-authored-by: Mackenzie Zastrow <[email protected]>

* fix(a2a): prefix tools and updated default timeout (#134)

Co-authored-by: jer <[email protected]>

* feat: add HTML to markdown conversion for http_request tool (#63)

* feat: add HTML to markdown conversion for http_request tool

- Add markdownify and readabilipy dependencies
- Add convert_to_markdown parameter to http_request tool
- Automatically detect and convert HTML responses to markdown
- Add tests and documentation with usage examples

---------

Co-authored-by: Murat Kaan Meral <[email protected]>

* feat: Add handoff to user tool (#142)

* remove references to max_parallel_tools (#144)

* Chore: Remove Preview (#141)

Chore: update contributions doc to reflect community driven status

* Chore: bump to require v1.0.0 (#145)

* feat: migrate multi-agent tools to sdk native patterns and model support (#143)

* feat: Major refactor - Migrate to Strands SDK native patterns & add comprehensive model support

🚀 Overview:
- Migrate custom implementations to native Strands SDK patterns
- Add comprehensive model provider support (9+ providers)
- Enhance reliability, performance, and user experience
- Maintain backward compatibility

🔄 Key Changes:
- agent_graph.py → graph.py (SDK GraphBuilder integration)
- use_llm.py → use_agent.py (enhanced nested agent creation)
- Complete swarm tool architecture overhaul
- Enhanced think tool with model switching
- Production-ready workflow orchestration
- New model utilities system in src/strands_tools/models/

📊 Impact:
- Files Changed: 25+ files
- Lines Added: ~2,000 lines
- Lines Removed: ~1,100 lines
- New model providers: Bedrock, Anthropic, LiteLLM, LlamaAPI, Ollama, OpenAI, Writer, Cohere, GitHub
- 700+ test cases covering new functionality

✅ Backward compatibility maintained with gradual migration path

* fix(deps): convert SSH git URLs to HTTPS for GitHub runner compatibility

- Changed git+ssh://[email protected] URLs to git+https://github.com URLs
- Fixes git clone failures in CI/CD environments that lack SSH key access
- Updated both main dependencies and hatch-static-analysis dependencies

* feat: restore deprecated tools for backward compatibility and update deps

- Add use_llm and agent_graph tools back with deprecation warnings
- These tools will be removed in next major release
- Move Stability AI documentation to docs/stability_ai_tool.md
- Update pyproject.toml dependencies to use proper version ranges
- Add comprehensive test suites for both deprecated tools
- Maintain backward compatibility while guiding users to new APIs

* build: replace git dependency with version constraint for strands-agents

* refactor(models): move models to utils/models directory

- Move all model files from strands_tools/models/ to strands_tools/utils/models/
- Update import statements in graph.py, think.py, use_agent.py, and workflow.py
- Reorganize codebase structure for better organization

* fix(deps): remove duplicate strands-agents dependency

- Removed duplicate strands-agents[a2a] dependency from pyproject.toml
- Keeps only the base strands-agents dependency

---------

Co-authored-by: Arron <[email protected]>

* build(pyproject): update development status classifier (#146)

* Add AgentCoreMemoryToolProvider for AgentCore Memory Service Integration (#147)

* feat: add Bedrock AgentCore Code Interpreter tool (#148)

* feat: add Bedrock AgentCore Browser Tool (#149)

* Add community-driven call out to the top of the file (#151)

We do the same further down, but you have to jump past the list of tools + environment variables to see it, so making it more prominent towards the top of the readme.

Co-authored-by: Mackenzie Zastrow <[email protected]>

* test(load_tool): add e2e test for load tool

* test(load_tool): update providing tool path

* test(load_tool): fix naming consistency

---------

Co-authored-by: Jack Yuan <[email protected]>
Co-authored-by: Mackenzie Zastrow <[email protected]>
Co-authored-by: Mackenzie Zastrow <[email protected]>
Co-authored-by: Jeremiah <[email protected]>
Co-authored-by: jer <[email protected]>
Co-authored-by: Murat Kaan Meral <[email protected]>
Co-authored-by: Murat Kaan Meral <[email protected]>
Co-authored-by: Patrick Gray <[email protected]>
Co-authored-by: Jonathan Segev <[email protected]>
Co-authored-by: ./c² <[email protected]>
Co-authored-by: Arron <[email protected]>
Co-authored-by: mehtarac <[email protected]>
Co-authored-by: Dean Schmigelski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants