diff --git a/docs.json b/docs.json index 5f7330395..f8f8ad081 100644 --- a/docs.json +++ b/docs.json @@ -1957,6 +1957,7 @@ "tools/toolkits/web-scrape/jina-reader", "tools/toolkits/web-scrape/newspaper", "tools/toolkits/web-scrape/newspaper4k", + "tools/toolkits/web-scrape/olostep", "tools/toolkits/web-scrape/spider", "tools/toolkits/web-scrape/trafilatura", "tools/toolkits/web-scrape/website", diff --git a/tools/toolkits/web-scrape/olostep.mdx b/tools/toolkits/web-scrape/olostep.mdx new file mode 100644 index 000000000..bf7c10a08 --- /dev/null +++ b/tools/toolkits/web-scrape/olostep.mdx @@ -0,0 +1,63 @@ +--- +title: Olostep +description: Use Olostep with Agno to scrape, crawl, search and get AI-powered answers from the web. +--- + +## Overview + +OlostepTools enable an Agent to scrape websites, crawl entire sites, discover URLs, run web searches, and get AI-powered answers grounded in live data using the [Olostep](https://www.olostep.com) web data API. + +## Prerequisites + +The following example requires the `olostep` library and an API key which can be obtained from [Olostep](https://www.olostep.com/dashboard/api-keys). +```shell +pip install -U olostep +``` + +## Example + +The following agent will scrape the content from `https://docs.olostep.com/get-started/welcome` and return a summary: +```python +from agno.agent import Agent +from agno.models.openai import OpenAIChat +from agno.tools.olostep import OlostepTools + +agent = Agent( + model=OpenAIChat(id="gpt-4o-mini"), + tools=[OlostepTools(scrape_url=True)], + markdown=True, +) +agent.print_response( + "Summarize the key features at https://docs.olostep.com/get-started/welcome" +) +``` + +## Toolkit Params + +| Parameter | Type | Default | Description | +| ---------------- | ---- | ------- | ------------------------------------------------------------------ | +| api_key | str | None | Olostep API key. Falls back to OLOSTEP_API_KEY env var. | +| scrape_url | bool | True | Enable single URL scraping. | +| crawl_website | bool | False | Enable website crawling. | +| map_website | bool | False | Enable URL discovery / site mapping. | +| search_web | bool | False | Enable web search returning structured links. | +| answer_question | bool | False | Enable AI-powered answers grounded in live web data. | +| batch_scrape | bool | False | Enable concurrent batch scraping of multiple URLs. | +| all_tools | bool | False | Enable all tools at once. | + +## Toolkit Functions + +| Function | Description | +| ---------------- | ---------------------------------------------------------------------------------------------------------------------- | +| scrape_url | Scrape a single URL and return its content as markdown, html, text, or structured JSON. Supports parsers and LLM extraction. | +| crawl_website | Recursively crawl a website starting from a URL. Supports URL glob filtering, depth limits, and relevance search. | +| map_website | Discover all URLs on a website from sitemaps and discovered links. | +| search_web | Search the web with a natural language query and return ranked links with titles and descriptions. | +| answer_question | Search the web and return an AI-synthesized answer grounded in live data with source citations. Supports structured JSON output. | +| batch_scrape | Scrape multiple URLs concurrently in a single batch job. Up to 10,000 URLs, completes in ~5–8 minutes. | + +## Developer Resources + +- [Olostep Website](https://www.olostep.com) +- [Olostep Documentation](https://docs.olostep.com) +- [Python SDK](https://docs.olostep.com/sdks/python)