Skip to content

Commit 6dbe4e1

Browse files
docs: rewrite README for SEO + AI use cases
Reframe the README around the primary audience — devs building LLM agents, RAG pipelines, and grounding flows — and the search queries they actually type. Adds runnable snippets for OpenAI-style tool calls, RAG ingestion, and prompt-time grounding; groups the 40+ APIs into a discoverable catalog table.
1 parent c9b7e00 commit 6dbe4e1

1 file changed

Lines changed: 155 additions & 48 deletions

File tree

README.md

Lines changed: 155 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -1,93 +1,200 @@
11
# hasdata-cli
22

3-
Command-line interface for [hasdata.com](https://hasdata.com).
3+
**The official command-line interface for [HasData](https://hasdata.com) — web scraping, SERP, and real-estate/e-commerce data APIs, wired for shell scripts, LLM agents, and RAG pipelines.**
44

5-
## Install
5+
[![CI](https://github.com/hasdata-com/hasdata-cli/actions/workflows/ci.yml/badge.svg)](https://github.com/hasdata-com/hasdata-cli/actions/workflows/ci.yml)
6+
[![Release](https://img.shields.io/github/v/release/hasdata-com/hasdata-cli?sort=semver)](https://github.com/hasdata-com/hasdata-cli/releases)
7+
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
8+
9+
One static binary. Every API at [`hasdata.com`](https://hasdata.com/apis) exposed as a subcommand. No SDK install, no dependencies, no glue code — `curl | sh`, export a key, pipe JSON to `jq` or straight into your LLM prompt.
610

7-
**macOS / Linux** (curl | sh, verifies checksum):
811
```sh
912
curl -sSL https://raw.githubusercontent.com/hasdata-com/hasdata-cli/main/install.sh | sh
13+
export HASDATA_API_KEY=hd_xxx
14+
hasdata google-serp --q "best espresso machine 2026"
1015
```
1116

12-
**Homebrew** (macOS / Linux):
13-
```sh
14-
brew install hasdata-com/tap/hasdata
15-
```
17+
---
1618

17-
**Scoop** (Windows):
18-
```powershell
19-
scoop bucket add hasdata https://github.com/hasdata-com/scoop-bucket
20-
scoop install hasdata
21-
```
19+
## Why a CLI?
2220

23-
**winget** (Windows):
24-
```powershell
25-
winget install hasdata-com.hasdata
26-
```
21+
- **Agents & tool use** — drop `hasdata <api>` into LangChain, LlamaIndex, CrewAI, or your own agent loop as a shell tool. Stable JSON in, stable JSON out.
22+
- **RAG ingestion** — stream fresh Google, Amazon, Zillow, and arbitrary web data into your vector store from a cron job or a `Makefile`, no backend required.
23+
- **Prompt-time grounding**`hasdata google-serp ... | jq .organic_results` ➜ into a system prompt to cut hallucinations on current events, product pricing, real-estate comps, reviews.
24+
- **Dataset building** — parallel GNU-`xargs` invocations produce JSONL for LLM fine-tuning or evals.
25+
- **Humans too** — one-off lookups from your terminal, full `--help` for every flag, tab-completion for every enum.
26+
27+
## Install
28+
29+
| Platform | Command |
30+
|---|---|
31+
| macOS / Linux | `curl -sSL https://raw.githubusercontent.com/hasdata-com/hasdata-cli/main/install.sh \| sh` |
32+
| Windows manual | download the `.zip` from [Releases](https://github.com/hasdata-com/hasdata-cli/releases), extract, put `hasdata.exe` on `%PATH%` |
33+
| From source | `go install github.com/hasdata-com/hasdata-cli@latest` |
2734

28-
**Manual**: download the archive matching your OS/arch from the [Releases](https://github.com/hasdata-com/hasdata-cli/releases) page, extract, place `hasdata` in your `PATH`.
35+
The `install.sh` script detects your OS/arch, downloads the matching asset, and verifies its SHA-256 against the published `checksums.txt` before installing.
2936

3037
## Configure
3138

3239
```sh
33-
export HASDATA_API_KEY=your_key_here
40+
export HASDATA_API_KEY=your_key # preferred for CI / containers / agents
3441
# or
35-
hasdata configure
42+
hasdata configure # writes ~/.hasdata/config.yaml (0600)
3643
```
3744

38-
Precedence: `--api-key` flag > `HASDATA_API_KEY` env var > `~/.hasdata/config.yaml`.
45+
Precedence: `--api-key` flag > `HASDATA_API_KEY` env > `~/.hasdata/config.yaml`. Get a key from the [HasData dashboard](https://hasdata.com).
3946

40-
## Use
47+
## First calls
4148

4249
```sh
43-
hasdata --help # all APIs grouped by category
44-
hasdata google-serp --help # flags for a specific API
45-
hasdata google-serp --q "coffee" --gl us --num 20
46-
hasdata web-scraping --url https://example.com --no-block-ads --extract-rules-json @rules.json
50+
# Google SERP — structured organic / ads / knowledge graph / PAA
51+
hasdata google-serp --q "langchain vs llamaindex" --gl us --pretty
52+
53+
# Render + scrape any URL (JS, proxies, markdown output, AI extraction)
54+
hasdata web-scraping \
55+
--url "https://news.ycombinator.com" \
56+
--output-format markdown \
57+
--ai-extract-rules-json '{"top_story":{"type":"string","description":"headline of the top story"}}' \
58+
--pretty
59+
60+
# Amazon product lookup for price monitoring / comparison
4761
hasdata amazon-product --asin B08N5WRWNW --pretty
62+
63+
# Zillow listings with complex filters
64+
hasdata zillow-listing \
65+
--keyword "Austin, TX" --type forSale \
66+
--price-min 400000 --price-max 900000 \
67+
--beds-min 3 --home-types house --home-types townhome \
68+
--sort priceLowToHigh --pretty
69+
```
70+
71+
Every command supports `--help`, `--pretty`, `--raw`, `--output file`, `--verbose`, `--timeout`, `--retries`, shell completion.
72+
73+
## Using it with LLMs
74+
75+
### Agent tool-call (Python + OpenAI-style tools)
76+
77+
```python
78+
import subprocess, json
79+
80+
def hasdata(cmd: list[str]) -> dict:
81+
"""Shell-tool wrapper around the hasdata CLI. Usable as an LLM tool."""
82+
out = subprocess.check_output(["hasdata", *cmd, "--raw"], text=True)
83+
return json.loads(out)
84+
85+
tool_spec = {
86+
"name": "web_search",
87+
"description": "Run a Google SERP query and return structured results.",
88+
"parameters": {
89+
"type": "object",
90+
"properties": {
91+
"query": {"type": "string"},
92+
"country": {"type": "string", "default": "us"},
93+
"n": {"type": "integer", "default": 10},
94+
},
95+
"required": ["query"],
96+
},
97+
}
98+
99+
def web_search(query: str, country: str = "us", n: int = 10) -> dict:
100+
return hasdata(["google-serp", "--q", query, "--gl", country, "--num", str(n)])
48101
```
49102

50-
Output goes to stdout; use `--output file.json` to write to a file, `--pretty` to indent JSON, `--raw` to skip formatting entirely. `--verbose` prints the request URL and rate-limit headers to stderr.
103+
Feed `tool_spec` to Claude / GPT / Gemini tool calling — zero Python dependencies on the HasData side.
104+
105+
### RAG ingestion (bash loop)
106+
107+
```sh
108+
for q in "$@"; do
109+
hasdata google-serp --q "$q" --num 50 --raw \
110+
| jq -c '.organic_results[] | {url:.link, title, snippet}' \
111+
>> serp-corpus.jsonl
112+
done
113+
```
114+
115+
Point your embedder at `serp-corpus.jsonl`.
116+
117+
### Prompt-time grounding (no vector store)
51118

52-
### Flag value shapes
119+
```sh
120+
CONTEXT=$(hasdata google-serp --q "latest gpu benchmarks" --num 5 --raw \
121+
| jq -r '.organic_results[] | "- \(.title): \(.snippet)"')
122+
llm "Answer using this context only:\n$CONTEXT\n\nQuestion: what's the fastest consumer GPU right now?"
123+
```
53124

54-
- **Scalars** (`--q text`, `--num 50`, `--block-ads=false`) — standard.
55-
- **Enum flags** validate against allowed values; shell completion offers the list. See `hasdata <api> --help` for each flag's allowed values.
56-
- **Boolean flags with default `true`** have a paired `--no-<flag>` form (e.g. `--no-block-ads`).
57-
- **List flags** accept repeated values or a comma-separated form: `--lr lang_en --lr lang_fr` or `--lr lang_en,lang_fr`.
58-
- **Any flag ending in `-json`** accepts raw JSON, a `@file` path, or `-` for stdin — e.g. `--ai-extract-rules-json @rules.json`, `--js-scenario-json '[{"wait":2000}]'`, `echo '{...}' | ... --extract-rules-json -`.
59-
- **`additionalProperties` objects** (e.g. `headers`, `extractRules`) are exposed as **two flags**: `--headers k=v` (repeatable, splits on the first `=`) and `--headers-json '{...}'` as an escape hatch. When both are given, the JSON is the base and kv items override per key.
125+
## Available APIs
60126

61-
### Exit codes
127+
`hasdata --help` lists all of them with per-call pricing. Grouped overview:
62128

63-
| Code | Meaning |
129+
| Category | Commands |
64130
|---|---|
65-
| 0 | success |
66-
| 1 | user / CLI-input error |
67-
| 2 | network error |
68-
| 3 | API returned 4xx |
69-
| 4 | API returned 5xx |
131+
| **Google SERP** | `google-serp` · `google-serp-light` · `google-ai-mode` · `google-news` · `google-shopping` · `google-immersive-product` · `google-events` · `google-short-videos` |
132+
| **Google Maps** | `google-maps` · `google-maps-place` · `google-maps-reviews` · `google-maps-contributor-reviews` · `google-maps-photos` |
133+
| **Google Other** | `google-images` · `google-trends` · `google-flights` |
134+
| **Search Engines** | `bing-serp` |
135+
| **Web** | `web-scraping` (headless, AI extraction, markdown output, screenshots) |
136+
| **E-commerce** | `amazon-product` · `amazon-search` · `amazon-seller` · `amazon-seller-products` · `shopify-products` · `shopify-collections` |
137+
| **Real Estate** | `zillow-listing` · `zillow-property` · `redfin-listing` · `redfin-property` · `airbnb-listing` · `airbnb-property` |
138+
| **Business / Local** | `yelp-search` · `yelp-place` · `yellowpages-search` · `yellowpages-place` |
139+
| **Jobs** | `indeed-listing` · `indeed-job` · `glassdoor-listing` · `glassdoor-job` |
140+
| **Social** | `instagram-profile` |
141+
142+
## Flag patterns
143+
144+
- **Scalars / enums**`--q text`, `--num 50`, `--block-ads=false`. Enum flags validate client-side and offer tab-completion.
145+
- **Booleans defaulting to `true`** — paired negated form: `--no-block-ads`, `--no-screenshot`.
146+
- **Lists** — repeat (`--lr lang_en --lr lang_fr`) or comma-join (`--lr lang_en,lang_fr`). Serialized as `key[]=value` for GET endpoints.
147+
- **Anything ending in `-json`** — accepts raw JSON, `@path/to/file.json`, or `-` for stdin. Works for `--ai-extract-rules-json`, `--js-scenario-json`, `--extract-rules-json`, `--headers-json`, etc.
148+
- **Key-value objects** — e.g. `--headers User-Agent=foo` (repeatable, splits on first `=`, values with `=` preserved). Combine with `--headers-json` for a JSON base; kv items override per key.
149+
150+
## Output & scripting
151+
152+
- JSON responses pretty-print when stdout is a TTY; raw when piped (great for `jq`). Force with `--pretty` / `--raw`.
153+
- `--output file` writes raw response bytes (works for `screenshot` / image endpoints too).
154+
- `--verbose` prints the outgoing URL and `X-RateLimit-*` headers on stderr.
155+
- Exit codes: `0` success · `1` user error · `2` network · `3` API 4xx · `4` API 5xx. Script-safe.
156+
157+
## Shell completion
158+
159+
```sh
160+
# zsh
161+
hasdata completion zsh > "${fpath[1]}/_hasdata"
162+
# bash
163+
hasdata completion bash > /usr/local/etc/bash_completion.d/hasdata
164+
# fish
165+
hasdata completion fish > ~/.config/fish/completions/hasdata.fish
166+
```
167+
168+
Enum values auto-complete (`hasdata google-serp --gl <TAB>``us`, `gb`, `ca`, …).
70169

71170
## Update
72171

73172
```sh
74-
hasdata update # upgrade to latest release
75-
hasdata update --check # report whether an update is available
173+
hasdata update # upgrade to latest release
174+
hasdata update --check # report available version without installing
76175
```
77176

78-
A once-per-24h background check will print a one-line notice to stderr when a newer version is available. Disable it by writing `check_updates: false` into `~/.hasdata/config.yaml`.
177+
A once-per-24h check prints a one-line notice to stderr when a newer version is out. Disable with `check_updates: false` in `~/.hasdata/config.yaml`.
178+
179+
## How the CLI stays current
79180

80-
## How it stays in sync
181+
Every command here is generated from the live schema at `https://api.hasdata.com/apis`. A scheduled GitHub Action re-runs the generator, and a hash of the normalized spec short-circuits diffs when nothing changed. When HasData ships a new API, a PR lands here within 24 hours, then a release goes out — and `hasdata update` brings it to your machine.
81182

82-
The CLI is regenerated from `/apis` and `/apis/<slug>` at build time (`internal/gen/main.go`). A scheduled GitHub Action (and a `repository_dispatch` trigger from the API side) re-runs the generator daily and opens a PR when the spec hash changes. New APIs reach users through a new release + `hasdata update`.
183+
Contributing locally:
83184

84-
Contributors — if you're changing the CLI manually:
85185
```sh
86-
go generate ./... # regenerate from live API specs
186+
go generate ./... # regenerate cmd/gen_*.go from api.hasdata.com
87187
go build ./...
88188
go test ./...
89189
```
90190

191+
## Resources
192+
193+
- **HasData docs**<https://docs.hasdata.com>
194+
- **API catalog**<https://hasdata.com/apis>
195+
- **Releases**<https://github.com/hasdata-com/hasdata-cli/releases>
196+
- **Issues & feature requests**<https://github.com/hasdata-com/hasdata-cli/issues>
197+
91198
## License
92199

93-
MIT — see `LICENSE`.
200+
[MIT](LICENSE)use it commercially, embed it in your agent, ship it inside a container. Just don't hold us liable.

0 commit comments

Comments
 (0)