|
1 | 1 | # hasdata-cli |
2 | 2 |
|
3 | | -Command-line interface for [hasdata.com](https://hasdata.com). |
| 3 | +**The official command-line interface for [HasData](https://hasdata.com) — web scraping, SERP, and real-estate/e-commerce data APIs, wired for shell scripts, LLM agents, and RAG pipelines.** |
4 | 4 |
|
5 | | -## Install |
| 5 | +[](https://github.com/hasdata-com/hasdata-cli/actions/workflows/ci.yml) |
| 6 | +[](https://github.com/hasdata-com/hasdata-cli/releases) |
| 7 | +[](LICENSE) |
| 8 | + |
| 9 | +One static binary. Every API at [`hasdata.com`](https://hasdata.com/apis) exposed as a subcommand. No SDK install, no dependencies, no glue code — `curl | sh`, export a key, pipe JSON to `jq` or straight into your LLM prompt. |
6 | 10 |
|
7 | | -**macOS / Linux** (curl | sh, verifies checksum): |
8 | 11 | ```sh |
9 | 12 | curl -sSL https://raw.githubusercontent.com/hasdata-com/hasdata-cli/main/install.sh | sh |
| 13 | +export HASDATA_API_KEY=hd_xxx |
| 14 | +hasdata google-serp --q "best espresso machine 2026" |
10 | 15 | ``` |
11 | 16 |
|
12 | | -**Homebrew** (macOS / Linux): |
13 | | -```sh |
14 | | -brew install hasdata-com/tap/hasdata |
15 | | -``` |
| 17 | +--- |
16 | 18 |
|
17 | | -**Scoop** (Windows): |
18 | | -```powershell |
19 | | -scoop bucket add hasdata https://github.com/hasdata-com/scoop-bucket |
20 | | -scoop install hasdata |
21 | | -``` |
| 19 | +## Why a CLI? |
22 | 20 |
|
23 | | -**winget** (Windows): |
24 | | -```powershell |
25 | | -winget install hasdata-com.hasdata |
26 | | -``` |
| 21 | +- **Agents & tool use** — drop `hasdata <api>` into LangChain, LlamaIndex, CrewAI, or your own agent loop as a shell tool. Stable JSON in, stable JSON out. |
| 22 | +- **RAG ingestion** — stream fresh Google, Amazon, Zillow, and arbitrary web data into your vector store from a cron job or a `Makefile`, no backend required. |
| 23 | +- **Prompt-time grounding** — `hasdata google-serp ... | jq .organic_results` ➜ into a system prompt to cut hallucinations on current events, product pricing, real-estate comps, reviews. |
| 24 | +- **Dataset building** — parallel GNU-`xargs` invocations produce JSONL for LLM fine-tuning or evals. |
| 25 | +- **Humans too** — one-off lookups from your terminal, full `--help` for every flag, tab-completion for every enum. |
| 26 | + |
| 27 | +## Install |
| 28 | + |
| 29 | +| Platform | Command | |
| 30 | +|---|---| |
| 31 | +| macOS / Linux | `curl -sSL https://raw.githubusercontent.com/hasdata-com/hasdata-cli/main/install.sh \| sh` | |
| 32 | +| Windows manual | download the `.zip` from [Releases](https://github.com/hasdata-com/hasdata-cli/releases), extract, put `hasdata.exe` on `%PATH%` | |
| 33 | +| From source | `go install github.com/hasdata-com/hasdata-cli@latest` | |
27 | 34 |
|
28 | | -**Manual**: download the archive matching your OS/arch from the [Releases](https://github.com/hasdata-com/hasdata-cli/releases) page, extract, place `hasdata` in your `PATH`. |
| 35 | +The `install.sh` script detects your OS/arch, downloads the matching asset, and verifies its SHA-256 against the published `checksums.txt` before installing. |
29 | 36 |
|
30 | 37 | ## Configure |
31 | 38 |
|
32 | 39 | ```sh |
33 | | -export HASDATA_API_KEY=your_key_here |
| 40 | +export HASDATA_API_KEY=your_key # preferred for CI / containers / agents |
34 | 41 | # or |
35 | | -hasdata configure |
| 42 | +hasdata configure # writes ~/.hasdata/config.yaml (0600) |
36 | 43 | ``` |
37 | 44 |
|
38 | | -Precedence: `--api-key` flag > `HASDATA_API_KEY` env var > `~/.hasdata/config.yaml`. |
| 45 | +Precedence: `--api-key` flag > `HASDATA_API_KEY` env > `~/.hasdata/config.yaml`. Get a key from the [HasData dashboard](https://hasdata.com). |
39 | 46 |
|
40 | | -## Use |
| 47 | +## First calls |
41 | 48 |
|
42 | 49 | ```sh |
43 | | -hasdata --help # all APIs grouped by category |
44 | | -hasdata google-serp --help # flags for a specific API |
45 | | -hasdata google-serp --q "coffee" --gl us --num 20 |
46 | | -hasdata web-scraping --url https://example.com --no-block-ads --extract-rules-json @rules.json |
| 50 | +# Google SERP — structured organic / ads / knowledge graph / PAA |
| 51 | +hasdata google-serp --q "langchain vs llamaindex" --gl us --pretty |
| 52 | + |
| 53 | +# Render + scrape any URL (JS, proxies, markdown output, AI extraction) |
| 54 | +hasdata web-scraping \ |
| 55 | + --url "https://news.ycombinator.com" \ |
| 56 | + --output-format markdown \ |
| 57 | + --ai-extract-rules-json '{"top_story":{"type":"string","description":"headline of the top story"}}' \ |
| 58 | + --pretty |
| 59 | + |
| 60 | +# Amazon product lookup for price monitoring / comparison |
47 | 61 | hasdata amazon-product --asin B08N5WRWNW --pretty |
| 62 | + |
| 63 | +# Zillow listings with complex filters |
| 64 | +hasdata zillow-listing \ |
| 65 | + --keyword "Austin, TX" --type forSale \ |
| 66 | + --price-min 400000 --price-max 900000 \ |
| 67 | + --beds-min 3 --home-types house --home-types townhome \ |
| 68 | + --sort priceLowToHigh --pretty |
| 69 | +``` |
| 70 | + |
| 71 | +Every command supports `--help`, `--pretty`, `--raw`, `--output file`, `--verbose`, `--timeout`, `--retries`, shell completion. |
| 72 | + |
| 73 | +## Using it with LLMs |
| 74 | + |
| 75 | +### Agent tool-call (Python + OpenAI-style tools) |
| 76 | + |
| 77 | +```python |
| 78 | +import subprocess, json |
| 79 | + |
| 80 | +def hasdata(cmd: list[str]) -> dict: |
| 81 | + """Shell-tool wrapper around the hasdata CLI. Usable as an LLM tool.""" |
| 82 | + out = subprocess.check_output(["hasdata", *cmd, "--raw"], text=True) |
| 83 | + return json.loads(out) |
| 84 | + |
| 85 | +tool_spec = { |
| 86 | + "name": "web_search", |
| 87 | + "description": "Run a Google SERP query and return structured results.", |
| 88 | + "parameters": { |
| 89 | + "type": "object", |
| 90 | + "properties": { |
| 91 | + "query": {"type": "string"}, |
| 92 | + "country": {"type": "string", "default": "us"}, |
| 93 | + "n": {"type": "integer", "default": 10}, |
| 94 | + }, |
| 95 | + "required": ["query"], |
| 96 | + }, |
| 97 | +} |
| 98 | + |
| 99 | +def web_search(query: str, country: str = "us", n: int = 10) -> dict: |
| 100 | + return hasdata(["google-serp", "--q", query, "--gl", country, "--num", str(n)]) |
48 | 101 | ``` |
49 | 102 |
|
50 | | -Output goes to stdout; use `--output file.json` to write to a file, `--pretty` to indent JSON, `--raw` to skip formatting entirely. `--verbose` prints the request URL and rate-limit headers to stderr. |
| 103 | +Feed `tool_spec` to Claude / GPT / Gemini tool calling — zero Python dependencies on the HasData side. |
| 104 | + |
| 105 | +### RAG ingestion (bash loop) |
| 106 | + |
| 107 | +```sh |
| 108 | +for q in "$@"; do |
| 109 | + hasdata google-serp --q "$q" --num 50 --raw \ |
| 110 | + | jq -c '.organic_results[] | {url:.link, title, snippet}' \ |
| 111 | + >> serp-corpus.jsonl |
| 112 | +done |
| 113 | +``` |
| 114 | + |
| 115 | +Point your embedder at `serp-corpus.jsonl`. |
| 116 | + |
| 117 | +### Prompt-time grounding (no vector store) |
51 | 118 |
|
52 | | -### Flag value shapes |
| 119 | +```sh |
| 120 | +CONTEXT=$(hasdata google-serp --q "latest gpu benchmarks" --num 5 --raw \ |
| 121 | + | jq -r '.organic_results[] | "- \(.title): \(.snippet)"') |
| 122 | +llm "Answer using this context only:\n$CONTEXT\n\nQuestion: what's the fastest consumer GPU right now?" |
| 123 | +``` |
53 | 124 |
|
54 | | -- **Scalars** (`--q text`, `--num 50`, `--block-ads=false`) — standard. |
55 | | -- **Enum flags** validate against allowed values; shell completion offers the list. See `hasdata <api> --help` for each flag's allowed values. |
56 | | -- **Boolean flags with default `true`** have a paired `--no-<flag>` form (e.g. `--no-block-ads`). |
57 | | -- **List flags** accept repeated values or a comma-separated form: `--lr lang_en --lr lang_fr` or `--lr lang_en,lang_fr`. |
58 | | -- **Any flag ending in `-json`** accepts raw JSON, a `@file` path, or `-` for stdin — e.g. `--ai-extract-rules-json @rules.json`, `--js-scenario-json '[{"wait":2000}]'`, `echo '{...}' | ... --extract-rules-json -`. |
59 | | -- **`additionalProperties` objects** (e.g. `headers`, `extractRules`) are exposed as **two flags**: `--headers k=v` (repeatable, splits on the first `=`) and `--headers-json '{...}'` as an escape hatch. When both are given, the JSON is the base and kv items override per key. |
| 125 | +## Available APIs |
60 | 126 |
|
61 | | -### Exit codes |
| 127 | +`hasdata --help` lists all of them with per-call pricing. Grouped overview: |
62 | 128 |
|
63 | | -| Code | Meaning | |
| 129 | +| Category | Commands | |
64 | 130 | |---|---| |
65 | | -| 0 | success | |
66 | | -| 1 | user / CLI-input error | |
67 | | -| 2 | network error | |
68 | | -| 3 | API returned 4xx | |
69 | | -| 4 | API returned 5xx | |
| 131 | +| **Google SERP** | `google-serp` · `google-serp-light` · `google-ai-mode` · `google-news` · `google-shopping` · `google-immersive-product` · `google-events` · `google-short-videos` | |
| 132 | +| **Google Maps** | `google-maps` · `google-maps-place` · `google-maps-reviews` · `google-maps-contributor-reviews` · `google-maps-photos` | |
| 133 | +| **Google Other** | `google-images` · `google-trends` · `google-flights` | |
| 134 | +| **Search Engines** | `bing-serp` | |
| 135 | +| **Web** | `web-scraping` (headless, AI extraction, markdown output, screenshots) | |
| 136 | +| **E-commerce** | `amazon-product` · `amazon-search` · `amazon-seller` · `amazon-seller-products` · `shopify-products` · `shopify-collections` | |
| 137 | +| **Real Estate** | `zillow-listing` · `zillow-property` · `redfin-listing` · `redfin-property` · `airbnb-listing` · `airbnb-property` | |
| 138 | +| **Business / Local** | `yelp-search` · `yelp-place` · `yellowpages-search` · `yellowpages-place` | |
| 139 | +| **Jobs** | `indeed-listing` · `indeed-job` · `glassdoor-listing` · `glassdoor-job` | |
| 140 | +| **Social** | `instagram-profile` | |
| 141 | + |
| 142 | +## Flag patterns |
| 143 | + |
| 144 | +- **Scalars / enums** — `--q text`, `--num 50`, `--block-ads=false`. Enum flags validate client-side and offer tab-completion. |
| 145 | +- **Booleans defaulting to `true`** — paired negated form: `--no-block-ads`, `--no-screenshot`. |
| 146 | +- **Lists** — repeat (`--lr lang_en --lr lang_fr`) or comma-join (`--lr lang_en,lang_fr`). Serialized as `key[]=value` for GET endpoints. |
| 147 | +- **Anything ending in `-json`** — accepts raw JSON, `@path/to/file.json`, or `-` for stdin. Works for `--ai-extract-rules-json`, `--js-scenario-json`, `--extract-rules-json`, `--headers-json`, etc. |
| 148 | +- **Key-value objects** — e.g. `--headers User-Agent=foo` (repeatable, splits on first `=`, values with `=` preserved). Combine with `--headers-json` for a JSON base; kv items override per key. |
| 149 | + |
| 150 | +## Output & scripting |
| 151 | + |
| 152 | +- JSON responses pretty-print when stdout is a TTY; raw when piped (great for `jq`). Force with `--pretty` / `--raw`. |
| 153 | +- `--output file` writes raw response bytes (works for `screenshot` / image endpoints too). |
| 154 | +- `--verbose` prints the outgoing URL and `X-RateLimit-*` headers on stderr. |
| 155 | +- Exit codes: `0` success · `1` user error · `2` network · `3` API 4xx · `4` API 5xx. Script-safe. |
| 156 | + |
| 157 | +## Shell completion |
| 158 | + |
| 159 | +```sh |
| 160 | +# zsh |
| 161 | +hasdata completion zsh > "${fpath[1]}/_hasdata" |
| 162 | +# bash |
| 163 | +hasdata completion bash > /usr/local/etc/bash_completion.d/hasdata |
| 164 | +# fish |
| 165 | +hasdata completion fish > ~/.config/fish/completions/hasdata.fish |
| 166 | +``` |
| 167 | + |
| 168 | +Enum values auto-complete (`hasdata google-serp --gl <TAB>` → `us`, `gb`, `ca`, …). |
70 | 169 |
|
71 | 170 | ## Update |
72 | 171 |
|
73 | 172 | ```sh |
74 | | -hasdata update # upgrade to latest release |
75 | | -hasdata update --check # report whether an update is available |
| 173 | +hasdata update # upgrade to latest release |
| 174 | +hasdata update --check # report available version without installing |
76 | 175 | ``` |
77 | 176 |
|
78 | | -A once-per-24h background check will print a one-line notice to stderr when a newer version is available. Disable it by writing `check_updates: false` into `~/.hasdata/config.yaml`. |
| 177 | +A once-per-24h check prints a one-line notice to stderr when a newer version is out. Disable with `check_updates: false` in `~/.hasdata/config.yaml`. |
| 178 | + |
| 179 | +## How the CLI stays current |
79 | 180 |
|
80 | | -## How it stays in sync |
| 181 | +Every command here is generated from the live schema at `https://api.hasdata.com/apis`. A scheduled GitHub Action re-runs the generator, and a hash of the normalized spec short-circuits diffs when nothing changed. When HasData ships a new API, a PR lands here within 24 hours, then a release goes out — and `hasdata update` brings it to your machine. |
81 | 182 |
|
82 | | -The CLI is regenerated from `/apis` and `/apis/<slug>` at build time (`internal/gen/main.go`). A scheduled GitHub Action (and a `repository_dispatch` trigger from the API side) re-runs the generator daily and opens a PR when the spec hash changes. New APIs reach users through a new release + `hasdata update`. |
| 183 | +Contributing locally: |
83 | 184 |
|
84 | | -Contributors — if you're changing the CLI manually: |
85 | 185 | ```sh |
86 | | -go generate ./... # regenerate from live API specs |
| 186 | +go generate ./... # regenerate cmd/gen_*.go from api.hasdata.com |
87 | 187 | go build ./... |
88 | 188 | go test ./... |
89 | 189 | ``` |
90 | 190 |
|
| 191 | +## Resources |
| 192 | + |
| 193 | +- **HasData docs** — <https://docs.hasdata.com> |
| 194 | +- **API catalog** — <https://hasdata.com/apis> |
| 195 | +- **Releases** — <https://github.com/hasdata-com/hasdata-cli/releases> |
| 196 | +- **Issues & feature requests** — <https://github.com/hasdata-com/hasdata-cli/issues> |
| 197 | + |
91 | 198 | ## License |
92 | 199 |
|
93 | | -MIT — see `LICENSE`. |
| 200 | +[MIT](LICENSE) — use it commercially, embed it in your agent, ship it inside a container. Just don't hold us liable. |
0 commit comments