Telegram bot that finds service providers on the web and calls them for you using ElevenLabs agents platform and Firecrawl web search.
Tell it what you need in any language. It searches the web, extracts contact details, and dispatches an AI voice agent to make the phone call. You get a transcript and summary when it's done.
You: Find me a dentist in San Francisco
Bot: π Searching...
Bot: Found 3 results:
1. Bay Area Dental β π (415) 555-0123
[π Call Bay Area Dental]
You: *taps button*
Bot: π Calling Bay Area Dental...
Bot: β
Call completed!
Appointment scheduled for March 25 at 2:00 PM.
Address: 123 Market St, SF.
Telegram β Cloudflare Worker β Durable Object (per user)
β
βββββββββββββββββββΌββββββββββββββββββ
β β β
Firecrawl Kimi K2.5 via ElevenLabs
(web search) AI SDK (routing) (voice agent)
β
Twilio
(phone line)
- User sends a message. Worker routes it to the user's Durable Object.
- The AI model (Kimi K2.5) receives the message with conversation history (last 15 messages) and decides what to do via tool calls.
- If the AI calls
searchWeb, Firecrawl searches the web and returns results with phone numbers extracted. - If the AI calls
makeCall, ElevenLabs voice agent calls via Twilio and handles the conversation. - The AI can chain tools in a single turn β e.g., search and then immediately call the best match.
- Bot polls for call completion, fetches the transcript, and sends a summary.
| Component | Technology | Purpose |
|---|---|---|
| Voice AI | ElevenLabs Conversational AI | AI voice agent, makes phone calls |
| Web Search | Firecrawl | Web search with full page content as markdown |
| Hosting | Cloudflare Workers | Serverless edge compute |
| State | Durable Objects (via Agents SDK) | Per-user persistent state with embedded SQLite |
| LLM | Kimi K2.5 via Vercel AI SDK on Workers AI | Conversational routing with tool calls (searchWeb, makeCall) |
| Phone | Twilio | Phone number (managed by ElevenLabs) |
| Bot | Telegram Bot API | User interface |
- Node.js 18+
- Cloudflare account (free tier works)
- Telegram bot token (free)
- Firecrawl API key (500 free credits to start)
- ElevenLabs API key (Conversational AI starts at $0.10/min)
- Twilio account with a voice-capable phone number
git clone https://github.com/romanbarbo/hey-agent.git
cd hey-agent
npm install- Open Telegram, search for @BotFather
- Send
/newbot, choose a name and username - Copy the bot token
- Sign up at twilio.com
- Go to Phone Numbers > Buy a number (pick one with Voice capability)
- Note your Account SID and Auth Token from the dashboard
- Go to ElevenLabs > Conversational AI > Create Agent
- Set the system prompt (see Agent Prompt below)
- Set First message to:
Hello, I'm calling to {{task_description}}. Could you help me with that? - Add dynamic variables:
task_description,provider_name,user_language - Choose a voice (e.g., "George" or "Sarah") and TTS model (
eleven_flash_v2_5for lowest latency) - Enable the
end_callbuilt-in tool - Copy the Agent ID from the URL
ElevenLabs manages Twilio natively, no webhook configuration needed:
- In ElevenLabs > Conversational AI > Phone Numbers
- Click Import phone number > From Twilio
- Enter your Twilio phone number, Account SID, and Auth Token
- Assign your agent to the number
- Copy the Phone Number ID
cp .dev.vars.example .dev.varsEdit .dev.vars with your actual values:
BOT_TOKEN=123456:ABC-DEF... # From BotFather
BOT_SECRET=a-long-random-string # Generate with: openssl rand -hex 32
FIRECRAWL_API_KEY=fc-... # From firecrawl.dev
ELEVENLABS_API_KEY=sk_... # From elevenlabs.io
ELEVENLABS_AGENT_ID=... # From agent URL
ELEVENLABS_PHONE_NUMBER_ID=... # From phone number settings
# Set production secrets
npx wrangler secret put BOT_TOKEN
npx wrangler secret put BOT_SECRET
npx wrangler secret put FIRECRAWL_API_KEY
npx wrangler secret put ELEVENLABS_API_KEY
npx wrangler secret put ELEVENLABS_AGENT_ID
npx wrangler secret put ELEVENLABS_PHONE_NUMBER_ID
# Deploy
npm run deploycurl -H "Authorization: Bearer YOUR_BOT_SECRET" \
https://hey-agent.YOUR-SUBDOMAIN.workers.dev/register-webhooknpm run dev # Start local dev server
npm run test # Run tests
npm run cf-typegen # Regenerate types after changing wrangler.jsoncLocal development uses .dev.vars for environment variables.
Use this as the system prompt for your ElevenLabs Conversational AI agent:
You are a polite and professional phone assistant making a call on behalf of a user.
Your task:
{{task_description}}
You are calling: {{provider_name}}
Guidelines:
- Be concise and natural, like a real person making a phone call
- State the purpose of your call clearly at the beginning
- Collect key information: available dates/times, pricing, address, any requirements
- If asked who you are, say you are an assistant calling on behalf of a client
- Confirm all important details before ending the call (date, time, address, cost)
- If the line is busy, voicemail, or no one answers, end the call politely
- If they ask for a callback number, say your client will call back and thank them
- Speak in the language matching the provider's language, but default to {{user_language}} if unsure
- Keep the conversation focused and don't ramble
- Thank them before hanging up
- If you successfully make an appointment or get the needed information, summarize what was agreed upon before ending
Do NOT:
- Provide personal medical, legal, or financial information about the user
- Agree to costs or commitments above what was requested
- Argue or be confrontational if they can't accommodate the request
The dynamic variables ({{task_description}}, {{provider_name}}, {{user_language}}) are filled by the bot at call time based on the user's search and selected provider.
src/
βββ index.ts # Worker entry point: Telegram webhook, routing
βββ agent.ts # TaskAgent Durable Object: message history, tool results, call polling
βββ telegram.ts # Telegram Bot API client
βββ firecrawl.ts # Firecrawl search/scrape client
βββ elevenlabs.ts # ElevenLabs Conversational AI client (outbound calls, transcripts)
βββ ai.ts # Vercel AI SDK: generateText with searchWeb/makeCall tools
Each Telegram user gets their own Durable Object instance identified by chat_id:
- Isolated state per user
- SQLite storage for messages, search results, and calls
- Scheduled alarms for call status polling
- Horizontal scaling by design
There is no manual intent parser. Every user message is sent to Kimi K2.5 via the Vercel AI SDK's generateText with two tools available: searchWeb and makeCall. The model decides which tools to call (if any) based on the conversation. It can chain multiple tools in a single turn (up to 5 steps).
The AI calls the searchWeb tool. Firecrawl searches the web and returns full page content as markdown. Phone numbers are extracted via regex. Results are stored in SQLite and presented to the user with optional inline call buttons.
The AI calls the makeCall tool (either directly from user request or after a search). ElevenLabs voice agent calls via Twilio and handles the conversation. The bot polls for completion every 15 seconds and sends a transcript summary when done.
| Storage | What | Why |
|---|---|---|
messages table |
Conversation history (role + content) | Last 15 messages sent as context to the AI |
search_results table |
Search results with extracted contacts | Quick lookup when user taps a call button |
calls table |
Active/completed calls with conversation IDs | Status polling, transcript storage |
The bot detects the user's language and adapts the entire pipeline: search queries, call scripts, and voice agent language. ElevenLabs supports 70+ languages.
MIT