Welcome to the Jarvis AI Assistant project! ๐๏ธ This AI-powered assistant can perform various tasks such as providing weather reports ๐ฆ๏ธ, summarizing news ๐ฐ, sending emails ๐ง , CAG , and more, all through voice commands. Below, you'll find detailed instructions on how to set up, use, and interact with this assistant. ๐ง
โ
Voice Activation: Say "Hey Jarvis" to activate listening mode. ๐ค
โ
Speech Recognition: Recognizes and processes user commands via speech input. ๐ฃ๏ธ
โ
AI Responses: Provides responses using AI-generated text-to-speech output. ๐ถ
โ
Task Execution: Handles multiple tasks, including:
- ๐ง Sending emails
- ๐ฆ๏ธ Summarizing weather reports
- ๐ฐ Reading news headlines
- ๐ผ๏ธ Image generation
- ๐ฆ Database functions
- ๐ฑ Phone call automation using ADB
- ๐ค AI-based task execution
- ๐ก Automate websites & applications
- ๐ง Retrieval-Augmented Generation (RAG) for knowledge-based interactions
- โ Timeout Handling: Automatically deactivates listening mode after 5 minutes of inactivity. โณ
- โ Automatic Input Processing: If no "stop" command is detected within 60 seconds, input is finalized and sent to the AI model for processing. โ๏ธ
- โ Multiple Function Calls: Call multiple functions simultaneously, even if their inputs and outputs are unrelated. ๐
Before running the project, ensure you have the following installed:
โ
Python 3.9 or later ๐
โ
Required libraries (listed in requirements.txt
) ๐
-
Create a
.env
file in the root directory of the project. -
Add your API keys and other configuration variables to the
.env
file:Weather_api=your_weather_api_key News_api=your_news_api_key Sender_email=your_email Receiver_email=subject_email Password_email=email_password
-
Setup API Keys & Passwords:
- ๐ฉ๏ธ WEATHER API - Get weather data.
- ๐ฐ NEWS API - Fetch latest news headlines.
- ๐ง GMAIL PASSWORD - Generate an app password for sending emails.
- ๐ง OLLAMA - Download Granite3.1-Dense:2b/8b models from Ollama.
install Models from ollama
ollama run gemma3:4b ollama run granite3.1-dense:2b ollama pull nomic-embed-text
- ๐ฎ GEMINI AI - API access for function execution.
Model
architecture gemma3
parameters 4.3B
context length 8192
embedding length 2560
quantization Q4_K_M
Parameters
stop "<end_of_turn>"
temperature 0.1
License
Gemma Terms of Use
Last modified: February 21, 2024
Model
architecture granite
parameters 2.5B
context length 131072
embedding length 2048
quantization Q4_K_M
System
Knowledge Cutoff Date: April 2024.
You are Granite, developed by IBM.
License
Apache License
Version 2.0, January 2004
gemini-2.0-flash
Audio, images, videos, and text Text, images (experimental), and audio (coming soon) Next generation features, speed, thinking, realtime streaming, and multimodal generation
gemini-2.0-flash-lite
Audio, images, videos, and text Text A Gemini 2.0 Flash model optimized for cost efficiency and low latency
gemini-2.0-pro-exp-02-05
Audio, images, videos, and text Text Our most powerful Gemini 2.0 model
gemini-1.5-flash
Audio, images, videos, and text Text Fast and versatile performance across a diverse variety of tasks
โโโ DATA
โ โโโ KNOWLEDGEBASE
โ โ โโโ disaster_data_converted.md
โ โโโ RAWKNOWLEDGEBASE
โ โ โโโ disaster_data.pdf
โ โโโ email_schema.py
โ โโโ msg.py
โ โโโ phone_details.py
โ โโโ tools.py
โโโ device_ips.txt
โโโ main.py
โโโ readme.md
โโโ requirements.txt
โโโ src
โโโ BRAIN
โ โโโ RAG.py
โ โโโ func_call.py
โ โโโ gemini_llm.py
โ โโโ lm_ai.py
โ โโโ text_to_info.py
โโโ CONVERSATION
โ โโโ speech_to_text.py
โ โโโ t_s.py
โ โโโ test_speech.py
โ โโโ text_to_speech.py
โโโ FUNCTION
โ โโโ Email_send.py
โ โโโ adb_connect.bat
โ โโโ adb_connect.sh
โ โโโ app_op.py
โ โโโ get_env.py
โ โโโ greet_time.py
โ โโโ incog.py
โ โโโ internet_search.py
โ โโโ link_op.py
โ โโโ news.py
โ โโโ phone_call.py
โ โโโ random_respon.py
โ โโโ run_function.py
โ โโโ weather.py
โ โโโ youtube_downloader.py
โโโ KEYBOARD
โ โโโ key_lst.py
โ โโโ key_prs_lst.py
โโโ VISION
โโโ eye.py
11 directories, 40 files
git clone https://github.com/ganeshnikhil/J.A.R.V.I.S.2.0.git
cd J.A.R.V.I.S.2.0
pip install -r requirements.txt
python main.py
๐ข Initial Interaction:
[= =] Say 'hey jarvis' to activate, and 'stop' to deactivate. Say 'exit' to quit.
๐ Transitioned to Gemini AI-powered function calling, allowing multiple function calls simultaneously for better efficiency! โ๏ธ If Gemini AI fails to generate function calls, the system automatically falls back to an Ollama-based model for reliable execution.ย
๐น AI Model Used: Gemini AI ๐ง
โ
Higher accuracy โ
Structured data processing โ
Reliable AI-driven interactions
๐ Command Parsing ๐
response = gemini_generate_function_call(command)
response_dic = parse_tool_call(response)
๐ Dynamic Function Execution ๐
if response_dic:
func_name = response_dic["name"]
response = execute_function_call(response_dic)
๐ Error Handling & Fallback to Ollama ๐
try:
response = execute_function_call(response_dic)
except Exception as e:
print(f"Error in Gemini AI function execution: {e}")
print("Falling back to Ollama-based function execution...")
response = ollama_generate_function_call(command)
๐ Retry Mechanism ๐
def send_to_ai_with_retry(prompt, retries=3, delay=2):
for _ in range(retries):
try:
return send_to_gemini(prompt)
except Exception:
time.sleep(delay)
print("Gemini AI is not responding. Switching to Ollama...")
return send_to_ollama(prompt)
๐ก Retrieval-Augmented Generation (RAG) dynamically loads relevant markdown-based knowledge files based on the queried topic, reducing hallucinations and improving response accuracy.
๐น Integrated Android Debug Bridge (ADB) to enable voice-controlled phone automation! ๐๏ธ
โ
Make phone calls โ๏ธ
โ
Open apps & toggle settings ๐ฒ
โ
Access phone data & remote operations ๐ ๏ธ
๐ Windows
winget install --id=Google.AndroidSDKPlatformTools -e
๐ Linux
sudo apt install adb
๐ Mac
brew install android-platform-tools
โจ Deeper mobile integration ๐ฑ
โจ Advanced AI-driven automation ๐ค
โจ Improved NLP-based command execution ๐ง
โจ Multi-modal interactions (text + voice + image) ๐ผ๏ธ
๐ Stay tuned for future updates! ๐ฅ
## Gemini Model Comparison
The following table provides a comparison of various Gemini models with respect to their rate limits:
| Model | RPM | TPM | RPD |
|------------------------------------- |-----:|----------:| -----:|
| **Gemini 2.0 Flash** | 15 | 1,000,000 | 1,500 |
| **Gemini 2.0 Flash-Lite Preview** | 30 | 1,000,000 | 1,500 |
| **Gemini 2.0 Pro Experimental 02-05** | 2 | 1,000,000 | 50 |
| **Gemini 2.0 Flash Thinking Experimental** | 10 | 4,000,000 | 1,500 |
| **Gemini 1.5 Flash** | 15 | 1,000,000 | 1,500 |
| **Gemini 1.5 Flash-8B** | 15 | 1,000,000 | 1,500 |
| **Gemini 1.5 Pro** | 2 | 32,000 | 50 |
| **Imagen 3** | -- | -- | -- |
- RPM: Requests per minute
- TPM: Tokens per minute
- RPD: Requests per day
The focus of project is mostly on using small model and free (api) models , get accurate agentic behaviours , to run these on low spec systems to.