Skip to content

open source assistant using small models (2b - 5b) , with agentic and tool calling capabilities and integration of RAG with effiecient memory.android support using adb

Notifications You must be signed in to change notification settings

ganeshnikhil/J.A.R.V.I.S.2.0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

49 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿš€ JARVIS 2.0


J.A.R.V.I.S. 2.0 โ€“ Judgment Augmented Reasoning for Virtual Intelligent Systems

๐Ÿค– Jarvis AI Assistant

Welcome to the Jarvis AI Assistant project! ๐ŸŽ™๏ธ This AI-powered assistant can perform various tasks such as providing weather reports ๐ŸŒฆ๏ธ, summarizing news ๐Ÿ“ฐ, sending emails ๐Ÿ“ง , CAG , and more, all through voice commands. Below, you'll find detailed instructions on how to set up, use, and interact with this assistant. ๐ŸŽง


๐ŸŒŸ Features

โœ… Voice Activation: Say "Hey Jarvis" to activate listening mode. ๐ŸŽค
โœ… Speech Recognition: Recognizes and processes user commands via speech input. ๐Ÿ—ฃ๏ธ
โœ… AI Responses: Provides responses using AI-generated text-to-speech output. ๐ŸŽถ
โœ… Task Execution: Handles multiple tasks, including:

  • ๐Ÿ“ง Sending emails
  • ๐ŸŒฆ๏ธ Summarizing weather reports
  • ๐Ÿ“ฐ Reading news headlines
  • ๐Ÿ–ผ๏ธ Image generation
  • ๐Ÿฆ Database functions
  • ๐Ÿ“ฑ Phone call automation using ADB
  • ๐Ÿค– AI-based task execution
  • ๐Ÿ“ก Automate websites & applications
  • ๐Ÿง  Retrieval-Augmented Generation (RAG) for knowledge-based interactions
  • โœ… Timeout Handling: Automatically deactivates listening mode after 5 minutes of inactivity. โณ
  • โœ… Automatic Input Processing: If no "stop" command is detected within 60 seconds, input is finalized and sent to the AI model for processing. โš™๏ธ
  • โœ… Multiple Function Calls: Call multiple functions simultaneously, even if their inputs and outputs are unrelated. ๐Ÿ”„

๐Ÿ“Œ Prerequisites

Before running the project, ensure you have the following installed:

โœ… Python 3.9 or later ๐Ÿ
โœ… Required libraries (listed in requirements.txt) ๐Ÿ“œ

๐Ÿ› ๏ธ Configuration

  1. Create a .env file in the root directory of the project.

  2. Add your API keys and other configuration variables to the .env file:

    Weather_api=your_weather_api_key
    News_api=your_news_api_key
    Sender_email=your_email
    Receiver_email=subject_email
    Password_email=email_password
  3. Setup API Keys & Passwords:

Model Details

Gemma for intellignet routing image and simple question answers.

  Model
    architecture        gemma3    
    parameters          4.3B      
    context length      8192      
    embedding length    2560      
    quantization        Q4_K_M    

  Parameters
    stop           "<end_of_turn>"    
    temperature    0.1                

  License
    Gemma Terms of Use                  
    Last modified: February 21, 2024

grantie dense has large context window ,for rag and chat.

  Model
    architecture        granite    
    parameters          2.5B       
    context length      131072     
    embedding length    2048       
    quantization        Q4_K_M     

  System
    Knowledge Cutoff Date: April 2024.    
    You are Granite, developed by IBM.    

  License
    Apache License               
    Version 2.0, January 2004

gemini free teir for as fallback mechanism . (only for tool calling)

gemini-2.0-flash
   Audio, images, videos, and text	Text, images (experimental), and audio (coming soon)	Next generation features, speed, thinking, realtime streaming, and     multimodal generation
gemini-2.0-flash-lite
   Audio, images, videos, and text	Text	A Gemini 2.0 Flash model optimized for cost efficiency and low latency
gemini-2.0-pro-exp-02-05
   Audio, images, videos, and text	Text	Our most powerful Gemini 2.0 model
gemini-1.5-flash
   Audio, images, videos, and text	Text	Fast and versatile performance across a diverse variety of tasks

Directory structure

โ”œโ”€โ”€ DATA
โ”‚   โ”œโ”€โ”€ KNOWLEDGEBASE
โ”‚   โ”‚   โ””โ”€โ”€ disaster_data_converted.md
โ”‚   โ”œโ”€โ”€ RAWKNOWLEDGEBASE
โ”‚   โ”‚   โ””โ”€โ”€ disaster_data.pdf
โ”‚   โ”œโ”€โ”€ email_schema.py
โ”‚   โ”œโ”€โ”€ msg.py
โ”‚   โ”œโ”€โ”€ phone_details.py
โ”‚   โ””โ”€โ”€ tools.py
โ”œโ”€โ”€ device_ips.txt
โ”œโ”€โ”€ main.py
โ”œโ”€โ”€ readme.md
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ src
    โ”œโ”€โ”€ BRAIN
    โ”‚   โ”œโ”€โ”€ RAG.py
    โ”‚   โ”œโ”€โ”€ func_call.py
    โ”‚   โ”œโ”€โ”€ gemini_llm.py
    โ”‚   โ”œโ”€โ”€ lm_ai.py
    โ”‚   โ””โ”€โ”€ text_to_info.py
    โ”œโ”€โ”€ CONVERSATION
    โ”‚   โ”œโ”€โ”€ speech_to_text.py
    โ”‚   โ”œโ”€โ”€ t_s.py
    โ”‚   โ”œโ”€โ”€ test_speech.py
    โ”‚   โ””โ”€โ”€ text_to_speech.py
    โ”œโ”€โ”€ FUNCTION
    โ”‚   โ”œโ”€โ”€ Email_send.py
    โ”‚   โ”œโ”€โ”€ adb_connect.bat
    โ”‚   โ”œโ”€โ”€ adb_connect.sh
    โ”‚   โ”œโ”€โ”€ app_op.py
    โ”‚   โ”œโ”€โ”€ get_env.py
    โ”‚   โ”œโ”€โ”€ greet_time.py
    โ”‚   โ”œโ”€โ”€ incog.py
    โ”‚   โ”œโ”€โ”€ internet_search.py
    โ”‚   โ”œโ”€โ”€ link_op.py
    โ”‚   โ”œโ”€โ”€ news.py
    โ”‚   โ”œโ”€โ”€ phone_call.py
    โ”‚   โ”œโ”€โ”€ random_respon.py
    โ”‚   โ”œโ”€โ”€ run_function.py
    โ”‚   โ”œโ”€โ”€ weather.py
    โ”‚   โ””โ”€โ”€ youtube_downloader.py
    โ”œโ”€โ”€ KEYBOARD
    โ”‚   โ”œโ”€โ”€ key_lst.py
    โ”‚   โ””โ”€โ”€ key_prs_lst.py
    โ””โ”€โ”€ VISION
        โ””โ”€โ”€ eye.py

11 directories, 40 files

๐Ÿ’ป Installation

1๏ธโƒฃ Clone the Repository

 git clone https://github.com/ganeshnikhil/J.A.R.V.I.S.2.0.git
 cd J.A.R.V.I.S.2.0

2๏ธโƒฃ Install Dependencies

 pip install -r requirements.txt

๐Ÿš€ Running the Application

Start the Program

 python main.py

๐Ÿ“ข Initial Interaction:

[= =] Say 'hey jarvis' to activate, and 'stop' to deactivate. Say 'exit' to quit.

๐Ÿ”„ Function Calling Methods

๐Ÿ”น Primary: Gemini AI-Based Function Execution

๐Ÿš€ Transitioned to Gemini AI-powered function calling, allowing multiple function calls simultaneously for better efficiency! โš™๏ธ If Gemini AI fails to generate function calls, the system automatically falls back to an Ollama-based model for reliable execution.ย 

๐Ÿ”น AI Model Used: Gemini AI ๐Ÿง 
โœ… Higher accuracy โœ… Structured data processing โœ… Reliable AI-driven interactions

๐Ÿ“Œ Command Parsing ๐Ÿ“œ

response = gemini_generate_function_call(command)
response_dic = parse_tool_call(response)

๐Ÿ“Œ Dynamic Function Execution ๐Ÿ”„

if response_dic:
    func_name = response_dic["name"]
    response = execute_function_call(response_dic)

๐Ÿ“Œ Error Handling & Fallback to Ollama ๐Ÿ›‘

try:
    response = execute_function_call(response_dic)
except Exception as e:
    print(f"Error in Gemini AI function execution: {e}")
    print("Falling back to Ollama-based function execution...")
    response = ollama_generate_function_call(command)

๐Ÿ“Œ Retry Mechanism ๐Ÿ”„

def send_to_ai_with_retry(prompt, retries=3, delay=2):
    for _ in range(retries):
        try:
            return send_to_gemini(prompt)
        except Exception:
            time.sleep(delay)
    print("Gemini AI is not responding. Switching to Ollama...")
    return send_to_ollama(prompt)

๐Ÿ“– RAG-Based Knowledge System

๐Ÿ’ก Retrieval-Augmented Generation (RAG) dynamically loads relevant markdown-based knowledge files based on the queried topic, reducing hallucinations and improving response accuracy.


๐Ÿ“ฑ ADB Integration for Phone Automation

๐Ÿ”น Integrated Android Debug Bridge (ADB) to enable voice-controlled phone automation! ๐ŸŽ™๏ธ

โœ… Make phone calls โ˜Ž๏ธ
โœ… Open apps & toggle settings ๐Ÿ“ฒ
โœ… Access phone data & remote operations ๐Ÿ› ๏ธ

Setting Up ADB

๐Ÿ“Œ Windows

winget install --id=Google.AndroidSDKPlatformTools -e

๐Ÿ“Œ Linux

sudo apt install adb

๐Ÿ“Œ Mac

brew install android-platform-tools

๐Ÿ”ฎ Future Enhancements

โœจ Deeper mobile integration ๐Ÿ“ฑ
โœจ Advanced AI-driven automation ๐Ÿค–
โœจ Improved NLP-based command execution ๐Ÿง 
โœจ Multi-modal interactions (text + voice + image) ๐Ÿ–ผ๏ธ

๐Ÿš€ Stay tuned for future updates! ๐Ÿ”ฅ

## Gemini Model Comparison

The following table provides a comparison of various Gemini models with respect to their rate limits:

| Model                                      | RPM  |    TPM    |  RPD  |
|-------------------------------------       |-----:|----------:| -----:|
| **Gemini 2.0 Flash**                       |  15  | 1,000,000 | 1,500 |
| **Gemini 2.0 Flash-Lite Preview**          |  30  | 1,000,000 | 1,500 |
| **Gemini 2.0 Pro Experimental 02-05**      |   2  | 1,000,000 |   50  |
| **Gemini 2.0 Flash Thinking Experimental** |  10  | 4,000,000 | 1,500 |
| **Gemini 1.5 Flash**                       |  15  | 1,000,000 | 1,500 |
| **Gemini 1.5 Flash-8B**                    |  15  | 1,000,000 | 1,500 |
| **Gemini 1.5 Pro**                         |   2  |   32,000  |   50  |
| **Imagen 3**                               |  --  |    --     |  --   |

Explanation:

  • RPM: Requests per minute
  • TPM: Tokens per minute
  • RPD: Requests per day
The focus of project is mostly on using small model and free (api)  models , get accurate agentic behaviours , to run these on low spec systems to.

About

open source assistant using small models (2b - 5b) , with agentic and tool calling capabilities and integration of RAG with effiecient memory.android support using adb

Topics

Resources

Stars

Watchers

Forks