Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gemini Integration w/ changes #250

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Conversation

SujalThapaK
Copy link

@SujalThapaK SujalThapaK commented Feb 10, 2025

Added Google Gemini Integration for Potpie.

  • Integration can be tested by setting the Provider to "google".
  • langchain_google_genai package was added to requirements
  • Provides a cheap way to run and test Potpie

(Changes requested in the last PR have been addressed)

Summary by CodeRabbit

  • New Features
     • Integrated support for a Google AI provider, giving users additional model options.
     • Re-enabled the Potpie API endpoint for streamlined service interaction.

  • Chores
     • Updated environment settings to include new variables for Google integration.
     • Updated installation settings to optimize performance and reliability.
     • Refined background processing configurations for smoother task handling.
     • Added entries to .gitignore to exclude specific files from version control.

Copy link
Contributor

coderabbitai bot commented Feb 10, 2025

Warning

Rate limit exceeded

@SujalThapaK has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 12 minutes and 35 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 5daf09d and 44b9372.

📒 Files selected for processing (1)
  • app/modules/key_management/secret_manager.py (2 hunks)

Walkthrough

This pull request introduces multiple modifications across the project. Updates include adding new environment variables for Gemini integration in the .env.template and new ignore rules in the .gitignore file. The Celery configuration is revised by removing the queue_prefix parameter, hardcoding the queue name, and enhancing logging in parsing tasks. The Potpie API router is reintroduced to the FastAPI application. Additionally, the AI provider service and key management modules now support a new Google provider, a dependency is added in requirements.txt, and start.sh is updated for dependency installation and variable naming.

Changes

File(s) Change Summary
.env.template Added Gemini integration variables (GOOGLE_API_KEY, GOOGLE_IDENTITY_TOOL_KIT_KEY, GOOGLE_APPLICATION_CREDENTIALS) after POTPIE_PLUS_HMAC_KEY.
.gitignore Added entries for potpievenv and service_account.json.
app/celery/{celery_app.py, tasks/parsing_tasks.py} Updated Celery configuration: removed queue_prefix parameter, hardcoded queue name ("dev_process_repository"), imported Queue from kombu, added task queues, and enhanced logging (set logger.propagate to True and logged user_id).
app/main.py Reintroduced import and inclusion of potpie_api_router with prefix /api/v2 and tag ["Potpie API"].
app/modules/intelligence/provider/{provider_service.py} Integrated Google AI provider: added ProviderInfo, updated MODEL_CONFIGS, and modified methods to initialize and determine provider names, including setting "gemini-1.5-flash".
app/modules/key_management/{secret_manager.py, secrets_schema.py} Expanded provider support to include google; updated get_secret_id logic and modified schema validation to require Google API keys to start with "AIza".
requirements.txt Added dependency langchain-google-genai==2.0.9.
start.sh Added --use-deprecated=legacy-resolver to the pip install command and updated the queue variable from CELERY_QUEUE_NAME to Celery_QUEUE_NAME.

Suggested reviewers

  • dhirenmathur

Poem

I'm a bunny hopping through code, light and free,
New variables and queues join in perfect harmony.
Google magic now enriches our provider flow,
Secrets and logging aligned in a steady row.
With ASCII cheers and whiskers aglow, I celebrate each change! 🐰


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (3)
app/modules/key_management/secret_manager.py (3)

188-189: Update provider literal in delete_secret.

The delete_secret method's provider literal should also include "google" for consistency.

-    provider: Literal["openai", "anthropic", "deepseek", "all"],
+    provider: Literal["openai", "anthropic", "deepseek", "google", "all"],
🧰 Tools
🪛 Ruff (0.8.2)

189-189: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


196-196: Add Google to provider_list.

Update the provider_list to include "google" when deleting all secrets.

-            provider_list = ["openai", "anthropic", "deepseek"]
+            provider_list = ["openai", "anthropic", "deepseek", "google"]

48-60:

🧩 Analysis chain

Verify Gemini integration requirements.

Please confirm the following:

  1. Is "google" the correct provider name for Gemini integration? Some implementations use "gemini" or "google-ai" instead.
  2. Are there any specific format requirements for Google API keys that need validation?

Let's check the official documentation:


🌐 Web query:

What is the correct provider name for Google Gemini API integration with LangChain?

💡 Result:

To integrate Google's Gemini API with LangChain, the correct provider name is **`langchain-google-genai`**. This package provides classes like `ChatGoogleGenerativeAI` and `GoogleGenerativeAIEmbeddings` for direct interaction with Gemini models via an API key[1][2][7][8][10].

Key details:
- **Installation**:  
  ```bash
  pip install langchain-google-genai
  ```
- **Usage**:  
  ```python
  from langchain_google_genai import ChatGoogleGenerativeAI
  llm = ChatGoogleGenerativeAI(model="gemini-pro", temperature=0.9)
  response = llm.invoke("Your prompt here")
  ```

This differs from Google Cloud Vertex AI integrations (`langchain-google-vertexai`), which require cloud credentials instead of a simple API key[3][10]. For most developers starting with Gemini API, `langchain-google-genai` is recommended[10].

Citations:

- 1: https://www.andela.com/blog-posts/langchain-and-google-gemini-api-for-ai-apps-a-quickstart-guide
- 2: https://python.langchain.com/api_reference/google_genai/
- 3: https://python.langchain.com/docs/integrations/llms/google_vertex_ai_palm/
- 4: https://github.com/shamspias/langchain-gemini-api/blob/main/README.md
- 5: https://www.youtube.com/watch?v=_GEitI9PU28
- 6: https://discuss.ai.google.dev/t/does-google-ai-studio-work-with-langchain/64052
- 7: https://python.langchain.com/docs/integrations/chat/google_generative_ai/
- 8: https://dev.to/timesurgelabs/how-to-use-googles-gemini-pro-with-langchain-1eje
- 9: https://github.com/google-gemini/cookbook/blob/main/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb
- 10: https://python.langchain.com/docs/integrations/providers/google/

Update provider name to match Gemini integration requirements.

  • The official documentation indicates that for Gemini integration with LangChain, the correct provider name is "langchain-google-genai" rather than "google".
  • There are no specific API key format requirements mentioned; the key is expected to be a standard API key string.
🧰 Tools
🪛 Ruff (0.8.2)

50-50: Use capitalized environment variable ISDEVELOPMENTMODE instead of isDevelopmentMode

(SIM112)

🧹 Nitpick comments (5)
app/celery/celery_app.py (1)

40-42: Consider environment-specific queue naming.

While hardcoding the queue name simplifies development in WSL, consider making this more flexible for different environments.

-# Hardcoded queue name to "dev" (Allows for easier development in WSL;) - Sujal
-def configure_celery():
-    queue_name = os.getenv("CELERY_QUEUE_NAME", "dev_process_repository")
+def configure_celery():
+    env = os.getenv("ENV", "development")
+    queue_name = os.getenv("CELERY_QUEUE_NAME", f"{env}_process_repository")
app/celery/tasks/parsing_tasks.py (1)

48-48: Consider structured logging for user ID.

While logging the user ID improves traceability, consider using structured logging for better log parsing.

-        logger.info(f"Given userID is: {user_id}")
+        logger.info("Processing task for user", extra={"user_id": user_id})
app/modules/intelligence/provider/provider_service.py (2)

1-1: Remove unused import.

The lru_cache import is not used in the code.

-from functools import lru_cache
🧰 Tools
🪛 Ruff (0.8.2)

1-1: functools.lru_cache imported but unused

Remove unused import: functools.lru_cache

(F401)


238-243: Consider adding max_tokens parameter.

The Google provider initialization might benefit from a max_tokens parameter like other providers.

 if provider == "google":
     common_params.update(
         {
             "google_api_key": api_key,
+            "max_tokens": 8000,
         }
     )
start.sh (1)

36-38: Legacy Resolver Flag in pip Install Command
The pip install command now uses the "--use-deprecated=legacy-resolver" flag to improve startup time. This is acceptable as a temporary measure for performance, but consider reassessing its necessity when a newer pip version or a more permanent solution becomes available.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b0e1086 and ef51679.

📒 Files selected for processing (10)
  • .env.template (1 hunks)
  • .gitignore (1 hunks)
  • app/celery/celery_app.py (3 hunks)
  • app/celery/tasks/parsing_tasks.py (2 hunks)
  • app/main.py (1 hunks)
  • app/modules/intelligence/provider/provider_service.py (7 hunks)
  • app/modules/key_management/secret_manager.py (2 hunks)
  • app/modules/key_management/secrets_schema.py (2 hunks)
  • requirements.txt (1 hunks)
  • start.sh (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • requirements.txt
🧰 Additional context used
🪛 Shellcheck (0.10.0)
start.sh

[warning] 50-50: Celery_QUEUE_NAME is referenced but not assigned.

(SC2154)

🪛 Ruff (0.8.2)
app/modules/intelligence/provider/provider_service.py

1-1: functools.lru_cache imported but unused

Remove unused import: functools.lru_cache

(F401)

🔇 Additional comments (13)
app/modules/key_management/secrets_schema.py (2)

9-9: LGTM! Provider type updated for Google integration.

The Literal type has been correctly extended to include the "google" provider.


42-44: Verify Google API key format validation.

The validation checks for the "AIza" prefix, which is standard for Google API keys. However, let's verify if there are any additional format requirements.

✅ Verification successful

🌐 Web query:

What is the format of Google Gemini API keys?

💡 Result:

Google Gemini API keys are **alphanumeric strings** generated by Google AI Studio, typically resembling a long sequence of characters like `AIzaSyB...` (truncated for security). While the exact character count and internal structure aren’t publicly documented in the provided sources[1][2][4][8], they follow standard Google Cloud API key conventions[8], which are designed to be cryptographically secure and globally unique.

Key characteristics:  
- **No fixed visual pattern**: Unlike some API keys with segmented formats (e.g., `xxxx-xxxx-xxxx`), Gemini keys appear as continuous strings[2][8].  
- **Case-sensitive**: Likely include both uppercase and lowercase letters[8].  
- **Generated exclusively via Google AI Studio**: No manual formatting is required or possible[1][4][5].  

Example usage in code (Python):  
```python
from google import genai
genai.configure(api_key="AIzaSyB...")  # Placeholder for actual key[3]
```

Security considerations:  
- Never expose keys in client-side code or public repositories[2][8].  
- Store them in environment variables or secure secret managers[2][5].  
- Regenerate compromised keys immediately via Google AI Studio[8].  

For verification, you can test your key with a curl command:  
```bash
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key=${GEMINI_API_KEY}" ... 
```  
A valid key returns structured JSON responses, while invalid ones trigger authentication errors[2][6].

Citations:

- 1: https://www.youtube.com/watch?v=I0LMGNAfA1k
- 2: https://ai.google.dev/gemini-api/docs/api-key
- 3: https://ai.google.dev/gemini-api/docs
- 4: https://www.merge.dev/blog/gemini-api-key
- 5: https://docs.aicontentlabs.com/articles/google-gemini-api-key/
- 6: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference
- 7: https://developers.google.com/idx/guides/build-gemini-api-app
- 8: https://cloud.google.com/api-keys/docs/overview?hl=en
- 9: https://zapier.com/blog/gemini-api/

Google API key validation is appropriate as implemented.

Based on the web query, Google Gemini API keys do indeed start with "AIza" and are generally alphanumeric without a fixed segmentation pattern. The current logic (checking that the key begins with "AIza") aligns with these characteristics. While there isn’t a publicly disclosed fixed length or additional static format beyond the prefix, this check serves as a valid initial validation for keys.

  • The check for the "AIza" prefix matches the commonly observed format for Gemini API keys.
  • No further formatting rules (e.g., length validation) are publicly documented, so the current validation is sufficient for preventing obvious format errors.
app/celery/celery_app.py (1)

54-56: LGTM! Explicit queue configuration added.

The explicit task_queues configuration using kombu.Queue is a good practice for clear queue definition.

app/celery/tasks/parsing_tasks.py (1)

13-14: LGTM! Enhanced logging configuration for async processes.

Setting logger.propagate = True is appropriate for background and async processes, ensuring logs are properly captured in the parent logger hierarchy.

app/main.py (1)

24-24: LGTM! Potpie API router reintroduced with v2 prefix.

The router is correctly configured with appropriate prefix and tags. The v2 prefix suggests this is a new version of the API, which is a good practice for versioning.

Also applies to: 118-119

app/modules/intelligence/provider/provider_service.py (3)

56-60: LGTM! Provider info added correctly.

The Google provider information is properly added with a clear description.


286-288: LGTM! Provider name handling is complete.

The provider name detection is properly implemented for both ChatGoogleGenerativeAI instances and model names containing "gemini".

Also applies to: 295-296


144-159: Verify Gemini model names.

The model configurations use "gemini-1.5-flash" and "gemini-1.5-pro". Let's verify these are the correct model names.

✅ Verification successful

🌐 Web query:

What are the available Gemini model names in Google's Generative AI?

💡 Result:

Google's Generative AI offers multiple Gemini models across different versions and optimization levels. Here's a breakdown of available model names as of February 2025:

## Core Model Families
**Gemini 2.x Series**  
- `gemini-2.0-flash` (General Availability)  
- `gemini-2.0-pro` (Experimental)  
- `gemini-2.0-flash-thinking` (Experimental)  
- `gemini-2.0-flash-lite` (Public Preview)  

**Gemini 1.x Series**  
- `gemini-1.5-flash` (Auto-updated version)  
    - Stable: `gemini-1.5-flash-002`[4]  
    - Preview: `gemini-1.5-flash-preview-MMDD`[4]  
    
- `gemini-1.5-pro` (Auto-updated version)  
    - Stable: `gemini-1.5-pro-002`[4][6]  
    - Preview: `gemini-1.5-pro-preview-MMDD`[4]  

## Specialized Variants
**Legacy Models**  
- `gemini-1.0-pro-vision` (Multimodal text+image processing)[4]  
- `gemini-1.0-pro` (Deprecated after Feb 15, 2025)[6]  

**Embeddings & QA**  
- `text-embedding-004` (Text embeddings)[6]  
- `aqa` (Source-grounded answers)[6]  

## Versioning System
Three naming conventions exist:  
| Type | Example | Characteristics |  
|------|---------|------------------|  
| Stable | `gemini-x.x-pro-002` | Fixed capabilities |  
| Auto-updated | `gemini-x.x-pro` | Always latest stable |  
| Preview | `gemini-x.x-pro-preview-0409` | Cutting-edge features |  

For agentic AI workflows requiring multimodal streaming or tool integration, Gemini 2.x models are recommended[2][10]. Developers should use auto-updated versions (`gemini-x.x-pro`) unless requiring specific feature locking via stable versions[4].

Citations:

- 1: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models
- 2: https://www.techtarget.com/whatis/feature/Google-Gemini-20-explained-Everything-you-need-to-know
- 3: https://ai.google/get-started/our-models/
- 4: https://firebase.google.com/docs/vertex-ai/gemini-models
- 5: https://pypi.org/project/google-generativeai/
- 6: https://ai.google.dev/gemini-api/docs/models/gemini
- 7: https://en.wikipedia.org/wiki/Gemini_(language_model)
- 8: https://zapier.com/blog/google-gemini/
- 9: https://www.ibm.com/think/topics/google-gemini
- 10: https://deepmind.google/technologies/gemini/

Verification of Gemini Model Names

Based on the latest information from Google’s Generative AI documentation, the model names "gemini-1.5-flash" and "gemini-1.5-pro" in your configuration represent the auto-updated versions of the Gemini 1.x series. These are valid and match the available auto-updated naming convention as of February 2025.

  • The auto-updated naming (without additional suffixes) is intended to always track the latest stable release.
  • If a specific stable version is required, you might consider using the suffixes (e.g., "gemini-1.5-flash-002" or "gemini-1.5-pro-002").

For the current implementation, the Gemini model names used are correct.

app/modules/key_management/secret_manager.py (2)

48-49: LGTM! Google provider support added correctly.

The secret manager properly handles Google provider secrets with consistent naming convention.

Also applies to: 58-60


58-59: LGTM! The Google provider implementation follows the established pattern.

The secret ID format for the Google provider maintains consistency with other providers.

.gitignore (1)

5-5: LGTM! Appropriate files ignored.

Good addition of:

  • potpievenv: Project-specific virtual environment
  • service_account.json: Google service account credentials (sensitive)

Also applies to: 7-7

.env.template (1)

30-33: LGTM! Google environment variables added with clear documentation.

Good addition of required Google environment variables with helpful comment about service account credentials.

start.sh (1)

50-50: Celery Queue Environment Variable Check
The variable name has been updated from "CELERY_QUEUE_NAME" to "Celery_QUEUE_NAME". Note that shellcheck warns this variable is referenced but not assigned. Ensure that "Celery_QUEUE_NAME" is defined—likely via the sourced .env file—or add an explicit check in the script to prevent runtime errors.
For instance, you could add the following snippet before its usage:

+if [ -z "${Celery_QUEUE_NAME}" ]; then
+  echo "Error: Celery_QUEUE_NAME environment variable not set."
+  exit 1
+fi
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 50-50: Celery_QUEUE_NAME is referenced but not assigned.

(SC2154)

app/modules/key_management/secret_manager.py Show resolved Hide resolved
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
app/modules/key_management/secret_manager.py (1)

191-203: Update provider list in delete_secret method.

The provider list for bulk deletion needs to include "google" to maintain consistency with the new provider support.

Apply this diff:

    if provider == "all":
-        provider_list = ["openai", "anthropic", "deepseek"]
+        provider_list = ["openai", "anthropic", "deepseek", "google"]
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef51679 and 5daf09d.

📒 Files selected for processing (1)
  • app/modules/key_management/secret_manager.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
app/modules/key_management/secret_manager.py

48-48: SyntaxError: Expected an indented block after function definition

🔇 Additional comments (2)
app/modules/key_management/secret_manager.py (2)

138-140: Update provider Literal type in get_secret method.

The provider Literal type needs to be updated to include "google" for consistency.

Apply this diff:

    @staticmethod
    def get_secret(
-        provider: Literal["openai", "anthropic", "deepseek"], customer_id: str
+        provider: Literal["openai", "anthropic", "deepseek", "google"], customer_id: str
    ):

61-62: LGTM! Google provider support is well implemented.

The implementation follows the established pattern for secret ID generation and maintains consistency with other providers.

app/modules/key_management/secret_manager.py Outdated Show resolved Hide resolved
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants