Fix/issue 147 security upload validation#148
Fix/issue 147 security upload validation#148kallal79 wants to merge 14 commits intoKathiraveluLab:devfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
/gemini review |
|
@kallal79 there are several merge conflicts. Please attend to them and to the bot comments as appropriate. |
There was a problem hiding this comment.
Code Review
This pull request introduces several security and robustness improvements, including mandatory production SECRET_KEY checks, request size limits, and enhanced file upload handling with unique naming and extension filtering. It also refactors heavy machine learning dependencies (like torch and transformers) to be optional using try...except blocks to support lightweight environments. Feedback suggests that moving lazy imports from within functions to the module level would be more efficient and recommends using img.load() for more thorough image validation during uploads.
dreamsApp/app/dashboard/main.py
Outdated
| import matplotlib.pyplot as plt | ||
| import numpy as np | ||
| import pandas as pd |
There was a problem hiding this comment.
Importing heavy libraries like matplotlib, numpy, and pandas inside a route handler is inefficient as these imports are evaluated on every request. While Python caches modules in sys.modules, the overhead of the import statement and local variable assignment still exists. A better approach for optional dependencies is to use the try...except pattern at the module level (as seen in fl_worker.py), which keeps the route handlers clean and performant.
dreamsApp/app/dashboard/main.py
Outdated
| """Refactor: Helper to generate base64 encoded word cloud image.""" | ||
| if not keywords: | ||
| return None | ||
| from wordcloud import WordCloud |
There was a problem hiding this comment.
dreamsApp/app/ingestion/routes.py
Outdated
| with Image.open(image_path) as img: | ||
| img.verify() |
There was a problem hiding this comment.
While img.verify() is a good first step for validating an uploaded image, it only checks the image header and does not decode the actual pixel data. This means it may not detect all forms of file corruption or certain types of malicious payloads embedded in the image data. For more robust validation, consider calling img.load() or performing a simple transformation to ensure the image can be fully decoded.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces robust error handling for optional heavy dependencies (such as PyTorch, Transformers, Spacy, and HDBSCAN) across the application, allowing it to run in lightweight environments. It also adds security configurations for file uploads and secret keys. While the changes significantly improve resilience, I have flagged concerns regarding the eager loading of Spacy and SentenceTransformer models at the module level, which may still cause startup failures, and the use of JSON error responses in browser-facing routes.
dreamsApp/app/ingestion/routes.py
Outdated
|
|
||
| logger = logging.getLogger(__name__) | ||
| model = SentenceTransformer("all-MiniLM-L6-V2") | ||
| model = SentenceTransformer("all-MiniLM-L6-V2") if SentenceTransformer else None |
There was a problem hiding this comment.
Instantiating SentenceTransformer at the module level will cause the application to crash during startup if the model cannot be loaded (e.g., due to missing local files or lack of internet access for the initial download). To support the 'lightweight environment' goal mentioned in your comments, this model should be initialized lazily within a helper function or the route itself, similar to the pattern used in dreamsApp/core/embeddings.py.
dreamsApp/core/extra/keywords.py
Outdated
| except ImportError: # pragma: no cover - optional in lightweight test envs | ||
| SentenceTransformer = None | ||
|
|
||
| nlp = spacy.load("en_core_web_sm") if spacy is not None else None |
There was a problem hiding this comment.
Loading the Spacy model at the module level will cause an OSError if the en_core_web_sm model data is not present, even if the spacy package is installed. This prevents the application from starting in environments where these heavy assets are intentionally omitted. Consider loading the model lazily inside extract_keywords_and_vectors or wrapping this call in a try-except block to handle missing models gracefully.
dreamsApp/app/dashboard/main.py
Outdated
| if plt is None or np is None or pd is None: | ||
| return jsonify({'error': 'Plotting dependencies are not installed'}), 500 |
There was a problem hiding this comment.
The profile route is a page-level route that normally returns an HTML response via render_template. Returning a JSON response here when dependencies are missing is inconsistent with the expected response type for a browser-initiated request. This will cause the user to see raw JSON text instead of a styled error page. Consider rendering a dedicated error template or using Flask's flash message system with a redirect to the main dashboard.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request makes heavy machine learning and data science dependencies optional across the application, enabling it to run in lightweight environments. It also introduces several security improvements, such as enforcing a secure SECRET_KEY in production, limiting request body sizes via MAX_CONTENT_LENGTH, and implementing image validation for uploads. A security concern was identified in the image ingestion route, where failing to catch RuntimeError (specifically DecompressionBombError) could leave the server vulnerable to Denial of Service attacks.
dreamsApp/app/ingestion/routes.py
Outdated
| img.verify() | ||
| with Image.open(image_path) as img: | ||
| img.load() | ||
| except (UnidentifiedImageError, OSError, ValueError): |
There was a problem hiding this comment.
The current exception handling for image validation does not catch DecompressionBombError, which is a subclass of RuntimeError in modern Pillow versions. If a malicious user uploads an image designed to expand into a massive amount of memory (a "decompression bomb"), it could lead to a Denial of Service (DoS) by exhausting server memory. Additionally, since the os.remove call is inside the except block, the malicious file would remain on disk if this exception is raised.
| except (UnidentifiedImageError, OSError, ValueError): | |
| except (UnidentifiedImageError, OSError, ValueError, RuntimeError): |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces several security and robustness improvements, primarily focusing on making heavy machine learning and data science dependencies optional. This allows the application to run in lightweight environments by wrapping imports for libraries like torch, transformers, spacy, and matplotlib in try-except blocks and adding runtime checks. Additionally, the PR implements request body size limits, enforces secure secret keys in production, and adds thorough image validation during ingestion. Feedback is provided regarding the new CHIME analysis function to ensure it robustly handles varying output formats from HuggingFace pipelines.
dreamsApp/core/sentiment.py
Outdated
| results = classifier(text) | ||
| return max(results[0], key=lambda x: x["score"]) |
There was a problem hiding this comment.
The output format of HuggingFace pipelines can vary between [dict] and [[dict]] depending on the library version and task configuration. Additionally, calling max() on an empty list or a dictionary (if results[0] is a dict) will lead to a crash or incorrect results. It's safer to check the structure and use .get() for the score key to ensure robustness.
| results = classifier(text) | |
| return max(results[0], key=lambda x: x["score"]) | |
| results = classifier(text) | |
| # Handle both [dict] and [[dict]] return formats from the pipeline | |
| if results and isinstance(results[0], list): | |
| return max(results[0], key=lambda x: x.get("score", 0.0)) | |
| return max(results, key=lambda x: x.get("score", 0.0)) |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enhances the application's security and flexibility by implementing request size limits, enforcing secure production configurations, and making heavy machine learning dependencies optional for lightweight environments. Key improvements include image validation to prevent decompression bombs and the introduction of lazy loading for several models. Review feedback focuses on adopting timezone-aware UTC timestamps, strengthening image security checks, and addressing architectural redundancy in the CHIME classification logic to ensure compatibility with federated learning updates.
dreamsApp/app/ingestion/routes.py
Outdated
| import uuid | ||
| import threading | ||
| from concurrent.futures import ThreadPoolExecutor | ||
| from datetime import datetime |
There was a problem hiding this comment.
dreamsApp/app/ingestion/routes.py
Outdated
| @@ -94,32 +114,48 @@ def upload_post(): | |||
| timestamp = request.form.get('timestamp', datetime.now().isoformat()) | |||
There was a problem hiding this comment.
Using naive datetime.now() creates a timestamp based on the server's local time, which can lead to inconsistencies. It is recommended to use timezone-aware UTC timestamps.
| timestamp = request.form.get('timestamp', datetime.now().isoformat()) | |
| timestamp = request.form.get('timestamp', datetime.now(timezone.utc).isoformat()) |
| try: | ||
| # Verify container/header first, then fully decode pixel data. | ||
| with Image.open(image_path) as img: | ||
| img.verify() | ||
| with Image.open(image_path) as img: | ||
| img.load() | ||
| except (UnidentifiedImageError, OSError, ValueError, RuntimeError): | ||
| if os.path.exists(image_path): | ||
| os.remove(image_path) | ||
| return jsonify({'error': 'Uploaded file is not a valid image'}), 400 |
There was a problem hiding this comment.
To protect against decompression bomb attacks (where a small, highly compressed image expands to consume massive amounts of memory), it is recommended to check the image dimensions (img.width and img.height) against a safety threshold before calling img.load(). While MAX_CONTENT_LENGTH limits the file size, it does not limit the uncompressed pixel data size.
| try: | |
| # Verify container/header first, then fully decode pixel data. | |
| with Image.open(image_path) as img: | |
| img.verify() | |
| with Image.open(image_path) as img: | |
| img.load() | |
| except (UnidentifiedImageError, OSError, ValueError, RuntimeError): | |
| if os.path.exists(image_path): | |
| os.remove(image_path) | |
| return jsonify({'error': 'Uploaded file is not a valid image'}), 400 | |
| try: | |
| # Verify container/header first, then fully decode pixel data. | |
| with Image.open(image_path) as img: | |
| img.verify() | |
| with Image.open(image_path) as img: | |
| # Security: Prevent decompression bombs by checking dimensions. | |
| if img.width * img.height > 100_000_000: | |
| raise ValueError("Image dimensions exceed safety limits") | |
| img.load() | |
| except (UnidentifiedImageError, OSError, ValueError, RuntimeError): | |
| if os.path.exists(image_path): | |
| os.remove(image_path) | |
| return jsonify({'error': 'Uploaded file is not a valid image'}), 400 |
| def get_chime_classifier(self): | ||
| if self._chime_classifier is None: | ||
| try: | ||
| if pipeline is None: | ||
| raise RuntimeError("transformers is required for CHIME inference") | ||
| self._chime_classifier = pipeline( | ||
| "text-classification", | ||
| model=HF_MODEL_ID, | ||
| tokenizer=HF_MODEL_ID, | ||
| return_all_scores=True, | ||
| ) | ||
| except Exception as e: | ||
| logger.error(f"Error loading CHIME model: {e}") | ||
| return None | ||
| return self._chime_classifier |
There was a problem hiding this comment.
This implementation of get_chime_classifier is largely redundant with the one in dreamsApp/core/extra/advanced_sentiment.py. Crucially, it lacks the logic to detect and load locally updated models from the models/production_chime_model directory. This means improvements learned via federated learning will not be utilized by this analyzer. Consider centralizing the CHIME classification logic.
…, FL CHIME model path
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors the application to make heavy machine learning dependencies optional, allowing for execution in lightweight environments. It also introduces security enhancements such as production-only secret key enforcement, request size limits, and image validation to prevent decompression bombs. Feedback provides a more robust fallback implementation for the softmax function to support multi-dimensional arrays.
dreamsApp/core/sentiment.py
Outdated
| def softmax(x): | ||
| x = np.asarray(x) | ||
| e_x = np.exp(x - np.max(x)) | ||
| return e_x / e_x.sum(axis=0) |
There was a problem hiding this comment.
The fallback softmax implementation is correct for 1D arrays, which is how it is currently used. However, for broader compatibility with scipy.special.softmax, you might want to ensure it handles multi-dimensional inputs correctly if its usage expands.
| def softmax(x): | |
| x = np.asarray(x) | |
| e_x = np.exp(x - np.max(x)) | |
| return e_x / e_x.sum(axis=0) | |
| def softmax(x): | |
| x = np.asarray(x) | |
| e_x = np.exp(x - np.max(x, axis=-1, keepdims=True)) | |
| return e_x / e_x.sum(axis=-1, keepdims=True) |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enhances the application's security and modularity by implementing request size limits, production secret key enforcement, and file extension validation. It refactors the codebase to handle heavy machine learning dependencies optionally, allowing for execution in lightweight environments. Additionally, image ingestion is hardened with integrity checks and protection against decompression bombs. A review comment suggests optimizing the image validation logic by removing redundant method calls.
dreamsApp/app/ingestion/routes.py
Outdated
| # Verify container/header first, then fully decode pixel data. | ||
| with Image.open(image_path) as img: | ||
| img.verify() | ||
| with Image.open(image_path) as img: | ||
| # Security: Prevent decompression bombs by checking dimensions. | ||
| if img.width * img.height > 100_000_000: | ||
| raise ValueError("Image dimensions exceed safety limits") | ||
| img.load() |
There was a problem hiding this comment.
The image validation logic can be simplified by combining the two Image.open calls. img.verify() is redundant when img.load() is called immediately after, as load() will also fail if the image is corrupt or invalid. Additionally, checking dimensions before load() is a good practice to prevent decompression bombs.
# Security: Prevent decompression bombs and verify image integrity.
with Image.open(image_path) as img:
if img.width * img.height > 100_000_000:
raise ValueError("Image dimensions exceed safety limits")
# Fully decode pixel data to ensure the image is valid and not corrupt.
img.load()|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enhances the application's robustness and security by implementing request size limits, production environment checks for secret keys, and comprehensive image validation to prevent decompression bombs. A significant architectural change makes heavy machine learning dependencies optional through lazy loading and conditional checks, allowing the core application to function in restricted environments. Feedback was provided regarding the use of a deprecated parameter in the transformers library.
dreamsApp/core/sentiment.py
Outdated
| "text-classification", | ||
| model=model_path, | ||
| tokenizer=model_path, | ||
| return_all_scores=True, |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces several security and robustness improvements, including request body size limits, production secret key validation, and enhanced image upload verification to prevent decompression bombs. A significant portion of the changes focuses on making heavy machine learning and data processing dependencies (like torch, transformers, and matplotlib) optional, allowing the application to run in lightweight environments through lazy loading and conditional checks. Feedback was provided regarding the efficiency of the federated learning worker, specifically suggesting that the application instance be reused to avoid redundant and memory-intensive model loading.
dreamsApp/app/fl_worker.py
Outdated
| def run_federated_round(): | ||
| if torch is None or AutoModelForSequenceClassification is None or AutoTokenizer is None or AutoConfig is None: | ||
| logger.warning("Required ML dependencies are not installed; skipping federated round.") | ||
| return | ||
|
|
||
| app = create_app() | ||
| with app.app_context(): |
There was a problem hiding this comment.
Calling create_app() inside run_federated_round is inefficient when this function is triggered from an existing Flask application (e.g., via the dashboard). create_app() re-initializes the entire application, including loading heavy AI models in DreamsPipeline. This can lead to excessive memory usage as models are loaded multiple times in the background thread. Refactoring the function to accept an optional app instance allows reusing the existing application context and its resources.
| def run_federated_round(): | |
| if torch is None or AutoModelForSequenceClassification is None or AutoTokenizer is None or AutoConfig is None: | |
| logger.warning("Required ML dependencies are not installed; skipping federated round.") | |
| return | |
| app = create_app() | |
| with app.app_context(): | |
| def run_federated_round(app=None): | |
| if torch is None or AutoModelForSequenceClassification is None or AutoTokenizer is None or AutoConfig is None: | |
| logger.warning("Required ML dependencies are not installed; skipping federated round.") | |
| return | |
| if app is None: | |
| app = create_app() | |
| with app.app_context(): |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enhances the application's security and flexibility by introducing configuration for content length limits, production secret key enforcement, and image upload validation to prevent decompression bombs. A major architectural change involves wrapping heavy machine learning dependencies—such as torch, transformers, and spacy—in try-except blocks and implementing lazy loading to ensure the core application remains functional in environments where these libraries are absent. Additionally, the PR includes a compatibility shim for location extraction and improves test resilience by handling missing MongoDB instances. I have no feedback to provide.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request improves application robustness and security by adding configuration for content limits and secure keys, validating uploaded images against decompression bombs, and making heavy ML dependencies optional through try-except blocks and runtime checks. It also implements lazy loading for the core pipeline and models. Feedback suggests centralizing duplicated CHIME classification logic and replacing hardcoded magic numbers with named constants.
| def get_chime_classifier(self): | ||
| if self._chime_classifier is None: | ||
| try: | ||
| if pipeline is None: | ||
| raise RuntimeError("transformers is required for CHIME inference") | ||
| model_path = HF_MODEL_ID | ||
|
|
||
| # Prefer locally learned FL model when available. | ||
| try: | ||
| from flask import has_app_context, current_app | ||
| if has_app_context(): | ||
| local_model_path = os.path.join( | ||
| current_app.root_path, | ||
| "models", | ||
| "production_chime_model", | ||
| ) | ||
| if os.path.exists(local_model_path): | ||
| logger.info( | ||
| ">>> SELF-CORRECTION: Learned model found at %s. Loading...", | ||
| local_model_path, | ||
| ) | ||
| model_path = local_model_path | ||
| except RuntimeError: | ||
| # No active Flask app context (e.g., tests/CLI) | ||
| pass | ||
|
|
||
| self._chime_classifier = pipeline( | ||
| "text-classification", | ||
| model=model_path, | ||
| tokenizer=model_path, | ||
| top_k=None, | ||
| ) | ||
| except Exception as e: | ||
| logger.error(f"Error loading CHIME model: {e}") | ||
| return None | ||
| return self._chime_classifier |
There was a problem hiding this comment.
The CHIME classification logic, including model path resolution and pipeline initialization, is duplicated between SentimentAnalyzer in dreamsApp/core/sentiment.py and AdvancedSentimentAnalyzer in dreamsApp/core/extra/advanced_sentiment.py. This redundancy increases the maintenance burden and can lead to inconsistent behavior if one implementation is updated but the other is not. Consider centralizing this logic into a single shared component or utility module.
dreamsApp/app/ingestion/routes.py
Outdated
| try: | ||
| # Security: Prevent decompression bombs and verify image integrity. | ||
| with Image.open(image_path) as img: | ||
| if img.width * img.height > 100_000_000: |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces several security and architectural improvements, primarily focused on making heavy machine learning dependencies optional to support lightweight environments. Key changes include the enforcement of MAX_CONTENT_LENGTH, production-only SECRET_KEY validation, and enhanced image upload security through decompression bomb protection and extension whitelisting. Additionally, the PR refactors CHIME classification logic and implements lazy loading for various models. Feedback identifies a critical thread-safety issue in the lazy initialization of NLP models within keywords.py and points out code duplication between the core and extra sentiment analysis modules that should be consolidated.
| _nlp = None | ||
| _keyword_model = None | ||
|
|
||
| model = SentenceTransformer("all-MiniLM-L6-v2") | ||
|
|
||
| def _get_nlp(): | ||
| global _nlp | ||
| if spacy is None: | ||
| return None | ||
| if _nlp is None: | ||
| try: | ||
| _nlp = spacy.load("en_core_web_sm") | ||
| except OSError as e: | ||
| logger.warning("spaCy model en_core_web_sm is unavailable: %s", e) | ||
| _nlp = None | ||
| except Exception as e: | ||
| logger.warning("Failed to initialize spaCy: %s", e) | ||
| _nlp = None | ||
| return _nlp | ||
|
|
||
|
|
||
| def _get_keyword_model(): | ||
| global _keyword_model | ||
| if SentenceTransformer is None: | ||
| return None | ||
| if _keyword_model is None: | ||
| try: | ||
| _keyword_model = SentenceTransformer("all-MiniLM-L6-v2") | ||
| except Exception as e: | ||
| logger.warning("Failed to initialize keyword embedding model: %s", e) | ||
| _keyword_model = None | ||
| return _keyword_model |
There was a problem hiding this comment.
The lazy initialization of the spacy model in _get_nlp() and the SentenceTransformer model in _get_keyword_model() is not thread-safe. In a multi-threaded environment (like a production Flask server), multiple threads could enter the if _nlp is None: or if _keyword_model is None: block simultaneously, leading to a race condition where the model might be loaded multiple times. This can cause unnecessary memory usage and potential errors.
To fix this, you should use a threading.Lock to protect the initialization block for each model, similar to the pattern used in dreamsApp/app/ingestion/routes.py for _get_location_model().
You will also need to add import threading at the top of the file.
_nlp = None
_nlp_lock = threading.Lock()
_keyword_model = None
_keyword_model_lock = threading.Lock()
def _get_nlp():
global _nlp
if spacy is None:
return None
with _nlp_lock:
if _nlp is None:
try:
_nlp = spacy.load("en_core_web_sm")
except OSError as e:
logger.warning("spaCy model en_core_web_sm is unavailable: %s", e)
_nlp = None
except Exception as e:
logger.warning("Failed to initialize spaCy: %s", e)
_nlp = None
return _nlp
def _get_keyword_model():
global _keyword_model
if SentenceTransformer is None:
return None
with _keyword_model_lock:
if _keyword_model is None:
try:
_keyword_model = SentenceTransformer("all-MiniLM-L6-v2")
except Exception as e:
logger.warning("Failed to initialize keyword embedding model: %s", e)
_keyword_model = None
return _keyword_model| def get_chime_classifier(self): | ||
| self._chime_classifier = init_chime_classifier( | ||
| self._chime_classifier, | ||
| pipeline, | ||
| HF_MODEL_ID, | ||
| logger, | ||
| ) | ||
| return self._chime_classifier | ||
|
|
||
| def analyze_chime(self, text: str): | ||
| if text is None or not text.strip(): | ||
| return {"label": "Uncategorized", "score": 0.0} | ||
|
|
||
| classifier = self.get_chime_classifier() | ||
| if classifier is None: | ||
| return {"label": "Uncategorized", "score": 0.0} | ||
|
|
||
| try: | ||
| results = classifier(text) | ||
| return pick_top_chime_result(results) | ||
| except Exception as e: | ||
| logger.error(f"CHIME inference error: {e}") | ||
| return {"label": "Uncategorized", "score": 0.0} |
There was a problem hiding this comment.
This PR introduces CHIME analysis logic (get_chime_classifier and analyze_chime) into SentimentAnalyzer. However, identical logic already exists in dreamsApp/core/extra/advanced_sentiment.py. This code duplication makes the codebase harder to maintain, as changes would need to be applied in two places.
It seems the intent is to promote CHIME analysis to a core feature. To resolve this, please remove the CHIME-related methods from AdvancedSentimentAnalyzer in dreamsApp/core/extra/advanced_sentiment.py and update any code that uses it (e.g., dreamsApp/app/ingestion/routes.py) to import from dreamsApp.core.sentiment instead.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enhances security by enforcing secure configuration in production and implementing image validation to prevent decompression bombs. It also refactors the application to make heavy machine learning dependencies optional, improving portability. Review feedback identifies a potential TOCTOU vulnerability in the image upload process and suggests refining exception handling for optional imports to prevent masking underlying issues.
| filename = secure_filename(image.filename) | ||
| allowed = current_app.config.get('ALLOWED_EXTENSIONS', {'png', 'jpg', 'jpeg', 'gif'}) | ||
| if '.' not in filename or filename.rsplit('.', 1)[1].lower() not in allowed: | ||
| return jsonify({'error': 'Unsupported file extension'}), 400 | ||
|
|
||
| unique_filename = f"{uuid.uuid4().hex}_{filename}" | ||
| upload_path = current_app.config['UPLOAD_FOLDER'] | ||
| image_path = os.path.join(upload_path, unique_filename) | ||
| image.save(image_path) | ||
|
|
||
| # Delegate the heavy AI extraction sequence to the global pipeline | ||
| try: | ||
| # Security: Prevent decompression bombs and verify image integrity. | ||
| with Image.open(image_path) as img: | ||
| if img.width * img.height > MAX_IMAGE_PIXELS: | ||
| raise ValueError("Image dimensions exceed safety limits") | ||
| # Fully decode pixel data to ensure image is valid and not corrupt. | ||
| img.load() | ||
| except (UnidentifiedImageError, OSError, ValueError, RuntimeError): | ||
| if os.path.exists(image_path): | ||
| os.remove(image_path) | ||
| return jsonify({'error': 'Uploaded file is not a valid image'}), 400 |
There was a problem hiding this comment.
For enhanced security, it's better to validate the uploaded image before saving it to the disk. The current implementation saves the file first, then validates it, which creates a small window where a potentially malicious file exists on the server (a Time-of-check to time-of-use, or TOCTOU, vulnerability). You can validate the image directly from the in-memory stream provided by Flask's request.files. If the image is valid, you can then save it. This avoids writing unvalidated user content to the filesystem.
filename = secure_filename(image.filename)
allowed = current_app.config.get('ALLOWED_EXTENSIONS', {'png', 'jpg', 'jpeg', 'gif'})
if '.' not in filename or filename.rsplit('.', 1)[1].lower() not in allowed:
return jsonify({'error': 'Unsupported file extension'}), 400
try:
# Security: Prevent decompression bombs and verify image integrity from stream.
with Image.open(image) as img:
if img.width * img.height > MAX_IMAGE_PIXELS:
raise ValueError("Image dimensions exceed safety limits")
# Fully decode pixel data to ensure image is valid and not corrupt.
img.load()
# Image is valid, now save it.
image.seek(0) # Rewind stream after reading
unique_filename = f"{uuid.uuid4().hex}_{filename}"
upload_path = current_app.config['UPLOAD_FOLDER']
image_path = os.path.join(upload_path, unique_filename)
image.save(image_path)
except (UnidentifiedImageError, OSError, ValueError, RuntimeError):
return jsonify({'error': 'Uploaded file is not a valid image'}), 400
dreamsApp/app/dashboard/main.py
Outdated
|
|
||
| try: | ||
| from dreamsApp.core.extra.llms import generate | ||
| except Exception: # pragma: no cover - optional in lightweight environments |
There was a problem hiding this comment.
Catching a generic Exception is too broad and can mask other underlying issues during the import process. It's better to be more specific. For consistency with other optional imports in this file and across the project, you should catch ImportError. This ensures you're only handling cases where the module or its dependencies are missing, without accidentally suppressing other unexpected errors.
| except Exception: # pragma: no cover - optional in lightweight environments | |
| except ImportError: # pragma: no cover - optional in lightweight environments |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces significant improvements to the application's robustness by making heavy AI and ML dependencies optional. It implements lazy loading and graceful error handling for dependencies like PyTorch, Transformers, spaCy, and HDBSCAN, allowing the application to run in lightweight environments. Additionally, it adds security enhancements, including request body size limits, file extension validation, and image integrity checks during uploads. The CHIME sentiment analysis logic has been refactored into a dedicated module to improve maintainability, and test suites have been updated to handle missing database connections gracefully. I have no feedback to provide as the changes are well-structured and improve system reliability.
|
HI Sir @pradeeban , Please review PR #148 .. |
ayusrjn
left a comment
There was a problem hiding this comment.
Thank you, @kallal79, for your contribution.
@pradeeban, I've tested the changes, and everything looks good to merge.
No description provided.