- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.9k
feat: Enhanced adaptive speaker scraper with merged strategy #1016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Closed
      
      
            LYordanovClearware
  wants to merge
  13
  commits into
  ScrapeGraphAI:main
from
LYordanovClearware:feature/enhanced-adaptive-scraper
  
      
      
   
      
    
                
     Closed
            
            feat: Enhanced adaptive speaker scraper with merged strategy #1016
                    LYordanovClearware
  wants to merge
  13
  commits into
  ScrapeGraphAI:main
from
LYordanovClearware:feature/enhanced-adaptive-scraper
  
      
      
   
              
            Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    - Fixed model ID bug (strip openai/ prefix) - Made max_tokens configurable for image extraction - Enhanced screenshot scrolling to capture full pages - Merged SmartScraperGraph + ScreenshotScraperGraph results - Added hallucination filter for fake speakers - Improved prompt to work with OpenAI content policies - Added lazy-load scrolling support (timeout-based) - Created FastAPI backend with web UI - Added Excel export with metadata 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
tiktoken==0.7.0 requires Rust compiler on Python 3.13 (no prebuilt wheels). Using Python 3.11 to ensure smooth deployment on Streamlit Cloud. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Streamlit Cloud may not recognize .streamlit/config.toml python version. Using .python-version file as fallback to force Python 3.11. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Trying runtime.txt as Streamlit Cloud standard for Python version. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Changed requires-python from '>=3.10,<4.0' to '>=3.10,<3.13' This forces Streamlit Cloud to use Python 3.12 or below, which has prebuilt tiktoken wheels (no Rust compiler needed). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Installing rust-all system package to compile tiktoken on Python 3.13 if pyproject.toml constraint doesn't force earlier Python version. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Fixed incorrect import in generate_code_node.py that was causing ModuleNotFoundError. langchain_classic doesn't exist, should be langchain_community. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
The code uses langchain_classic but it wasn't in dependencies. Added langchain-classic>=1.0.0 to pyproject.toml and reverted generate_code_node.py to use langchain_classic (the correct import). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
langchain_classic is bundled inside langchain starting from version 1.0.0. Removed separate langchain-classic dependency and bumped langchain min version. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
… issue CodeGeneratorGraph requires langchain_classic which has packaging issues. Since we don't use CodeGeneratorGraph for speaker scraping, commenting it out is the simplest workaround. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Added langchain-classic as explicit dependency to fix import errors on Streamlit Cloud deployment. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Added try/except block to gracefully fall back to langchain.output_parsers if langchain_classic is not available. This ensures compatibility across different deployment environments. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Added fallback to load OPENAI_API_KEY from Streamlit secrets for hosted deployments. Also added langchain-classic to requirements.txt. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
              
                    VinciGit00
  
              
              requested changes
              
                  
                    Oct 22, 2025 
                  
              
              
            
            
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove outputs and add it as .gitignore?
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      Labels
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
🤖 Generated with Claude Code