Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
165 changes: 165 additions & 0 deletions Ronak_Sarvaya/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
# Google Search Scraper - Ronak Sarvaya

A robust Selenium-based web scraper for extracting Google search results.

## Features

- ✅ Modern Selenium 4+ implementation
- ✅ Automatic ChromeDriver management (no manual driver download needed)
- ✅ Extracts title, URL, and snippet for each result
- ✅ Saves results to CSV file with timestamp
- ✅ Configurable headless/visible browser mode
- ✅ Proper error handling and explicit waits
- ✅ Anti-detection measures (user-agent, automation flags)

## Installation

### Prerequisites
- Python 3.7 or higher
- Google Chrome browser installed

### Install Required Packages

```bash
pip install selenium webdriver-manager
```

Or install from requirements file:

```bash
pip install -r requirements.txt
```

## Usage

### Basic Usage

Run the scraper with default settings:

```bash
python my_scraper.py
```

### Customize Search Query

Edit the `main()` function in `my_scraper.py`:

```python
# Configuration
SEARCH_QUERY = "Your search query here" # Change this
NUM_RESULTS = 10 # Number of results to scrape
HEADLESS_MODE = True # False to see browser window
```

### Use as a Module

```python
from my_scraper import GoogleScraper

# Create scraper instance
scraper = GoogleScraper(headless=True)

# Scrape Google
results = scraper.scrape(
query="Python tutorials",
num_results=15,
save_csv=True
)

# Access results
for result in results:
print(f"Title: {result['title']}")
print(f"URL: {result['url']}")
print(f"Snippet: {result['snippet']}")
```

## Output

The scraper generates a CSV file with the following columns:
- **rank**: Position in search results (1-based)
- **title**: Page title
- **url**: Page URL
- **snippet**: Description/snippet from search results

Output filename format: `google_results_YYYYMMDD_HHMMSS.csv`

## Configuration Options

### GoogleScraper Class

```python
scraper = GoogleScraper(headless=True)
```

- `headless` (bool): Run browser in headless mode (default: True)

### scrape() Method

```python
results = scraper.scrape(query, num_results=10, save_csv=True)
```

- `query` (str): Search query string
- `num_results` (int): Maximum number of results to extract (default: 10)
- `save_csv` (bool): Save results to CSV file (default: True)

## Troubleshooting

### ChromeDriver Issues
The scraper uses `webdriver-manager` to automatically download and manage ChromeDriver. If you encounter issues:

```bash
pip install --upgrade webdriver-manager
```

### Import Errors
Make sure all dependencies are installed:

```bash
pip install selenium webdriver-manager
```

### No Results Found
- Check your internet connection
- Try running in non-headless mode to see what's happening
- Google may be blocking automated requests - try adding delays

## Example Output

```
============================================================
Google Search Scraper - Ronak Sarvaya
============================================================
✅ Chrome WebDriver initialized successfully
🔍 Searching Google for: 'Python programming tutorials'
✅ Search completed successfully
📊 Found 10 search results
✓ Extracted result #1: Python Tutorial - W3Schools...
✓ Extracted result #2: Learn Python Programming...
...
✅ Successfully extracted 10 results
✅ Results saved to: Ronak_Sarvaya/google_results_20240115_143022.csv
✅ Browser closed

============================================================
SEARCH RESULTS
============================================================

[1] Python Tutorial - W3Schools
URL: https://www.w3schools.com/python/
Snippet: Well organized and easy to understand Web building tutorials...
------------------------------------------------------------
...
```

## Notes

- Respect Google's Terms of Service
- Use reasonable delays between requests
- Consider using Google's official APIs for production use
- This scraper is for educational purposes

## Author

**Ronak Sarvaya**
GC-Internship Project
10 changes: 10 additions & 0 deletions Ronak_Sarvaya/google_results_20251116_111743.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
rank,title,url,snippet
1,Python Tutorial,https://www.w3schools.com/python/,"This Python tutorial covers file handling, database handling, exercises, and examples. It also includes a ""Try it Yourself"" editor."
2,Python Tutorial,https://www.geeksforgeeks.org/python/python-programming-language-tutorial/,"5 Nov 2025 — In this section, we'll cover the basics of Python programming, including installing Python, writing first program, understanding comments and working with ..."
3,The Python Tutorial,https://docs.python.org/3/tutorial/index.html,"This tutorial is for new Python programmers, introduces basic concepts, and helps you write Python modules and programs, but is not comprehensive."
4,"Python Full Course for Beginners [2025]
YouTube · Programming with Mosh
12 Feb 2025",https://www.youtube.com/watch?v=K5KVEU3aaeQ,No description available
5,Python Tutorial,https://www.tutorialspoint.com/python/index.htm,"This Python tutorial is for beginners to learn basic to advanced concepts of Python, a popular, general-purpose, interpreted, object-oriented language."
6,Learn Python - Free Interactive Python Tutorial,https://www.learnpython.org/,"This free tutorial offers interactive coding challenges, videos, and covers topics like variables, lists, loops, functions, and more. It is for everyone."
7,Python for Beginners (Full Course),https://www.youtube.com/playlist?list=PLu0W_9lII9agwh1XjRt242xIpHhPT2llg,Introduction to Programming & Python | Python Tutorial - Day #1 · Some Amazing Python Programs - The Power of Python | Python Tutorial - Day #2 · Modules and Pip ...
Loading