Scraper

Scraper is a cute HTML screen-scraping tool.

require 'scraper'
require 'open-uri'

class BlogScraper < Scraper
  element :title
  
  elements 'div.hentry' => :articles do
    element 'h2' => :title
    element 'a/@href' => :url
  end
end

blog = BlogScraper.parse open('http://example.com')

blog.title
#=> "My blog title"

blog.articles.first.title
#=> "First article title"

blog.articles.first.url
#=> "http://example.com/article"

There are sample scripts in the "examples/" directory; run them with:

ruby -rubygems examples/<script>.rb

See the wiki for more on how to use Scraper.

Requirements

None. Well, Nokogiri is a requirement if you pass in HTML content that needs to be parsed, like in the example above. Otherwise you can initialize the scaper with an Hpricot document or anything else that implements at(selector) and search(selector) methods.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
examples		examples
README.md		README.md
Rakefile		Rakefile
scraper.rb		scraper.rb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scraper

Requirements

About

Releases

Packages

Wobot/scraper-1

Folders and files

Latest commit

History

Repository files navigation

Scraper

Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages