Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

运行scrapy crawler woaidu之后,卡住不动了 #14

Open
MRLuowen opened this issue Jul 10, 2014 · 5 comments
Open

运行scrapy crawler woaidu之后,卡住不动了 #14

MRLuowen opened this issue Jul 10, 2014 · 5 comments

Comments

@MRLuowen
Copy link

/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:12: ScrapyDeprecationWarning: woaidu_crawler.spiders.woaidu_detail_spider.WoaiduSpider inherits from deprecated class scrapy.spider.BaseSpider, please inherit from scrapy.spider.Spider. (warning only on first subclass, there may be others)
class WoaiduSpider(BaseSpider):
/usr/local/lib/python2.7/dist-packages/scrapy/contrib/pipeline/init.py:21: ScrapyDeprecationWarning: ITEM_PIPELINES defined as a list or a set is deprecated, switch to a dict
category=ScrapyDeprecationWarning, stacklevel=1)
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:19: ScrapyDeprecationWarning: scrapy.selector.HtmlXPathSelector is deprecated, instantiate scrapy.Selector instead.
response_selector = HtmlXPathSelector(response)
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:20: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
next_link = list_first_item(response_selector.select(u'//div[@Class="k2"]/div/a[text()="下一页"]/@href').extract())
/usr/local/lib/python2.7/dist-packages/scrapy/selector/unified.py:106: ScrapyDeprecationWarning: scrapy.selector.HtmlXPathSelector is deprecated, instantiate scrapy.Selector instead.
for x in result]
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:25: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
for detail_link in response_selector.select(u'//div[contains(@Class,"sousuolist")]/a/@href').extract():
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:33: ScrapyDeprecationWarning: scrapy.selector.HtmlXPathSelector is deprecated, instantiate scrapy.Selector instead.
response_selector = HtmlXPathSelector(response)
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:34: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
woaidu_item['book_name'] = list_first_item(response_selector.select('//div[@Class="zizida"][1]/text()').extract())
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:35: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
woaidu_item['author'] = [list_first_item(response_selector.select('//div[@Class="xiaoxiao"][1]/text()').extract())[5:].strip(),]
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:36: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
woaidu_item['book_description'] = list_first_item(response_selector.select('//div[@Class="lili"][1]/text()').extract()).strip()
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:37: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
woaidu_item['book_covor_image_url'] = list_first_item(response_selector.select('//div[@Class="hong"][1]/img/@src').extract())
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:40: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
for i in response_selector.select('//div[contains(@Class,"xiazai_xiao")]')[1:]:
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:46: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
list_first_item(i.select('./div')[0].select('./a/@href').extract()),
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:47: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
list_first_item(i.select('./div')[1].select('./a/@href').extract())
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:52: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
download_item['progress'] = list_first_item(i.select('./div')[2].select('./text()').extract())
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:53: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
download_item['update_time'] = list_first_item(i.select('./div')[3].select('./text()').extract())
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:56: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
list_first_item(i.select('./div')[4].select('./a/text()').extract()),
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:57: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
list_first_item(i.select('./div')[4].select('./a/@href').extract())\

@TylerzhangZC
Copy link

请问后来如何解决的?有方案吗?

@zhuang1992
Copy link

I have the same problems.

@eyrelzy
Copy link

eyrelzy commented Mar 18, 2015

image
iders.woaidu_detail_spider.WoaiduSpider inherits from deprecated class scrapy.spider.BaseSpider, please inherit from scrapy.spider.Spider. (warning only on first subclass, there may be others)
class WoaiduSpider(BaseSpider):
卡在这里不执行了,有解决方案么?

@TylerzhangZC
Copy link

follow this changelist,sync the code,it will be work normally:
https://github.com/gnemoug/distribute_crawler/pull/5/files

@georgezouq
Copy link

@TylerzhangZC I change to branch pr/5 and run it,It still has the error:

Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 150, in _run_command
    cmd.run(args, opts)
  File "/Library/Python/2.7/site-packages/scrapy/commands/crawl.py", line 57, in run
    self.crawler_process.crawl(spname, **opts.spargs)
  File "/Library/Python/2.7/site-packages/scrapy/crawler.py", line 153, in crawl
    d = crawler.crawl(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/twisted/internet/defer.py", line 1274, in unwindGenerator
    return _inlineCallbacks(None, gen, Deferred())
--- <exception caught here> ---
  File "/Library/Python/2.7/site-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks
    result = g.send(result)
  File "/Library/Python/2.7/site-packages/scrapy/crawler.py", line 71, in crawl
    self.engine = self._create_engine()
  File "/Library/Python/2.7/site-packages/scrapy/crawler.py", line 83, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "/Library/Python/2.7/site-packages/scrapy/core/engine.py", line 69, in __init__
    self.scraper = Scraper(crawler)
  File "/Library/Python/2.7/site-packages/scrapy/core/scraper.py", line 70, in __init__
    self.itemproc = itemproc_cls.from_crawler(crawler)
  File "/Library/Python/2.7/site-packages/scrapy/middleware.py", line 56, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "/Library/Python/2.7/site-packages/scrapy/middleware.py", line 32, in from_settings
    mwcls = load_object(clspath)
  File "/Library/Python/2.7/site-packages/scrapy/utils/misc.py", line 44, in load_object
    mod = import_module(module)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
  File "/Users/georgezou/Documents/Coding/github/distribute_crawler/woaidu_crawler/woaidu_crawler/pipelines/cover_image.py", line 7, in <module>
    from scrapy.contrib.pipeline.images import ImagesPipeline
  File "/Library/Python/2.7/site-packages/scrapy/contrib/pipeline/images.py", line 7, in <module>
    from scrapy.pipelines.images import *
  File "/Library/Python/2.7/site-packages/scrapy/pipelines/images.py", line 15, in <module>
    from PIL import Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants