-
Notifications
You must be signed in to change notification settings - Fork 995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
咱们这个爬虫的原理是什么呢 #6
Comments
原理就是模拟人工点击。 流程就是这样:
中间用了消息队列,所以可以做到分布式 和 高并发 2 “去搜狗微信爬取” 的过程 1) 从代理池,获取一个代理ip |
哦 还是从搜狗爬去啊 |
棒棒的! |
hi: 谢谢! |
@ichenfujun 我在淘宝上买的动态vps,搭建的代理。不过成本略高,效果也不太好。 建议被封后,触发本地自动拨号换ip,感觉更靠谱一些。 |
您好 我想问一下架构图中mongoDB是用Redis代替了吧?Redis和MySQL是什么样的关系 我看要爬取的公众号和关键字还是存到了本地的M有SQL中 |
@liuyang66 现在所有数据都存储在mysql中,没有用mongodb。 redis只是用来做消息队列用的。 |
@yijingping 你好,想问下,数据库可以替换成Mongo吗 |
@Chuenfai 不支持 |
核心原理是走的什么接口呢
The text was updated successfully, but these errors were encountered: