Skip to content

Commit 4c1c214

Browse files
authored
Crawlera → Zyte Smart Proxy Manager (#97)
2 parents 622ab6a + 6bb0bf9 commit 4c1c214

File tree

14 files changed

+389
-335
lines changed

14 files changed

+389
-335
lines changed

.bumpversion.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@ tag_name = v{new_version}
66

77
[bumpversion:file:setup.py]
88

9-
[bumpversion:file:scrapy_crawlera/__init__.py]
9+
[bumpversion:file:scrapy_zyte_smartproxy/__init__.py]

README.rst

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,21 @@
1-
===============
2-
scrapy-crawlera
3-
===============
1+
======================
2+
scrapy-zyte-smartproxy
3+
======================
44

5-
.. image:: https://img.shields.io/pypi/v/scrapy-crawlera.svg
6-
:target: https://pypi.python.org/pypi/scrapy-crawlera
5+
.. image:: https://img.shields.io/pypi/v/scrapy-zyte-smartproxy.svg
6+
:target: https://pypi.python.org/pypi/scrapy-zyte-smartproxy
77
:alt: PyPI Version
88

9-
.. image:: https://travis-ci.org/scrapy-plugins/scrapy-crawlera.svg?branch=master
10-
:target: http://travis-ci.org/scrapy-plugins/scrapy-crawlera
9+
.. image:: https://travis-ci.org/scrapy-plugins/scrapy-zyte-smartproxy.svg?branch=master
10+
:target: http://travis-ci.org/scrapy-plugins/scrapy-zyte-smartproxy
1111
:alt: Build Status
1212

13-
.. image:: http://codecov.io/github/scrapy-plugins/scrapy-crawlera/coverage.svg?branch=master
14-
:target: http://codecov.io/github/scrapy-plugins/scrapy-crawlera?branch=master
13+
.. image:: http://codecov.io/github/scrapy-plugins/scrapy-zyte-smartproxy/coverage.svg?branch=master
14+
:target: http://codecov.io/github/scrapy-plugins/scrapy-zyte-smartproxy?branch=master
1515
:alt: Code Coverage
1616

17-
scrapy-crawlera provides easy use of `Crawlera <http://scrapinghub.com/crawlera>`_ with Scrapy.
17+
scrapy-zyte-smartproxy provides easy use of `Zyte Smart Proxy Manager
18+
<https://www.zyte.com/smart-proxy-manager/>`_ (formerly Crawlera) with Scrapy.
1819

1920
Requirements
2021
============
@@ -25,12 +26,13 @@ Requirements
2526
Installation
2627
============
2728

28-
You can install scrapy-crawlera using pip::
29+
You can install scrapy-zyte-smartproxy using pip::
2930

30-
pip install scrapy-crawlera
31+
pip install scrapy-zyte-smartproxy
3132

3233

3334
Documentation
3435
=============
3536

36-
Documentation is available online at https://scrapy-crawlera.readthedocs.io/ and in the ``docs`` directory.
37+
Documentation is available online at
38+
https://scrapy-zyte-smartproxy.readthedocs.io/ and in the ``docs`` directory.

docs/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
# You can set these variables from the command line.
55
SPHINXOPTS =
66
SPHINXBUILD = sphinx-build
7-
SPHINXPROJ = scrapy-crawlera
7+
SPHINXPROJ = scrapy-zyte-smartproxy
88
SOURCEDIR = .
99
BUILDDIR = _build
1010

docs/conf.py

Lines changed: 18 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# -*- coding: utf-8 -*-
22
#
3-
# scrapy-crawlera documentation build configuration file, created by
3+
# scrapy-zyte-smartproxy documentation build configuration file, created by
44
# sphinx-quickstart on Sat Jan 21 13:17:41 2017.
55
#
66
# This file is execfile()d with the current directory set to its
@@ -54,19 +54,19 @@
5454
master_doc = 'index'
5555

5656
# General information about the project.
57-
project = u'scrapy-crawlera'
58-
copyright = u'2011-2017, Scrapinghub'
59-
author = u'Scrapinghub'
57+
project = u'scrapy-zyte-smartproxy'
58+
copyright = u'2011-2021, Zyte Group Ltd'
59+
author = u'Zyte'
6060

6161
# The version info for the project you're documenting, acts as replacement for
6262
# |version| and |release|, also used in various other places throughout the
6363
# built documents.
6464
#
6565

6666
try:
67-
import scrapy_crawlera
68-
version = '.'.join(scrapy_crawlera.__version__.split('.')[:2])
69-
release = scrapy_crawlera.__version__
67+
import scrapy_zyte_smartproxy
68+
version = '.'.join(scrapy_zyte_smartproxy.__version__.split('.')[:2])
69+
release = scrapy_zyte_smartproxy.__version__
7070
except ImportError:
7171
version = ''
7272
release = ''
@@ -111,7 +111,7 @@
111111
# -- Options for HTMLHelp output ------------------------------------------
112112

113113
# Output file base name for HTML help builder.
114-
htmlhelp_basename = 'scrapy-crawleradoc'
114+
htmlhelp_basename = 'scrapy-zyte-smartproxydoc'
115115

116116

117117
# -- Options for LaTeX output ---------------------------------------------
@@ -138,8 +138,13 @@
138138
# (source start file, target name, title,
139139
# author, documentclass [howto, manual, or own class]).
140140
latex_documents = [
141-
(master_doc, 'scrapy-crawlera.tex', u'scrapy-crawlera Documentation',
142-
u'Scrapinghub', 'manual'),
141+
(
142+
master_doc,
143+
'scrapy-zyte-smartproxy.tex',
144+
u'scrapy-zyte-smartproxy Documentation',
145+
u'Zyte',
146+
'manual',
147+
),
143148
]
144149

145150

@@ -148,7 +153,7 @@
148153
# One entry per manual page. List of tuples
149154
# (source start file, name, description, authors, manual section).
150155
man_pages = [
151-
(master_doc, 'scrapy-crawlera', u'scrapy-crawlera Documentation',
156+
(master_doc, 'scrapy-zyte-smartproxy', u'scrapy-zyte-smartproxy Documentation',
152157
[author], 1)
153158
]
154159

@@ -159,8 +164,8 @@
159164
# (source start file, target name, title, author,
160165
# dir menu entry, description, category)
161166
texinfo_documents = [
162-
(master_doc, 'scrapy-crawlera', u'scrapy-crawlera Documentation',
163-
author, 'scrapy-crawlera', 'One line description of project.',
167+
(master_doc, 'scrapy-zyte-smartproxy', u'scrapy-zyte-smartproxy Documentation',
168+
author, 'scrapy-zyte-smartproxy', 'One line description of project.',
164169
'Miscellaneous'),
165170
]
166171

docs/index.rst

Lines changed: 25 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,12 @@
1-
=======================================
2-
scrapy-crawlera |version| documentation
3-
=======================================
1+
==============================================
2+
scrapy-zyte-smartproxy |version| documentation
3+
==============================================
44

5-
scrapy-crawlera is a Scrapy `Downloader Middleware <https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#downloader-middleware>`_
6-
to interact with `Crawlera <http://scrapinghub.com/crawlera>`_ automatically.
5+
scrapy-zyte-smartproxy is a `Scrapy downloader middleware`_ to interact with
6+
`Zyte Smart Proxy Manager`_ (formerly Crawlera) automatically.
7+
8+
.. _Scrapy downloader middleware: https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
9+
.. _Zyte Smart Proxy Manager: https://www.zyte.com/smart-proxy-manager/
710

811
Configuration
912
=============
@@ -12,32 +15,32 @@ Configuration
1215
:caption: Configuration
1316

1417

15-
* Add the Crawlera middleware including it into the ``DOWNLOADER_MIDDLEWARES`` in your ``settings.py`` file::
18+
* Add the Zyte Smart Proxy Manager middleware including it into the ``DOWNLOADER_MIDDLEWARES`` in your ``settings.py`` file::
1619

1720
DOWNLOADER_MIDDLEWARES = {
1821
...
19-
'scrapy_crawlera.CrawleraMiddleware': 610
22+
'scrapy_zyte_smartproxy.ZyteSmartProxyMiddleware': 610
2023
}
2124

2225
* Then there are two ways to enable it
2326

2427
* Through ``settings.py``::
2528

26-
CRAWLERA_ENABLED = True
27-
CRAWLERA_APIKEY = 'apikey'
29+
ZYTE_SMARTPROXY_ENABLED = True
30+
ZYTE_SMARTPROXY_APIKEY = 'apikey'
2831

2932
* Through spider attributes::
3033

3134
class MySpider:
32-
crawlera_enabled = True
33-
crawlera_apikey = 'apikey'
35+
zyte_smartproxy_enabled = True
36+
zyte_smartproxy_apikey = 'apikey'
3437

3538

36-
* (optional) If you are not using the default Crawlera proxy (``http://proxy.crawlera.com:8010``),
39+
* (optional) If you are not using the default Zyte Smart Proxy Manager proxy (``http://proxy.zyte.com:8011``),
3740
for example if you have a dedicated or private instance,
38-
make sure to also set ``CRAWLERA_URL`` in ``settings.py``, e.g.::
41+
make sure to also set ``ZYTE_SMARTPROXY_URL`` in ``settings.py``, e.g.::
3942

40-
CRAWLERA_URL = 'http://myinstance.crawlera.com:8010'
43+
ZYTE_SMARTPROXY_URL = 'http://myinstance.zyte.com:8011'
4144

4245
How to use it
4346
=============
@@ -52,8 +55,8 @@ How to use it
5255
All configurable Scrapy Settings added by the Middleware.
5356

5457

55-
With the middleware, the usage of crawlera is automatic, every request will go through crawlera without nothing to worry about.
56-
If you want to *disable* crawlera on a specific Request, you can do so by updating `meta` with `dont_proxy=True`::
58+
With the middleware, the usage of Zyte Smart Proxy Manager is automatic, every request will go through Zyte Smart Proxy Manager without nothing to worry about.
59+
If you want to *disable* Zyte Smart Proxy Manager on a specific Request, you can do so by updating `meta` with `dont_proxy=True`::
5760

5861

5962
scrapy.Request(
@@ -65,11 +68,11 @@ If you want to *disable* crawlera on a specific Request, you can do so by updati
6568
)
6669

6770

68-
Remember that you are now making requests to Crawlera, and the Crawlera service will be the one actually making the requests to the different sites.
71+
Remember that you are now making requests to Zyte Smart Proxy Manager, and the Zyte Smart Proxy Manager service will be the one actually making the requests to the different sites.
6972

70-
If you need to specify special `Crawlera Headers <https://doc.scrapinghub.com/crawlera.html#request-headers>`_, just apply them as normal `Scrapy Headers <https://doc.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.headers>`_.
73+
If you need to specify special `Zyte Smart Proxy Manager headers <https://docs.zyte.com/smart-proxy-manager.html#request-headers>`_, just apply them as normal `Scrapy headers <https://doc.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.headers>`_.
7174

72-
Here we have an example of specifying a Crawlera header into a Scrapy request::
75+
Here we have an example of specifying a Zyte Smart Proxy Manager header into a Scrapy request::
7376

7477
scrapy.Request(
7578
'http://example.com',
@@ -82,8 +85,8 @@ Here we have an example of specifying a Crawlera header into a Scrapy request::
8285
Remember that you could also set which headers to use by default by all
8386
requests with `DEFAULT_REQUEST_HEADERS <http://doc.scrapy.org/en/1.0/topics/settings.html#default-request-headers>`_
8487

85-
.. note:: Crawlera headers are removed from requests when the middleware is activated but Crawlera
86-
is disabled. For example, if you accidentally disable Crawlera via ``crawlera_enabled = False``
88+
.. note:: Zyte Smart Proxy Manager headers are removed from requests when the middleware is activated but Zyte Smart Proxy Manager
89+
is disabled. For example, if you accidentally disable Zyte Smart Proxy Manager via ``zyte_smartproxy_enabled = False``
8790
but keep sending ``X-Crawlera-*`` headers in your requests, those will be removed from the
8891
request headers.
8992

@@ -99,4 +102,4 @@ All the rest
99102
news
100103

101104
:doc:`news`
102-
See what has changed in recent scrapy-crawlera versions.
105+
See what has changed in recent scrapy-zyte-smartproxy versions.

docs/news.rst

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,35 @@
33
Changes
44
=======
55

6+
v2.0.0 (2021-05-NN)
7+
-------------------
8+
9+
Following the upstream rebranding of Crawlera as Zyte Smart Proxy Manager,
10+
``scrapy-crawlera`` has been renamed as ``scrapy-zyte-smartproxy``, with the
11+
following backward-incompatible changes:
12+
13+
- The repository name and Python Package Index (PyPI) name are now
14+
``scrapy-zyte-smartproxy``.
15+
16+
- Setting prefixes have switched from ``CRAWLERA_`` to ``ZYTE_SMARTPROXY_``.
17+
18+
- Spider attribute prefixes and request meta key prefixes have switched from
19+
``crawlera_`` to ``zyte_smartproxy_``.
20+
21+
- ``scrapy_crawlera`` is now ``scrapy_zyte_smartproxy``.
22+
23+
- ``CrawleraMiddleware`` is now ``ZyteSmartProxyMiddleware``, and its default
24+
``url`` is now ``http://proxy.zyte.com:8011``.
25+
26+
- Stat prefixes have switched from ``crawlera/`` to ``zyte_smartproxy/``.
27+
28+
- The online documentation is moving to
29+
https://scrapy-zyte-smartproxy.readthedocs.io/
30+
31+
.. note:: Zyte Smart Proxy Manager headers continue to use the ``X-Crawlera-``
32+
prefix.
33+
34+
635
v1.7.2 (2020-12-01)
736
-------------------
837
- Use request.meta than response.meta in the middleware

docs/settings.rst

Lines changed: 31 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -2,76 +2,77 @@
22
Settings
33
========
44

5-
This Middleware adds some settings to configure how to work with Crawlera.
5+
This Scrapy downloader middleware adds some settings to configure how to work
6+
with Zyte Smart Proxy Manager.
67

7-
CRAWLERA_APIKEY
8-
---------------
8+
ZYTE_SMARTPROXY_APIKEY
9+
----------------------
910

1011
Default: ``None``
1112

12-
Unique Crawlera API Key provided for authentication.
13+
Unique Zyte Smart Proxy Manager API key provided for authentication.
1314

14-
CRAWLERA_URL
15-
------------
15+
ZYTE_SMARTPROXY_URL
16+
-------------------
1617

17-
Default: ``'http://proxy.crawlera.com:8010'``
18+
Default: ``'http://proxy.zyte.com:8011'``
1819

19-
Crawlera instance url, it varies depending on adquiring a private or dedicated instance. If Crawlera didn't provide
20-
you with a private instance url, you don't need to specify it.
20+
Zyte Smart Proxy Manager instance URL, it varies depending on adquiring a private or dedicated instance. If Zyte Smart Proxy Manager didn't provide
21+
you with a private instance URL, you don't need to specify it.
2122

22-
CRAWLERA_MAXBANS
23-
----------------
23+
ZYTE_SMARTPROXY_MAXBANS
24+
-----------------------
2425

2526
Default: ``400``
2627

27-
Number of consecutive bans from Crawlera necessary to stop the spider.
28+
Number of consecutive bans from Zyte Smart Proxy Manager necessary to stop the spider.
2829

29-
CRAWLERA_DOWNLOAD_TIMEOUT
30-
-------------------------
30+
ZYTE_SMARTPROXY_DOWNLOAD_TIMEOUT
31+
--------------------------------
3132

3233
Default: ``190``
3334

34-
Timeout for processing Crawlera requests. It overrides Scrapy's ``DOWNLOAD_TIMEOUT``.
35+
Timeout for processing Zyte Smart Proxy Manager requests. It overrides Scrapy's ``DOWNLOAD_TIMEOUT``.
3536

36-
CRAWLERA_PRESERVE_DELAY
37-
-----------------------
37+
ZYTE_SMARTPROXY_PRESERVE_DELAY
38+
------------------------------
3839

3940
Default: ``False``
4041

4142
If ``False`` Sets Scrapy's ``DOWNLOAD_DELAY`` to ``0``, making the spider to crawl faster. If set to ``True``, it will
4243
respect the provided ``DOWNLOAD_DELAY`` from Scrapy.
4344

44-
CRAWLERA_DEFAULT_HEADERS
45-
------------------------
45+
ZYTE_SMARTPROXY_DEFAULT_HEADERS
46+
-------------------------------
4647

4748
Default: ``{}``
4849

49-
Default headers added only to crawlera requests. Headers defined on ``DEFAULT_REQUEST_HEADERS`` will take precedence as long as the ``CrawleraMiddleware`` is placed after the ``DefaultHeadersMiddleware``. Headers set on the requests have precedence over the two settings.
50+
Default headers added only to Zyte Smart Proxy Manager requests. Headers defined on ``DEFAULT_REQUEST_HEADERS`` will take precedence as long as the ``ZyteSmartProxyMiddleware`` is placed after the ``DefaultHeadersMiddleware``. Headers set on the requests have precedence over the two settings.
5051

51-
* This is the default behavior, ``DefaultHeadersMiddleware`` default priority is ``400`` and we recommend ``CrawleraMiddleware`` priority to be ``610``
52+
* This is the default behavior, ``DefaultHeadersMiddleware`` default priority is ``400`` and we recommend ``ZyteSmartProxyMiddleware`` priority to be ``610``
5253

53-
CRAWLERA_BACKOFF_STEP
54-
-----------------------
54+
ZYTE_SMARTPROXY_BACKOFF_STEP
55+
----------------------------
5556

5657
Default: ``15``
5758

5859
Step size used for calculating exponential backoff according to the formula: ``random.uniform(0, min(max, step * 2 ** attempt))``.
5960

60-
CRAWLERA_BACKOFF_MAX
61-
-----------------------
61+
ZYTE_SMARTPROXY_BACKOFF_MAX
62+
---------------------------
6263

6364
Default: ``180``
6465

6566
Max value for exponential backoff as showed in the formula above.
6667

67-
CRAWLERA_FORCE_ENABLE_ON_HTTP_CODES
68-
------------------------------------
68+
ZYTE_SMARTPROXY_FORCE_ENABLE_ON_HTTP_CODES
69+
------------------------------------------
6970

7071
Default: ``[]``
7172

72-
List of HTTP response status codes that warrant enabling Crawlera for the
73+
List of HTTP response status codes that warrant enabling Zyte Smart Proxy Manager for the
7374
corresponding domain.
7475

7576
When a response with one of these HTTP status codes is received after a request
76-
that did not go through Crawlera, the request is retried with Crawlera, and any
77-
new request to the same domain is also sent through Crawlera.
77+
that did not go through Zyte Smart Proxy Manager, the request is retried with Zyte Smart Proxy Manager, and any
78+
new request to the same domain is also sent through Zyte Smart Proxy Manager.

scrapy_crawlera/__init__.py

Lines changed: 0 additions & 4 deletions
This file was deleted.

scrapy_zyte_smartproxy/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
from .middleware import ZyteSmartProxyMiddleware
2+
3+
4+
__version__ = '1.7.2'

0 commit comments

Comments
 (0)