Skip to content

Introduce Meta as a way to pass information inside a PO #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 21, 2022
2 changes: 2 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ TBR
* Added support for Python 3.10
* Added support for performing additional requests using
``web_poet.HttpClient``.
* Introduced ``web_poet.Meta`` to pass arbitrary information
inside a Page Object.


0.1.1 (2021-06-02)
Expand Down
2 changes: 2 additions & 0 deletions docs/advanced/additional_requests.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,8 @@ to extract more images in a product page that might not otherwise be possible.
This is because in order to do so, an additional button needs to be clicked
which fetches the complete set of product images via AJAX.

.. _`request-post-example`:

A ``POST`` request with `header` and `body`
-------------------------------------------

Expand Down
154 changes: 154 additions & 0 deletions docs/advanced/meta.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
.. _`advanced-meta`:

============================
Passing information via Meta
============================

In some cases, Page Objects might require additional information to be passed to
them. Such information can dictate the behavior of the Page Object or affect its
data entirely depending on the needs of the developer.

If you can recall from the previous basic tutorials, one essential requirement of
Page Objects that inherit from :class:`~.WebPage` or :class:`~.ItemWebPage` would
be :class:`~.ResponseData`. This holds the HTTP response information that the
Page Object is trying to represent.

In order to standardize how to pass arbitrary information inside Page Objects,
we'll need to use :class:`~.Meta` similar on how we use :class:`~.ResponseData`
as a requirement to instantiate Page Objects:

.. code-block:: python

import attr
import web_poet

@attr.define
class SomePage(web_poet.ItemWebPage):
# ResponseData is inherited from ItemWebPage
meta: web_poet.Meta

response = web_poet.ResponseData(...)
meta = web_poet.Meta("arbitrary_value": 1234, "cool": True)

page = SomePage(response=response, meta=meta)

However, similar with :class:`~.ResponseData`, developers using :class:`~.Meta`
shouldn't care about how they are being passed into Page Objects. This will
depend on the framework that would use **web-poet**.

Let's checkout some examples on how to use it inside a Page Object.

Controlling item values
-----------------------

.. code-block:: python

import attr
import web_poet


@attr.define
class ProductPage(web_poet.ItemWebPage):
meta: web_poet.Meta

default_tax_rate = 0.10

def to_item(self):
item = {
"url": self.url,
"name": self.css("#main h3.name ::text").get(),
"price": self.css("#main .price ::text").get(),
}
self.calculate_price_with_tax(item)
return item

@staticmethod
def calculate_price_with_tax(item):
tax_rate = self.meta.get("tax_rate") or self.default_tax_rate
item["price_with_tax"] = item["price"] * (1 + tax_rate)


From the example above, we were able to provide an optional information regarding
the **tax rate** of the product. This could be useful when trying to support
the different tax rates for each state or territory. However, since we're treating
the **tax_rate** as optional information, notice that we also have a the
``default_tax_rate`` as a backup value just in case it's not available.


Controlling Page Object behavior
--------------------------------

Let's try an example wherein :class:`~.Meta` is able to control how
:ref:`advanced-requests` are being used. Specifically, we are going to use
:class:`~.Meta` to control the number of paginations being made.

.. code-block:: python

from typing import List

import attr
import web_poet


@attr.define
class ProductPage(web_poet.ItemWebPage):
http_client: web_poet.HttpClient
meta: web_poet.Meta

default_max_pages = 5

async def to_item(self):
return {"product_urls": await self.get_product_urls()}

async def get_product_urls(self) -> List[str]:
# Simulates scrolling to the bottom of the page to load the next
# set of items in an "Infinite Scrolling" category list page.
max_pages = self.meta.get("max_pages") or self.default_max_pages
requests = [
self.create_next_page_request(page_num)
for page_num in range(2, max_pages + 1)
]
responses = await http_client.batch_requests(*requests)
pages = [self] + list(map(web_poet.WebPage, responses))
return [
product_url
for page in pages
for product_url in self.parse_product_urls(page)
]

@staticmethod
def create_next_page_request(page_num):
next_page_url = f"https://example.com/category/products?page={page_num}"
return web_poet.Request(url=next_page_url)

@staticmethod
def parse_product_urls(page):
return page.css("#main .products a.link ::attr(href)").getall()

From the example above, we can see how :class:`~.Meta` is able to arbitrarily
limit the pagination behavior by passing an optional **max_pages** info. Take
note that a ``default_max_pages`` value is also present in the Page Object in
case the :class:`~.Meta` instance did not provide it.

Value Restrictions
------------------

From the examples above, you may notice that we can access :class:`~.Meta` with
a ``dict`` interface since it's simply a subclass of it. However, :class:`~.Meta`
posses some extendable features on top of being a ``dict``.

Specifically, :class:`~.Meta` is able to restrict any value passed based on its
type. For example, if any of these values are passed, then a ``ValueError`` is
raised:

* module
* class
* method or function
* generator
* coroutine or awaitable
* traceback
* frame

This is to ensure that frameworks using **web-poet** are able safely use values
passed into :class:`~.Meta` as they could be passed via CLI, web forms, HTTP API
calls, etc.
15 changes: 13 additions & 2 deletions docs/api_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,19 @@ Page Inputs
===========

.. automodule:: web_poet.page_inputs
:members:
:undoc-members:

.. autoclass:: ResponseData
:show-inheritance:
:members:
:undoc-members:
:inherited-members:
:no-special-members:

.. autoclass:: Meta
:show-inheritance:
:members:
:no-special-members:


Pages
=====
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ and the motivation behind ``web-poet``, start with :ref:`from-ground-up`.
:maxdepth: 1

advanced/additional_requests
advanced/meta

.. toctree::
:caption: Reference
Expand Down
21 changes: 20 additions & 1 deletion tests/test_page_inputs.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
from web_poet.page_inputs import ResponseData
import pytest
import asyncio

from web_poet.page_inputs import ResponseData, Meta


def test_html_response():
Expand All @@ -11,3 +14,19 @@ def test_html_response():
response = ResponseData("url", "content", 200, {"User-Agent": "test agent"})
assert response.status == 200
assert response.headers["User-Agent"] == "test agent"


def test_meta_restriction():
# Any value that conforms with `Meta.restrictions` raises an error
with pytest.raises(ValueError) as err:
Meta(func=lambda x: x + 1)

with pytest.raises(ValueError) as err:
Meta(class_=ResponseData)

# These are allowed though
m = Meta(x="hi", y=2.2, z={"k": "v"})
m["allowed"] = [1, 2, 3]

with pytest.raises(ValueError) as err:
m["not_allowed"] = asyncio.sleep(1)
2 changes: 1 addition & 1 deletion web_poet/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
from .pages import WebPage, ItemPage, ItemWebPage, Injectable
from .page_inputs import ResponseData
from .page_inputs import ResponseData, Meta
from .requests import request_backend_var, Request, HttpClient
57 changes: 56 additions & 1 deletion web_poet/page_inputs.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
from typing import Optional, Dict, Any, ByteString, Union
import inspect
from typing import Optional, Dict, Any, ByteString, Union, Set
from contextlib import suppress

import attr

Expand All @@ -24,7 +26,60 @@ class ResponseData:

``headers`` should contain the HTTP response headers.
"""

url: str
html: str
status: Optional[int] = None
headers: Optional[Dict[Union[str, ByteString], Any]] = None


class Meta(dict):
"""Container class that could contain any arbitrary data to be passed into
a Page Object.

This is basically a subclass of a ``dict`` that adds the ability to check
if any of the assigned values are not allowed. This ensures that some input
parameters with data types that are difficult to provide or pass via CLI
like ``lambdas`` are checked. Otherwise, a ``ValueError`` is raised.
"""

# Any "value" that returns True for the functions here are not allowed.
restrictions: Dict = {
inspect.ismodule: "module",
inspect.isclass: "class",
inspect.ismethod: "method",
inspect.isfunction: "function",
inspect.isgenerator: "generator",
inspect.isgeneratorfunction: "generator",
inspect.iscoroutine: "coroutine",
inspect.isawaitable: "awaitable",
inspect.istraceback: "traceback",
inspect.isframe: "frame",
}

def __init__(self, *args, **kwargs) -> None:
for val in kwargs.values():
self.enforce_value_restriction(val)
super().__init__(*args, **kwargs)

def __setitem__(self, key: Any, value: Any) -> None:
self.enforce_value_restriction(value)
super().__setattr__(key, value)

def enforce_value_restriction(self, value: Any) -> None:
"""Raises a ``ValueError`` if a given value isn't allowed inside the meta.

This method is called during :class:`~.Meta` instantiation and setting
new values in an existing instance.

This behavior can be controlled by tweaking the class variable named
``restrictions``.
"""
violations = []

for restrictor, err in self.restrictions.items():
if restrictor(value):
violations.append(f"{err} is not allowed: {value}")

if violations:
raise ValueError(f"Found these issues: {', '.join(violations)}")