Task0 #4

ligzer · 2021-12-15T19:51:02Z

No description provided.

flikos · 2021-12-16T11:34:44Z

ProxyHandler.py

+    ]
+
+    def __proxy_request__(self):
+        con = HTTPSConnection('news.ycombinator.com')


Здесь и далее возможно стоило вынести это в инициализацию класса и вообще во внешнюю переменную.

Note: this enforcement is applied by the HTTPConnection class. The
HTTPResponse class does not enforce this state machine, which
implies sophisticated clients may accelerate the request/response
pipeline. Caution should be taken, though: accelerating the states
beyond the above pattern may imply knowledge of the server's
connection-close behavior for certain requests. For example, it
is impossible to tell whether the server will close the connection
UNTIL the response headers have been read; this means that further
requests cannot be placed into the pipeline until it is known that
the server will NOT be closing the connection.

Поясню: HTTPSConnection умеет быть переиспользованной(Если tcp-соединение будет закрыто, откроет его заново, если открыто сразу отправит запрос). И это может повысить скорость, но так же вызвать проблемы(зависит от сервера, он может внезапно закрыть соединение). Помимо это повод усложнить код. do_GET do_POST вполне себе простые синхронные методы. Но зато выполняются ThreadingServer. Если же сделать HTTPSConnection статичным свойством(это будет правильней чем глобальный объект) то будем решать вопрос с конкурентным доступом к нему(скорее всего проблем не будет, из-за GIL, но это всё равно вопрос к мультипоточности). И скорее всего ещё и в скорости потеряем, потому что не сможем слать параллельные запросы.

flikos · 2021-12-16T11:36:56Z

utils.py

+__regexp_domain__ = re.compile(r'([\w-]+(?:\.[\w-]+)+)')
+__regexp_dash_underline__ = re.compile(r'(-_+|_+-)')
+__regexp_6letter_word__ = re.compile(r'(?i)(?<![\w-])([a-z][\w-]{4}[a-z0-9])(?![\w-])')
+


Решение с регэкспами больше похоже на магию, в которую проще поверить, чем проверить. )
Но это мой минус, я просто не знаю, как пользоваться регами, возможно тут всё очевидно.

flikos

Вроде всё ок.
Я не совсем в курсе, как принято у "взрослых", но может быть стоило помимо модульных тестов ещё комплексный тест с простой частью страницы, чтобы проверить, что несколько последовательных частей программы в совокупности не портят итоговый результат?

flikos · 2021-12-16T11:39:39Z

utils.py

+    """Split text with rexexp's recursively"""
+
+    # Split by any url-like sequence
+    # Example: anyprotocol://yaras.ru/asddas?asd=1&asd=dsa


Коммент с примером есть, но по нему непонятно, что произойдёт после выполнения кода.

flikos · 2021-12-16T12:39:38Z

utils.py

+    # Example: anyprotocol://yaras.ru/asddas?asd=1&asd=dsa
+    splited1 = enumerate(__regexp_url__.split(text))
+    for i, t1 in splited1:
+        if i % 2 == 0:


Возможно стоило указать, почему чётные обрабатываются именно так хотя бы комментарием, что это
# if not splitter
или подобным, чтобы легче было читать.

Да нужно было пояснить как работает re.split в данном контексте.

ligzer · 2021-12-16T13:32:56Z

ProxyHandler.py

+            else:
+                # For others send original answer
+                shutil.copyfileobj(response, self.wfile)
+        con.close()


Совершенно забыл об обработке исключений. В случае исключения при запросе может иметь смысл отправлять повторный запрос(если GET, если POST - пусть сами разбираются) и отправлять осмысленные ошибки клиенту

Предполагаю что проблемы будут с текстом ссылок вида domain.com/abcdef/index - вставит tm после abcdef

ligzer · 2021-12-17T15:40:48Z

tests.py

+
+        elem_b = lxml.html.fromstring('<span class="sitestr">github.com/spencertipping</span>')
+        __add_tm_to_element__(elem_a)
+        self.assertEqual(b'<span class="sitestr">github.com/spencertipping</span>', lxml.html.tostring(elem_b))


Косяк. Модифицируем elem_a, а проверяем elem_b

elsiniestra · 2021-12-20T20:57:47Z

utils.py

+__regexp_url__ = re.compile(r'(?i)\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|[a-z0-9.\-]+[.][a-z]{2,4}\/)'
+                            r'(?:[^\s\(\)<>]+|\((?:[^\s\(\)<>]+|(?:\([^\s\(\)<>]+\)))*\))+'
+                            r'(?:\((?:[^\s\(\)<>]+|(?:\([^\s\(\)<>]+\)))*\)|[^\s`!\(\)\[\]{};:\'".,<>?«»“”‘’]))')
+
+__regexp_domain__ = re.compile(r'([\w-]+(?:\.[\w-]+)+)')
+__regexp_dash_underline__ = re.compile(r'(-_+|_+-)')
+__regexp_6letter_word__ = re.compile(r'(?i)(?<![\w-])([a-z][\w-]{4}[a-z0-9])(?![\w-])')


What's the point of naming variables/methods as in the case of class dunder methods?
Constants, for example, should be named in a caps lock case like REGEXP_URL

elsiniestra

In general, the solution is unusual and quite original, but I think the part with regular expressions (and working with html) is a bit overcomplicated, you could it in a much simpler way. Written tests in the project are a really good thing about it.

P.S. what about requirements.txt and README.md?

ligzer · 2021-12-21T16:46:21Z

P.S. what about requirements.txt and README.md?
Так они же вроде есть

Mike Pro added 12 commits December 13, 2021 17:58

Наивное решение

6ca1783

release appending 6-words with tm-symbol

93744a3

skip in-html javascript blocks

0786acb

Some headers are proxied too - to use cache

12321a2

advanced regexp for searching words

01d05ed

naive replacing urls

37658b0

hotfix for unicode symbols

5154f52

Add support for POST-requests

19d530e

hotfix in url replacements for https urls

8ee17b7

fix replacing urls

bd3be67

replace doubled code

5e7678d

hotfix for replacing urls in plain-text

0ee3a9f

flikos reviewed Dec 16, 2021

View reviewed changes

ligzer commented Dec 16, 2021

View reviewed changes

ligzer commented Dec 17, 2021

View reviewed changes

elsiniestra reviewed Dec 20, 2021

View reviewed changes

Task0 #4

Are you sure you want to change the base?

Task0 #4

Uh oh!

Conversation

ligzer commented Dec 15, 2021

Uh oh!

flikos Dec 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flikos left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elsiniestra left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ligzer commented Dec 21, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

flikos Dec 16, 2021 •

edited

Loading

flikos left a comment •

edited

Loading

elsiniestra left a comment •

edited

Loading