Skip to content

Doesn't crawl the webpage #1

@madhuradlakha

Description

@madhuradlakha

Changed line 16 to:

if 'href' in getattr(link, 'attrs', {}): as it showed the error:

AttributeError: 'Doctype' object has no attribute 'has_attr'

It also shows a user warning as follows:

UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

BeautifulSoup([your markup])

to this:

BeautifulSoup([your markup], "lxml")

markup_type=markup_type))

Now, it doesn't crawl the page just prints:
Finished!
0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions