Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[draft] import_page: support file scheme and use bs4 to workaround missing 'body' element #456

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

jirib
Copy link

@jirib jirib commented Feb 17, 2025

Support file scheme and use bs4 to workaround missing 'body' element

@felixfontein felixfontein changed the title Support file scheme and use bs4 to workaround missing 'body' element import_page: support file scheme and use bs4 to workaround missing 'body' element Feb 17, 2025
document = doc_template.format(
title=title,
slug=slug,
content=node.prettify()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we leave the HTML as it is? If it is 'article' should I just get 'article' content (mostly likely the website template would already have 'article' element) ???

Copy link
Author

@jirib jirib Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to add a functionality for this topic, see ab15bed .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove the wrapper element (e.g. <article>) by default.

@jirib jirib force-pushed the support_file_scheme branch from 8c15716 to ab15bed Compare February 18, 2025 00:43
@jirib jirib changed the title import_page: support file scheme and use bs4 to workaround missing 'body' element [draft] import_page: support file scheme and use bs4 to workaround missing 'body' element Feb 18, 2025
Comment on lines +70 to +77
while args:
arg = args.pop(0)
if arg == "-s" and args:
selector = args.pop(0)
elif arg == "-e" and args:
extractor = args.pop(0)
else:
urls.append(arg) # Assume it's a page URL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don’t need to parse args yourself, you should use the built-in support in doit. See just about any command plugin for an example.

Comment on lines +43 to +46
args = sys.argv[1:]
selector = None # 'body'
extractor = None # 'lambda node: BeautifulSoup(node.decode_contents(), "html.parser").prettify()'
urls = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used?


doc_template = '''<!--
.. title: {title}
.. slug: {slug}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider also adding date (defaulting to now is fine).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants