Skip to content

Conversation

@michaelos443
Copy link
Owner

Description

This PR adds a new extractor for Wikipedia that allows users to extract and save article content in Markdown format. This is useful for offline reading, research, or creating local archives of Wikipedia articles.

Features

  • Extract text content from Wikipedia articles
  • Save content as Markdown files
  • Preserve headings and paragraph structure
  • Clean up unwanted elements (references, edit links, etc.)

Changes

  • Added new wikipedia.py extractor module
  • Updated SITES dictionary in common.py
  • Updated extractors/init.py to include the new module
  • Added BeautifulSoup4 as a dependency in requirements.txt and setup.py
  • Updated README.md with documentation for the new feature

Example Usage

$ you-get https://en.wikipedia.org/wiki/Free_software
Site:       Wikipedia
Title:      Free software
Type:       Markdown
Size:       0.12 MiB (123456 Bytes)

Wikipedia article saved to: Free software.md

Pull Request opened by Augment Code with guidance from the PR author

@michaelos443 michaelos443 self-assigned this Oct 2, 2025
@michaelos443
Copy link
Owner Author

augment review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants