Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
576 commits
Select commit Hold shift + click to select a range
6e4362a
Merge pull request #165 from mediawiki-client-tools/dependabot/pip/re…
elsiehupp Aug 1, 2023
65719b7
CI: remove pre-commit and travis
yzqzss Aug 3, 2023
48cd660
dependencies: remove useless dependencies
yzqzss Aug 3, 2023
1804882
deprecate: wikia.py and wikkii-spider.py
yzqzss Aug 3, 2023
c9f7681
remove: wikiteam3.gui
yzqzss Aug 3, 2023
21fc305
feat&refactor: more robust convert URLs to prefix filename
yzqzss Aug 3, 2023
ef39373
# refactor: WIP 20230803
yzqzss Aug 3, 2023
7c0f08e
fix: standardize_url()
yzqzss Aug 4, 2023
6a49a93
WIP: refactor uploader
yzqzss Aug 4, 2023
d0b7db6
do not strip sha1s from XML
yzqzss Aug 4, 2023
074dd42
fix: Accidentally changed session.post to session.get in refactoring :(
yzqzss Aug 4, 2023
b283425
fix: config is not passed to log_error()
yzqzss Aug 4, 2023
94d1326
feat: ensure params["limit"] is int and >= 2
yzqzss Aug 4, 2023
9b1c2a5
fix: `parentid` should be a positive integer
yzqzss Aug 6, 2023
51eab0a
sort config.json
yzqzss Aug 6, 2023
65ce4a5
fix: `--xmlapiexport` infinite loop on page with missing revision
yzqzss Aug 7, 2023
76465d3
fix: truncated API response for "allrevisions" causes infinite loop
yzqzss Aug 9, 2023
57421f3
fix: yield bug in red_titles()
yzqzss Aug 9, 2023
790b9a7
update .gitattributes
yzqzss Aug 9, 2023
7987c33
feat: save timestamp metadata to images.txt
yzqzss Aug 11, 2023
1043567
fix: Unable to traverse `Special:Filelist`
yzqzss Aug 12, 2023
2b567a5
feat: cli: add `--image-timestamp-interval` to download images upload…
yzqzss Aug 12, 2023
75d04ac
drop: launcher.py
yzqzss Aug 12, 2023
62f0c56
deprecate: dumpgenerator script (use wikiteam3dumpgenerator instead)
yzqzss Aug 12, 2023
3b9a582
rename: parseLastPageChunk() -> parse_last_page_chunk()
yzqzss Aug 12, 2023
f581f32
dependencies: re-lock
yzqzss Aug 12, 2023
5646b79
name back to wikiteam3 !
yzqzss Aug 13, 2023
310339e
refactor: Delay
yzqzss Aug 16, 2023
73022e5
Update: README.md
yzqzss Aug 18, 2023
ff3b066
refactor: ...
yzqzss Aug 18, 2023
9b047cd
refactor: uploader
yzqzss Aug 18, 2023
9b410f7
zstd level => 17
yzqzss Aug 18, 2023
06ffdc0
re-lock: --xmlrevisions --curonly
yzqzss Aug 18, 2023
865ff20
bump 4.0.0
yzqzss Aug 18, 2023
241d8f4
Update test-dumogenerator.yml
yzqzss Aug 18, 2023
b36ae98
remove legacy code
yzqzss Aug 18, 2023
38b9059
fix: upload-state
yzqzss Aug 18, 2023
b735332
bump 4.0.1
yzqzss Aug 18, 2023
19f5901
fix: metadata: rights
yzqzss Aug 18, 2023
eb3c9c7
bump 4.0.2
yzqzss Aug 18, 2023
a414c8e
fix: metadata: sitename
yzqzss Aug 19, 2023
3e69702
bump 4.0.3
yzqzss Aug 19, 2023
a90f57d
fix: batch must be positive
yzqzss Aug 19, 2023
6aa3415
bump 4.0.4
yzqzss Aug 19, 2023
a9e5bf9
uploader: --dry-run
yzqzss Aug 19, 2023
4bb5f3d
fix: metadata base_url quoted
yzqzss Aug 19, 2023
95e0ce9
bump 4.0.5
yzqzss Aug 19, 2023
1c51ea1
zstd titles.txt & images.txt
yzqzss Aug 19, 2023
c177971
bump 4.0.6
yzqzss Aug 19, 2023
2142e81
bump 4.0.7
yzqzss Aug 19, 2023
0b638e4
uploader: download logo: retry
yzqzss Aug 19, 2023
2338909
bump 4.0.8
yzqzss Aug 19, 2023
ec8a840
fix: logo upload
yzqzss Aug 19, 2023
7673afc
bump 4.0.9
yzqzss Aug 19, 2023
c2b18bc
fix: Delay.__init__() got an unexpected keyword argument 'session'
yzqzss Aug 19, 2023
ba68b7a
bump 4.0.10
yzqzss Aug 19, 2023
0231c2c
Update: README.md
yzqzss Aug 19, 2023
bb0f0f2
ia_wbm_booster: fallfast on HTTP 429
yzqzss Aug 19, 2023
a484901
uploader: print item url
yzqzss Aug 19, 2023
983cfb2
bump 4.0.12
yzqzss Aug 19, 2023
8c549cf
include errors.log into IA item
yzqzss Aug 20, 2023
f8c51e7
bump 4.0.13
yzqzss Aug 20, 2023
d3f1414
feat: uploader: --parallel
yzqzss Aug 21, 2023
4bda5d9
feat: If there are archives on IA in the last year, exit
yzqzss Aug 21, 2023
ddc73cf
fix: can't find API URL from HTML
yzqzss Aug 21, 2023
8343aa3
feat: uploader: `--zstd-level`
yzqzss Aug 21, 2023
5a9a777
bump 4.0.15
yzqzss Aug 21, 2023
83666a6
fix: --upload only work with --resume
yzqzss Aug 22, 2023
55e79ea
bump 4.0.16
yzqzss Aug 22, 2023
36fc712
session: set Accept header
yzqzss Aug 22, 2023
b434e37
bump 4.0.17
yzqzss Aug 22, 2023
6475ffc
increase the default delay from 0.5 to 1.5
yzqzss Aug 24, 2023
fdc11f6
update User-Agent
yzqzss Aug 24, 2023
d1b5a8c
temporary comment out an unicode check
yzqzss Aug 24, 2023
47e2cbb
feat: a trick to get original file for fandom.com
yzqzss Aug 24, 2023
82ad61e
bump 4.0.18
yzqzss Aug 24, 2023
fced139
more refactoring
yzqzss Sep 12, 2023
1d3c6b3
fix: `getNamespacesAPI()` broken fallback
yzqzss Sep 12, 2023
d4795e2
feat: load config.json every time Delay is called
yzqzss Sep 12, 2023
67f3c7c
refactor: `read_titles()`
yzqzss Sep 12, 2023
1645a7f
feat: utils: intro: `underscore()` and `space()`
yzqzss Sep 12, 2023
d8745bc
fear&refactor: image
yzqzss Sep 12, 2023
3654492
refactor: checking images dir: reduce memory usage
yzqzss Sep 12, 2023
ce17a94
feat: tools: images_size.py
yzqzss Sep 12, 2023
32d6acc
feat: uploader: 20x 7z packaging speed
yzqzss Sep 12, 2023
fcc261e
bump 4.1.0
yzqzss Sep 12, 2023
0039463
doc: add [[Uploader usage]]
yzqzss Sep 12, 2023
fa4aa24
feat: image: set `mtime` for downloaded file
yzqzss Sep 12, 2023
561f7e6
bump 4.1.1
yzqzss Sep 12, 2023
d8633ac
fix: countdown of images decreases by 2 per image
yzqzss Sep 19, 2023
89b59db
(upstream) Fix typos
yzqzss Sep 19, 2023
29e79ae
util: intro: `sha1bytes()`
yzqzss Sep 19, 2023
5e0e12c
feat: images: also verify sha1 while downloading
yzqzss Sep 19, 2023
ba2c426
feat: images: print estimated size of all images
yzqzss Sep 19, 2023
9c2879f
minor refactoring
yzqzss Sep 19, 2023
bae4a78
feat: cli: `--index-check-threshold`
yzqzss Sep 19, 2023
45960e5
feat: uploader: checking IA S3 load average
yzqzss Sep 19, 2023
19f8dd6
bump 4.1.2
yzqzss Sep 19, 2023
f3e4abc
refactor: dump.misc
yzqzss Sep 19, 2023
765464d
chore: sort cli options
yzqzss Sep 23, 2023
f16a07b
feat: `--add-referer-header` to image requests
yzqzss Sep 23, 2023
8ac253f
privatize `--user-agent` option
yzqzss Sep 23, 2023
2129df3
fix: ValueError('apiurl or indexurl must be provided')
yzqzss Sep 25, 2023
6addce7
change: no longer to create -2, -3 folder for user when dump dir exists
yzqzss Sep 25, 2023
cde1e68
improve: index check
yzqzss Sep 25, 2023
1c80603
feat: show progress while using --xmlrevisions
yzqzss Sep 25, 2023
96aa2b9
dependencies: update&re-lock
yzqzss Sep 25, 2023
5dbae3a
bump 4.1.3
yzqzss Sep 25, 2023
7205a17
fix: apiurl.lower(): NoneType' object has no attribute 'lower'
yzqzss Sep 25, 2023
f32ff3c
bump v4.1.4
yzqzss Sep 25, 2023
305378f
fix: checkParameters(): add check: `--xml* require --xml`
yzqzss Sep 27, 2023
417beae
refactor: clean up logic
yzqzss Oct 19, 2023
1538a6e
fix: (fandom image): ValueError if size is `null`
yzqzss Oct 21, 2023
892ea09
change: `file` passed to FileSizeError/FileSha1Error no longer contai…
yzqzss Oct 21, 2023
11417c4
rm ISSUE_TEMPLATE
yzqzss Dec 18, 2023
4546e3a
fix: a line of obsolete code left over from wikiteam(py2) ?
yzqzss Dec 18, 2023
e81c6dc
fix: `rvlimit` become a float or less than 1 under certain circumstances
yzqzss Dec 18, 2023
46f64ce
do not strip sha1s from XML (page_xml_api)
yzqzss Dec 18, 2023
a4a837a
fix: Multiple slashes in api/index URL result in `AssertionError: pre…
yzqzss Dec 18, 2023
45c68cc
update README.md (really)
yzqzss Dec 18, 2023
1880c04
bump 4.1.5
yzqzss Dec 18, 2023
6dd0fc4
UTC timzone for all
yzqzss Dec 18, 2023
b1626f6
fix: handling of 403 Forbidden error in get_WikiEngine() function
yzqzss Jan 21, 2024
3cbe113
feat: better `FileSizeError` error message
yzqzss Jan 21, 2024
4866014
dependencies: update
yzqzss Jan 21, 2024
2920ca9
bump 4.1.6
yzqzss Jan 21, 2024
51048b4
fix&bump 4.1.7
yzqzss Jan 21, 2024
aafb3a8
Update README.md (typo)
yzqzss Jan 27, 2024
10bfe50
add badges
yzqzss Feb 11, 2024
3f609d2
feat: `--insecure` disable TLS verification more deeply.
yzqzss Feb 26, 2024
8d7134d
minor refactor & "fix" tests
yzqzss Feb 26, 2024
f70d510
dependencies: re-lock
yzqzss Feb 26, 2024
c38429c
bump 4.1.8
yzqzss Feb 26, 2024
3ee0c18
feat: uploader: implement `--bin-zstd`
yzqzss Mar 5, 2024
dc37bdc
feat: uploader: implement `--bin-7z`
yzqzss Mar 5, 2024
9375733
remove deprecated `dumpgenerator` script
yzqzss Mar 5, 2024
00db786
faet: uploader: always exit when IA is heavily overloaded
yzqzss Mar 5, 2024
613453f
dependencies: remove `httpx`
yzqzss Mar 5, 2024
4496543
bump 4.1.9
yzqzss Mar 5, 2024
02bb743
cli: remove `--disable-image-verify`
yzqzss Mar 5, 2024
07843c1
feat: save files with incorrect sha1 or size to the "images_mismatch"…
yzqzss Mar 5, 2024
2a4d731
feat: uploader: support "images_mismatch" dir
yzqzss Mar 5, 2024
f40dad4
bump 4.2.0
yzqzss Mar 5, 2024
6f83ef8
feat: assertions to check before actually downloading
yzqzss Mar 5, 2024
d948774
bump 4.2.1
yzqzss Mar 5, 2024
5ec5e19
typo
yzqzss Mar 5, 2024
5654502
CI: add python3.12
yzqzss Mar 5, 2024
e07eee1
workaround for the wrong "Content-Encoding" header (--images)
yzqzss Mar 7, 2024
6af6f6c
bump 4.2.2
yzqzss Mar 7, 2024
845315f
feat: uploader: `--rezstd` zstd server-side recompression
yzqzss Mar 8, 2024
17ca876
Update README.md
yzqzss Mar 8, 2024
b1954be
bump 4.2.3
yzqzss Mar 8, 2024
af1c419
feat: replace error characters with � (U+FFFD) if they are not too many
yzqzss Mar 8, 2024
34a96dd
bump 4.2.4
yzqzss Mar 8, 2024
d1612ba
generate README.md
yzqzss Mar 10, 2024
a1b2beb
images: add conditional printing based on sys.stdout.isatty()
yzqzss Mar 23, 2024
0923364
dependencies: re-lock
yzqzss Mar 23, 2024
14fa468
refactor: convert `other` into dataclass
yzqzss Apr 8, 2024
942477f
retry with different url params when `--bypass-cdn-image-compression`…
yzqzss Apr 8, 2024
ecc1b90
`--image-timestamp-interval` is production ready now
yzqzss Apr 8, 2024
01dd1aa
retry with `Accept-Encoding: identity` header when `ChunkedEncodingEr…
yzqzss Apr 8, 2024
ca667c4
bump 4.2.5
yzqzss Apr 8, 2024
73a709d
DROP `--stdout-log-file` OPTION
yzqzss Apr 16, 2024
11b021a
chore: add config comments
yzqzss Apr 16, 2024
5e655bd
refactor: load config dataclass natively instead of `new_config()`
yzqzss Apr 16, 2024
fd5a02a
handle `MWUnknownContentModelException` when using "--xmlrevisions" o…
yzqzss Apr 16, 2024
c0b10c6
chore: minor edit
yzqzss Apr 20, 2024
a8bba92
bump 4.2.6
yzqzss Apr 20, 2024
946d64c
chore: typo
yzqzss Apr 20, 2024
68393b1
chore: remove legacy uprint
yzqzss Jun 21, 2024
005493b
migrating to pdm
yzqzss Jun 21, 2024
38bb007
datetime.datetime.now(datetime.UTC)
yzqzss Jun 22, 2024
513071b
README: mention wikibot
yzqzss Jun 30, 2024
e73fc6f
chore: format code
yzqzss Jul 9, 2024
36897c6
chore: more type hints
yzqzss Jul 9, 2024
7da59d2
test: refactor `TestRegexsOffline`
yzqzss Jul 9, 2024
2b08c86
refactor: `getXMLPageCore()`
yzqzss Jul 9, 2024
3254eb3
feat: intro `PARAM_XML_LIMIT` env to adjust `&limit=` for `getXMLPage…
yzqzss Jul 9, 2024
ed8ea38
add `dryrun` option to `truncateXMLDump()`
yzqzss Jul 9, 2024
05959bf
chore: typo fix & warning for --bypass-cdn-image-compression
yzqzss Jul 11, 2024
5a91db1
chore: show --force tip if found exists IA item
yzqzss Jul 11, 2024
1cdbd9c
feat: increasemental xmldump (--xmlrevisions) PoC (#24)
yzqzss Jul 23, 2024
079109f
CI: py3.13.0-rc.1
yzqzss Aug 7, 2024
b821fb4
test: fix
yzqzss Aug 7, 2024
5ad4501
bump 4.3.0
yzqzss Aug 7, 2024
3dd7c12
fix: datetime.utc available from python3.11
yzqzss Aug 11, 2024
e385247
bump 4.3.1
yzqzss Aug 11, 2024
9fdae97
fix: uploader: incremental dump mark check
yzqzss Aug 11, 2024
61de739
bump 4.3.2
yzqzss Aug 11, 2024
6d5962b
dependencies: re-lock
yzqzss Sep 10, 2024
f5bff47
feat: update regex to handle new MediaWiki versions
yzqzss Sep 24, 2024
09e15e9
minor regexs refactor
yzqzss Sep 24, 2024
7c03084
bump 4.3.3
yzqzss Sep 24, 2024
94147d6
fix: uploader crash when resuming an upload after all files on disk a…
yzqzss Oct 4, 2024
6954968
bump 4.3.4
yzqzss Oct 4, 2024
4129537
feat: allow uploading into any collection (#29)
DigitalDwagon Oct 5, 2024
678573f
chore: move tests to `/tests`
yzqzss Oct 5, 2024
db2818c
bump 4.3.5
yzqzss Oct 5, 2024
ffed938
add a --hard-retries option (#30)
DigitalDwagon Oct 9, 2024
f0b234d
respect robots.txt
yzqzss Nov 17, 2024
628137f
minor refactor
yzqzss Nov 17, 2024
757d1c7
update wikibot link
yzqzss Nov 17, 2024
5039239
bump 4.3.6
yzqzss Nov 17, 2024
47ea40c
Revert "minor refactor"
yzqzss Nov 21, 2024
8a05488
bump 4.3.7
yzqzss Nov 21, 2024
f2a906c
mark --exnamespaces lack maintenanced
yzqzss Dec 4, 2024
03cd7af
fix: openssl ciphers
yzqzss Dec 4, 2024
2af8803
minor refactor: exnamespaces
yzqzss Dec 4, 2024
3343127
chore: naming "all" as `ALL_NAMESPACE_FLAG` and change internel magic…
yzqzss Dec 4, 2024
4ead08d
fix: title list be truncated if the page name is exactly `--END--`.
yzqzss Dec 4, 2024
f901972
feat: add option `--redirects` to dump page redirects map
yzqzss Dec 4, 2024
d545f80
doc: add DEV.md and update README.md
yzqzss Dec 4, 2024
c0dae4c
debug: print requests url and body
yzqzss Dec 4, 2024
9c425e5
minor refactor
yzqzss Dec 4, 2024
a6d68c8
bump 4.4.0
yzqzss Dec 4, 2024
1c91498
feat: better robots.txt parsing by using urlib
yzqzss Feb 15, 2025
5a3f020
doc: update
yzqzss Feb 15, 2025
b5eedfb
fix: exit on 403 in handle_StatusCode
yzqzss Mar 19, 2025
67c82a2
bump 4.4.1
yzqzss Mar 19, 2025
a906e40
check both HTTP and HTTPS when searching IA for wiki dumps
DigitalDwagon Apr 17, 2025
9b5ea1c
make linter happy
yzqzss Apr 27, 2025
1178d3b
Merge pull request #45 from DigitalDwagon/check-http-https
yzqzss Apr 27, 2025
4f5f126
CI: remove requirements.txt
yzqzss May 6, 2025
b28dadd
bump 4.4.2
yzqzss May 6, 2025
0244338
docs: add information about bot IPs
yzqzss May 7, 2025
b18f277
Fix link to IRC channel on hackint
riotbib May 10, 2025
299b00a
Merge pull request #46 from riotbib/fix-hackint-link
yzqzss May 10, 2025
8a6dd30
Update README.md
yzqzss May 10, 2025
6f5dce6
fix: `ValueError` occurs if `size` is `NULL` and `sha1` doesn't match
yzqzss Jul 30, 2025
d07c14e
pyproject: update `requires-python` to `>=3.9`
yzqzss Jul 30, 2025
695eeaa
bump 4.4.3
yzqzss Jul 30, 2025
7bbbfdc
CI: upgrade min `python-version` from 3.8 to 3.9
yzqzss Jul 30, 2025
c15dd5d
fix: Incomplete redirect dump (#50)
TripleCamera Aug 3, 2025
9846611
bump 4.4.4
yzqzss Aug 3, 2025
40be01b
feat: image: skip Fandom PNG when resuming
yzqzss Aug 23, 2025
5cb8bf0
test: add unit test for Image.get_image_names_API to handle filenames…
yzqzss Aug 23, 2025
f57d212
image: update skip logic to handle PNG, JPG, and JPEG files for Fandom
yzqzss Aug 27, 2025
9e2cc89
bump 4.4.5
yzqzss Aug 27, 2025
2d208e8
test: fix
yzqzss Aug 27, 2025
962423f
fix: `read_titles()` no longer returns `--END--` at last iteration
yzqzss Sep 20, 2025
e1733cb
Add --http-method (#59)
DigitalDwagon Nov 13, 2025
c34d58a
bump 4.4.6
yzqzss Nov 15, 2025
e1470ff
fixed: wrong arguments using in http basic auth for three years (#60)
sizau Nov 19, 2025
7b03b21
bump 4.4.7
yzqzss Dec 11, 2025
0b2135d
drop: `--exnamespaces` cli option
yzqzss Dec 24, 2025
3fda599
dependency: fix lxml build
yzqzss Dec 24, 2025
ea55b49
dependencies: update
yzqzss Dec 24, 2025
94de420
CI: python3.14
yzqzss Dec 24, 2025
06346a4
bump 4.4.8
yzqzss Dec 24, 2025
474efd0
Add Nix flake support for reproducible archiving
Feb 12, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1,2 +1,12 @@
*.com linguist-vendored
*.org linguist-vendored

*.py text=auto
*.sh text=auto
*.json text=auto
*.txt text=auto
*.md text=auto

*.html linguist-detectable=false

wikiteam3/dumpgenerator/test/data/* linguist-vendored
34 changes: 34 additions & 0 deletions .github/workflows/nix.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: Nix Build and Check

on:
push:
branches: [ main, v4-main ]
pull_request:
branches: [ main, v4-main ]

jobs:
nix-build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install Nix
uses: cachix/install-nix-action@v27
with:
extra_nix_config: |
experimental-features = nix-command flakes

- name: Check flake
run: nix flake check

- name: Build package
run: nix build

- name: Test wikiteam3dumpgenerator
run: |
nix run . -- --help

- name: Verify reproducibility
run: |
nix build --rebuild
nix path-info --json | jq -r '.[].narHash'
43 changes: 43 additions & 0 deletions .github/workflows/test-dumogenerator.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python

name: dumogenerator test

on:
push:
pull_request:

jobs:
build:

runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.9", "3.12", "3.14"]

steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y libxml2-dev libxslt-dev # for lxml
python -m pip install --upgrade pip
python -m pip install flake8 pytest
pip install .
- name: Lint with flake8
run: |
# exit if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: run dumpgenerator
run: |
python -m wikiteam3.dumpgenerator -h
- name: Test with pytest
run: |
pytest
12 changes: 9 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
*.pyc
testing/*.pyc
testing/dumpgenerator.py
/.tox
.pytest_cache
keys.txt
batchdownload/keys.txt
batchdownload/dumpgenerator.py
batchdownload/uploader.py
__pycache__
tests/tmp
dist/
.DS_Store
desktop.ini

.venv
.vscode
.idea
1 change: 1 addition & 0 deletions .pdm-python
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/home/yzqzss/git/wikiteam3/.venv/bin/python
8 changes: 0 additions & 8 deletions .travis.yml

This file was deleted.

89 changes: 89 additions & 0 deletions DEV.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# WikiTeam3 internal

## Images.txt structure

```python
filename + "\t" + url + "\t" + uploader
+ "\t" + (str(size) if size else NULL)
+ "\t" + (str(sha1) if sha1 else NULL)
+ "\t" + (timestamp if timestamp else NULL)
+ "\n"
```

*optional fields:
- "null" (magic None value, since wikiteam3 v4.0.0)
- "False" (magic None value, before wikiteam3 v4.0.0)
- not present (ancient wikiteam3 versions)

# Snippets

## API Output format

https://www.mediawiki.org/wiki/API:Data_formats#Output

> The standard and default output format in MediaWiki is JSON. All other formats are discouraged.
>
> The output format should always be specified using format=yourformat with yourformat being one of the following:
>
> json: JSON format. (recommended)
> php: serialized PHP format. (deprecated)
> xml: XML format. (deprecated)
> txt: PHP print_r() format. (removed in 1.27)
> dbg: PHP var_export() format. (removed in 1.27)
> yaml: YAML format. (removed in 1.27)
> wddx: WDDX format. (removed in 1.26)
> dump: PHP var_dump() format. (removed in 1.26)
> none: Returns a blank response. 1.21+

In our practice, `json` is not available for some old wikis.

## Allpages

https://www.mediawiki.org/wiki/API:Allpages (>= 1.8)


## Allimages

https://www.mediawiki.org/wiki/API:Allimages (>= 1.13)

## Redirects

https://www.mediawiki.org/wiki/Manual:Redirect_table

## Logs

https://www.mediawiki.org/wiki/Manual:Logging_table

## Continuation

https://www.mediawiki.org/wiki/API:Continue (≥ 1.26)
https://www.mediawiki.org/wiki/API:Raw_query_continue (≥ 1.9)

> From MediaWiki 1.21 to 1.25, it was required to specify continue= (i.e. with an empty string as the value) in the initial request to get continuation data in the format described above. Without doing that, API results would indicate there is additional data by returning a query-continue element, explained in Raw query continue.
> Prior to 1.21, that raw continuation (`query-continue`) was the only option.
>
> If your application needs to use the raw continuation in MediaWiki 1.26 or later, you must specify rawcontinue= to request it.

# Workarounds

## truncated API response causes infinite loop

https://github.com/mediawiki-client-tools/mediawiki-dump-generator/issues/166
https://phabricator.wikimedia.org/T86611

wikiteam3 workaround: https://github.com/saveweb/wikiteam3/commit/76465d34898b80e8c0eb6d9652aa8efa403a7ce7

## MWUnknownContentModelException

> "The content model xxxxxx is not registered on this wiki;"

Some extensions use custom content models for their own purposes, but they did not register a handler to export their content.

wikiteam3 workaround: https://github.com/saveweb/wikiteam3/commit/fd5a02a649dcf3bdab7ac1268445b0550130e6ee

## Insecure SSL

https://docs.openssl.org/1.1.1/man1/ciphers/
https://docs.openssl.org/master/man1/openssl-ciphers/

wikiteam3 workaround: https://github.com/saveweb/wikiteam3/blob/8a054882de19c6b69bc03798d3044b7b5c4c3c88/wikiteam3/utils/monkey_patch.py#L63-L84
Binary file added MediaWikiArchive.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading