fix(notubiz): missing documents from several municipalities #513

BluntKatana · 2025-02-12T11:48:16Z

Problem

I've found that in the notubiz API there are several 'hidden' agenda items and documents which are currently not being scraped resulting in a large difference between documents actually available on the municipalities sites and on ORI.

There are two main issues I have found. Both of which are related to the agenda items properties. Currently an agenda item from a meeting (.agenda_items[]) is parsed only on the .documents[]. However there are two more properties which are interesting:

.module_items[]: A module item in itself does not look interesting (see below). But once fetching this item using the .self-property we find that a module item can have several documents containing it.
(see point 6 on municipality website)
(see 7th agenda item: https://api.notubiz.nl/events/meetings/1152031?format=json&version=1.17.0)

.agenda_items[]: An agenda item itself can contain several more agenda items which (again) do not look interesting at first (see below), but when fetching them outright they can ofcourse contain documents again (and even more agenda items or module items..)
(they have a special suffix on the municipality website)
(see 14th agenda item: https://api.notubiz.nl/events/meetings/1161553?format=json&version=1.17.0)

Some examples of missing documents

(note that my simple scraper is also missing some documents atm, but has better coverage for the notubiz api)

Breda

year	scraped_from_notubiz	scraped_from_ori	in_notubiz_not_in_ori	in_ori_not_in_notubiz
2014	0	126	0	126
2015	0	578	0	578
2016	3740	1000	2856	116
2017	1745	1000	933	188
2018	431	243	198	10
2019	1915	218	1698	1
2020	227	157	72	2
2021	193	184	10	1
2022	220	186	37	3
2023	240	221	22	3
2024	206	155	56	5

Waddinxveen

year	scraped_from_notubiz	scraped_from_ori	in_notubiz_not_in_ori	in_ori_not_in_notubiz
2014	0	964	0	964
2015	0	1000	0	1000
2016	5166	1000	4391	225
2017	2362	0	2362	0
2018	1514	988	601	75
2019	1603	993	647	37
2020	1593	469	1125	1
2021	1544	695	926	77
2022	1636	1000	662	26
2023	1523	1000	554	31
2024	1120	698	459	37

Bunschoten

Enkhuizen

IJsstelstein

The text was updated successfully, but these errors were encountered:

BluntKatana added the bug High priority issue for (blocking) problems label Feb 12, 2025

BluntKatana changed the title ~~fix: missing documents from several notubiz municipalities~~ fix(notubiz): missing documents from several municipalities Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(notubiz): missing documents from several municipalities #513

fix(notubiz): missing documents from several municipalities #513

BluntKatana commented Feb 12, 2025 •

edited

Loading

fix(notubiz): missing documents from several municipalities #513

fix(notubiz): missing documents from several municipalities #513

Comments

BluntKatana commented Feb 12, 2025 • edited Loading

Problem

Some examples of missing documents

BluntKatana commented Feb 12, 2025 •

edited

Loading