You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've found that in the notubiz API there are several 'hidden' agenda items and documents which are currently not being scraped resulting in a large difference between documents actually available on the municipalities sites and on ORI.
There are two main issues I have found. Both of which are related to the agenda items properties. Currently an agenda item from a meeting (.agenda_items[]) is parsed only on the .documents[]. However there are two more properties which are interesting:
.module_items[]: A module item in itself does not look interesting (see below). But once fetching this item using the .self-property we find that a module item can have several documents containing it. (see point 6 on municipality website)
(see 7th agenda item: https://api.notubiz.nl/events/meetings/1152031?format=json&version=1.17.0)
.agenda_items[]: An agenda item itself can contain several more agenda items which (again) do not look interesting at first (see below), but when fetching them outright they can ofcourse contain documents again (and even more agenda items or module items..) (they have a special suffix on the municipality website)
(see 14th agenda item: https://api.notubiz.nl/events/meetings/1161553?format=json&version=1.17.0)
Some examples of missing documents
(note that my simple scraper is also missing some documents atm, but has better coverage for the notubiz api)
Breda
year
scraped_from_notubiz
scraped_from_ori
in_notubiz_not_in_ori
in_ori_not_in_notubiz
2014
0
126
0
126
2015
0
578
0
578
2016
3740
1000
2856
116
2017
1745
1000
933
188
2018
431
243
198
10
2019
1915
218
1698
1
2020
227
157
72
2
2021
193
184
10
1
2022
220
186
37
3
2023
240
221
22
3
2024
206
155
56
5
Waddinxveen
year
scraped_from_notubiz
scraped_from_ori
in_notubiz_not_in_ori
in_ori_not_in_notubiz
2014
0
964
0
964
2015
0
1000
0
1000
2016
5166
1000
4391
225
2017
2362
0
2362
0
2018
1514
988
601
75
2019
1603
993
647
37
2020
1593
469
1125
1
2021
1544
695
926
77
2022
1636
1000
662
26
2023
1523
1000
554
31
2024
1120
698
459
37
Bunschoten
Enkhuizen
IJsstelstein
The text was updated successfully, but these errors were encountered:
BluntKatana
changed the title
fix: missing documents from several notubiz municipalities
fix(notubiz): missing documents from several municipalities
Feb 12, 2025
Problem
I've found that in the notubiz API there are several 'hidden' agenda items and documents which are currently not being scraped resulting in a large difference between documents actually available on the municipalities sites and on ORI.
There are two main issues I have found. Both of which are related to the agenda items properties. Currently an agenda item from a meeting (
.agenda_items[]
) is parsed only on the.documents[]
. However there are two more properties which are interesting:.module_items[]
: A module item in itself does not look interesting (see below). But once fetching this item using the.self
-property we find that a module item can have several documents containing it.(see point 6 on municipality website)
(see 7th agenda item:
https://api.notubiz.nl/events/meetings/1152031?format=json&version=1.17.0
).agenda_items[]
: An agenda item itself can contain several more agenda items which (again) do not look interesting at first (see below), but when fetching them outright they can ofcourse contain documents again (and even more agenda items or module items..)(they have a special suffix on the municipality website)
(see 14th agenda item:
https://api.notubiz.nl/events/meetings/1161553?format=json&version=1.17.0
)Some examples of missing documents
(note that my simple scraper is also missing some documents atm, but has better coverage for the notubiz api)
Breda
Waddinxveen
Bunschoten

Enkhuizen

IJsstelstein

The text was updated successfully, but these errors were encountered: