Hi @stephenou,
thanks again for your ongoing development and support of this project! While #19 #18 aim to fix this issue, I just would like to reiterate why it would be super important to exclude all pages that are not whitelisted with a 404 or no-index header.
Yesterday, I saw that Google indexed 374 pages on my domain labeled as Indexed, not submitted in sitemap. Here's a screenshot with examples:

While I am not sure about the legal implications, it would be for sure nice to only have own pages in the index.
I am happy to contribute to the solution, let me know how I can help best.