Wikipedia Corpus Builder is a toolkit for creating clean (i.e. most content that usually are of little use for most NLP and IR tasks is removed) corpora from database snapshots of Mediawiki powered wikis.
It is currently being reworked in order to make it more usable for the public.
Documentation is emerging at http://moin.delph-in.net/WcbTop .