Some servers force gzip compression on their content, which HtmlFetcher does not deal gracefully with because urllib2 assumes non-compressed content. Cheapest/easiest solution would be to check the encoding header on the response and decompress with zlib if it's gzipped. More ambitious/heavy solution would be to move over to something like requests rather than urllib2.