It seems like a compromise between quality and speed will have to be reached Next step may be to combine TF-IDF for domain and subdomain. Also check sparsity of both matrices and if the TruncatedSVD is even suitable.