Self-tuning techniques for large scale cluster analysis on textual data collections