1. Suppose that a list contains the values 20 44 48 55 62 66 74 88 93 99 at index positions 0 through 9. Trace the values of the variables….
Implement the feature selection criterion for term strength.
1. The Gini index criterion is discussed in this chapter (for cluster validity). Show how you can pair this criterion with the k-means algorithm to perform unsupervised feature selection. Which other cluster validity criterion (or criteria) can you use for unsupervised feature selection in this manner?
2. Implement the feature selection criterion for term strength.
3. Suppose your text documents have a representation in which you only know about the presence or absence of words in half the lexicon and you know the exact frequencies of words in the remaining half. Show how you can combine the Bernoulli and multivariate models to perform text clustering.