# Daily Archives: January 25, 2021

## Express your representation as a set of words with associated frequencies but no normalization.

1.       Suppose that you want to find the articles that are strongly critical of some issue in Exercise 3. Which problem discussed in this chapter would you use?

2.       Consider a news article that discusses multiple topics. You want to obtain the portions of contiguous text associated with each topic. Which problem discussed in this chapter would you use in order to identify these segments?

3.       Assume that all article, pronouns, and prepositions are stop words. Perform a sensible stemming and case folding in the example of Exercise 1, and convert to a vectorspace representation. Express your representation as a set of words with associated frequencies but no normalization.

## Write a computer program to evaluate the cosine similarity between a pair of vectors.

1.       Compute the cosine similarity between the vector pair (1, 2, 3, 4, 0, 1, 0) and (4, 3, 2, 1, 1, 0, 0). Repeat the same computation with the Jaccard coefficient.

2.       Normalize each of the vectors in Exercise 5 to unit norm. Compute the Euclidean distance between the pair of normalized vectors. What is the relationship between this Euclidean distance and the cosine similarity computed in Exercise 5?

3.       Repeat Exercise 5 with the boolean representations of the two documents.

4.       Write a computer program to evaluate the cosine similarity between a pair of vectors.

## describe this constant in terms of the data matrix D?

Suppose that you are allowed to assume that at least one of the optimal solutions of the objective function in Exercise 3 must have mutually orthogonal columns in each of U and V , and in which each column of V is normalized to unit norm. (a) Use the optimality conditions of Exercise 3(a) to show that U must contain the largest eigenvectors of DDT in its columns and V must contain the largest eigenvectors of DT D in its columns. What is the value of the optimal objective function? (b) Show that the (length-normalized) optimal value for V that maximizes ||DV T ||2 F also contains the largest eigenvectors of DT D like (a) above. You are allowed to use the same assumption of orthonormal columns in….

## Implement the k-means algorithm for clustering.

1.       Implement the k-means algorithm for clustering.

2.       Suppose that you represent your corpus as a graph in which each document is a node, and the weight of the edge between a pair of nodes is equal to the cosine similarity between them. Interpret the single-linkage clustering algorithm in terms of this similarity graph.

3.       Suppose you were given only the similarity graph of Exercise 5 and not the actual documents. How would you perform k-means clustering with this input?

4.       For the case of hierarchical clustering algorithms, what is the complexity of centroid merging? How would you make it efficient?

## Implement the group-average linkage clustering algorithm.

1.       What is the number of possible clusterings of a data set of n points into k groups? What does this imply about the convergence behavior of algorithms whose objective function is guaranteed not to worsen from one iteration to the next?

2.       Implement the group-average linkage clustering algorithm.

3.       As discussed in the chapter, explicit feature engineering methods can be made faster and more accurate with Nystr¨om sampling. Spectral clustering has also been presented as a special case of kernel methods with explicit feature engineering in this chapter. Discuss the difficulties in using Nystr¨om sampling with spectral clustering. Can you think of any way of providing a reasonable approximation? [The second part of the question is open-ended without a crisp answer.

## . How would you (most simply) describe this constant in terms of the data matrix D?

Suppose that you are allowed to assume that at least one of the optimal solutions of the objective function in Exercise 3 must have mutually orthogonal columns in each of U and V , and in which each column of V is normalized to unit norm. (a) Use the optimality conditions of Exercise 3(a) to show that U must contain the largest eigenvectors of DDT in its columns and V must contain the largest eigenvectors of DT D in its columns. What is the value of the optimal objective function? (b) Show that the (length-normalized) optimal value for V that maximizes ||DV T ||2 F also contains the largest eigenvectors of DT D like (a) above. You are allowed to use the same assumption of orthonormal columns in….

## Implement the feature selection criterion for term strength.

1.       The Gini index criterion is discussed in this chapter (for cluster validity). Show how you can pair this criterion with the k-means algorithm to perform unsupervised feature selection. Which other cluster validity criterion (or criteria) can you use for unsupervised feature selection in this manner?

2.       Implement the feature selection criterion for term strength.

3.       Suppose your text documents have a representation in which you only know about the presence or absence of words in half the lexicon and you know the exact frequencies of words in the remaining half. Show how you can combine the Bernoulli and multivariate models to perform text clustering.

## Implement the k-means algorithm for clustering.

1.       Implement the k-means algorithm for clustering.

2.       Suppose that you represent your corpus as a graph in which each document is a node, and the weight of the edge between a pair of nodes is equal to the cosine similarity between them. Interpret the single-linkage clustering algorithm in terms of this similarity graph.

3.       Suppose you were given only the similarity graph of Exercise 5 and not the actual documents. How would you perform k-means clustering with this input?

4.       For the case of hierarchical clustering algorithms, what is the complexity of centroid merging? How would you make it efficient?

## What is the number of possible clusterings of a data set of n points into k groups?

1.       What is the number of possible clusterings of a data set of n points into k groups? What does this imply about the convergence behavior of algorithms whose objective function is guaranteed not to worsen from one iteration to the next?

2.       Implement the group-average linkage clustering algorithm.

3.       As discussed in the chapter, explicit feature engineering methods can be made faster and more accurate with Nystr¨om sampling. Spectral clustering has also been presented as a special case of kernel methods with explicit feature engineering in this chapter. Discuss the difficulties in using Nystr¨om sampling with spectral clustering. Can you think of any way of providing a reasonable approximation? [The second part of the question is open-ended without a crisp answer.