1. Suppose that a list contains the values 20 44 48 55 62 66 74 88 93 99 at index positions 0 through 9. Trace the values of the variables….
What is the expected dot product between a pair with 50,000 words each?
1. Consider a text corpus with 106 documents, a lexicon of size 105, and 100 distinct words per document, which is represented as a bag of words with frequencies. (a) What is the amount of space required to store the entire data matrix without any optimization? (b) Suggest a sparse data format to store the matrix and compute the space required.
2. In Exercise 1, let us represent the documents in 0-1 format depending on whether or not a word is present in the document. Compute the expected dot product between a pair of documents in each of which 100 words are included completely at random. What is the expected dot product between a pair with 50,000 words each? What does this tell you about the effect of document length on the computation of the dot product?