General tips

Know how to do the things in the recent quizzes and homeworks.

Note that the quizzes are necessarily very short, so the questions are broad (e.g. describe in a few sentences how do you implement something). Whereas on an exam you will be asked to actually implement the solution for a set of data.

The exam will overwhelmingly favor recent material (after the first midterm).

Understand what a language model is and how to implement it with a network. I.e. it's just a function like any other we model with a neural network, except its inputs are usually (one or more) words and its output is a softmax that is interpreted as a probability distribution.

Know how to take a dataset consisting of a single long string of text, and convert it into a machine learning dataset for training a language model.

Know how to make adjacency matrices from a matrix consisting of data

Know the steps for spectral embedding of graph

compare and contrast embedding matrix, term document matrix, and tf-idf model

Practice questions

Q1: How generally do you make words into vectors using one-hot encoding?

Q2: How can an embedding matrix may be implemented with a neural network?

Q3: How would you convert a single long sequence into a collection of samples for training a language model?

  1. using a feedforward network
  2. using a recurrent network

A1: For a vocabulary of size V unique words, make a dictionary where the key is the word and the value is its index in the vocabulary. For each word, use a unique index k_word. Then for each word, produce a vector of length V which has a one in the kth element and zeros elsewhere.

A2: Directly use the embedding matrix as the weight matrix for a dense layer (be able to draw this). I.e. the dense layer should compute $y = \sigma(\mathbf E^T \mathbf x + \mathbf b) = \mathbf E^T \mathbf x$, because we set the layer to not use activation or bias.

A3.1: for a feedforward network, take each N words in a sliding window (i.e. for sample i, use the ith word through the (i+N)th word) as a sequence, then use the (one-hot encoded) Nth word as the target $\mathbf y^{(i)}$ and the previous (one-hot encoded) N-1 words as the sample input vector $\mathbf x^{(i)}$. The output layer is a softmax over possible words. There are other ways too.

A3.2: For a recurrent network you can simply use the ith word for the target $\mathbf y^{(i)}$ and the (i-1)th word for the input $\mathbf x^{(i)}$.

Q: Describe the function of an embedding layer ?

A: An embedding layer implements the multiplication of an embedding matrix with a one-hot encoded input vector. The input is typically a one-hot encoded vector and the output is an embedded version of the vector.

Q: Describe how an embedding layer differs from a normal neural network layer?

A: An embedding layer is like a dense layer, but with no activation function or bias for the nodes.

Q: The precision matrix is the inverse of: (a) the covariance matrix (b) the correlation matrix (c) the adjacency matrix

A: (a) the covariance matrix

Q: What is the difference between a neural language model and a N-gram language model?

A: a neural language model relates embeddings of words rather than words themselves.

Q: What is the difference between a recurrent neural language model and a feedforward neural language model?

A: a feedforward neural language model predicts the next word using a limited number of inputs, similar to an $N$-gram. While the output of a recurrent neural language model depends on all previous words.

Q: how is the probability distribution implemented in a neural language model?

A: the output layer is a softmax over different outputs representing words. The softmax values can be interpreted as probabilities for each word.

Q: How do you autogenerate text for a particular topic like sports with a recurrent neural language model?

A: Feed a sequence of words using real sports text into the neural language model for some "prefix" length L, then after input L start using the outputs of the language to choose inputs. I.e. take the prediction of the most likeyl next word and use it as the next input word.