2025 Quizzes¶

Make a python function that prints every 3rd and 5th number from a list

Make a function that takes a N by N matrix as in put and returns the diagonal

Q1: Suppose you have a collection of 2D data points, i.e., vectors representing coordinates

$a = (1,1)$, $b = (2,1)$, $c = (2,2)$

Draw a picture containing the three points and a line around the local region of all points within distance 1 of $c$ using distances based on:

(a). The Euclidean distance (L2-norm)

(b). The Chebychev distance (L-infinity-norm)

In other words if the above values are units of feet, draw a line around all points within 1 foot of $d$ using the norms listed.

Q2: Describe two different ways you could compute a ``genetic distance'' between two people, given their DNA sequences.

Recall DNA is a sequence composed of base pairs C,G, A, or T. Your distance can treat them as strings or somehow represent the letters with numbers.

Suppose we have data for age,weight, height, and blood glucose for 100 patients.

Use linear algebra to make a linear regression model to solve for blood glucose given the other variables

comprehension tests:

Suppose we have a Gaussian random variable $ x \sim N(\mu = -1,\sigma^2 = 2)$

Draw a plot of the distribution over $x$. Label the axes.
Give a plot or manual list of several (10) plausible samples drawn from this distribution.

Now suppose we have a 2D random vector $\mathbf u$ (containing two random variables $x$ and $y$)

Suppose the stats are $\boldsymbol\mu = \begin{pmatrix}10\\15\end{pmatrix}$, $\boldsymbol\Sigma = \begin{pmatrix}1, &0\\0, &5\end{pmatrix}$

Use calculus to find the location of the maximum of the one-dimensional Normal (Gaussian) distribution

Extract the name, date, and codenumber from the following filename, and put them as separate elements into a dict.

Then put the dict into an array, where the codenumber is used as array index:

'smith_10-01-2024_00125.data'

Starting from the joint probability density function $P(X_1, X_2, X_3 X_4)$.

Give the chain rule expansion into conditional probabilities.
In Markov models, a state only depends on the previous state. Show how this simplifies $P$.
Sketch a proof for the fact than the probability for a particular N-gram cannot be lower than for the (N+1)-gram formed by including an additional random variable (e.g., using the same N words plus one additional word).

Derive the relationship between a Naive Bayes classification and a N-gram probability estimate of the next word in a sequence.

Explain how to use Naive Bayes to predict the next word in a sequence.

You may find the following useful:

$$P(X,Y) = P(Y|X)P(X)$$$$P(X_1, X_2, ..., X_n) = P(X_1)P(X_2|X_1)P(X_3|X_2, X_1)...P(X_n|X_1, ..., X_{n-1})$$

Give a covariance matrix for the following 2D distributions.
1. a horizontal line
2. a vertical line
3. a diagonal line (make a reasonable guess)
Give examples of 4 points sampled from distributions with each of the above covariances. I.e., give four points from a distribution with covariance of (A), 4 points for (B), and 4 points for (C).

Suppose we have roughly Normally-distributed 2D dataset with a covariance that is:

(i) Primarily Horizontal 
(ii) Primarily Vertical
(iii) Primarily diagonal

For each case listed above

(a) give the eigenvectors of the covariance matrix
(b) give the principal component