5.2.2 Stats
From xkcd. To distract yourself from interviewing stress, here are more statistics jokes.
- [E] Explain frequentist vs. Bayesian statistics.
- [E] Given the array , find its mean, median, variance, and standard deviation.
- [M] When should we use median instead of mean? When should we use mean instead of median?
- [M] What is a moment of function? Explain the meanings of the zeroth to fourth moments.
- [M] Are independence and zero covariance the same? Give a counterexample if not.
- [E] Suppose that you take 100 random newborn puppies and determine that the average weight is 1 pound with the population standard deviation of 0.12 pounds. Assuming the weight of newborn puppies follows a normal distribution, calculate the 95% confidence interval for the average weight of all newborn puppies.
[M] Suppose that we examine 100 newborn puppies and the 95% confidence interval for their average weight is pounds. Which of the following statements is true?
- Given a random newborn puppy, its weight has a 95% chance of being between 0.9 and 1.1 pounds.
- If we examine another 100 newborn puppies, their mean has a 95% chance of being in that interval.
We're 95% confident that this interval captured the true mean weight.
Hint: This is a subtle point that many people misunderstand. If you struggle with the answer, Khan Academy has a great article on it.
- [H] Suppose we have a random variable supported on from which we can draw samples. How can we come up with an unbiased estimate of the median of ?
- [H] Can correlation be greater than 1? Why or why not? How to interpret a correlation value of 0.3?
- The weight of newborn puppies is roughly symmetric with a mean of 1 pound and a standard deviation of 0.12. Your favorite newborn puppy weighs 1.1 pounds.
- [E] Calculate your puppy’s z-score (standard score).
- [E] How much does your newborn puppy have to weigh to be in the top 10% in terms of weight?
- [M] Suppose the weight of newborn puppies followed a skew distribution. Would it still make sense to calculate z-scores?
- [H] Tossing a coin ten times resulted in 10 heads and 5 tails. How would you analyze whether a coin is fair?
- Statistical significance.
- [E] How do you assess the statistical significance of a pattern whether it is a meaningful pattern or just by chance?
- [E] What’s the distribution of p-values?
- [H] Recently, a lot of scientists started a war against statistical significance. What do we need to keep in mind when using p-value and statistical significance?
- Variable correlation.
- [M] What happens to a regression model if two of their supposedly independent variables are strongly correlated?
- [M] How do we test for independence between two categorical variables?
- [H] How do we test for independence between two continuous variables?
- [E] A/B testing is a method of comparing two versions of a solution against each other to determine which one performs better. What are some of the pros and cons of A/B testing?
- [M] You want to test which of the two ad placements on your website is better. How many visitors and/or how many times each ad is clicked do we need so that we can be 95% sure that one placement is better?
- [M] Your company runs a social network whose revenue comes from showing ads in newsfeed. To double revenue, your coworker suggests that you should just double the number of ads shown. Is that a good idea? How do you find out?
Imagine that you have the prices of 10,000 stocks over the last 24 month period and you only have the price at the end of each month, which means you have 24 price points for each stock. After calculating the correlations of 10,000 * 9,9992 pairs of stock, you found a pair that has the correlation to be above 0.8.
- [E] What’s the probability that this happens by chance?
- [M] How to avoid this kind of accidental patterns?
Hint: Check out the curse of big data.
- [H] How are sufficient statistics and Information Bottleneck Principle used in machine learning?
This book was created by Chip Huyen with the help of wonderful friends. For feedback, errata, and suggestions, the author can be reached here.