Statistics > QUESTIONS & ANSWERS > ISYE 6501 - Midterm 2 Questions and Answers Rated A (All)

ISYE 6501 - Midterm 2 Questions and Answers Rated A when might overfitting occur Correct Answer-when the # of factors is close to or larger than the # of data points causing the model to potentiall... y fit too closely to random effects Why are simple models better than complex ones Correct Answer-less data is required; less chance of insignificant factors and easier to interpret what is forward selection Correct Answer-we select the best new factor and see if it's good enough (R^2, AIC, or p-value) add it to our model and fit the model with the current set of factors. Then at the end we remove factors that are lower than a certain threshold what is backward elimination Correct Answer-we start with all factors and find the worst on a supplied threshold (p = 0.15). If it is worse we remove it and start the process over. We do that until we have the number of factors that we want and then we move the factors lower than a second threshold (p = .05) and fit the model with all set of factors what is stepwise regression Correct Answer-it is a combination of forward selection and backward elimination. We can either start with all factors or no factors and at each step we remove or add a factor. As we go through the procedure after adding each new factor and at the end we eliminate right away factors that no longer appear. what type of algorithms are stepwise selection? Correct Answer-Greedy algorithms - at each step they take one thing that looks best what is LASSO Correct Answer-a variable selection method where the coefficients are determined by both minimizing the squared error and the sum of their absolute value not being over a certain threshold t How do you choose t in LASSO Correct Answer-use the lasso approach with different values of t and see which gives the best trade offwhy do we have to scale the data for LASSO Correct Answer-if we don't the measure of the data will artificially affect how big the coefficients need to be What is elastic net? Correct Answer-A variable selection method that works by minimizing the squared error and constraining the combination of absolute values of coefficients and their squares what is a key difference between stepwise regresson and lasso regression Correct Answer-If the data is not scaled, the coefficients can have artificially different orders of magnitude, which means they'll have unbalanced effects on the lasso constraint. Why doesn't Ridge Regression perform variable selection? Correct Answer-The coefficients values are squared so they go closer to zero or regularizes them What are the pros and cons of Greedy Algorithms (Forward selection, stepwise elimination, stepwise regression) Correct Answer-Good for initial analysis but often don't perform as well on other data because they fit more to random effects than you'd like and appear to have a better fit What are the pros and cons of LASSO and elastic net Correct Answer-They are slower but help make models that make better predictions Which two methods does elastic net look like it combines and what are the downsides from it? Correct Answer-Ridge Regression and LASSO. Advantages: variable selection from LASSO and Predictive benefits of LASSO. Disadvantages: Arbitrarily rules out some correlated variables like LASSO (don't know which one that is left out should be); Underestimates coefficients of very predictive variables like Ridge Regresison What are some downsides of surveys? Correct Answer-Even if you what appears to be a representative sample in simple ways, maybe it isn't in more complex ways. If we're testing to see whether red cars sell for higher prices than blue cars, we need to account for the type and age of the cars in our data set. This is called: Correct Answer-Controllingwhat is a blocking factor Correct Answer-a source of variability that is not of primary interest to the experimenter what is an example of a blocking factor Correct Answer-The type of car, sports car or family car, is a blocking factor that it could account for some of the difference between red cars and blue cars. Because sports cars are more likely to be red; if we account for the difference, we can reduce the variability in our estimates Under what conditions should you run A/B tests Correct Answer-When you can collect data quickly. When the data is representative and the amount of data is small compared to the whole population Do you have to decide the sample size ahead of time for A/B tests Correct Answer-no, and we can run the hypothesis test anytime we want What is full factorial design Correct Answer-you test every combination and then use ANOVA to determine importance of each factor What is fractional factorial design Correct Answer-when you test a subset of the entire set of combinations What is a balanced design? Correct Answer-You test each choice the same # of times and each pair of choices the same # of times When is regression effective work well to determine important factors? Correct Answer-If there aren't significant interactions between the factors. what is exploration? Correct Answer-focusing on getting more information; in this case, to determine with more certainty which ad is really the best what is exploitation Correct Answer-we're focused on getting immediate value; in this example, to show the add that seems to be doing best so far, because it seems to be most likely to be clicked.what is the multi-armed bandit approach and how does it balance exploration and exploitation. Correct Answer-We start with no info and have an equal probability of selecting each alternative. After performing some tests, we've gotten more information, so we can update the probabilities of each one being best and start assigning new tests according to those probabilities. We keep testing multiple alternatives; so, we're still doing exploration. But we make it more likely to pick the best ones so we're also doing exploitation What are some of the parameters in the multi-armed bandit approach Correct Answer-number of tests between recalculating probabilities; how to update the probabilities; and how to pick an alternative to test based on probabilities and/or expected values. For updating we can use bayesian updates or estimate from the observed distribution What are common reasons that data sets are missing values? Correct Answer-* a person accidentally types in the wrong value * a person did not want to reveal the true value * an automated system did not work correctly to record the value What are some examples of why there might be bias in missing data Correct Answer-* Income: people with higher incomes are less likely to omit this answer * Radar gun: a car that passes the radar gun very slowly might be treated as an anomaly and its speed might be recorded in the system * Heart transplants: If there's a variable "date of death" it will be missing for patients still living and thus the missing data will naturally include more successful transplant cases What are three ways of dealing with missing data that don't require imputation Correct Answer-discard the data, use categorical variables to indicate missing data, estimate missing values What are the pros and cons of throwing away missing data Correct Answer-Pros: not potentially introducing errors; easy to implement Cons: don't want to lose to many data points; potential for censored or biased missing dataWhat is the categorical variable approach Correct Answer-If the data is categorical, we just add another category "missing". With quantitative variables you include interactions variables between the categorical variable and other variables. Why wouldn't you want to fill in missing quantitative variabes with 0 Correct Answer-It can lead to problems if some types of data points are more likely than others to have missing data. The coefficients of the other variables might be pulled in one direction or another to try to account for the missing data What are the advantages and disadvantages of imputing missing data with the mean, median (numeric) or mode (categorical) Correct Answer-Advantage: hedge against being too wrong and easy to compute Disadvantage: it can be biased imputation. Example people with high income less likely to answer survey and thus the mean/median will underestimate the missing value What are the advantages and disadvantages of using regression for imputation Correct Answer-It reduces or eliminates the problem of bias. Also gives better values for missing data Disadvantages: we have to build, validate and test a whole other model just to fill in the missing data and then we have to do it all over again to get the answer we want. Also we are using the same data twice: once for imputation and a second time to fit the model How does adding variability to a regression imputation compare to one without Correct Answerwithout: more accurate on average but has less accurate variability with: it's less accurate on average but has more accurate variability When should you not use imputation? Correct Answer-When more than 5% of the data is moving per factor what is the binomial distribution Correct Answer-the probability of getting x successes out of n independent identically distributed Bernoulli (p) trials; count of successful coin flips in n trials What happens when n is big for binomial distribution Correct Answer-it converges to normal distributionwhat is a Bernoulli distribution Correct Answer-it's like a flipping coin. It can be used to model a single event and is most useful when we put many of them together what are some examples of a geometric distribution Correct Answer-How many interviews until first job offer; how many hits until a baseball bat breaks what is a geometric distribution? Correct Answer-How many Bernoulli trials until ...; It is the probability of having x Bernoulli(p) falures until first success or having Bernoulli(p) success until first failure In a geometric distribution what is the value that is set to a power Correct Answer-The thing you're trying to see how manxy X until something What are the assumptions does a geometric distribution make? Correct Answer-Each Bernoulli trial is independent and identically distributed what is the Poisson distribution good at modeling Correct Answer-random arrivals what does the Poisson distribution assume Correct Answer-arrivals are independent and identically distributed If arrivals are poisson what then the interarrival time is what type of distribution Correct Answerexponential If the interarrival time is exponential what type of distribution is the arrival Correct Answer-poisson what is the difference between Weibull and geometric distribution Correct Answer-weibull - time between failures; geometric - number of tries between failures What is the weibull distribution useful for modeling Correct Answer-time it takes something to fail, specifically time between failuresWhat does k < 1 mean in a weibull distribution Correct Answer-modeling when failure rate decreases with time; worst things fail first (mechancial parts), the parts that are left are the better ones and take longer to fail What does k > 1 mean in a weibull distribution Correct Answer-The more worn they get the more likely it is that they'll fail soon, so we'll observe fewer failures at first and more later on What do q-q plots help visual Correct Answer-if two data sets follow the same distribution. why are q-q plots sometimes better than statistical tests Correct Answer-sometimes the statistical test will lead us in the wrong direction because most points might match but may be bad matches at the ends what is the memoryless property Correct Answer-it doesn't matter what's happened in the past, all that matters is where we are now If the data fits exponential distribution is it memoryless? Correct Answer-Yes If a data is memoryless is it exponential Correct Answer-yes Which distributions are memoryless Correct Answer-poisson and exponential Can a distribution not be memoryless and still be exponential Correct Answer-no [Show More]

Last updated: 1 month ago

Preview 1 out of 17 pages

Buy this document to get the full access instantly

Instant Download Access after purchase

Add to cartInstant download

We Accept:

111

0

Connected school, study & course

**About the document**

Uploaded On

Sep 07, 2022

Number of pages

17

Written in

This document has been written for:

Uploaded

Sep 07, 2022

Downloads

0

Views

111

In Browsegrades, a student can earn by offering help to other student. Students can help other students with materials by upploading their notes and earn money.

We're available through e-mail, Twitter, Facebook, and live chat.

FAQ

Questions? Leave a message!

Copyright © Browsegrades · High quality services·