Information Technology > QUESTIONS & ANSWERS > Georgia Tech WEEK 2 HOMEWORK – SAMPLE SOLUTIONS IMPORTANT NOTE. 100% pass rate. (All)

WEEK 2 HOMEWORK – SAMPLE SOLUTIONS IMPORTANT NOTE These homework solutions show multiple approaches and some optional extensions for most of the questions in the assignment. You don’t need to s... ubmit all this in your assignments; they’re included here just to help you learn more – because remember, the main goal of the homework assignments, and of the entire course, is to help you learn as much as you can, and develop your analytics skills as much as possible! Question 1 Describe a situation or problem from your job, everyday life, current events, etc., for which a clustering model would be appropriate. List some (up to 5) predictors that you might use. Here’s one answer. An investor who wants to diversify a portfolio might want to cluster stocks, and then make sure the portfolio does not have too much money invested in any particular cluster. A common way of clustering is to just classify each company by economic sector or size, but there might be deeper similarities that aren’t captured by those factors. So, the investor might create factors related to each stock’s performance (such as percent increase/decrease in price) in each quarter over the past 5 years, or each stock’s performance in certain key days or intervals, etc. Stocks that behaved similarly would be clustered together. Question 2 The iris data set contains 150 data points, each with four predictor variables and one categorical response. The predictors are the width and length of the sepal and petal of flowers and the response is the type of flower. The data is available from the R library datasets and can be accessed with iris once the library is loaded. It is also available at the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Iris ). The response values are only given to see how well a specific method performed and should not be used to build the model. Use the R function kmeans to cluster the points as well as possible. Report the best combination of predictors, your suggested value of k, and how well your best clustering predicts flower type. Here’s one possible solution. Please note that a good solution doesn’t have to try all of the possibilities inthe code; they’re shown to help you learn, but they’re not necessary. The R code in file HW2-Q2.R shows clustering solutions for k=2,3,4,5 using all factors, for both unscaled and scaled data. Unscaled data Scaled data Cluster Setosa Versicolor Virgin-ica Cluster Setosa Versicolor Virgin-ica k=2 1 50 3 0 1 50 0 0 2 0 47 50 2 0 50 50 k=3 1 50 0 0 1 50 0 0 2 0 48 14 2 0 47 14 3 0 2 36 3 0 3 36 k=4 1 50 0 0 1 50 0 0 2 0 27 1 2 0 27 2 3 0 0 32 3 0 0 29 4 0 23 17 4 0 23 19 k=5 1 50 0 0 1 28 0 0 2 0 24 1 2 22 0 0 3 0 0 24 3 0 27 2 4 0 0 12 4 0 0 29 5 0 26 13 5 0 23 19 Table 1. Results using all factors For k=2, the setosa species is almost perfectly in one cluster, and the other two species (versicolor and virginica) are in the other cluster. For k=3,4,5, setosa is a perfect cluster. When k=4,5 there’s a nice cluster of versicolor, a nice cluster or two of virginica, and a cluster of about 40 points that is mixed between the two. k=3 is a little more ambiguous – so even though there are 3 species, it turns out that k=4,5 work better. The R code also shows clustering solutions for k=2,3,4,5 using only the Petal Length and Petal Width factors, for both unscaled and scaled data. Unscaled data Scaled data Cluster Setosa Versicolor Virgin-ica Cluster Setosa Versicolor Virgin-ica k=2 1 50 1 0 1 50 0 0 2 0 49 50 2 0 50 50 k=3 1 50 0 0 1 50 0 0 2 0 48 4 2 0 48 4 3 0 2 46 3 0 2 46 k=4 1 50 0 0 1 50 0 0 2 0 26 0 2 0 42 0 3 0 0 35 3 0 0 27 4 0 24 15 4 0 8 23 k=5 1 50 0 0 1 50 0 02 0 22 0 2 0 23 0 3 0 0 30 3 0 25 4 4 0 0 13 4 0 0 25 5 0 28 7 5 0 2 21 Table 2. Results using only Petal Length and Petal Width factors Using only the Petal Length and Petal Width factors significantly improves the k=3 solution, and the k=5 solution. Notice that for k=4 especially, using scaled data is a big improvement over using unscaled data. The R code also introduces the ggplot2 library for plotting, just for your learning pleasure – it’s not required for the assignment. Of course, we can only create the tables above because we happen to know the correct species for each data point. Normally when we’re doing clustering, we don’t have that information. Instead, we can look at a measure like the total distance between points and their cluster centers in each clustering solution, as shown in the elbow diagram below for scaled data using only the petal factors. Figure 1. Elbow diagram for scaled data using only petal factors. Based on this figure, the 3-cluster solution might be the one we would recommend, since k=3 is where the improvements level out. Question 3 Using crime data from http://www.statsci.org/data/general/uscrime.txt (description at http://www.statsci.org/data/general/uscrime.html), test to see whether there is an outlier in the last column (number of crimes per 100,000 people). Is the lowest-crime city an outlier? Is the highest-crime city an outlier? Use the grubbs.test function in the outliers package in R. 8 10 12 14 16 18 20 22 24 2 3 4 5Here’s one possible solution. Please note that a good solution doesn’t have to try all of the possibilities in the code; they’re shown to help you learn, but they’re not necessary. The file HW2-Q3.R contains R code and some explanation for the following approach. First, because the Grubbs test assumes normality, we start by running a normality test that you’ll probably remember from basic statistics: the Shapiro-Wilk test. The test actually suggests that the data is not normally distributed (p=0.001882) – but looking at the Q-Q plot below, it seems that the reason for the non-normality is the tails, which might imply that the test is affected by potential outliers. The middle of the distribution looks normal, so we’ll go ahead with the Grubbs test. Figure 2. Q-Q plot of the Crime column. Note here that this is really a judgment call. On the one hand, it could be that the Shapiro-Wilk test is identifying that the tails, especially on the upper end, are really not normally-distributed, enough so that the extreme values aren’t really outliers, they’re just part of the distribution. On the other hand, it could be that the distribution really is close enough to normal, and the reason it fails the Shapiro-Wilk test is that there’s outlying data. The Grubbs test’s validity depends on which of these is closer to true. In this case, let’s go on with the Grubbs test. At worst, it’ll either show that there aren’t outliers, or it’ll identify potential outliers – then we would (if this was more than a homework assignment) investigate those data points more carefully to see what’s going on, to determine whether they seem like a real part of the distribution or whether they’re real outliers [Show More]

Last updated: 1 year ago

Preview 1 out of 10 pages

Engineering> QUESTIONS & ANSWERS > HOMEWORK 2 – SAMPLE SOLUTIONS (All)

HOMEWORK 2 – SAMPLE SOLUTIONS IMPORTANT NOTE These homework solutions show multiple approaches and some optional extensions for most of the questions in the assignment. You don’t need to submit all...

By Nutmegs , Uploaded: May 20, 2022

**$8**

Engineering> QUESTIONS & ANSWERS > WEEK 2 HOMEWORK – SAMPLE SOLUTIONS (All)

WEEK 2 HOMEWORK – SAMPLE SOLUTIONS IMPORTANT NOTE These homework solutions show multiple approaches and some optional extensions for most of the questions in the assignment. You don’t need to sub...

By Nutmegs , Uploaded: May 20, 2022

**$8**

Engineering> QUESTIONS & ANSWERS > WEEK 2 HOMEWORK – SAMPLE SOLUTIONS LATEST UPDATE (All)

WEEK 2 HOMEWORK – SAMPLE SOLUTIONS IMPORTANT NOTE These homework solutions show multiple approaches and some optional extensions for most of the questions in the assignment. You don’t need to sub...

By Nutmegs , Uploaded: May 20, 2022

**$8.5**

Engineering> QUESTIONS & ANSWERS > WEEK 3 HOMEWORK – SAMPLE SOLUTIONS LATEST UPDATE (All)

WEEK 3 HOMEWORK – SAMPLE SOLUTIONS IMPORTANT NOTE These homework solutions show multiple approaches and some optional extensions for most of the questions in the assignment. You don’t need to sub...

By Nutmegs , Uploaded: May 20, 2022

**$8**

Engineering> QUESTIONS & ANSWERS > WEEK 11 HOMEWORK – SAMPLE SOLUTIONS (All)

WEEK 11 HOMEWORK – SAMPLE SOLUTIONS IMPORTANT NOTE These homework solutions show multiple approaches and some optional extensions for most of the questions in the assignment. You don’t need to su...

By Nutmegs , Uploaded: May 19, 2022

**$7.5**

Engineering> QUESTIONS & ANSWERS > WEEK 10 HOMEWORK – SAMPLE SOLUTIONS (All)

WEEK 10 HOMEWORK – SAMPLE SOLUTIONS IMPORTANT NOTE These homework solutions show multiple approaches and some optional extensions for most of the questions in the assignment. You don’t need to subm...

By Nutmegs , Uploaded: May 19, 2022

**$9**

Engineering> QUESTIONS & ANSWERS > ISYE 6501 WEEK 4 HOMEWORK – SAMPLE SOLUTIONS (All)

WEEK 4 HOMEWORK – SAMPLE SOLUTIONS IMPORTANT NOTE These homework solutions show multiple approaches and some optional extensions for most of the questions in the assignment. You don’t need to submi...

By Nutmegs , Uploaded: May 19, 2022

**$9**

Engineering> QUESTIONS & ANSWERS > ISYE 6501 WEEK 7 HOMEWORK – SAMPLE SOLUTIONS (All)

WEEK 7 HOMEWORK – SAMPLE SOLUTIONS IMPORTANT NOTE These homework solutions show multiple approaches and some optional extensions for most of the questions in the assignment. You don’t need to sub...

By Nutmegs , Uploaded: May 19, 2022

**$7.5**

Engineering> QUESTIONS & ANSWERS > ISYE 6501 HOMEWORK 9 – SAMPLE SOLUTIONS (All)

HOMEWORK 9 – SAMPLE SOLUTIONS IMPORTANT NOTE These homework solutions show multiple approaches and some optional extensions for most of the questions in the assignment. You don’t need to submit all...

By Nutmegs , Uploaded: May 19, 2022

**$9**

Engineering> QUESTIONS & ANSWERS > ISYE 6501 WEEK 4 HOMEWORK – SAMPLE SOLUTIONS (All)

WEEK 4 HOMEWORK – SAMPLE SOLUTIONS IMPORTANT NOTE These homework solutions show multiple approaches and some optional extensions for most of the questions in the assignment. You don’t need to submi...

By Nutmegs , Uploaded: May 19, 2022

**$9**

Connected school, study & course

**About the document**

Uploaded On

Sep 03, 2022

Number of pages

10

Written in

This document has been written for:

Uploaded

Sep 03, 2022

Downloads

0

Views

68

Avoid resits and achieve higher grades with the best study guides, textbook notes, and class notes written by your fellow students

Your fellow students know the appropriate material to use to deliver high quality content. With this great service and assistance from fellow students, you can become well prepared and avoid having to resits exams.

Your fellow student knows the best materials to research on and use. This guarantee you the best grades in your examination. Your fellow students use high quality materials, textbooks and notes to ensure high quality

Get paid by selling your notes and study materials to other students. Earn alot of cash and help other students in study by providing them with appropriate and high quality study materials.

In Browsegrades, a student can earn by offering help to other student. Students can help other students with materials by upploading their notes and earn money.

We're available through e-mail, Twitter, Facebook, and live chat.

FAQ

Questions? Leave a message!

Copyright © Browsegrades · High quality services·