Computer Science > Solutions Guide > Georgia Institute Of TechnologyISYE 6501Week_10_Homework_Solutions_-_Spring2021.VERIFIED CORRECT SOL (All)

Georgia Institute Of TechnologyISYE 6501Week_10_Homework_Solutions_-_Spring2021.VERIFIED CORRECT SOLUTIONS

Document Content and Description Below

WEEK 10 HOMEWORK – SAMPLE SOLUTIONS IMPORTANT NOTE These homework solutions show multiple approaches and some optional extensions for most of the questions in the assignment. You don’t need to ... submit all this in your assignments; they’re included here just to help you learn more – because remember, the main goal of the homework assignments, and of the entire course, is to help you learn as much as you can, and develop your analytics skills as much as possible! Question 14.1 The breast cancer data set breast-cancer-wisconsin.data.txt from http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/ (description at http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29 ) has missing values. 1. Use the mean/mode imputation method to impute values for the missing data. 2. Use regression to impute values for the missing data. 3. Use regression with perturbation to impute values for the missing data. 4. (Optional) Compare the results and quality of classification models (e.g., SVM, KNN) build using (1) the data sets from questions 1,2,3; (2) the data that remains after data points with missing values are removed; and (3) the data set when a binary variable is introduced to indicate missing values. Here’s one possible solution. Please note that a good solution doesn’t have to try all of the possibilities in the code; they’re shown to help you learn, but they’re not necessary. The file solution 14.1.R shows one possible solution. In it, missing data is identified (only variable V7 has any, and it is only a small amount). Five different data sets are created to deal with the missing data: (1) Replacing missing values with the mode. This could have gone either way (mode or mean). The data is categorical, but it takes integer values from 1 to 10, and as we’ll see later the values seem to have some relative meaning, so they’re also somewhat continuous. (2) Using regression to estimate missing values. Here too could have gone either way (see above)… but since we didn’t cover multinomial logistic regression in this course, the solutions treat the data as continuous for this part. Once the missing values are estimated, the estimates are rounded (because the original values are all integer) and values larger or smaller than the extremes are shrunk to the extremes. (3) Using regression plus perturbation. (4) Removing rows with missing data. (5) Adding a binary variables to indicate when data is missing, and adding the necessary interaction variables also. Once the data sets have been created, we use KNN (for k=1,2,3,4,5) and SVM (C=0.0001,0.001,0.01,0.1,1,10) to create classification models, and measure their quality [Show More]

Last updated: 1 year ago

Preview 1 out of 7 pages

Reviews( 0 )

$6.00

Add to cart

Instant download

Can't find what you want? Try our AI powered Search

OR

GET ASSIGNMENT HELP
155
0

Document information


Connected school, study & course


About the document


Uploaded On

Sep 27, 2021

Number of pages

7

Written in

Seller


seller-icon
Dr Medina Reed

Member since 2 years

54 Documents Sold


Additional information

This document has been written for:

Uploaded

Sep 27, 2021

Downloads

 0

Views

 155

Document Keyword Tags

More From Dr Medina Reed

View all Dr Medina Reed's documents »

Recommended For You

What is Browsegrades

In Browsegrades, a student can earn by offering help to other student. Students can help other students with materials by upploading their notes and earn money.

We are here to help

We're available through e-mail, Twitter, Facebook, and live chat.
 FAQ
 Questions? Leave a message!

Follow us on
 Twitter

Copyright © Browsegrades · High quality services·