Computer Science > QUESTIONS & ANSWERS > University of California, Berkeley DS 100 sp18_hw2_solution.ipynb at master DS-100_sp18 GitHub (All)

University of California, Berkeley DS 100 sp18_hw2_solution.ipynb at master DS-100_sp18 GitHub

Document Content and Description Below

Homework 2: Food Safety Course Policies Here are some important course policies. These are also located at http://www.ds100.org/sp18/ (http://www.ds100.org/sp18/). Collaboration Policy Data scienc... e is a collaborative activity. While you may talk with others about the homework, we ask that you write your solutions individually. If you do discuss the assignments with others please include their names at the top of your solution. Due Date This assignment is due at 11:59pm Tuesday, February 6th. Instructions for submission are on the website. Homework 2: Food Safety Cleaning and Exploring Data with Pandas <img src="scoreCard.jpg" width=400> In this homework, you will investigate restaurant food safety scores for restaurants in San Francisco. Above is a sample score card for a restaurant. The scores and violation information have been made available by the San Francisco Department of Public Health, and we have made these data available to you via the DS 100 repository. The main goal for this assignment is to understand how restaurants are scored. We will walk through the various steps of exploratory data analysis to do this. To give you a sense of how we think about each discovery we make and what next steps it leads to we will provide comments and insights along the way. As we clean and explore these data, you will gain practice with: Reading simple csv files Working with data at different levels of granularity Identifying the type of data collected, missing values, anomalies, etc. Exploring characteristics and distributions of individual variables Question 0 To start the assignment, run the cell below to set up some imports and the automatic tests that we will need for this assignment: In many of these assignments (and your future adventures as a data scientist) you will use os, zipfile, pandas, numpy, matplotlib.pyplot, and seaborn. 1. Import each of these libraries as their commonly used abbreviations (e.g., pd, np, plt, and sns). 2. Don't forget to use the jupyter notebook "magic" to enable inline matploblib plots (http://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-matplotlib). 3. Add the line sns.set() to make your plots look nicer. In [1]: import os import zipfile import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline sns.set() In [2]: import sys assert 'zipfile'in sys.modules assert 'pandas'in sys.modules and pd assert 'numpy'in sys.modules and np assert 'matplotlib'in sys.modules and plt assert 'seaborn'in sys.modules and sns Downloading the data As you saw in lectures, we can download data from the internet with Python. Using the utils.py file from the lectures (see link (http://www.ds100.org/sp18/assets/lectures/lec05/utils.py)), define a helper function fetch_and_cache to download the data with the following arguments: data_url: the web address to download file: the file in which to save the results data_dir: (default="data") the location to save the data f if t th fil i l d l d d4/18/2018 sp18/hw2_solution.ipynb at master · DS-100/sp18 · GitHub https://github.com/DS-100/sp18/blob/master/hw/hw2/solution/hw2_solution.ipynb 3/19 force: if true the file is always re-downloaded This function should return pathlib.Path object representing the file. In [3]: import requests from pathlib import Path def fetch_and_cache(data_url, file, data_dir="data", force=False): """ Download and cache a url and return the file object. data_url: the web address to download file: the file in which to save the results. data_dir: (default="data") the location to save the data force: if true the file is always re-downloaded return: The pathlib.Path object representing the file. """ ### BEGIN SOLUTION data_dir = Path(data_dir) data_dir.mkdir(exist_ok = True) file_path = data_dir / Path(file) # If the file already exists and we want to force a download then # delete the file first so that the creation date is correct. if force and file_path.exists(): file_path.unlink() if force or not file_path.exists(): print('Downloading...', end=' ') resp = requests.get(data_url) with file_path.open('wb') as f: f.write(resp.content) print('Done!') else: import time last_modified_time = time.ctime(file_path.stat().st_mtime) print("Using cached version last modified (UTC):", last_modified_time) return file_path ### END SOLUTION Now use the previously defined function to download the data from the following URL: http://www.ds100.org/sp18/assets/datasets/hw2- SFBusinesses.zip (http://www.ds100.org/sp18/assets/datasets/hw2-SFBusinesses.zip) In [4]: data_url = 'http://www.ds100.org/sp18/assets/datasets/hw2-SFBusinesses.zip' file_name = 'data.zip' data_dir = '.' dest_path = fetch_and_cache(data_url=data_url, data_dir=data_dir, file=file_name) print('Saved at {}'.format(dest_path)) Loading Food Safety Data To begin our investigation, we need to understand the structure of the data. Recall this involves answering questions such as Is the data in a standard format or encoding? Is the data organized in records? What are the fields in each record? There are 4 files in the data directory. Let's use Python to understand how this data is laid out. Use the zipfile library to list all the files stored in the dest_path directory. Creating a ZipFile object might be a good start (the Python docs (https://docs.python.org/3/library/zipfile.html) have further details). In [5]: # Fill in the list_files variable with a list of all the names of the files in the zip file my_zip = ... list_names = ... ### BEGIN SOLUTION my_zip = zipfile.ZipFile(dest_path, 'r') list_names = [f.filename for f in my_zip.filelist] print(list_names) ### END SOLUTION In [6]: assert isinstance(my_zip, zipfile.ZipFile) assert isinstance(list_names, list) assert all([isinstance(file, str) for file in list_names]) Using cached version last modified (UTC): Wed Feb 7 17:46:26 2018 Saved at data.zip ['violations.csv', 'businesses.csv', 'inspections.csv', 'legend.csv']4/18/2018 sp18/hw2_solution.ipynb at master · DS-100/sp18 · GitHub https://github.com/DS-100/sp18/blob/master/hw/hw2/solution/hw2_solution.ipynb 4/19 ### BEGIN HIDDEN TESTS assert set(list_names) == set(['violations.csv', 'businesses.csv', 'inspections.csv', 'legend.csv']) ### END HIDDEN TESTS Now display the files' names and their sizes. You might want to check the attributes of a ZipFile object. In [7]: ### BEGIN SOLUTION zf = zipfile.ZipFile(dest_path, 'r') for file in zf.filelist: print('{}\t{}'.format(file.filename, file.file_size)) ### END SOLUTION Question 1a From the above output we see that one of the files is relatively small. Still based on the HTML notebook (http://www.ds100.org/sp18/assets/lectures/lec03/03-live-datatables-indexes-pandas.html) of Prof. Perez, display the 5 first lines of this file. In [8]: file_to_open = ... ### BEGIN SOLUTION file_to_open = 'legend.csv' with zf.open(file_to_open) as f: for i in range(5): print(f.readline().rstrip().decode()) ### END SOLUTION In [9]: assert isinstance(file_to_open, str) ### BEGIN HIDDEN TESTS assert file_to_open == 'legend.csv' ### END HIDDEN TEST [Show More]

Last updated: 1 year ago

Preview 1 out of 19 pages

Reviews( 0 )

Recommended For You

 Business> QUESTIONS & ANSWERS > CE Shop, CE Shop Final Exam Questions and Answers 100% Pass (All)

preview
CE Shop, CE Shop Final Exam Questions and Answers 100% Pass

CE Shop Final Exam Questions and Answers 100% Pass Blaire apparently hasn't learned her lesson. She paid a $5,000 penalty when MREC found her guilty of misrepresentation, a $10,000 penalty when she...

By Nutmegs , Uploaded: Jul 16, 2023

$11

 Ryanair security> QUESTIONS & ANSWERS > Ryanair Conversion; Ryanair Conversion Questions and Answers 100% Pass Rate. (All)

preview
Ryanair Conversion; Ryanair Conversion Questions and Answers 100% Pass Rate.

Ryanair Conversion Questions and Answers 100% Pass How many passengers seats does a Boeing 737-800 have? ✔✔189 How many passenger seats does a Boeing 737-8200 have? ✔✔197 What row are the MED door...

By Nutmegs , Uploaded: Jun 09, 2023

$11

 *NURSING> QUESTIONS & ANSWERS > Advanced Cardiovascular Life Support Exam Version A (50 questions) and solution (All)

preview
Advanced Cardiovascular Life Support Exam Version A (50 questions) and solution

1. You find an unresponsive patient who is not breathing. After activating the emergency response system, you determine that there is no pulse. What is your next action? A. Open the airway with a h...

By arp , Uploaded: Jul 18, 2022

$11

 Health Care> QUESTIONS & ANSWERS > Advanced Cardiovascular Life Support Exam Version B (50 questions) (All)

preview
Advanced Cardiovascular Life Support Exam Version B (50 questions)

Advanced Cardiovascular Life Support Exam Version B (50 questions) Please do not mark on this exam. Record the best answer on the separate answer sheet. 1. What should be done to minimize interrupt...

By Nutmegs , Uploaded: Jun 03, 2022

$13

 Health Care> QUESTIONS & ANSWERS > NRNP 6552 WEEK 6 MIDTERM EXAM LATEST. Contains 100 Q & A. Score 100% (All)

preview
NRNP 6552 WEEK 6 MIDTERM EXAM LATEST. Contains 100 Q & A. Score 100%

NRNP 6552 MIDTERM EXAM LATEST 2023-(NRNP6552 WEEK 6 MIDTERM 100 Q & A-VERIFIED ANSWERS)

By NurseEunice , Uploaded: Apr 22, 2023

$12

 English> QUESTIONS & ANSWERS > LETRS Units 5 - 8 Pre & Post Test Graded A (All)

preview
LETRS Units 5 - 8 Pre & Post Test Graded A

LETRS Units 5 - 8 Pre & Post Test Graded A Once students decode well, which statement describes the relationship between vocabulary and reading comprehension? - ANSWER Vocabulary is the best single...

By emily009 , Uploaded: Jan 18, 2023

$9.5

 Computer Science> QUESTIONS & ANSWERS > CCNA 2 TM357 The Open University : You are working on a free form Packet Tracer challenge activity. You have been given the 'London Railways' network. (All)

preview
CCNA 2 TM357 The Open University : You are working on a free form Packet Tracer challenge activity. You have been given the 'London Railways' network.

You are working on a free form Packet Tracer challenge activity - as seen in Figure 1, you have been given the 'London Railways' network.' The purpose of this EMA question is to build upon each of the...

By CourseWorks,Inc , Uploaded: Mar 23, 2023

$9.5

 Medical Studies> QUESTIONS & ANSWERS > Atls Post-Test-NEW GRADED 100 %% Shinta R. Widya, MD (All)

preview
Atls Post-Test-NEW GRADED 100 %% Shinta R. Widya, MD

Shinta R. Widya, MD–Post Test ATLS11.Which of the following signs is LEAST reliable for diagnosing esophageal intubation?aSymmetrical chest wall movementbEnd tidal CO2 presence by colorimetrycBilatera...

By denim NURSE , Uploaded: Jul 28, 2021

$10

 *NURSING> QUESTIONS & ANSWERS > Prophecy Core Mandatory Part 2 Attempt score 100% (All)

preview
Prophecy Core Mandatory Part 2 Attempt score 100%

Prophecy- Core Mandatory Part II (Nursing) 1. What should you assess regardless of age group? 2. The National Patient Safety Goal 6 is to improve the safety of clinical alarm systems. What is t...

By CoursesExams , Uploaded: Mar 25, 2022

$8

 *NURSING> QUESTIONS & ANSWERS > NCSBN Practice Test Questions Compilation (All)

preview
NCSBN Practice Test Questions Compilation

NCSBN ON-LINE REVIEW 1.A client has been hospitalized after an automobile accident. A full leg cast was applied in the emergency room. The most important reason for the nurse to elevate the casted l...

By Cheryshev , Uploaded: Sep 12, 2021

$25

$13.00

Add to cart

Instant download

Can't find what you want? Try our AI powered Search

OR

GET ASSIGNMENT HELP
61
0

Document information


Connected school, study & course



About the document


Uploaded On

Jul 04, 2021

Number of pages

19

Written in

Seller


seller-icon
d.occ

Member since 3 years

226 Documents Sold


Additional information

This document has been written for:

Uploaded

Jul 04, 2021

Downloads

 0

Views

 61

Document Keyword Tags

THE BEST STUDY GUIDES

Avoid resits and achieve higher grades with the best study guides, textbook notes, and class notes written by your fellow students

custom preview

Avoid examination resits

Your fellow students know the appropriate material to use to deliver high quality content. With this great service and assistance from fellow students, you can become well prepared and avoid having to resits exams.

custom preview

Get the best grades

Your fellow student knows the best materials to research on and use. This guarantee you the best grades in your examination. Your fellow students use high quality materials, textbooks and notes to ensure high quality

custom preview

Earn from your notes

Get paid by selling your notes and study materials to other students. Earn alot of cash and help other students in study by providing them with appropriate and high quality study materials.


$13.00

WHAT STUDENTS SAY ABOUT US


What is Browsegrades

In Browsegrades, a student can earn by offering help to other student. Students can help other students with materials by upploading their notes and earn money.

We are here to help

We're available through e-mail, Twitter, Facebook, and live chat.
 FAQ
 Questions? Leave a message!

Follow us on
 Twitter

Copyright © Browsegrades · High quality services·