CS 234 Winter 2022: Assignment #2 Introduction In this assignment we will implement deep Q-learning, following DeepMind’s paper ([1] and [2]) that learns to play Atari games from raw pixels.... The purpose is to demonstrate the effectiveness of deep neural networks as well as some of the techniques used in practice to stabilize training and achieve better performance. In the process, you’ll become familiar with PyTorch. We will train our networks on the Pong-v0 environment from OpenAI gym, but the code can easily be applied to any other environment. In Pong, one player scores if the ball passes by the other player. An episode is over when one of the players reaches 21 points. Thus, the total return of an episode is between −21 (lost every point) and +21 (won every point). Our agent plays against a decent hard-coded AI player. Average human performance is −3 (reported in [2]). In this assignment, you will train an AI agent with super-human performance, reaching at least +10 (hopefully more!). 1 0 Distributions induced by a policy (13 pts) In this problem, we’ll work with an infinite-horizon MDP M = ⟨S, A, R, T , γ⟩ and consider stochastic policies of the form π : S → ∆(A)1. Additionally, we’ll assume that M has a single, fixed starting state s0 ∈ S for simplicity. [Show More]
Last updated: 1 year ago
Preview 1 out of 12 pages
by NGhadar · 2 years ago
Connected school, study & course
About the document
Uploaded On
Mar 23, 2022
Number of pages
12
Written in
This document has been written for:
Uploaded
Mar 23, 2022
Downloads
1
Views
130
In Browsegrades, a student can earn by offering help to other student. Students can help other students with materials by upploading their notes and earn money.
We're available through e-mail, Twitter, Facebook, and live chat.
FAQ
Questions? Leave a message!
Copyright © Browsegrades · High quality services·