awacke1 commited on
Commit
ed205d3
Β·
1 Parent(s): 1b7e859

Create TwoTranscriptQuotesFromIlyaSutskever.md

Browse files
TwoTranscriptQuotesFromIlyaSutskever.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ 1:42
3
+ program the does very very well on your data then you will achieve the best
4
+ 1:48
5
+ generalization possible with a little bit of modification you can turn it into a precise theorem
6
+ 1:54
7
+ and on a very intuitive level it's easy to see what it should be the case if you
8
+ 2:01
9
+ have some data and you're able to find a shorter program which generates this
10
+ 2:06
11
+ data then you've essentially extracted all the all conceivable regularity from
12
+ 2:11
13
+ this data into your program and then you can use these objects to make the best predictions possible like if if you have
14
+ 2:19
15
+ data which is so complex but there is no way to express it as a shorter program
16
+ 2:25
17
+ then it means that your data is totally random there is no way to extract any regularity from it whatsoever now there
18
+ 2:32
19
+ is little known mathematical theory behind this and the proofs of these statements actually not even that hard
20
+ 2:38
21
+ but the one minor slight disappointment is that it's actually not possible at
22
+ 2:44
23
+ least given today's tools and understanding to find the best short program that
24
+
25
+
26
+
27
+
28
+ 5
29
+ to talk a little bit about reinforcement learning so reinforcement learning is a framework it's a framework of evaluating
30
+ 6:53
31
+ agents in their ability to achieve goals and complicated stochastic environments
32
+ 6:58
33
+ you've got an agent which is plugged into an environment as shown in the figure right here and for any given
34
+ 7:06
35
+ agent you can simply run it many times and compute its average reward now the
36
+ 7:13
37
+ thing that's interesting about the reinforcement learning framework is that there exist interesting useful
38
+ 7:20
39
+ reinforcement learning algorithms the framework existed for a long time it
40
+ 7:25
41
+ became interesting once we realized that good algorithms exist now these are there are perfect algorithms but they
42
+ 7:31
43
+ are good enough to do interesting things and all you want the mathematical
44
+ 7:37
45
+ problem is one where you need to maximize the expected reward now one
46
+ 7:44
47
+ important way in which the reinforcement learning framework is not quite complete is that it assumes that the reward is
48
+ 7:50
49
+ given by the environment you see this picture the agent sends an action while
50
+ 7:56
51
+ the reward sends it an observation in a both the observation and the reward backwards that's what the environment
52
+ 8:01
53
+ communicates back the way in which this is not the case in the real world is that we figure out
54
+ 8:11
55
+ what the reward is from the observation we reward ourselves we are not told
56
+ 8:16
57
+ environment doesn't say hey here's some negative reward it's our interpretation over census that lets us determine what
58
+ 8:23
59
+ the reward is and there is only one real true reward in life and this is
60
+ 8:28
61
+ existence or nonexistence and everything else is a corollary of that so well what
62
+ 8:35
63
+ should our agent be you already know the answer should be a neural network because whenever you want to do
64
+ 8:41
65
+ something dense it's going to be a neural network and you want the agent to map observations to actions so you let
66
+ 8:47
67
+ it be parametrized with a neural net and you apply learning algorithm so I want to explain to you how reinforcement
68
+ 8:53
69
+ learning works this is model free reinforcement learning the reinforcement learning has actually been used in practice everywhere but it's