Spaces:
Runtime error
Runtime error
Create TwoTranscriptQuotesFromIlyaSutskever.md
Browse files
TwoTranscriptQuotesFromIlyaSutskever.md
ADDED
@@ -0,0 +1,69 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
1:42
|
3 |
+
program the does very very well on your data then you will achieve the best
|
4 |
+
1:48
|
5 |
+
generalization possible with a little bit of modification you can turn it into a precise theorem
|
6 |
+
1:54
|
7 |
+
and on a very intuitive level it's easy to see what it should be the case if you
|
8 |
+
2:01
|
9 |
+
have some data and you're able to find a shorter program which generates this
|
10 |
+
2:06
|
11 |
+
data then you've essentially extracted all the all conceivable regularity from
|
12 |
+
2:11
|
13 |
+
this data into your program and then you can use these objects to make the best predictions possible like if if you have
|
14 |
+
2:19
|
15 |
+
data which is so complex but there is no way to express it as a shorter program
|
16 |
+
2:25
|
17 |
+
then it means that your data is totally random there is no way to extract any regularity from it whatsoever now there
|
18 |
+
2:32
|
19 |
+
is little known mathematical theory behind this and the proofs of these statements actually not even that hard
|
20 |
+
2:38
|
21 |
+
but the one minor slight disappointment is that it's actually not possible at
|
22 |
+
2:44
|
23 |
+
least given today's tools and understanding to find the best short program that
|
24 |
+
|
25 |
+
|
26 |
+
|
27 |
+
|
28 |
+
5
|
29 |
+
to talk a little bit about reinforcement learning so reinforcement learning is a framework it's a framework of evaluating
|
30 |
+
6:53
|
31 |
+
agents in their ability to achieve goals and complicated stochastic environments
|
32 |
+
6:58
|
33 |
+
you've got an agent which is plugged into an environment as shown in the figure right here and for any given
|
34 |
+
7:06
|
35 |
+
agent you can simply run it many times and compute its average reward now the
|
36 |
+
7:13
|
37 |
+
thing that's interesting about the reinforcement learning framework is that there exist interesting useful
|
38 |
+
7:20
|
39 |
+
reinforcement learning algorithms the framework existed for a long time it
|
40 |
+
7:25
|
41 |
+
became interesting once we realized that good algorithms exist now these are there are perfect algorithms but they
|
42 |
+
7:31
|
43 |
+
are good enough to do interesting things and all you want the mathematical
|
44 |
+
7:37
|
45 |
+
problem is one where you need to maximize the expected reward now one
|
46 |
+
7:44
|
47 |
+
important way in which the reinforcement learning framework is not quite complete is that it assumes that the reward is
|
48 |
+
7:50
|
49 |
+
given by the environment you see this picture the agent sends an action while
|
50 |
+
7:56
|
51 |
+
the reward sends it an observation in a both the observation and the reward backwards that's what the environment
|
52 |
+
8:01
|
53 |
+
communicates back the way in which this is not the case in the real world is that we figure out
|
54 |
+
8:11
|
55 |
+
what the reward is from the observation we reward ourselves we are not told
|
56 |
+
8:16
|
57 |
+
environment doesn't say hey here's some negative reward it's our interpretation over census that lets us determine what
|
58 |
+
8:23
|
59 |
+
the reward is and there is only one real true reward in life and this is
|
60 |
+
8:28
|
61 |
+
existence or nonexistence and everything else is a corollary of that so well what
|
62 |
+
8:35
|
63 |
+
should our agent be you already know the answer should be a neural network because whenever you want to do
|
64 |
+
8:41
|
65 |
+
something dense it's going to be a neural network and you want the agent to map observations to actions so you let
|
66 |
+
8:47
|
67 |
+
it be parametrized with a neural net and you apply learning algorithm so I want to explain to you how reinforcement
|
68 |
+
8:53
|
69 |
+
learning works this is model free reinforcement learning the reinforcement learning has actually been used in practice everywhere but it's
|