Spaces:

awacke1
/

Transcript-AI-Learner-From-Youtube

Runtime error

App Files Files Community

awacke1 commited on Apr 15, 2023

Commit

ed205d3

1 Parent(s): 1b7e859

Create TwoTranscriptQuotesFromIlyaSutskever.md

Browse files

Files changed (1) hide show

TwoTranscriptQuotesFromIlyaSutskever.md +69 -0

TwoTranscriptQuotesFromIlyaSutskever.md ADDED Viewed

	@@ -0,0 +1,69 @@

+1:42
+program the does very very well on your data then you will achieve the best
+1:48
+generalization possible with a little bit of modification you can turn it into a precise theorem
+1:54
+and on a very intuitive level it's easy to see what it should be the case if you
+2:01
+have some data and you're able to find a shorter program which generates this
+2:06
+data then you've essentially extracted all the all conceivable regularity from
+2:11
+this data into your program and then you can use these objects to make the best predictions possible like if if you have
+2:19
+data which is so complex but there is no way to express it as a shorter program
+2:25
+then it means that your data is totally random there is no way to extract any regularity from it whatsoever now there
+2:32
+is little known mathematical theory behind this and the proofs of these statements actually not even that hard
+2:38
+but the one minor slight disappointment is that it's actually not possible at
+2:44
+least given today's tools and understanding to find the best short program that
+5
+to talk a little bit about reinforcement learning so reinforcement learning is a framework it's a framework of evaluating
+6:53
+agents in their ability to achieve goals and complicated stochastic environments
+6:58
+you've got an agent which is plugged into an environment as shown in the figure right here and for any given
+7:06
+agent you can simply run it many times and compute its average reward now the
+7:13
+thing that's interesting about the reinforcement learning framework is that there exist interesting useful
+7:20
+reinforcement learning algorithms the framework existed for a long time it
+7:25
+became interesting once we realized that good algorithms exist now these are there are perfect algorithms but they
+7:31
+are good enough to do interesting things and all you want the mathematical
+7:37
+problem is one where you need to maximize the expected reward now one
+7:44
+important way in which the reinforcement learning framework is not quite complete is that it assumes that the reward is
+7:50
+given by the environment you see this picture the agent sends an action while
+7:56
+the reward sends it an observation in a both the observation and the reward backwards that's what the environment
+8:01
+communicates back the way in which this is not the case in the real world is that we figure out
+8:11
+what the reward is from the observation we reward ourselves we are not told
+8:16
+environment doesn't say hey here's some negative reward it's our interpretation over census that lets us determine what
+8:23
+the reward is and there is only one real true reward in life and this is
+8:28
+existence or nonexistence and everything else is a corollary of that so well what
+8:35
+should our agent be you already know the answer should be a neural network because whenever you want to do
+8:41
+something dense it's going to be a neural network and you want the agent to map observations to actions so you let
+8:47
+it be parametrized with a neural net and you apply learning algorithm so I want to explain to you how reinforcement
+8:53
+learning works this is model free reinforcement learning the reinforcement learning has actually been used in practice everywhere but it's