retkowski's picture
Add demo
cb71ef5
WEBVTT
0:00:01.561 --> 0:00:05.186
Okay So Um.
0:00:08.268 --> 0:00:17.655
Welcome to today's presentation of the second
class and machine translation where we'll today
0:00:17.655 --> 0:00:25.044
do a bit of a specific topic and we'll talk
about linguistic backgrounds.
0:00:26.226 --> 0:00:34.851
Will cover their three different parts of
the lecture.
0:00:35.615 --> 0:00:42.538
We'll do first a very, very brief introduction
about linguistic background in a way that what
0:00:42.538 --> 0:00:49.608
is language, what are ways of describing language,
what are a bit serious behind it, very, very
0:00:49.608 --> 0:00:50.123
short.
0:00:50.410 --> 0:00:57.669
Don't know some of you have listened, think
to NLP in the last semester or so.
0:00:58.598 --> 0:01:02.553
So there we did a lot longer explanation.
0:01:02.553 --> 0:01:08.862
Here is just because we are not talking about
machine translation.
0:01:09.109 --> 0:01:15.461
So it's really focused on the parts which
are important when we talk about machine translation.
0:01:15.755 --> 0:01:19.377
Though for everybody who has listened to that
already, it's a bit of a repetition.
0:01:19.377 --> 0:01:19.683
Maybe.
0:01:19.980 --> 0:01:23.415
But it's really trying to look.
0:01:23.415 --> 0:01:31.358
These are properties of languages and how
can they influence translation.
0:01:31.671 --> 0:01:38.928
We'll use that in the second part to discuss
why is machine translation more from what we
0:01:38.928 --> 0:01:40.621
know about language.
0:01:40.940 --> 0:01:47.044
We will see that I mean there's two main things
is that the language might express ideas and
0:01:47.044 --> 0:01:53.279
information differently, and if they are expressed
different in different languages we have to
0:01:53.279 --> 0:01:54.920
do somehow the transfer.
0:01:55.135 --> 0:02:02.771
And it's not purely that we know there's words
used for it, but it's not that simple and very
0:02:02.771 --> 0:02:03.664
different.
0:02:04.084 --> 0:02:10.088
And the other problem we mentioned last time
about biases is that there's not always the
0:02:10.088 --> 0:02:12.179
same amount of information in.
0:02:12.592 --> 0:02:18.206
So it can be that there's some more information
in the one or you can't express that few information
0:02:18.206 --> 0:02:19.039
on the target.
0:02:19.039 --> 0:02:24.264
We had that also, for example, with the example
with the rice plant in Germany, we would just
0:02:24.264 --> 0:02:24.820
say rice.
0:02:24.904 --> 0:02:33.178
Or in English, while in other countries you
have to distinguish between rice plant or rice
0:02:33.178 --> 0:02:33.724
as a.
0:02:34.194 --> 0:02:40.446
And then it's not always possible to directly
infer this on the surface.
0:02:41.781 --> 0:02:48.501
And if we make it to the last point otherwise
we'll do that next Tuesday or we'll partly
0:02:48.501 --> 0:02:55.447
do it only here is like we'll describe briefly
the three main approaches on a rule based so
0:02:55.447 --> 0:02:59.675
linguistic motivated ways of doing machine
translation.
0:02:59.779 --> 0:03:03.680
We mentioned them last time like the direct
translation.
0:03:03.680 --> 0:03:10.318
The translation by transfer the lingua interlingua
bass will do that a bit more in detail today.
0:03:10.590 --> 0:03:27.400
But very briefly because this is not a focus
of this class and then next week because.
0:03:29.569 --> 0:03:31.757
Why do we think this is important?
0:03:31.757 --> 0:03:37.259
On the one hand, of course, we are dealing
with natural language, so therefore it might
0:03:37.259 --> 0:03:43.074
be good to spend a bit of time in understanding
what we are really dealing with because this
0:03:43.074 --> 0:03:45.387
is challenging these other problems.
0:03:45.785 --> 0:03:50.890
And on the other hand, this was the first
way of how we're doing machine translation.
0:03:51.271 --> 0:04:01.520
Therefore, it's interesting to understand
what was the idea behind that and also to later
0:04:01.520 --> 0:04:08.922
see what is done differently and to understand
when some models.
0:04:13.453 --> 0:04:20.213
When we're talking about linguistics, we can
of course do that on different levels and there's
0:04:20.213 --> 0:04:21.352
different ways.
0:04:21.521 --> 0:04:26.841
On the right side here you are seeing the
basic levels of linguistics.
0:04:27.007 --> 0:04:31.431
So we have at the bottom the phonetics and
phonology.
0:04:31.431 --> 0:04:38.477
Phones will not cover this year because we
are mainly focusing on text input where we
0:04:38.477 --> 0:04:42.163
are directly having directors and then work.
0:04:42.642 --> 0:04:52.646
Then what we touch today, at least mention
what it is, is a morphology which is the first
0:04:52.646 --> 0:04:53.424
level.
0:04:53.833 --> 0:04:59.654
Already mentioned it a bit on Tuesday that
of course there are some languages where this
0:04:59.654 --> 0:05:05.343
is very, very basic and there is not really
a lot of rules of how you can build words.
0:05:05.343 --> 0:05:11.099
But since I assume you all have some basic
knowledge of German there is like a lot more
0:05:11.099 --> 0:05:12.537
challenges than that.
0:05:13.473 --> 0:05:20.030
You know, maybe if you're a native speaker
that's quite easy and everything is clear,
0:05:20.030 --> 0:05:26.969
but if you have to learn it like the endings
of a word, we are famous for doing compositar
0:05:26.969 --> 0:05:29.103
and putting words together.
0:05:29.103 --> 0:05:31.467
So this is like the first lab.
0:05:32.332 --> 0:05:40.268
Then we have the syntax, which is both on
the word and on the sentence level, and that's
0:05:40.268 --> 0:05:43.567
about the structure of the sentence.
0:05:43.567 --> 0:05:46.955
What are the functions of some words?
0:05:47.127 --> 0:05:51.757
You might remember part of speech text from
From Your High School Time.
0:05:51.757 --> 0:05:57.481
There is like noun and adjective and and things
like that and this is something helpful.
0:05:57.737 --> 0:06:03.933
Just imagine in the beginning that it was
not only used for rule based but for statistical
0:06:03.933 --> 0:06:10.538
machine translation, for example, the reordering
between languages was quite a challenging task.
0:06:10.770 --> 0:06:16.330
Especially if you have long range reorderings
and their part of speech information is very
0:06:16.330 --> 0:06:16.880
helpful.
0:06:16.880 --> 0:06:20.301
You know, in German you have to move the word
the verb.
0:06:20.260 --> 0:06:26.599
To the second position, if you have Spanish
you have to change the noun and the adjective
0:06:26.599 --> 0:06:30.120
so information from part of speech could be
very.
0:06:30.410 --> 0:06:38.621
Then you have a syntax base structure where
you have a full syntax tree in the beginning
0:06:38.621 --> 0:06:43.695
and then it came into statistical machine translation.
0:06:44.224 --> 0:06:50.930
And it got more and more important for statistical
machine translation that you are really trying
0:06:50.930 --> 0:06:53.461
to model the whole syntax tree of a.
0:06:53.413 --> 0:06:57.574
Sentence in order to better match how to do
that in UM.
0:06:57.574 --> 0:07:04.335
In the target language, a bit yeah, the syntax
based statistical machine translation had a
0:07:04.335 --> 0:07:05.896
bitter of a problem.
0:07:05.896 --> 0:07:08.422
It got better and better and was.
0:07:08.368 --> 0:07:13.349
Just on the way of getting better in some
languages than traditional statistical models.
0:07:13.349 --> 0:07:18.219
But then the neural models came up and they
were just so much better in modelling that
0:07:18.219 --> 0:07:19.115
all implicitly.
0:07:19.339 --> 0:07:23.847
So that they are never were used in practice
so much.
0:07:24.304 --> 0:07:34.262
And then we'll talk about the semantics, so
what is the meaning of the words?
0:07:34.262 --> 0:07:40.007
Last time words can have different meanings.
0:07:40.260 --> 0:07:46.033
And yeah, how you represent meaning of cause
is very challenging.
0:07:45.966 --> 0:07:53.043
And normally that like formalizing this is
typically done in quite limited domains because
0:07:53.043 --> 0:08:00.043
like doing that for like all possible words
has not really been achieved yet in this very
0:08:00.043 --> 0:08:00.898
challenge.
0:08:02.882 --> 0:08:09.436
About pragmatics, so pragmatics is then what
is meaning in the context of the current situation.
0:08:09.789 --> 0:08:16.202
So one famous example is there, for example,
if you say the light is red.
0:08:16.716 --> 0:08:21.795
The traffic light is red so that typically
not you don't want to tell the other person
0:08:21.795 --> 0:08:27.458
if you're sitting in a car that it's surprising
oh the light is red but typically you're meaning
0:08:27.458 --> 0:08:30.668
okay you should stop and you shouldn't pass
the light.
0:08:30.850 --> 0:08:40.994
So the meaning of this sentence, the light,
is red in the context of sitting in the car.
0:08:42.762 --> 0:08:51.080
So let's start with the morphology so that
with the things we are starting there and one
0:08:51.080 --> 0:08:53.977
easy and first thing is there.
0:08:53.977 --> 0:09:02.575
Of course we have to split the sentence into
words or joint directors so that we have word.
0:09:02.942 --> 0:09:09.017
Because in most of our work we'll deal like
machine translation with some type of words.
0:09:09.449 --> 0:09:15.970
In neuromachine translation, people are working
also on director based and subwords, but a
0:09:15.970 --> 0:09:20.772
basic unique words of the sentence is a very
important first step.
0:09:21.421 --> 0:09:32.379
And for many languages that is quite simple
in German, it's not that hard to determine
0:09:32.379 --> 0:09:33.639
the word.
0:09:34.234 --> 0:09:46.265
In tokenization, the main challenge is if
we are doing corpus-based methods that we are
0:09:46.265 --> 0:09:50.366
also dealing as normal words.
0:09:50.770 --> 0:10:06.115
And there of course it's getting a bit more
challenging.
0:10:13.173 --> 0:10:17.426
So that is maybe the main thing where, for
example, in Germany, if you think of German
0:10:17.426 --> 0:10:19.528
tokenization, it's easy to get every word.
0:10:19.779 --> 0:10:26.159
You split it at a space, but then you would
have the dots at the end join to the last word,
0:10:26.159 --> 0:10:30.666
and of course that you don't want because it's
a different word.
0:10:30.666 --> 0:10:37.046
The last word would not be go, but go dot,
but what you can do is split up the dots always.
0:10:37.677 --> 0:10:45.390
Can you really do that always or it might
be sometimes better to keep the dot as a point?
0:10:47.807 --> 0:10:51.001
For example, email addresses or abbreviations
here.
0:10:51.001 --> 0:10:56.284
For example, doctor, maybe it doesn't make
sense to split up the dot because then you
0:10:56.284 --> 0:11:01.382
would assume all year starts a new sentence,
but it's just the DR dot from doctor.
0:11:01.721 --> 0:11:08.797
Or if you have numbers like he's a seventh
person like the zipter, then you don't want
0:11:08.797 --> 0:11:09.610
to split.
0:11:09.669 --> 0:11:15.333
So there are some things where it could be
a bit more difficult, but it's not really challenging.
0:11:16.796 --> 0:11:23.318
In other languages it's getting a lot more
challenging, especially in Asian languages
0:11:23.318 --> 0:11:26.882
where often there are no spaces between words.
0:11:27.147 --> 0:11:32.775
So you just have the sequence of characters.
0:11:32.775 --> 0:11:38.403
The quick brown fox jumps over the lazy dog.
0:11:38.999 --> 0:11:44.569
And then it still might be helpful to work
on something like words.
0:11:44.569 --> 0:11:48.009
Then you need to have a bit more complex.
0:11:48.328 --> 0:11:55.782
And here you see we are again having our typical
problem.
0:11:55.782 --> 0:12:00.408
That means that there is ambiguity.
0:12:00.600 --> 0:12:02.104
So you're seeing here.
0:12:02.104 --> 0:12:08.056
We have exactly the same sequence of characters
or here, but depending on how we split it,
0:12:08.056 --> 0:12:12.437
it means he is your servant or he is the one
who used your things.
0:12:12.437 --> 0:12:15.380
Or here we have round eyes and take the air.
0:12:15.895 --> 0:12:22.953
So then of course yeah this type of tokenization
gets more important because you could introduce
0:12:22.953 --> 0:12:27.756
already arrows and you can imagine if you're
doing it here wrong.
0:12:27.756 --> 0:12:34.086
If you once do a wrong decision it's quite
difficult to recover from a wrong decision.
0:12:34.634 --> 0:12:47.088
And so in these cases looking about how we're
doing tokenization is an important issue.
0:12:47.127 --> 0:12:54.424
And then it might be helpful to do things
like director based models where we treat each
0:12:54.424 --> 0:12:56.228
director as a symbol.
0:12:56.228 --> 0:13:01.803
For example, do this decision in the later
or never really do this?
0:13:06.306 --> 0:13:12.033
The other thing is that if we have words we
might, it might not be the optimal unit to
0:13:12.033 --> 0:13:18.155
work with because it can be that we should
look into the internal structure of words because
0:13:18.155 --> 0:13:20.986
if we have a morphological rich language,.
0:13:21.141 --> 0:13:27.100
That means we have a lot of different types
of words, and if you have a lot of many different
0:13:27.100 --> 0:13:32.552
types of words, it on the other hand means
of course each of these words we have seen
0:13:32.552 --> 0:13:33.757
very infrequently.
0:13:33.793 --> 0:13:39.681
So if you only have ten words and you have
a large corpus, each word occurs more often.
0:13:39.681 --> 0:13:45.301
If you have three million different words,
then each of them will occur less often.
0:13:45.301 --> 0:13:51.055
Hopefully you know, from machine learning,
it's helpful if you have seen each example
0:13:51.055 --> 0:13:51.858
very often.
0:13:52.552 --> 0:13:54.524
And so why does it help?
0:13:54.524 --> 0:13:56.495
Why does it help happen?
0:13:56.495 --> 0:14:02.410
Yeah, in some languages we have quite a complex
information inside a word.
0:14:02.410 --> 0:14:09.271
So here's a word from a finish talosanikiko
or something like that, and it means in my
0:14:09.271 --> 0:14:10.769
house to question.
0:14:11.491 --> 0:14:15.690
So you have all these information attached
to the word.
0:14:16.036 --> 0:14:20.326
And that of course in extreme case that's
why typically, for example, Finnish is the
0:14:20.326 --> 0:14:20.831
language.
0:14:20.820 --> 0:14:26.725
Where machine translation quality is less
good because generating all these different
0:14:26.725 --> 0:14:33.110
morphological variants is is a challenge and
the additional challenge is typically in finish
0:14:33.110 --> 0:14:39.564
not really low resource but for in low resource
languages you quite often have more difficult
0:14:39.564 --> 0:14:40.388
morphology.
0:14:40.440 --> 0:14:43.949
Mean English is an example of a relatively
easy one.
0:14:46.066 --> 0:14:54.230
And so in general we can say that words are
composed of more themes, and more themes are
0:14:54.230 --> 0:15:03.069
the smallest meaning carrying unit, so normally
it means: All morphine should have some type
0:15:03.069 --> 0:15:04.218
of meaning.
0:15:04.218 --> 0:15:09.004
For example, here does not really have a meaning.
0:15:09.289 --> 0:15:12.005
Bian has some type of meaning.
0:15:12.005 --> 0:15:14.371
It's changing the meaning.
0:15:14.371 --> 0:15:21.468
The NES has the meaning that it's making out
of an adjective, a noun, and happy.
0:15:21.701 --> 0:15:31.215
So each of these parts conveys some meaning,
but you cannot split them further up and have
0:15:31.215 --> 0:15:32.156
somehow.
0:15:32.312 --> 0:15:36.589
You see that of course a little bit more is
happening.
0:15:36.589 --> 0:15:43.511
Typically the Y is going into an E so there
can be some variation, but these are typical
0:15:43.511 --> 0:15:46.544
examples of what we have as morphines.
0:16:02.963 --> 0:16:08.804
That is, of course, a problem and that's the
question why how you do your splitting.
0:16:08.804 --> 0:16:15.057
But that problem we have anyway always because
even full words can have different meanings
0:16:15.057 --> 0:16:17.806
depending on the context they're using.
0:16:18.038 --> 0:16:24.328
So we always have to somewhat have a model
which can infer or represent the meaning of
0:16:24.328 --> 0:16:25.557
the word in the.
0:16:25.825 --> 0:16:30.917
But you are right that this problem might
get even more severe if you're splitting up.
0:16:30.917 --> 0:16:36.126
Therefore, it might not be the best to go
for the very extreme and represent each letter
0:16:36.126 --> 0:16:41.920
and have a model which is only on letters because,
of course, a letter can have a lot of different
0:16:41.920 --> 0:16:44.202
meanings depending on where it's used.
0:16:44.524 --> 0:16:50.061
And yeah, there is no right solution like
what is the right splitting.
0:16:50.061 --> 0:16:56.613
It depends on the language and the application
on the amount of data you're having.
0:16:56.613 --> 0:17:01.058
For example, typically it means the fewer
data you have.
0:17:01.301 --> 0:17:12.351
The more splitting you should do, if you have
more data, then you can be better distinguish.
0:17:13.653 --> 0:17:19.065
Then there are different types of morphines:
So we have typically one stemmed theme: It's
0:17:19.065 --> 0:17:21.746
like house or tish, so the main meaning.
0:17:21.941 --> 0:17:29.131
And then you can have functional or bound
morphemes which can be f which can be prefix,
0:17:29.131 --> 0:17:34.115
suffix, infix or circumfix so it can be before
can be after.
0:17:34.114 --> 0:17:39.416
It can be inside or it can be around it, something
like a coughed there.
0:17:39.416 --> 0:17:45.736
Typically you would say that it's not like
two more themes, G and T, because they both
0:17:45.736 --> 0:17:50.603
describe the function, but together G and T
are marking the cough.
0:17:53.733 --> 0:18:01.209
For what are people using them you can use
them for inflection to describe something like
0:18:01.209 --> 0:18:03.286
tense count person case.
0:18:04.604 --> 0:18:09.238
That is yeah, if you know German, this is
commonly used in German.
0:18:10.991 --> 0:18:16.749
But of course there is a lot more complicated
things: I think in in some languages it also.
0:18:16.749 --> 0:18:21.431
I mean, in Germany it only depends counting
person on the subject.
0:18:21.431 --> 0:18:27.650
For the word, for example, in other languages
it can also determine the first and on the
0:18:27.650 --> 0:18:28.698
second object.
0:18:28.908 --> 0:18:35.776
So that it like if you buy an apple or an
house, that not only the, the, the.
0:18:35.776 --> 0:18:43.435
Kauft depends on on me like in German, but
it can also depend on whether it's an apple
0:18:43.435 --> 0:18:44.492
or a house.
0:18:44.724 --> 0:18:48.305
And then of course you have an exploding number
of web fronts.
0:18:49.409 --> 0:19:04.731
Furthermore, it can be used to do derivations
so you can make other types of words from it.
0:19:05.165 --> 0:19:06.254
And then yeah.
0:19:06.254 --> 0:19:12.645
This is like creating new words by joining
them like rainbow waterproof but for example
0:19:12.645 --> 0:19:19.254
in German like Einköw's Wagen, Ice Cult and
so on where you can join where you can do that
0:19:19.254 --> 0:19:22.014
with nouns and German adjectives and.
0:19:22.282 --> 0:19:29.077
Then of course you might have additional challenges
like the Fugan where you have to add this one.
0:19:32.452 --> 0:19:39.021
Yeah, then there is a yeah of course additional
special things.
0:19:39.639 --> 0:19:48.537
You have to sometimes put extra stuff because
of phonology, so it's dig the plural, not plural.
0:19:48.537 --> 0:19:56.508
The third person singular, as in English,
is normally S, but by Goes, for example, is
0:19:56.508 --> 0:19:57.249
an E S.
0:19:57.277 --> 0:20:04.321
In German you can also have other things that
like Osmutta gets Mutter so you're changing
0:20:04.321 --> 0:20:11.758
the Umlaud in order to express the plural and
in other languages for example the vowel harmony
0:20:11.758 --> 0:20:17.315
where the vowels inside are changing depending
on which form you have.
0:20:17.657 --> 0:20:23.793
Which makes things more difficult than splitting
a word into its part doesn't really work anymore.
0:20:23.793 --> 0:20:28.070
So like for Muta and Muta, for example, that
is not really possible.
0:20:28.348 --> 0:20:36.520
The nice thing is, of course, more like a
general thing, but often irregular things are
0:20:36.520 --> 0:20:39.492
happening as words which occur.
0:20:39.839 --> 0:20:52.177
So that you can have enough examples, while
the regular things you can do by some type
0:20:52.177 --> 0:20:53.595
of rules.
0:20:55.655 --> 0:20:57.326
Yeah, This Can Be Done.
0:20:57.557 --> 0:21:02.849
So there are tasks on this: how to do automatic
inflection, how to analyze them.
0:21:02.849 --> 0:21:04.548
So you give it a word to.
0:21:04.548 --> 0:21:10.427
It's telling you what are the possible forms
of that, like how they are built, and so on.
0:21:10.427 --> 0:21:15.654
And for the at least Ah Iris shoes language,
there are a lot of tools for that.
0:21:15.654 --> 0:21:18.463
Of course, if you now want to do that for.
0:21:18.558 --> 0:21:24.281
Some language which is very low resourced
might be very difficult and there might be
0:21:24.281 --> 0:21:25.492
no tool for them.
0:21:28.368 --> 0:21:37.652
Good before we are going for the next part
about part of speech, are there any questions
0:21:37.652 --> 0:21:38.382
about?
0:22:01.781 --> 0:22:03.187
Yeah, we'll come to that a bit.
0:22:03.483 --> 0:22:09.108
So it's a very good question and difficult
and especially we'll see that later if you
0:22:09.108 --> 0:22:14.994
just put in words it would be very bad because
words are put into neural networks just as
0:22:14.994 --> 0:22:15.844
some digits.
0:22:15.844 --> 0:22:21.534
Each word is mapped into a jitter and you
put it in so it doesn't really know any more
0:22:21.534 --> 0:22:22.908
about the structure.
0:22:23.543 --> 0:22:29.898
What we will see therefore the most successful
approach which is mostly done is a subword
0:22:29.898 --> 0:22:34.730
unit where we split: But we will do this.
0:22:34.730 --> 0:22:40.154
Don't know if you have been in advanced.
0:22:40.154 --> 0:22:44.256
We'll cover this on a Tuesday.
0:22:44.364 --> 0:22:52.316
So there is an algorithm called bite pairing
coding, which is about splitting words into
0:22:52.316 --> 0:22:52.942
parts.
0:22:53.293 --> 0:23:00.078
So it's doing the splitting of words but not
morphologically motivated but more based on
0:23:00.078 --> 0:23:00.916
frequency.
0:23:00.940 --> 0:23:11.312
However, it performs very good and that's
why it's used and there is a bit of correlation.
0:23:11.312 --> 0:23:15.529
Sometimes they agree on count based.
0:23:15.695 --> 0:23:20.709
So we're splitting words and we're splitting
especially words which are infrequent and that's
0:23:20.709 --> 0:23:23.962
maybe a good motivation why that's good for
neural networks.
0:23:23.962 --> 0:23:28.709
That means if you have seen a word very often
you don't need to split it and it's easier
0:23:28.709 --> 0:23:30.043
to just process it fast.
0:23:30.690 --> 0:23:39.218
While if you have seen the words infrequently,
it is good to split it into parts so it can
0:23:39.218 --> 0:23:39.593
do.
0:23:39.779 --> 0:23:47.729
So there is some way of doing it, but linguists
would say this is not a morphological analyst.
0:23:47.729 --> 0:23:53.837
That is true, but we are spitting words into
parts if they are not seen.
0:23:59.699 --> 0:24:06.324
Yes, so another important thing about words
are the paddle speech text.
0:24:06.324 --> 0:24:14.881
These are the common ones: noun, verb, adjective,
verb, determine, pronoun, proposition, and
0:24:14.881 --> 0:24:16.077
conjunction.
0:24:16.077 --> 0:24:26.880
There are some more: They are not the same
in all language, but for example there is this
0:24:26.880 --> 0:24:38.104
universal grammar which tries to do this type
of part of speech text for many languages.
0:24:38.258 --> 0:24:42.018
And then, of course, it's helping you for
generalization.
0:24:42.018 --> 0:24:48.373
There are some language deals with verbs and
nouns, especially if you look at sentence structure.
0:24:48.688 --> 0:24:55.332
And so if you know the part of speech tag
you can easily generalize and do get these
0:24:55.332 --> 0:24:58.459
rules or apply these rules as you know.
0:24:58.459 --> 0:25:02.680
The verb in English is always at the second
position.
0:25:03.043 --> 0:25:10.084
So you know how to deal with verbs independently
of which words you are now really looking at.
0:25:12.272 --> 0:25:18.551
And that again can be done is ambiguous.
0:25:18.598 --> 0:25:27.171
So there are some words which can have several
pot of speech text.
0:25:27.171 --> 0:25:38.686
Example are the word can, for example, which
can be the can of beans or can do something.
0:25:38.959 --> 0:25:46.021
Often is also in English related work.
0:25:46.021 --> 0:25:55.256
Access can be to excess or to access to something.
0:25:56.836 --> 0:26:02.877
Most words have only one single part of speech
tag, but they are some where it's a bit more
0:26:02.877 --> 0:26:03.731
challenging.
0:26:03.731 --> 0:26:09.640
The nice thing is the ones which are in big
are often more words, which occur more often,
0:26:09.640 --> 0:26:12.858
while for really ware words it's not that often.
0:26:13.473 --> 0:26:23.159
If you look at these classes you can distinguish
open classes where new words can happen so
0:26:23.159 --> 0:26:25.790
we can invent new nouns.
0:26:26.926 --> 0:26:31.461
But then there are the close classes which
I think are determined or pronoun.
0:26:31.461 --> 0:26:35.414
For example, it's not that you can easily
develop your new pronoun.
0:26:35.414 --> 0:26:38.901
So there is a fixed list of pronouns and we
are using that.
0:26:38.901 --> 0:26:44.075
So it's not like that or tomorrow there is
something happening and then people are using
0:26:44.075 --> 0:26:44.482
a new.
0:26:45.085 --> 0:26:52.426
Pronoun or new conjectures, so it's like end,
because it's not that you normally invent a
0:26:52.426 --> 0:26:52.834
new.
0:27:00.120 --> 0:27:03.391
And additional to part of speech text.
0:27:03.391 --> 0:27:09.012
Then some of these part of speech texts have
different properties.
0:27:09.389 --> 0:27:21.813
So, for example, for nouns and adjectives
we can have a singular plural: In other languages,
0:27:21.813 --> 0:27:29.351
there is a duel so that a word is not only
like a single or in plural, but also like a
0:27:29.351 --> 0:27:31.257
duel if it's meaning.
0:27:31.631 --> 0:27:36.246
You have the gender and masculine feminine
neutre we know.
0:27:36.246 --> 0:27:43.912
In other language there is animated and inanimated
and you have the cases like in German you have
0:27:43.912 --> 0:27:46.884
no maternative guinetive acquisitive.
0:27:47.467 --> 0:27:57.201
So here and then in other languages you also
have Latin with the upper teeth.
0:27:57.497 --> 0:28:03.729
So there's like more, it's just like yeah,
and there you have no one to one correspondence,
0:28:03.729 --> 0:28:09.961
so it can be that there are some cases which
are only in the one language and do not happen
0:28:09.961 --> 0:28:11.519
in the other language.
0:28:13.473 --> 0:28:20.373
For whorps we have tenses of course like walk
is walking walked have walked head walked will
0:28:20.373 --> 0:28:21.560
walk and so on.
0:28:21.560 --> 0:28:28.015
Interestingly for example in Japanese this
can also happen for adjectives though there
0:28:28.015 --> 0:28:32.987
is a difference between something is white
or something was white.
0:28:35.635 --> 0:28:41.496
There is this continuous thing which should
not really have that commonly in German and
0:28:41.496 --> 0:28:47.423
I guess that's if you're German and learning
English that's something like she sings and
0:28:47.423 --> 0:28:53.350
she is singing and of course we can express
that but it's not commonly used and normally
0:28:53.350 --> 0:28:55.281
we're not doing this aspect.
0:28:55.455 --> 0:28:57.240
Also about tenses.
0:28:57.240 --> 0:29:05.505
If you use pasts in English you will also
use past tenses in German, so we have similar
0:29:05.505 --> 0:29:09.263
tenses, but the use might be different.
0:29:14.214 --> 0:29:20.710
There is uncertainty like the mood in there
indicative.
0:29:20.710 --> 0:29:26.742
If he were here, there's voices active and
passive.
0:29:27.607 --> 0:29:34.024
That you know, that is like both in German
and English there, but there is something in
0:29:34.024 --> 0:29:35.628
the Middle and Greek.
0:29:35.628 --> 0:29:42.555
I get myself taught, so there is other phenomens
than which might only happen in one language.
0:29:42.762 --> 0:29:50.101
This is, like yeah, the different synthetic
structures that you can can have in the language,
0:29:50.101 --> 0:29:57.361
and where there's the two things, so it might
be that some only are in some language, others
0:29:57.361 --> 0:29:58.376
don't exist.
0:29:58.358 --> 0:30:05.219
And on the other hand there is also matching,
so it might be that in some situations you
0:30:05.219 --> 0:30:07.224
use different structures.
0:30:10.730 --> 0:30:13.759
The next would be then about semantics.
0:30:13.759 --> 0:30:16.712
Do you have any questions before that?
0:30:19.819 --> 0:30:31.326
I'll just continue, but if something is unclear
beside the structure, we typically have more
0:30:31.326 --> 0:30:39.863
ambiguities, so it can be that words itself
have different meanings.
0:30:40.200 --> 0:30:48.115
And we are typically talking about polysemy
and homonyme, where polysemy means that a word
0:30:48.115 --> 0:30:50.637
can have different meanings.
0:30:50.690 --> 0:30:58.464
So if you have the English word interest,
it can be that you are interested in something.
0:30:58.598 --> 0:31:07.051
Or it can be like the interest rate financial,
but it is somehow related because if you are
0:31:07.051 --> 0:31:11.002
getting some interest rates there is some.
0:31:11.531 --> 0:31:18.158
Are, but there is a homophemer where they
really are not related.
0:31:18.458 --> 0:31:24.086
So you can and can doesn't really have anything
in common, so it's really very different.
0:31:24.324 --> 0:31:29.527
And of course that's not completely clear
so there is not a clear definition so for example
0:31:29.527 --> 0:31:34.730
for the bank it can be that you say it's related
but it can also be other can argue that so
0:31:34.730 --> 0:31:39.876
there are some clear things which is interest
there are some which is vague and then there
0:31:39.876 --> 0:31:43.439
are some where it's very clear again that there
are different.
0:31:45.065 --> 0:31:49.994
And in order to translate them, of course,
we might need the context to disambiguate.
0:31:49.994 --> 0:31:54.981
That's typically where we can disambiguate,
and that's not only for lexical semantics,
0:31:54.981 --> 0:32:00.198
that's generally very often that if you want
to disambiguate, context can be very helpful.
0:32:00.198 --> 0:32:03.981
So in which sentence and which general knowledge
who is speaking?
0:32:04.944 --> 0:32:09.867
You can do that externally by some disinvigration
task.
0:32:09.867 --> 0:32:14.702
Machine translation system will also do it
internally.
0:32:16.156 --> 0:32:21.485
And sometimes you're lucky and you don't need
to do it because you just have the same ambiguity
0:32:21.485 --> 0:32:23.651
in the source and the target language.
0:32:23.651 --> 0:32:26.815
And then it doesn't matter if you think about
the mouse.
0:32:26.815 --> 0:32:31.812
As I said, you don't really need to know if
it's a computer mouse or the living mouse you
0:32:31.812 --> 0:32:36.031
translate from German to English because it
has exactly the same ambiguity.
0:32:40.400 --> 0:32:46.764
There's also relations between words like
synonyms, antonyms, hipponomes, like the is
0:32:46.764 --> 0:32:50.019
a relation and the part of like Dora House.
0:32:50.019 --> 0:32:55.569
Big small is an antonym and synonym is like
which needs something similar.
0:32:56.396 --> 0:33:03.252
There are resources which try to express all
these linguistic information like word net
0:33:03.252 --> 0:33:10.107
or German net where you have a graph with words
and how they are related to each other.
0:33:11.131 --> 0:33:12.602
Which can be helpful.
0:33:12.602 --> 0:33:18.690
Typically these things were more used in tasks
where there is fewer data, so there's a lot
0:33:18.690 --> 0:33:24.510
of tasks in NLP where you have very limited
data because you really need to hand align
0:33:24.510 --> 0:33:24.911
that.
0:33:25.125 --> 0:33:28.024
Machine translation has a big advantage.
0:33:28.024 --> 0:33:31.842
There's naturally a lot of text translated
out there.
0:33:32.212 --> 0:33:39.519
Typically in machine translation we have compared
to other tasks significantly amount of data.
0:33:39.519 --> 0:33:46.212
People have looked into integrating wordnet
or things like that, but it is rarely used
0:33:46.212 --> 0:33:49.366
in like commercial systems or something.
0:33:52.692 --> 0:33:55.626
So this was based on the words.
0:33:55.626 --> 0:34:03.877
We have morphology, syntax, and semantics,
and then of course it makes sense to also look
0:34:03.877 --> 0:34:06.169
at the bigger structure.
0:34:06.169 --> 0:34:08.920
That means information about.
0:34:08.948 --> 0:34:17.822
Of course, we don't have a really morphology
there because morphology about the structure
0:34:17.822 --> 0:34:26.104
of words, but we have syntax on the sentence
level and the semantic representation.
0:34:28.548 --> 0:34:35.637
When we are thinking about the sentence structure,
then the sentence is, of course, first a sequence
0:34:35.637 --> 0:34:37.742
of words terminated by a dot.
0:34:37.742 --> 0:34:42.515
Jane bought the house and we can say something
about the structure.
0:34:42.515 --> 0:34:47.077
It's typically its subject work and then one
or several objects.
0:34:47.367 --> 0:34:51.996
And the number of objects, for example, is
then determined by the word.
0:34:52.232 --> 0:34:54.317
It's Called the Valency.
0:34:54.354 --> 0:35:01.410
So you have intransitive verbs which don't
get any object, it's just to sleep.
0:35:02.622 --> 0:35:05.912
For example, there is no object sleep beds.
0:35:05.912 --> 0:35:14.857
You cannot say that: And there are transitive
verbs where you have to put one or more objects,
0:35:14.857 --> 0:35:16.221
and you always.
0:35:16.636 --> 0:35:19.248
Sentence is not correct if you don't put the
object.
0:35:19.599 --> 0:35:33.909
So if you have to buy something you have to
say bought this or give someone something then.
0:35:34.194 --> 0:35:40.683
Here you see a bit that may be interesting
the relation between word order and morphology.
0:35:40.683 --> 0:35:47.243
Of course it's not that strong, but for example
in English you always have to first say who
0:35:47.243 --> 0:35:49.453
you gave it and what you gave.
0:35:49.453 --> 0:35:53.304
So the structure is very clear and cannot
be changed.
0:35:54.154 --> 0:36:00.801
German, for example, has a possibility of
determining what you gave and whom you gave
0:36:00.801 --> 0:36:07.913
it because there is a morphology and you can
do what you gave a different form than to whom
0:36:07.913 --> 0:36:08.685
you gave.
0:36:11.691 --> 0:36:18.477
And that is a general tendency that if you
have morphology then typically the word order
0:36:18.477 --> 0:36:25.262
is more free and possible, while in English
you cannot express these information through
0:36:25.262 --> 0:36:26.482
the morphology.
0:36:26.706 --> 0:36:30.238
You typically have to express them through
the word order.
0:36:30.238 --> 0:36:32.872
It's not as free, but it's more restricted.
0:36:35.015 --> 0:36:40.060
Yeah, the first part is typically the noun
phrase, the subject, and that can not only
0:36:40.060 --> 0:36:43.521
be a single noun, but of course it can be a
longer phrase.
0:36:43.521 --> 0:36:48.860
So if you have Jane the woman, it can be Jane,
it can be the woman, it can a woman, it can
0:36:48.860 --> 0:36:52.791
be the young woman or the young woman who lives
across the street.
0:36:53.073 --> 0:36:56.890
All of these are the subjects, so this can
be already very, very long.
0:36:57.257 --> 0:36:58.921
And they also put this.
0:36:58.921 --> 0:37:05.092
The verb is on the second position in a bit
more complicated way because if you have now
0:37:05.092 --> 0:37:11.262
the young woman who lives across the street
runs to somewhere or so then yeah runs is at
0:37:11.262 --> 0:37:16.185
the second position in this tree but the first
position is quite long.
0:37:16.476 --> 0:37:19.277
And so it's not just counting okay.
0:37:19.277 --> 0:37:22.700
The second word is always is always a word.
0:37:26.306 --> 0:37:32.681
Additional to these simple things, there's
more complex stuff.
0:37:32.681 --> 0:37:43.104
Jane bought the house from Jim without hesitation,
or Jane bought the house in the pushed neighborhood
0:37:43.104 --> 0:37:44.925
across the river.
0:37:45.145 --> 0:37:51.694
And these often lead to additional ambiguities
because it's not always completely clear to
0:37:51.694 --> 0:37:53.565
which this prepositional.
0:37:54.054 --> 0:37:59.076
So that we'll see and you have, of course,
subclasses and so on.
0:38:01.061 --> 0:38:09.926
And then there is a theory behind it which
was very important for rule based machine translation
0:38:09.926 --> 0:38:14.314
because that's exactly what you're doing there.
0:38:14.314 --> 0:38:18.609
You would take the sentence, do the syntactic.
0:38:18.979 --> 0:38:28.432
So that we can have this constituents which
like describe the basic parts of the language.
0:38:28.468 --> 0:38:35.268
And we can create the sentence structure as
a context free grammar, which you hopefully
0:38:35.268 --> 0:38:42.223
remember from basic computer science, which
is a pair of non terminals, terminal symbols,
0:38:42.223 --> 0:38:44.001
production rules, and.
0:38:43.943 --> 0:38:50.218
And the star symbol, and you can then describe
a sentence by this phrase structure grammar:
0:38:51.751 --> 0:38:59.628
So a simple example would be something like
that: you have a lexicon, Jane is a noun, Frays
0:38:59.628 --> 0:39:02.367
is a noun, Telescope is a noun.
0:39:02.782 --> 0:39:10.318
And then you have these production rules sentences:
a noun phrase in the web phrase.
0:39:10.318 --> 0:39:18.918
The noun phrase can either be a determinized
noun or it can be a noun phrase and a propositional
0:39:18.918 --> 0:39:19.628
phrase.
0:39:19.919 --> 0:39:25.569
Or a prepositional phrase and a prepositional
phrase is a preposition and a non phrase.
0:39:26.426 --> 0:39:27.622
We're looking at this.
0:39:27.622 --> 0:39:30.482
What is the valency of the word we're describing
here?
0:39:33.513 --> 0:39:36.330
How many objects would in this case the world
have?
0:39:46.706 --> 0:39:48.810
We're looking at the web phrase.
0:39:48.810 --> 0:39:54.358
The web phrase is a verb and a noun phrase,
so one object here, so this would be for a
0:39:54.358 --> 0:39:55.378
balance of one.
0:39:55.378 --> 0:40:00.925
If you have intransitive verbs, it would be
verb phrases, just a word, and if you have
0:40:00.925 --> 0:40:03.667
two, it would be noun phrase, noun phrase.
0:40:08.088 --> 0:40:15.348
And yeah, then the, the, the challenge or
what you have to do is like this: Given a natural
0:40:15.348 --> 0:40:23.657
language sentence, you want to parse it to
get this type of pastry from programming languages
0:40:23.657 --> 0:40:30.198
where you also need to parse the code in order
to get the representation.
0:40:30.330 --> 0:40:39.356
However, there is one challenge if you parse
natural language compared to computer language.
0:40:43.823 --> 0:40:56.209
So there are different ways of how you can
express things and there are different pastures
0:40:56.209 --> 0:41:00.156
belonging to the same input.
0:41:00.740 --> 0:41:05.241
So if you have Jane buys a horse, how's that
an easy example?
0:41:05.241 --> 0:41:07.491
So you do the lexicon look up.
0:41:07.491 --> 0:41:13.806
Jane can be a noun phrase, a bias is a verb,
a is a determiner, and a house is a noun.
0:41:15.215 --> 0:41:18.098
And then you can now use the grammar rules
of here.
0:41:18.098 --> 0:41:19.594
There is no rule for that.
0:41:20.080 --> 0:41:23.564
Here we have no rules, but here we have a
rule.
0:41:23.564 --> 0:41:27.920
A noun is a non-phrase, so we have mapped
that to the noun.
0:41:28.268 --> 0:41:34.012
Then we can map this to the web phrase.
0:41:34.012 --> 0:41:47.510
We have a verb noun phrase to web phrase and
then we can map this to a sentence representing:
0:41:49.069 --> 0:41:53.042
We can have that even more complex.
0:41:53.042 --> 0:42:01.431
The woman who won the lottery yesterday bought
the house across the street.
0:42:01.431 --> 0:42:05.515
The structure gets more complicated.
0:42:05.685 --> 0:42:12.103
You now see that the word phrase is at the
second position, but the noun phrase is quite.
0:42:12.052 --> 0:42:18.655
Quite big in here and the p p phrases, it's
sometimes difficult where to put them because
0:42:18.655 --> 0:42:25.038
they can be put to the noun phrase, but in
other sentences they can also be put to the
0:42:25.038 --> 0:42:25.919
web phrase.
0:42:36.496 --> 0:42:38.250
Yeah.
0:42:43.883 --> 0:42:50.321
Yes, so then either it can have two tags,
noun or noun phrase, or you can have the extra
0:42:50.321 --> 0:42:50.755
rule.
0:42:50.755 --> 0:42:57.409
The noun phrase can not only be a determiner
in the noun, but it can also be a noun phrase.
0:42:57.717 --> 0:43:04.360
Then of course either you introduce additional
rules when what is possible or the problem
0:43:04.360 --> 0:43:11.446
that if you do pastures which are not correct
and then you have to add some type of probability
0:43:11.446 --> 0:43:13.587
which type is more probable.
0:43:16.876 --> 0:43:23.280
But of course some things also can't really
model easily with this type of cheese.
0:43:23.923 --> 0:43:32.095
There, for example, the agreement is not straightforward
to do so that in subject and work you can check
0:43:32.095 --> 0:43:38.866
that the person, the agreement, the number
in person, the number agreement is correct,
0:43:38.866 --> 0:43:41.279
but if it's a singular object.
0:43:41.561 --> 0:43:44.191
A singular verb, it's also a singular.
0:43:44.604 --> 0:43:49.242
Non-subject, and if it's a plural subject,
it's a plural work.
0:43:49.489 --> 0:43:56.519
Things like that are yeah, the agreement in
determining action driven now, so they also
0:43:56.519 --> 0:43:57.717
have to agree.
0:43:57.877 --> 0:44:05.549
Things like that cannot be easily done with
this type of grammar or this subcategorization
0:44:05.549 --> 0:44:13.221
that you check whether the verb is transitive
or intransitive, and that Jane sleeps is OK,
0:44:13.221 --> 0:44:16.340
but Jane sleeps the house is not OK.
0:44:16.436 --> 0:44:21.073
And Jane Walterhouse is okay, but Jane Walterhouse
is not okay.
0:44:23.183 --> 0:44:29.285
Furthermore, this long range dependency might
be difficult and which word orders are allowed
0:44:29.285 --> 0:44:31.056
and which are not allowed.
0:44:31.571 --> 0:44:40.011
This is also not directly so you can say Maria
give de man das bourg, de man give Maria das
0:44:40.011 --> 0:44:47.258
bourg, das bourg give Maria, de man aber Maria,
de man give des bourg is some.
0:44:47.227 --> 0:44:55.191
One yeah, which one from this one is possible
and not is sometimes not possible to model,
0:44:55.191 --> 0:44:56.164
is simple.
0:44:56.876 --> 0:45:05.842
Therefore, people have done more complex stuff
like this unification grammar and tried to
0:45:05.842 --> 0:45:09.328
model both the categories of verb.
0:45:09.529 --> 0:45:13.367
The agreement has to be that it's person and
single.
0:45:13.367 --> 0:45:20.028
You're joining that so you're annotating this
thing with more information and then you have
0:45:20.028 --> 0:45:25.097
more complex synthetic structures in order
to model also these types.
0:45:28.948 --> 0:45:33.137
Yeah, why is this difficult?
0:45:33.873 --> 0:45:39.783
We have different ambiguities and that makes
it different, so words have different part
0:45:39.783 --> 0:45:43.610
of speech text and if you have time flies like
an error.
0:45:43.583 --> 0:45:53.554
It can mean that sometimes the animal L look
like an arrow and or it can mean that the time
0:45:53.554 --> 0:45:59.948
is flying very fast is going away very fast
like an error.
0:46:00.220 --> 0:46:10.473
And if you want to do a pastry, these two
meanings have a different part of speech text,
0:46:10.473 --> 0:46:13.008
so flies is the verb.
0:46:13.373 --> 0:46:17.999
And of course that is a different semantic,
and so that is very different.
0:46:19.499 --> 0:46:23.361
And otherwise a structural.
0:46:23.243 --> 0:46:32.419
Ambiguity so that like some part of the sentence
can have different rules, so the famous thing
0:46:32.419 --> 0:46:34.350
is this attachment.
0:46:34.514 --> 0:46:39.724
So the cops saw the Bulgara with a binoculars.
0:46:39.724 --> 0:46:48.038
Then with a binocular can be attached to saw
or it can be attached to the.
0:46:48.448 --> 0:46:59.897
And so in the first two it's more probable
that he saw the theft, and not that the theft
0:46:59.897 --> 0:47:01.570
has the one.
0:47:01.982 --> 0:47:13.356
And this, of course, makes things difficult
while parsing and doing structure implicitly
0:47:13.356 --> 0:47:16.424
defining the semantics.
0:47:20.120 --> 0:47:29.736
Therefore, we would then go directly to semantics,
but maybe some questions about spintax and
0:47:29.736 --> 0:47:31.373
how that works.
0:47:33.113 --> 0:47:46.647
Then we'll do a bit more about semantics,
so now we only describe the structure of the
0:47:46.647 --> 0:47:48.203
sentence.
0:47:48.408 --> 0:47:55.584
And for the meaning of the sentence we typically
have the compositionality of meaning.
0:47:55.584 --> 0:48:03.091
The meaning of the full sentence is determined
by the meaning of the individual words, and
0:48:03.091 --> 0:48:06.308
they together form the meaning of the.
0:48:06.686 --> 0:48:17.936
For words that is partly true but not always
mean for things like rainbow, jointly rain
0:48:17.936 --> 0:48:19.086
and bow.
0:48:19.319 --> 0:48:26.020
But this is not always a case, while for sentences
typically that is happening because you can't
0:48:26.020 --> 0:48:30.579
directly determine the full meaning, but you
split it into parts.
0:48:30.590 --> 0:48:36.164
Sometimes only in some parts like kick the
bucket the expression.
0:48:36.164 --> 0:48:43.596
Of course you cannot get the meaning of kick
the bucket by looking at the individual or
0:48:43.596 --> 0:48:46.130
in German abyss in its grass.
0:48:47.207 --> 0:48:53.763
You cannot get that he died by looking at
the individual words of Bis ins grass, but
0:48:53.763 --> 0:48:54.611
they have.
0:48:55.195 --> 0:49:10.264
And there are different ways of describing
that some people have tried that more commonly
0:49:10.264 --> 0:49:13.781
used for some tasks.
0:49:14.654 --> 0:49:20.073
Will come to so the first thing would be something
like first order logic.
0:49:20.073 --> 0:49:27.297
If you have Peter loves Jane then you have
this meaning and you're having the end of representation
0:49:27.297 --> 0:49:33.005
that you have a love property between Peter
and Jane and you try to construct.
0:49:32.953 --> 0:49:40.606
That you will see this a lot more complex
than directly than only doing syntax but also
0:49:40.606 --> 0:49:43.650
doing this type of representation.
0:49:44.164 --> 0:49:47.761
The other thing is to try to do frame semantics.
0:49:47.867 --> 0:49:55.094
That means that you try to represent the knowledge
about the world and you have these ah frames.
0:49:55.094 --> 0:49:58.372
For example, you might have a frame to buy.
0:49:58.418 --> 0:50:05.030
And the meaning is that you have a commercial
transaction.
0:50:05.030 --> 0:50:08.840
You have a person who is selling.
0:50:08.969 --> 0:50:10.725
You Have a Person Who's Buying.
0:50:11.411 --> 0:50:16.123
You have something that is priced, you might
have a price, and so on.
0:50:17.237 --> 0:50:22.698
And then what you are doing in semantic parsing
with frame semantics you first try to determine.
0:50:22.902 --> 0:50:30.494
Which frames are happening in the sentence,
so if it's something with Bowie buying you
0:50:30.494 --> 0:50:33.025
would try to first identify.
0:50:33.025 --> 0:50:40.704
Oh, here we have to try Brain B, which does
not always have to be indicated by the verb
0:50:40.704 --> 0:50:42.449
cell or other ways.
0:50:42.582 --> 0:50:52.515
And then you try to find out which elements
of these frame are in the sentence and try
0:50:52.515 --> 0:50:54.228
to align them.
0:50:56.856 --> 0:51:01.121
Yeah, you have, for example, to buy and sell.
0:51:01.121 --> 0:51:07.239
If you have a model that has frames, they
have the same elements.
0:51:09.829 --> 0:51:15.018
In addition over like sentence, then you have
also a phenomenon beyond sentence level.
0:51:15.018 --> 0:51:20.088
We're coming to this later because it's a
special challenge for machine translation.
0:51:20.088 --> 0:51:22.295
There is, for example, co reference.
0:51:22.295 --> 0:51:27.186
That means if you first mention it, it's like
the President of the United States.
0:51:27.467 --> 0:51:30.107
And later you would refer to him maybe as
he.
0:51:30.510 --> 0:51:36.966
And that is especially challenging in machine
translation because you're not always using
0:51:36.966 --> 0:51:38.114
the same thing.
0:51:38.114 --> 0:51:44.355
Of course, for the president, it's he and
air in German, but for other things it might
0:51:44.355 --> 0:51:49.521
be different depending on the gender in languages
that you refer to it.
0:51:55.435 --> 0:52:03.866
So much for the background and the next, we
want to look based on the knowledge we have
0:52:03.866 --> 0:52:04.345
now.
0:52:04.345 --> 0:52:10.285
Why is machine translation difficult before
we have any more?
0:52:16.316 --> 0:52:22.471
The first type of problem is what we refer
to as translation divers.
0:52:22.471 --> 0:52:30.588
That means that we have the same information
in source and target, but the problem is that
0:52:30.588 --> 0:52:33.442
they are expressed differently.
0:52:33.713 --> 0:52:42.222
So it is not the same way, and we have to
translate these things more easily by just
0:52:42.222 --> 0:52:44.924
having a bit more complex.
0:52:45.325 --> 0:52:51.324
So example is if it's only a structure in
English, the delicious.
0:52:51.324 --> 0:52:59.141
The adjective is before the noun, while in
Spanish you have to put it after the noun,
0:52:59.141 --> 0:53:02.413
and so you have to change the word.
0:53:02.983 --> 0:53:10.281
So there are different ways of divergence,
so there can be structural divergence, which
0:53:10.281 --> 0:53:10.613
is.
0:53:10.550 --> 0:53:16.121
The word orders so that the order is different,
so in German we have that especially in the
0:53:16.121 --> 0:53:19.451
in the sub clause, while in English in the
sub clause.
0:53:19.451 --> 0:53:24.718
The verb is also at the second position, in
German it's at the end, and so you have to
0:53:24.718 --> 0:53:25.506
move it all.
0:53:25.465 --> 0:53:27.222
Um All Over.
0:53:27.487 --> 0:53:32.978
It can be that that it's a complete different
grammatical role.
0:53:33.253 --> 0:53:35.080
So,.
0:53:35.595 --> 0:53:37.458
You Have You Like Her.
0:53:38.238 --> 0:53:41.472
And eh in in.
0:53:41.261 --> 0:53:47.708
English: In Spanish it's a la ti gusta which
means she so now she is no longer like object
0:53:47.708 --> 0:53:54.509
but she is subject here and you are now acquisitive
and then pleases or like yeah so you really
0:53:54.509 --> 0:53:58.689
use a different sentence structure and you
have to change.
0:53:59.139 --> 0:54:03.624
Can also be the head switch.
0:54:03.624 --> 0:54:09.501
In English you say the baby just ate.
0:54:09.501 --> 0:54:16.771
In Spanish literary you say the baby finishes.
0:54:16.997 --> 0:54:20.803
So the is no longer the word, but the finishing
is the word.
0:54:21.241 --> 0:54:30.859
So you have to learn so you cannot always
have the same structures in your input and
0:54:30.859 --> 0:54:31.764
output.
0:54:36.856 --> 0:54:42.318
Lexical things like to swim across or to cross
swimming.
0:54:43.243 --> 0:54:57.397
You have categorical like an adjective gets
into a noun, so you have a little bread to
0:54:57.397 --> 0:55:00.162
make a decision.
0:55:00.480 --> 0:55:15.427
That is the one challenge and the even bigger
challenge is referred to as translation.
0:55:17.017 --> 0:55:19.301
That can be their lexical mismatch.
0:55:19.301 --> 0:55:21.395
That's the fish we talked about.
0:55:21.395 --> 0:55:27.169
If it's like the, the fish you eat or the
fish which is living is the two different worlds
0:55:27.169 --> 0:55:27.931
in Spanish.
0:55:28.108 --> 0:55:34.334
And then that's partly sometimes even not
known, so even the human might not be able
0:55:34.334 --> 0:55:34.627
to.
0:55:34.774 --> 0:55:40.242
Infer that you maybe need to see the context
you maybe need to have the sentences around,
0:55:40.242 --> 0:55:45.770
so one problem is that at least traditional
machine translation works on a sentence level,
0:55:45.770 --> 0:55:51.663
so we take each sentence and translate it independent
of everything else, but that's, of course,
0:55:51.663 --> 0:55:52.453
not correct.
0:55:52.532 --> 0:55:59.901
Will look into some ways of looking at and
doing document-based machine translation, but.
0:56:00.380 --> 0:56:06.793
There's gender information might be a problem,
so in English it's player and you don't know
0:56:06.793 --> 0:56:10.139
if it's Spieler Spielerin or if it's not known.
0:56:10.330 --> 0:56:15.770
But in the English, if you now generate German,
you should know is the reader.
0:56:15.770 --> 0:56:21.830
Does he know the gender or does he not know
the gender and then generate the right one?
0:56:22.082 --> 0:56:38.333
So just imagine a commentator if he's talking
about the player and you can see if it's male
0:56:38.333 --> 0:56:40.276
or female.
0:56:40.540 --> 0:56:47.801
So in generally the problem is that if you
have less information and you need more information
0:56:47.801 --> 0:56:51.928
in your target, this translation doesn't really
work.
0:56:55.175 --> 0:56:59.180
Another problem is we just talked about the
the.
0:56:59.119 --> 0:57:01.429
The co reference.
0:57:01.641 --> 0:57:08.818
So if you refer to an object and that can
be across sentence boundaries then you have
0:57:08.818 --> 0:57:14.492
to use the right pronoun and you cannot just
translate the pronoun.
0:57:14.492 --> 0:57:18.581
If the baby does not thrive on raw milk boil
it.
0:57:19.079 --> 0:57:28.279
And if you are now using it and just take
the typical translation, it will be: And That
0:57:28.279 --> 0:57:31.065
Will Be Ah Wrong.
0:57:31.291 --> 0:57:35.784
No, that will be even right because it is
dust baby.
0:57:35.784 --> 0:57:42.650
Yes, but I mean, you have to determine that
and it might be wrong at some point.
0:57:42.650 --> 0:57:48.753
So getting this this um yeah, it will be wrong
yes, that is right yeah.
0:57:48.908 --> 0:57:55.469
Because in English both are baby and milk,
and baby are both referred to it, so if you
0:57:55.469 --> 0:58:02.180
do S it will be to the first one referred to,
so it's correct, but in Germany it will be
0:58:02.180 --> 0:58:06.101
S, and so if you translate it as S it will
be baby.
0:58:06.546 --> 0:58:13.808
But you have to do Z because milk is female,
although that is really very uncommon because
0:58:13.808 --> 0:58:18.037
maybe a model is an object and so it should
be more.
0:58:18.358 --> 0:58:25.176
Of course, I agree there might be a situation
which is a bit created and not a common thing,
0:58:25.176 --> 0:58:29.062
but you can see that these things are not that
easy.
0:58:29.069 --> 0:58:31.779
Another example is this: Dr.
0:58:31.779 --> 0:58:37.855
McLean often brings his dog champion to visit
with his patients.
0:58:37.855 --> 0:58:41.594
He loves to give big wets loppy kisses.
0:58:42.122 --> 0:58:58.371
And there, of course, it's also important
if he refers to the dog or to the doctor.
0:58:59.779 --> 0:59:11.260
Another example of challenging is that we
don't have a fixed language and that was referred
0:59:11.260 --> 0:59:16.501
to morphology and we can build new words.
0:59:16.496 --> 0:59:23.787
So we can in all languages build new words
by just concatinating part of it like braxits,
0:59:23.787 --> 0:59:30.570
some things like: And then, of course, also
words don't exist in languages, don't exist
0:59:30.570 --> 0:59:31.578
in isolations.
0:59:32.012 --> 0:59:41.591
In Germany you can now use the word download
somewhere and you can also use a morphological
0:59:41.591 --> 0:59:43.570
operation on that.
0:59:43.570 --> 0:59:48.152
I guess there is even not the correct word.
0:59:48.508 --> 0:59:55.575
But so you have to deal with these things,
and yeah, in social meters.
0:59:55.996 --> 1:00:00.215
This word is maybe most of you have forgotten
already.
1:00:00.215 --> 1:00:02.517
This was ten years ago or so.
1:00:02.517 --> 1:00:08.885
I don't know there was a volcano in Iceland
which stopped Europeans flying around.
1:00:09.929 --> 1:00:14.706
So there is always new words coming up and
you have to deal with.
1:00:18.278 --> 1:00:24.041
Yeah, one last thing, so some of these examples
we have seen are a bit artificial.
1:00:24.041 --> 1:00:30.429
So one example what is very common with machine
translation doesn't really work is this box
1:00:30.429 --> 1:00:31.540
was in the pen.
1:00:32.192 --> 1:00:36.887
And maybe you would be surprised, at least
when read it.
1:00:36.887 --> 1:00:39.441
How can a box be inside a pen?
1:00:40.320 --> 1:00:44.175
Does anybody have a solution for that while
the sentence is still correct?
1:00:47.367 --> 1:00:51.692
Maybe it's directly clear for you, maybe your
English was aside, yeah.
1:00:54.654 --> 1:01:07.377
Yes, like at a farm or for small children,
and that is also called a pen or a pen on a
1:01:07.377 --> 1:01:08.254
farm.
1:01:08.368 --> 1:01:12.056
And then this is, and so you can mean okay.
1:01:12.056 --> 1:01:16.079
To infer these two meanings is quite difficult.
1:01:16.436 --> 1:01:23.620
But at least when I saw it, I wasn't completely
convinced because it's maybe not the sentence
1:01:23.620 --> 1:01:29.505
you're using in your daily life, and some of
these constructions seem to be.
1:01:29.509 --> 1:01:35.155
They are very good in showing where the problem
is, but the question is, does it really imply
1:01:35.155 --> 1:01:35.995
in real life?
1:01:35.996 --> 1:01:42.349
And therefore here some examples also that
we had here with a lecture translator that
1:01:42.349 --> 1:01:43.605
really occurred.
1:01:43.605 --> 1:01:49.663
They maybe looked simple, but you will see
that some of them still are happening.
1:01:50.050 --> 1:01:53.948
And they are partly about spitting words,
and then they are happening.
1:01:54.294 --> 1:01:56.816
So Um.
1:01:56.596 --> 1:02:03.087
We had a text about the numeral system in
German, the Silen system, which got splitted
1:02:03.087 --> 1:02:07.041
into sub parts because otherwise we can't translate.
1:02:07.367 --> 1:02:14.927
And then he did only a proximate match and
was talking about the binary payment system
1:02:14.927 --> 1:02:23.270
because the payment system was a lot more common
in the training data than the Thailand system.
1:02:23.823 --> 1:02:29.900
And so there you see like rare words, which
don't occur that often.
1:02:29.900 --> 1:02:38.211
They are very challenging to deal with because
we are good and inferring that sometimes, but
1:02:38.211 --> 1:02:41.250
for others that's very difficult.
1:02:44.344 --> 1:02:49.605
Another challenge is that, of course, the
context is very difficult.
1:02:50.010 --> 1:02:56.448
This is also an example a bit older from also
the lecture translators we were translating
1:02:56.448 --> 1:03:01.813
in mass lecture, and he was always talking
about the omens of the numbers.
1:03:02.322 --> 1:03:11.063
Which doesn't make any sense at all, but the
German word fortsizing can of course mean the
1:03:11.063 --> 1:03:12.408
sign and the.
1:03:12.732 --> 1:03:22.703
And if you not have the right to main knowledge
in there and encode it, it might use the main
1:03:22.703 --> 1:03:23.869
knowledge.
1:03:25.705 --> 1:03:31.205
A more recent version of that is like here
from a paper where it's about translating.
1:03:31.205 --> 1:03:36.833
We had this pivot based translation where
you translate maybe to English and to another
1:03:36.833 --> 1:03:39.583
because you have not enough training data.
1:03:40.880 --> 1:03:48.051
And we did that from Dutch to German guess
if you don't understand Dutch, if you speak
1:03:48.051 --> 1:03:48.710
German.
1:03:48.908 --> 1:03:56.939
So we have this raven forebuilt, which means
to geben in English.
1:03:56.939 --> 1:04:05.417
It's correctly in setting an example: However,
if we're then translate to German, he didn't
1:04:05.417 --> 1:04:11.524
get the full context, and in German you normally
don't set an example, but you give an example,
1:04:11.524 --> 1:04:16.740
and so yes, going through another language
you introduce their additional errors.
1:04:19.919 --> 1:04:27.568
Good so much for this are there more questions
about why this is difficult.
1:04:30.730 --> 1:04:35.606
Then we'll start with this one.
1:04:35.606 --> 1:04:44.596
I have to leave a bit early today in a quarter
of an hour.
1:04:44.904 --> 1:04:58.403
If you look about linguistic approaches to
machine translation, they are typically described
1:04:58.403 --> 1:05:03.599
by: So we can do a direct translation, so you
take the Suez language.
1:05:03.599 --> 1:05:09.452
Do not apply a lot of the analysis we were
discussing today about syntax representation,
1:05:09.452 --> 1:05:11.096
semantic representation.
1:05:11.551 --> 1:05:14.678
But you directly translate to your target
text.
1:05:14.678 --> 1:05:16.241
That's here the direct.
1:05:16.516 --> 1:05:19.285
Then there is a transfer based approach.
1:05:19.285 --> 1:05:23.811
Then you transfer everything over and you
do the text translation.
1:05:24.064 --> 1:05:28.354
And you can do that at two levels, more at
the syntax level.
1:05:28.354 --> 1:05:34.683
That means you only do synthetic analysts
like you do a pasture or so, or at the semantic
1:05:34.683 --> 1:05:37.848
level where you do a semantic parsing frame.
1:05:38.638 --> 1:05:51.489
Then there is an interlingua based approach
where you don't do any transfer anymore, but
1:05:51.489 --> 1:05:55.099
you only do an analysis.
1:05:57.437 --> 1:06:02.790
So how does now the direct transfer, the direct
translation?
1:06:03.043 --> 1:06:07.031
Look like it's one of the earliest approaches.
1:06:07.327 --> 1:06:18.485
So you do maybe some morphological analysts,
but not a lot, and then you do this bilingual
1:06:18.485 --> 1:06:20.202
word mapping.
1:06:20.540 --> 1:06:25.067
You might do some here in generations.
1:06:25.067 --> 1:06:32.148
These two things are not really big, but you
are working on.
1:06:32.672 --> 1:06:39.237
And of course this might be a first easy solution
about all the challenges we have seen that
1:06:39.237 --> 1:06:41.214
the structure is different.
1:06:41.214 --> 1:06:45.449
That you have to reorder, look at the agreement,
then work.
1:06:45.449 --> 1:06:47.638
That's why the first approach.
1:06:47.827 --> 1:06:54.618
So if we have different word order, structural
shifts or idiomatic expressions that doesn't
1:06:54.618 --> 1:06:55.208
really.
1:06:57.797 --> 1:07:05.034
Then there are these rule based approaches
which were more commonly used.
1:07:05.034 --> 1:07:15.249
They might still be somewhere: Mean most commonly
they are now used by neural networks but wouldn't
1:07:15.249 --> 1:07:19.254
be sure there is no system out there but.
1:07:19.719 --> 1:07:25.936
And in this transfer based approach we have
these steps there nicely visualized in the.
1:07:26.406 --> 1:07:32.397
Triangle, so we have the analytic of the sur
sentence where we then get some type of abstract
1:07:32.397 --> 1:07:33.416
representation.
1:07:33.693 --> 1:07:40.010
Then we are doing the transfer of the representation
of the source sentence into the representation
1:07:40.010 --> 1:07:40.263
of.
1:07:40.580 --> 1:07:46.754
And then we have the generation where we take
this abstract representation and do then the
1:07:46.754 --> 1:07:47.772
surface forms.
1:07:47.772 --> 1:07:54.217
For example, it might be that there is no
morphological variants in the episode representation
1:07:54.217 --> 1:07:56.524
and we have to do this agreement.
1:07:56.656 --> 1:08:00.077
Which components do you they need?
1:08:01.061 --> 1:08:08.854
You need monolingual source and target lexicon
and the corresponding grammars in order to
1:08:08.854 --> 1:08:12.318
do both the analyst and the generation.
1:08:12.412 --> 1:08:18.584
Then you need the bilingual dictionary in
order to do the lexical translation and the
1:08:18.584 --> 1:08:25.116
bilingual transfer rules in order to transfer
the grammar, for example in German, into the
1:08:25.116 --> 1:08:28.920
grammar in English, and that enables you to
do that.
1:08:29.269 --> 1:08:32.579
So an example is is something like this here.
1:08:32.579 --> 1:08:38.193
So if you're doing a syntactic transfer it
means you're starting with John E.
1:08:38.193 --> 1:08:38.408
Z.
1:08:38.408 --> 1:08:43.014
Apple you do the analyst then you have this
type of graph here.
1:08:43.014 --> 1:08:48.340
Therefore you need your monolingual lexicon
and your monolingual grammar.
1:08:48.748 --> 1:08:59.113
Then you're doing the transfer where you're
transferring this representation into this
1:08:59.113 --> 1:09:01.020
representation.
1:09:01.681 --> 1:09:05.965
So how could this type of translation then
look like?
1:09:07.607 --> 1:09:08.276
Style.
1:09:08.276 --> 1:09:14.389
We have the example of a delicious soup and
una soup deliciosa.
1:09:14.894 --> 1:09:22.173
This is your source language tree and this
is your target language tree and then the rules
1:09:22.173 --> 1:09:26.092
that you need are these ones to do the transfer.
1:09:26.092 --> 1:09:31.211
So if you have a noun phrase that also goes
to the noun phrase.
1:09:31.691 --> 1:09:44.609
You see here that the switch is happening,
so the second position is here at the first
1:09:44.609 --> 1:09:46.094
position.
1:09:46.146 --> 1:09:52.669
Then you have the translation of determiner
of the words, so the dictionary entries.
1:09:53.053 --> 1:10:07.752
And with these types of rules you can then
do these mappings and do the transfer between
1:10:07.752 --> 1:10:11.056
the representation.
1:10:25.705 --> 1:10:32.505
Think it more depends on the amount of expertise
you have in representing them.
1:10:32.505 --> 1:10:35.480
The rules will get more difficult.
1:10:36.136 --> 1:10:42.445
For example, these rule based were, so I think
it more depends on how difficult the structure
1:10:42.445 --> 1:10:42.713
is.
1:10:42.713 --> 1:10:48.619
So for German generating German they were
quite long, quite successful because modeling
1:10:48.619 --> 1:10:52.579
all the German phenomena which are in there
was difficult.
1:10:52.953 --> 1:10:56.786
And that can be done there, and it wasn't
easy to learn that just from data.
1:10:59.019 --> 1:11:07.716
Think even if you think about Chinese and
English or so, if you have the trees there
1:11:07.716 --> 1:11:10.172
is quite some rule and.
1:11:15.775 --> 1:11:23.370
Another thing is you can also try to do something
like that on the semantic, which means this
1:11:23.370 --> 1:11:24.905
gets more complex.
1:11:25.645 --> 1:11:31.047
This gets maybe a bit easier because this
representation, the semantic representation
1:11:31.047 --> 1:11:36.198
between languages, are more similar and therefore
this gets more difficult again.
1:11:36.496 --> 1:11:45.869
So typically if you go higher in your triangle
this is more work while this is less work.
1:11:49.729 --> 1:11:56.023
So it can be then, for example, like in Gusta,
we have again that the the the order changes.
1:11:56.023 --> 1:12:02.182
So you see the transfer rule for like is that
the first argument is here and the second is
1:12:02.182 --> 1:12:06.514
there, while on the on the Gusta side here
the second argument.
1:12:06.466 --> 1:12:11.232
It is in the first position and the first
argument is in the second position.
1:12:11.511 --> 1:12:14.061
So that you do yeah, and also there you're
ordering,.
1:12:14.354 --> 1:12:20.767
From the principle it is more like you have
a different type of formalism of representing
1:12:20.767 --> 1:12:27.038
your sentence and therefore you need to do
more on one side and less on the other side.
1:12:32.852 --> 1:12:42.365
Then so in general transfer based approaches
are you have to first select how to represent
1:12:42.365 --> 1:12:44.769
a synthetic structure.
1:12:45.165 --> 1:12:55.147
There's like these variable abstraction levels
and then you have the three components: The
1:12:55.147 --> 1:13:04.652
disadvantage is that on the one hand you need
normally a lot of experts monolingual experts
1:13:04.652 --> 1:13:08.371
who analyze how to do the transfer.
1:13:08.868 --> 1:13:18.860
And if you're doing a new language, you have
to do analyst transfer in generation and the
1:13:18.860 --> 1:13:19.970
transfer.
1:13:20.400 --> 1:13:27.074
So if you need one language, add one language
in existing systems, of course you have to
1:13:27.074 --> 1:13:29.624
do transfer to all the languages.
1:13:32.752 --> 1:13:39.297
Therefore, the other idea which people were
interested in is the interlingua based machine
1:13:39.297 --> 1:13:40.232
translation.
1:13:40.560 --> 1:13:47.321
Where the idea is that we have this intermediate
language with this abstract language independent
1:13:47.321 --> 1:13:53.530
representation and so the important thing is
it's language independent so it's really the
1:13:53.530 --> 1:13:59.188
same for all language and it's a pure meaning
and there is no ambiguity in there.
1:14:00.100 --> 1:14:05.833
That allows this nice translation without
transfer, so you just do an analysis into your
1:14:05.833 --> 1:14:11.695
representation, and there afterwards you do
the generation into the other target language.
1:14:13.293 --> 1:14:16.953
And that of course makes especially multilingual.
1:14:16.953 --> 1:14:19.150
It's like somehow is a dream.
1:14:19.150 --> 1:14:25.519
If you want to add a language you just need
to add one analyst tool and one generation
1:14:25.519 --> 1:14:25.959
tool.
1:14:29.249 --> 1:14:32.279
Which is not the case in the other scenario.
1:14:33.193 --> 1:14:40.547
However, the big challenge is in this case
the interlingua based representation because
1:14:40.547 --> 1:14:47.651
you need to represent all different types of
knowledge in there in order to do that.
1:14:47.807 --> 1:14:54.371
And also like world knowledge, so something
like an apple is a fruit and property is a
1:14:54.371 --> 1:14:57.993
fruit, so they are eatable and stuff like that.
1:14:58.578 --> 1:15:06.286
So that is why this is typically always only
done for small amounts of data.
1:15:06.326 --> 1:15:13.106
So what people have done for special applications
like hotel reservation people have looked into
1:15:13.106 --> 1:15:18.348
that, but they have typically not done it for
any possibility of doing it.
1:15:18.718 --> 1:15:31.640
So the advantage is you need to represent
all the world knowledge in your interlingua.
1:15:32.092 --> 1:15:40.198
And that is not possible at the moment or
never was possible so far.
1:15:40.198 --> 1:15:47.364
Typically they were for small domains for
hotel reservation.
1:15:51.431 --> 1:15:57.926
But of course this idea of doing that and
that's why some people are interested in is
1:15:57.926 --> 1:16:04.950
like if you now do a neural system where you
learn the representation in your neural network
1:16:04.950 --> 1:16:07.442
is that some type of artificial.
1:16:08.848 --> 1:16:09.620
Interlingua.
1:16:09.620 --> 1:16:15.025
However, what we at least found out until
now is that there's often very language specific
1:16:15.025 --> 1:16:15.975
information in.
1:16:16.196 --> 1:16:19.648
And they might be important and essential.
1:16:19.648 --> 1:16:26.552
You don't have all the information in your
input, so you typically can't do resolving
1:16:26.552 --> 1:16:32.412
all ambiguities inside there because you might
not have all information.
1:16:32.652 --> 1:16:37.870
So in English you don't know if it's a living
fish or the fish which you're eating, and if
1:16:37.870 --> 1:16:43.087
you're translating to Germany you also don't
have to resolve this problem because you have
1:16:43.087 --> 1:16:45.610
the same ambiguity in your target language.
1:16:45.610 --> 1:16:50.828
So why would you put in our effort in finding
out if it's a dish or the other fish if it's
1:16:50.828 --> 1:16:52.089
not necessary at all?
1:16:54.774 --> 1:16:59.509
Yeah Yeah.
1:17:05.585 --> 1:17:15.019
The semantic transfer is not the same for
both languages, so you still represent the
1:17:15.019 --> 1:17:17.127
semantic language.
1:17:17.377 --> 1:17:23.685
So you have the like semantic representation
in the Gusta, but that's not the same as semantic
1:17:23.685 --> 1:17:28.134
representation for both languages, and that's
the main difference.
1:17:35.515 --> 1:17:44.707
Okay, then these are the most important things
for today: what is language and how our rule
1:17:44.707 --> 1:17:46.205
based systems.
1:17:46.926 --> 1:17:59.337
And if there is no more questions thank you
for joining, we have today a bit of a shorter
1:17:59.337 --> 1:18:00.578
lecture.