Spaces:
Running
Running
WEBVTT | |
0:00:01.641 --> 0:00:06.302 | |
Hey so what again to today's lecture on machine | |
translation. | |
0:00:07.968 --> 0:00:15.152 | |
This week we'll have a bit of different focus, | |
so last two weeks or so we have looking into. | |
0:00:15.655 --> 0:00:28.073 | |
How we can improve our system by having more | |
data, other data sources, or using them to | |
0:00:28.073 --> 0:00:30.331 | |
more efficient. | |
0:00:30.590 --> 0:00:38.046 | |
And we'll have a bit more of that next week | |
with the anti-travised and the context. | |
0:00:38.338 --> 0:00:47.415 | |
So that we are shifting from this idea of | |
we treat each sentence independently, but treat | |
0:00:47.415 --> 0:00:49.129 | |
the translation. | |
0:00:49.129 --> 0:00:58.788 | |
Because maybe you can remember from the beginning, | |
there are phenomenon in machine translation | |
0:00:58.788 --> 0:01:02.143 | |
that you cannot correctly check. | |
0:01:03.443 --> 0:01:14.616 | |
However, today we want to more look into what | |
challenges arise, specifically when we're practically | |
0:01:14.616 --> 0:01:16.628 | |
applying machine. | |
0:01:17.017 --> 0:01:23.674 | |
And this block will be a total of four different | |
lectures. | |
0:01:23.674 --> 0:01:29.542 | |
What type of biases are in machine translation | |
can. | |
0:01:29.729 --> 0:01:37.646 | |
Just then can we try to improve this, but | |
of course the first focus can be at least the. | |
0:01:37.717 --> 0:01:41.375 | |
And this, of course, gets more and more important. | |
0:01:41.375 --> 0:01:48.333 | |
The more often you apply this type of technology, | |
when it was mainly a basic research tool which | |
0:01:48.333 --> 0:01:53.785 | |
you were using in a research environment, it's | |
not directly that important. | |
0:01:54.054 --> 0:02:00.370 | |
But once you apply it to the question, is | |
it performed the same for everybody or is it | |
0:02:00.370 --> 0:02:04.436 | |
performance of some people less good than other | |
people? | |
0:02:04.436 --> 0:02:10.462 | |
Does it have specific challenges and we are | |
seeing that especially in translation? | |
0:02:10.710 --> 0:02:13.420 | |
We have the major challenge. | |
0:02:13.420 --> 0:02:20.333 | |
We have the grammatical gender and this is | |
not the same in all languages. | |
0:02:20.520 --> 0:02:35.431 | |
In English, it's not clear if you talk about | |
some person, if it's male or female, and so | |
0:02:35.431 --> 0:02:39.787 | |
hopefully you've learned. | |
0:02:41.301 --> 0:02:50.034 | |
Just as a brief view, so based on this one | |
aspect of application will then have two other | |
0:02:50.034 --> 0:02:57.796 | |
aspects: On Thursday we'll look into adaptation, | |
so how can we adapt to specific situations? | |
0:02:58.718 --> 0:03:09.127 | |
Because we have seen that your systems perform | |
well when the test case is similar to the training | |
0:03:09.127 --> 0:03:15.181 | |
case, it's always the case you should get training | |
data. | |
0:03:16.036 --> 0:03:27.577 | |
However, in practical applications, it's not | |
always possible to collect really the best | |
0:03:27.577 --> 0:03:31.642 | |
fitting data, so in that case. | |
0:03:32.092 --> 0:03:39.269 | |
And then the third larger group of applications | |
will then be speech translation. | |
0:03:39.269 --> 0:03:42.991 | |
What do we have to change in our machine? | |
0:03:43.323 --> 0:03:53.569 | |
If we are now not translating text, but if | |
we want to translate speech, that will be more | |
0:03:53.569 --> 0:03:54.708 | |
lectures. | |
0:04:00.180 --> 0:04:12.173 | |
So what are we talking about when we are talking | |
about bias from a definition point? | |
0:04:12.092 --> 0:04:21.799 | |
Means we are introducing systematic errors | |
when testing, and then we encourage the selection | |
0:04:21.799 --> 0:04:24.408 | |
of the specific answers. | |
0:04:24.804 --> 0:04:36.862 | |
The most prominent case, which is analyzed | |
most in the research community, is a bias based | |
0:04:36.862 --> 0:04:38.320 | |
on gender. | |
0:04:38.320 --> 0:04:43.355 | |
One example: she works in a hospital. | |
0:04:43.523 --> 0:04:50.787 | |
It is not directly able to assess whether | |
this is now a point or a friend. | |
0:04:51.251 --> 0:05:07.095 | |
And although in this one even there is, it's | |
possible to ambiguate this based on the context. | |
0:05:07.127 --> 0:05:14.391 | |
However, there is yeah, this relation to learn | |
is of course not that easy. | |
0:05:14.614 --> 0:05:27.249 | |
So the system might also learn more like shortcut | |
connections, which might be that in your training | |
0:05:27.249 --> 0:05:31.798 | |
data most of the doctors are males. | |
0:05:32.232 --> 0:05:41.725 | |
That is like that was too bigly analyzed and | |
biased, and we'll focus on that also in this. | |
0:05:41.641 --> 0:05:47.664 | |
In this lecture, however, of course, the system | |
might be a lot of other biases too, which have | |
0:05:47.664 --> 0:05:50.326 | |
been partly investigated in other fields. | |
0:05:50.326 --> 0:05:53.496 | |
But I think machine translation is not that | |
much. | |
0:05:53.813 --> 0:05:57.637 | |
For example, it can be based on your originals. | |
0:05:57.737 --> 0:06:09.405 | |
So there is an example for a sentiment analysis | |
that's a bit prominent. | |
0:06:09.405 --> 0:06:15.076 | |
A sentiment analysis means you're. | |
0:06:15.035 --> 0:06:16.788 | |
Like you're seeing it in reviews. | |
0:06:17.077 --> 0:06:24.045 | |
And then you can show that with baseline models, | |
if the name is Mohammed then the sentiment | |
0:06:24.045 --> 0:06:30.786 | |
in a lot of systems will be more negative than | |
if it's like a traditional European name. | |
0:06:31.271 --> 0:06:33.924 | |
Are with foods that is simple. | |
0:06:33.924 --> 0:06:36.493 | |
It's this type of restaurant. | |
0:06:36.493 --> 0:06:38.804 | |
It's positive and another. | |
0:06:39.319 --> 0:06:49.510 | |
You have other aspects, so we have seen this. | |
0:06:49.510 --> 0:06:59.480 | |
We have done some experiments in Vietnamese. | |
0:06:59.559 --> 0:07:11.040 | |
And then, for example, you can analyze that | |
if it's like he's Germany will address it more | |
0:07:11.040 --> 0:07:18.484 | |
formal, while if he is North Korean he'll use | |
an informal. | |
0:07:18.838 --> 0:07:24.923 | |
So these are also possible types of gender. | |
0:07:24.923 --> 0:07:31.009 | |
However, this is difficult types of biases. | |
0:07:31.251 --> 0:07:38.903 | |
However, especially in translation, the bias | |
for gender is the most challenging because | |
0:07:38.903 --> 0:07:42.989 | |
we are treating gender in different languages. | |
0:07:45.405 --> 0:07:46.930 | |
Hi this is challenging. | |
0:07:48.148 --> 0:07:54.616 | |
The reason for that is that there is a translation | |
mismatch and we have, I mean, one reason for | |
0:07:54.616 --> 0:08:00.140 | |
that is there's a translation mismatch and | |
that's the most challenging situation. | |
0:08:00.140 --> 0:08:05.732 | |
So there is there is different information | |
in the Sears language or in the target. | |
0:08:06.046 --> 0:08:08.832 | |
So if we have the English word dot player,. | |
0:08:09.029 --> 0:08:12.911 | |
It's there is no information about the gender | |
in there. | |
0:08:12.911 --> 0:08:19.082 | |
However, if you want to translate in German, | |
you cannot easily generate a word without a | |
0:08:19.082 --> 0:08:20.469 | |
gender information. | |
0:08:20.469 --> 0:08:27.056 | |
Or man, you can't do something like Shubila | |
in, but that sounds a bit weird if you're talking. | |
0:08:27.027 --> 0:08:29.006 | |
About a specific person. | |
0:08:29.006 --> 0:08:32.331 | |
Then you should use the appropriate font. | |
0:08:32.692 --> 0:08:44.128 | |
And so it's most challenging translation as | |
always in this situation where you have less | |
0:08:44.128 --> 0:08:50.939 | |
information on the source side but more information. | |
0:08:51.911 --> 0:08:57.103 | |
Similar things like if you think about Japanese, | |
for example where there's different formality | |
0:08:57.103 --> 0:08:57.540 | |
levels. | |
0:08:57.540 --> 0:09:02.294 | |
If in German there is no formality or like | |
two only or in English there's no formality | |
0:09:02.294 --> 0:09:02.677 | |
level. | |
0:09:02.862 --> 0:09:08.139 | |
And now you have to estimate the formality | |
level. | |
0:09:08.139 --> 0:09:10.884 | |
Of course, it takes some. | |
0:09:10.884 --> 0:09:13.839 | |
It's not directly possible. | |
0:09:14.094 --> 0:09:20.475 | |
What nowadays systems are doing is at least | |
assess. | |
0:09:20.475 --> 0:09:27.470 | |
This is a situation where don't have enough | |
information. | |
0:09:27.567 --> 0:09:28.656 | |
Translation. | |
0:09:28.656 --> 0:09:34.938 | |
So here you have that suggesting it can be | |
doctor or doctorate in Spanish. | |
0:09:35.115 --> 0:09:37.051 | |
So that is a possibility. | |
0:09:37.051 --> 0:09:41.595 | |
However, it is of course very, very challenging | |
to find out. | |
0:09:42.062 --> 0:09:46.130 | |
Is there two really different meanings, or | |
is it not the case? | |
0:09:46.326 --> 0:09:47.933 | |
You can do the big rule base here. | |
0:09:47.933 --> 0:09:49.495 | |
Maybe don't know how they did it. | |
0:09:49.990 --> 0:09:57.469 | |
You can, of course, if you are focusing on | |
gender, the source and the target is different, | |
0:09:57.469 --> 0:09:57.879 | |
and. | |
0:09:58.118 --> 0:10:05.799 | |
But if you want to do it more general, it's | |
not that easy because there's always. | |
0:10:06.166 --> 0:10:18.255 | |
But it's not clear if these are really different | |
or if there's only slight differences. | |
0:10:22.142 --> 0:10:36.451 | |
Between that another reason why there is a | |
bias in there is typically the system tries | |
0:10:36.451 --> 0:10:41.385 | |
to always do the most simple. | |
0:10:42.262 --> 0:10:54.483 | |
And also in your training data there are unintended | |
shortcuts or clues only in the training data | |
0:10:54.483 --> 0:10:59.145 | |
because you sample them in some way. | |
0:10:59.379 --> 0:11:06.257 | |
This example, if she works in a hospital and | |
my friend is a nurse, then it might be that | |
0:11:06.257 --> 0:11:07.184 | |
one friend. | |
0:11:08.168 --> 0:11:18.979 | |
Male and female because it has learned that | |
in your trained doctor is a male and a nurse | |
0:11:18.979 --> 0:11:20.802 | |
is doing this. | |
0:11:20.880 --> 0:11:29.587 | |
And of course, if we are doing maximum likelihood | |
approximation as we are doing it in general, | |
0:11:29.587 --> 0:11:30.962 | |
we are always. | |
0:11:30.951 --> 0:11:43.562 | |
So that means if in your training data this | |
correlation is maybe in the case then your | |
0:11:43.562 --> 0:11:48.345 | |
predictions are always the same. | |
0:11:48.345 --> 0:11:50.375 | |
It typically. | |
0:11:55.035 --> 0:12:06.007 | |
What does it mean, of course, if we are having | |
this type of fires and if we are applying? | |
0:12:05.925 --> 0:12:14.821 | |
It might be that the benefit of machine translation | |
rice so more and more people can benefit from | |
0:12:14.821 --> 0:12:20.631 | |
the ability to talk to people in different | |
languages and so on. | |
0:12:20.780 --> 0:12:27.261 | |
But if you more often use it, problems of | |
the system also get more and more important. | |
0:12:27.727 --> 0:12:36.984 | |
And so if we are seeing that these problems | |
and people nowadays only start to analyze these | |
0:12:36.984 --> 0:12:46.341 | |
problems partly, also because if it hasn't | |
been used, it's not that important if the quality | |
0:12:46.341 --> 0:12:47.447 | |
is so bad. | |
0:12:47.627 --> 0:12:51.907 | |
Version or is mixing it all the time like | |
we have seen in old systems. | |
0:12:51.907 --> 0:12:52.993 | |
Then, of course,. | |
0:12:53.053 --> 0:12:57.303 | |
The issue is not that you have biased issues | |
that you at first need to create a right view. | |
0:12:57.637 --> 0:13:10.604 | |
So only with the wide application of the good | |
quality this becomes important, and then of | |
0:13:10.604 --> 0:13:15.359 | |
course you should look into how. | |
0:13:15.355 --> 0:13:23.100 | |
In order to first get aware of what are the | |
challenges, and that is a general idea not | |
0:13:23.100 --> 0:13:24.613 | |
only about bias. | |
0:13:24.764 --> 0:13:31.868 | |
Of course, we have learned about blue scores, | |
so how can you evaluate the over quality and | |
0:13:31.868 --> 0:13:36.006 | |
they are very important, either blue or any | |
of that. | |
0:13:36.006 --> 0:13:40.378 | |
However, they are somehow giving us a general | |
overview. | |
0:13:40.560 --> 0:13:58.410 | |
And if we want to improve our systems, of | |
course it's important that we also do more | |
0:13:58.410 --> 0:14:00.510 | |
detailed. | |
0:14:00.340 --> 0:14:05.828 | |
Test sets which are very challenging in order | |
to attend to see how good these systems. | |
0:14:06.446 --> 0:14:18.674 | |
Of course, one last reminder to that if you | |
do a challenge that says it's typically good | |
0:14:18.674 --> 0:14:24.581 | |
to keep track of your general performance. | |
0:14:24.784 --> 0:14:28.648 | |
You don't want to improve normally then on | |
the general quality. | |
0:14:28.688 --> 0:14:41.555 | |
So if you build a system which will mitigate | |
some biases then the aim is that if you evaluate | |
0:14:41.555 --> 0:14:45.662 | |
it on the challenging biases. | |
0:14:45.745 --> 0:14:53.646 | |
You don't need to get better because the aggregated | |
versions don't really measure that aspect well, | |
0:14:53.646 --> 0:14:57.676 | |
but if you significantly drop in performance | |
then. | |
0:15:00.000 --> 0:15:19.164 | |
What are, in generally calms, people report | |
about that or why should you care about? | |
0:15:19.259 --> 0:15:23.598 | |
And you're even then amplifying this type | |
of stereotypes. | |
0:15:23.883 --> 0:15:33.879 | |
And that is not what you want to achieve with | |
using this technology. | |
0:15:33.879 --> 0:15:39.384 | |
It's not working through some groups. | |
0:15:39.819 --> 0:15:47.991 | |
And secondly what is referred to as allocational | |
parts. | |
0:15:47.991 --> 0:15:54.119 | |
The system might not perform as well for. | |
0:15:54.314 --> 0:16:00.193 | |
So another example of which we would like | |
to see is that sometimes the translation depends | |
0:16:00.193 --> 0:16:01.485 | |
on who is speaking. | |
0:16:01.601 --> 0:16:03.463 | |
So Here You Have It in French. | |
0:16:03.723 --> 0:16:16.359 | |
Not say it, but the word happy or French has | |
to be expressed differently, whether it's a | |
0:16:16.359 --> 0:16:20.902 | |
male person or a female person. | |
0:16:21.121 --> 0:16:28.917 | |
It's nearly impossible to guess that or it's | |
impossible, so then you always select one. | |
0:16:29.189 --> 0:16:37.109 | |
And of course, since we do greedy search, | |
it will always generate the same, so you will | |
0:16:37.109 --> 0:16:39.449 | |
have a worse performance. | |
0:16:39.779 --> 0:16:46.826 | |
And of course not what we want to achieve | |
in average. | |
0:16:46.826 --> 0:16:54.004 | |
You might be then good, but you also have | |
the ability. | |
0:16:54.234 --> 0:17:08.749 | |
This is a biased problem or an interface problem | |
because mean you can say well. | |
0:17:09.069 --> 0:17:17.358 | |
And if you do it, we still have a system that | |
generates unusable output. | |
0:17:17.358 --> 0:17:24.057 | |
If you don't tell it what you want to do, | |
so in this case. | |
0:17:24.244 --> 0:17:27.173 | |
So in this case it's like if we don't have | |
enough information. | |
0:17:27.467 --> 0:17:34.629 | |
So you have to adapt your system in some way | |
that can either access the information or output. | |
0:17:34.894 --> 0:17:46.144 | |
But yeah, how you mean there's different ways | |
of how to improve over that first thing is | |
0:17:46.144 --> 0:17:47.914 | |
you find out. | |
0:17:48.688 --> 0:17:53.826 | |
Then there is different ways of addressing | |
them, and they of course differ. | |
0:17:53.826 --> 0:17:57.545 | |
Isn't the situation where the information's | |
available? | |
0:17:58.038 --> 0:18:12.057 | |
That's the first case we have, or is it a | |
situation where we don't have the information | |
0:18:12.057 --> 0:18:13.332 | |
either? | |
0:18:14.154 --> 0:18:28.787 | |
Or should give the system maybe the opportunity | |
to output those or say don't know this is still | |
0:18:28.787 --> 0:18:29.701 | |
open. | |
0:18:29.769 --> 0:18:35.470 | |
And even if they have enough information, | |
need this additional information, but they | |
0:18:35.470 --> 0:18:36.543 | |
are just doing. | |
0:18:36.776 --> 0:18:51.132 | |
Which is a bit based on how we find that there | |
is research on that, but it's not that easy | |
0:18:51.132 --> 0:18:52.710 | |
to solve. | |
0:18:52.993 --> 0:19:05.291 | |
But in general, detecting do have enough information | |
to do a good translation or are information | |
0:19:05.291 --> 0:19:06.433 | |
missing? | |
0:19:09.669 --> 0:19:18.951 | |
But before we come on how we will address | |
it or try to change it, and before we look | |
0:19:18.951 --> 0:19:22.992 | |
at how we can assess it, of course,. | |
0:19:23.683 --> 0:19:42.820 | |
And therefore wanted to do a bit of a review | |
on how gender is represented in languages. | |
0:19:43.743 --> 0:19:48.920 | |
Course: You can have more fine grained. | |
0:19:48.920 --> 0:20:00.569 | |
It's not that everything in the group is the | |
same, but in general you have a large group. | |
0:20:01.381 --> 0:20:08.347 | |
For example, you even don't say ishi or but | |
it's just one word for it written. | |
0:20:08.347 --> 0:20:16.107 | |
Oh, don't know how it's pronounced, so you | |
cannot say from a sentence whether it's ishi | |
0:20:16.107 --> 0:20:16.724 | |
or it. | |
0:20:17.937 --> 0:20:29.615 | |
Of course, there are some exceptions for whether | |
it's a difference between male and female. | |
0:20:29.615 --> 0:20:35.962 | |
They have different names for brother and | |
sister. | |
0:20:36.036 --> 0:20:41.772 | |
So normally you cannot infer whether this | |
is a male speaker or speaking about a male | |
0:20:41.772 --> 0:20:42.649 | |
or a female. | |
0:20:44.304 --> 0:20:50.153 | |
Examples for these languages are, for example, | |
Finnish and Turkish. | |
0:20:50.153 --> 0:21:00.370 | |
There are more languages, but these are: Then | |
we have no nutritional gender languages where | |
0:21:00.370 --> 0:21:05.932 | |
there's some gender information in there, but | |
it's. | |
0:21:05.905 --> 0:21:08.169 | |
And this is an example. | |
0:21:08.169 --> 0:21:15.149 | |
This is English, which is in that way a nice | |
example because most people. | |
0:21:15.415 --> 0:21:20.164 | |
So you have there some lexicogender and phenomenal | |
gender. | |
0:21:20.164 --> 0:21:23.303 | |
I mean mamadeta there she-hee and him. | |
0:21:23.643 --> 0:21:31.171 | |
And very few words are marked like actor and | |
actress, but in general most words are not | |
0:21:31.171 --> 0:21:39.468 | |
marked, so it's teacher and lecturer and friend, | |
so in all these words the gender is not marked, | |
0:21:39.468 --> 0:21:41.607 | |
and so you cannot infer. | |
0:21:42.622 --> 0:21:48.216 | |
So the initial Turkish sentence here would | |
be translated to either he is a good friend | |
0:21:48.216 --> 0:21:49.373 | |
or she is a good. | |
0:21:51.571 --> 0:22:05.222 | |
In this case you would have them gender information | |
in there, but of course there's a good friend. | |
0:22:07.667 --> 0:22:21.077 | |
And then finally there is the grammatical | |
German languages where each noun has a gender. | |
0:22:21.077 --> 0:22:25.295 | |
That's the case in Spanish. | |
0:22:26.186 --> 0:22:34.025 | |
This is mostly formal, but at least if you're | |
talking about a human that also agrees. | |
0:22:34.214 --> 0:22:38.209 | |
Of course, it's like the sun. | |
0:22:38.209 --> 0:22:50.463 | |
There is no clear thing why the sun should | |
be female, and in other language it's different. | |
0:22:50.390 --> 0:22:56.100 | |
The matching, and then you also have more | |
agreements with this that makes things more | |
0:22:56.100 --> 0:22:56.963 | |
complicated. | |
0:22:57.958 --> 0:23:08.571 | |
Here he is a good friend and the good is also | |
depending whether it's male or went up so it's | |
0:23:08.571 --> 0:23:17.131 | |
changing also based on the gender so you have | |
a lot of gender information. | |
0:23:17.777 --> 0:23:21.364 | |
Get them, but do you always get them correctly? | |
0:23:21.364 --> 0:23:25.099 | |
It might be that they're in English, for example. | |
0:23:28.748 --> 0:23:36.154 | |
And since this is the case, and you need to | |
like often express the gender even though you | |
0:23:36.154 --> 0:23:37.059 | |
might not. | |
0:23:37.377 --> 0:23:53.030 | |
Aware of it or it's not possible, there's | |
some ways in German how to mark mutual forms. | |
0:23:54.194 --> 0:24:03.025 | |
But then it's again from the machine learning | |
side of view, of course quite challenging because | |
0:24:03.025 --> 0:24:05.417 | |
you only want to use the. | |
0:24:05.625 --> 0:24:11.108 | |
If it's known to the reader you want to use | |
the correct, the not mutual form but either | |
0:24:11.108 --> 0:24:12.354 | |
the male or female. | |
0:24:13.013 --> 0:24:21.771 | |
So they are assessing what is known to the | |
reader as a challenge which needs to in some | |
0:24:21.771 --> 0:24:23.562 | |
way be addressed. | |
0:24:26.506 --> 0:24:30.887 | |
Here why does that happen? | |
0:24:30.887 --> 0:24:42.084 | |
Three reasons we have that in a bit so one | |
is, of course, that your. | |
0:24:42.162 --> 0:24:49.003 | |
Example: If you look at the Europe High Corpus, | |
which is an important resource for doing machine | |
0:24:49.003 --> 0:24:49.920 | |
translation. | |
0:24:50.010 --> 0:24:59.208 | |
Then there's only thirty percent of the speakers | |
are female, and so if you train a model on | |
0:24:59.208 --> 0:25:06.606 | |
that data, if you're translating to French, | |
there will be a male version. | |
0:25:06.746 --> 0:25:10.762 | |
And so you'll just have a lot more like seventy | |
percent of your mail for it. | |
0:25:10.971 --> 0:25:18.748 | |
And that will be Yep will make the model therefore | |
from this data sub. | |
0:25:18.898 --> 0:25:25.882 | |
And of course this will be in the data for | |
a very long time. | |
0:25:25.882 --> 0:25:33.668 | |
So if there's more female speakers in the | |
European Parliament, but. | |
0:25:33.933 --> 0:25:42.338 | |
But we are training on historical data, so | |
even if there is for a long time, it will not | |
0:25:42.338 --> 0:25:43.377 | |
be in the. | |
0:25:46.346 --> 0:25:57.457 | |
Then besides these preexisting data there | |
is of course technical biases which will amplify | |
0:25:57.457 --> 0:25:58.800 | |
this type. | |
0:25:59.039 --> 0:26:04.027 | |
So one we already address, that's for example | |
sampling or beam search. | |
0:26:04.027 --> 0:26:06.416 | |
You get the most probable output. | |
0:26:06.646 --> 0:26:16.306 | |
So if there's a bias in your model, it will | |
amplify that not only in the case we had before, | |
0:26:16.306 --> 0:26:19.423 | |
and produce the male version. | |
0:26:20.040 --> 0:26:32.873 | |
So if you have the same source sentence like | |
am happy and in your training data it will | |
0:26:32.873 --> 0:26:38.123 | |
be male and female if you're doing. | |
0:26:38.418 --> 0:26:44.510 | |
So in that way by doing this type of algorithmic | |
design you will have. | |
0:26:44.604 --> 0:26:59.970 | |
Another use case is if you think about a multilingual | |
machine translation, for example if you are | |
0:26:59.970 --> 0:27:04.360 | |
now doing a pivot language. | |
0:27:04.524 --> 0:27:13.654 | |
But if you're first trying to English this | |
information might get lost and then you translate | |
0:27:13.654 --> 0:27:14.832 | |
to Spanish. | |
0:27:15.075 --> 0:27:21.509 | |
So while in general in this class there is | |
not this type of bias there,. | |
0:27:22.922 --> 0:27:28.996 | |
You might introduce it because you might have | |
good reasons for doing a modular system because | |
0:27:28.996 --> 0:27:31.968 | |
you don't have enough training data or so on. | |
0:27:31.968 --> 0:27:37.589 | |
It's performing better in average, but of | |
course by doing this choice you'll introduce | |
0:27:37.589 --> 0:27:40.044 | |
an additional type of bias into your. | |
0:27:45.805 --> 0:27:52.212 | |
And then there is what people refer to as | |
emergent bias, and that is, if you use a system | |
0:27:52.212 --> 0:27:58.903 | |
for a different use case as we see in, generally | |
it is the case that is performing worse, but | |
0:27:58.903 --> 0:28:02.533 | |
then of course you can have even more challenging. | |
0:28:02.942 --> 0:28:16.196 | |
So the extreme case would be if you train | |
a system only on male speakers, then of course | |
0:28:16.196 --> 0:28:22.451 | |
it will perform worse on female speakers. | |
0:28:22.902 --> 0:28:36.287 | |
So, of course, if you're doing this type of | |
problem, if you use a system for a different | |
0:28:36.287 --> 0:28:42.152 | |
situation where it was original, then. | |
0:28:44.004 --> 0:28:54.337 | |
And with this we would then go for type of | |
evaluation, but before we are looking at how | |
0:28:54.337 --> 0:28:56.333 | |
we can evaluate. | |
0:29:00.740 --> 0:29:12.176 | |
Before we want to look into how we can improve | |
the system, think yeah, maybe at the moment | |
0:29:12.176 --> 0:29:13.559 | |
most work. | |
0:29:13.954 --> 0:29:21.659 | |
And the one thing is the system trying to | |
look into stereotypes. | |
0:29:21.659 --> 0:29:26.164 | |
So how does a system use stereotypes? | |
0:29:26.466 --> 0:29:29.443 | |
So if you have the Hungarian sentence,. | |
0:29:29.729 --> 0:29:33.805 | |
Which should be he is an engineer or she is | |
an engineer. | |
0:29:35.375 --> 0:29:43.173 | |
And you cannot guess that because we saw that | |
he and she is not different in Hungary. | |
0:29:43.423 --> 0:29:57.085 | |
Then you can have a test set where you have | |
these type of ailanomal occupations. | |
0:29:56.977 --> 0:30:03.862 | |
You have statistics from how is the distribution | |
by gender so you can automatically generate | |
0:30:03.862 --> 0:30:04.898 | |
the sentence. | |
0:30:04.985 --> 0:30:21.333 | |
Then you could put in jobs which are mostly | |
done by a man and then you can check how is | |
0:30:21.333 --> 0:30:22.448 | |
your. | |
0:30:22.542 --> 0:30:31.315 | |
That is one type of evaluating stereotypes | |
that one of the most famous benchmarks called | |
0:30:31.315 --> 0:30:42.306 | |
vino is exactly: The second type of evaluation | |
is about gender preserving. | |
0:30:42.342 --> 0:30:51.201 | |
So that is exactly what we have seen beforehand. | |
0:30:51.201 --> 0:31:00.240 | |
If these information are not in the text itself,. | |
0:31:00.320 --> 0:31:01.875 | |
Gender as a speaker. | |
0:31:02.062 --> 0:31:04.450 | |
And how good does a system do that? | |
0:31:04.784 --> 0:31:09.675 | |
And we'll see there's, for example, one benchmark | |
on this. | |
0:31:09.675 --> 0:31:16.062 | |
For example: For Arabic there is one benchmark | |
on this foot: Audio because if you're now think | |
0:31:16.062 --> 0:31:16.781 | |
already of the. | |
0:31:17.157 --> 0:31:25.257 | |
From when we're talking about speech translation, | |
it might be interesting because in the speech | |
0:31:25.257 --> 0:31:32.176 | |
signal you should have a better guess on whether | |
it's a male or a female speaker. | |
0:31:32.432 --> 0:31:38.928 | |
So but mean current systems, mostly you can | |
always add, and they will just first transcribe. | |
0:31:42.562 --> 0:31:45.370 | |
Yes, so how do these benchmarks? | |
0:31:45.305 --> 0:31:51.356 | |
Look like that, the first one is here. | |
0:31:51.356 --> 0:32:02.837 | |
There's an occupation test where it looks | |
like a simple test set because. | |
0:32:03.023 --> 0:32:10.111 | |
So I've known either hurry him or pronounce | |
the name for a long time. | |
0:32:10.111 --> 0:32:13.554 | |
My friend works as an occupation. | |
0:32:13.833 --> 0:32:16.771 | |
So that is like all sentences in that look | |
like that. | |
0:32:17.257 --> 0:32:28.576 | |
So in this case you haven't had the biggest | |
work in here, which is friends. | |
0:32:28.576 --> 0:32:33.342 | |
So your only checking later is. | |
0:32:34.934 --> 0:32:46.981 | |
This can be inferred from whether it's her | |
or her or her, or if it's a proper name, so | |
0:32:46.981 --> 0:32:55.013 | |
can you infer it from the name, and then you | |
can compare. | |
0:32:55.115 --> 0:33:01.744 | |
So is this because the job description is | |
nearer to friend. | |
0:33:01.744 --> 0:33:06.937 | |
Does the system get disturbed by this type | |
of. | |
0:33:08.828 --> 0:33:14.753 | |
And there you can then automatically assess | |
yeah this type. | |
0:33:14.774 --> 0:33:18.242 | |
Of course, that's what said at the beginning. | |
0:33:18.242 --> 0:33:24.876 | |
You shouldn't only rely on that because if | |
you only rely on it you can easily trick the | |
0:33:24.876 --> 0:33:25.479 | |
system. | |
0:33:25.479 --> 0:33:31.887 | |
So one type of sentence is translated, but | |
of course it can give you very important. | |
0:33:33.813 --> 0:33:35.309 | |
Any questions yeah. | |
0:33:36.736 --> 0:33:44.553 | |
Much like the evaluation of stereotype, we | |
want the system to agree with stereotypes because | |
0:33:44.553 --> 0:33:46.570 | |
it increases precision. | |
0:33:46.786 --> 0:33:47.979 | |
No, no, no. | |
0:33:47.979 --> 0:33:53.149 | |
In this case, if we say oh yeah, he is an | |
engineer. | |
0:33:53.149 --> 0:34:01.600 | |
From the example, it's probably the most likely | |
translation, probably in more cases. | |
0:34:02.702 --> 0:34:08.611 | |
Now there is two things, so yeah yeah, so | |
there is two ways of evaluating. | |
0:34:08.611 --> 0:34:15.623 | |
The one thing is in this case he's using that | |
he's an engineer, but there is conflicting | |
0:34:15.623 --> 0:34:19.878 | |
information that in this case the engineer | |
is female. | |
0:34:20.380 --> 0:34:21.890 | |
So anything was. | |
0:34:22.342 --> 0:34:29.281 | |
Information yes, so that is the one in the | |
other case. | |
0:34:29.281 --> 0:34:38.744 | |
Typically it's not evaluated in that, but | |
in that time you really want it. | |
0:34:38.898 --> 0:34:52.732 | |
That's why most of those cases you have evaluated | |
in scenarios where you have context information. | |
0:34:53.453 --> 0:34:58.878 | |
How to deal with the other thing is even more | |
challenging to one case where it is the case | |
0:34:58.878 --> 0:35:04.243 | |
is what I said before is when it's about the | |
speaker so that the speech translation test. | |
0:35:04.584 --> 0:35:17.305 | |
And there they try to look in a way that can | |
you use, so use the audio also as input. | |
0:35:18.678 --> 0:35:20.432 | |
Yeah. | |
0:35:20.640 --> 0:35:30.660 | |
So if we have a reference where she is an | |
engineer okay, are there efforts to adjust | |
0:35:30.660 --> 0:35:37.497 | |
the metric so that our transmissions go into | |
the correct? | |
0:35:37.497 --> 0:35:38.676 | |
We don't. | |
0:35:38.618 --> 0:35:40.389 | |
Only done for mean this is evaluation. | |
0:35:40.389 --> 0:35:42.387 | |
You are not pushing the model for anything. | |
0:35:43.023 --> 0:35:53.458 | |
But if you want to do it in training, that | |
you're not doing it this way. | |
0:35:53.458 --> 0:35:58.461 | |
I'm not aware of any direct model. | |
0:35:58.638 --> 0:36:04.146 | |
Because you have to find out, is it known | |
in this scenario or not? | |
0:36:05.725 --> 0:36:12.622 | |
So at least I'm not aware of there's like | |
the directive doing training try to assess | |
0:36:12.622 --> 0:36:13.514 | |
more than. | |
0:36:13.813 --> 0:36:18.518 | |
Mean there is data augmentation in the way | |
that is done. | |
0:36:18.518 --> 0:36:23.966 | |
Think we'll have that later, so what you can | |
do is generate more. | |
0:36:24.144 --> 0:36:35.355 | |
You can do that automatically or there's ways | |
of biasing so that you can try to make your | |
0:36:35.355 --> 0:36:36.600 | |
training. | |
0:36:36.957 --> 0:36:46.228 | |
That's typically not done with focusing on | |
scenarios where you check before or do have | |
0:36:46.228 --> 0:36:47.614 | |
information. | |
0:36:49.990 --> 0:36:58.692 | |
Mean, but for everyone it's not clear and | |
agree with you in this scenario, the normal | |
0:36:58.692 --> 0:37:01.222 | |
evaluation system where. | |
0:37:01.341 --> 0:37:07.006 | |
Maybe you could say it shouldn't do always | |
the same but have a distribution like a training | |
0:37:07.006 --> 0:37:12.733 | |
data or something like that because otherwise | |
we're amplifying but that current system can't | |
0:37:12.733 --> 0:37:15.135 | |
do current systems can't predict both. | |
0:37:15.135 --> 0:37:17.413 | |
That's why we see all the beginning. | |
0:37:17.413 --> 0:37:20.862 | |
They have this extra interface where they | |
then propose. | |
0:37:24.784 --> 0:37:33.896 | |
Another thing is the vino empty system and | |
it started from a challenge set for co-reference | |
0:37:33.896 --> 0:37:35.084 | |
resolution. | |
0:37:35.084 --> 0:37:43.502 | |
Co-reference resolution means we have pear | |
on him and we need to find out what it's. | |
0:37:43.823 --> 0:37:53.620 | |
So you have the doctor off the nurse to help | |
her in the procedure, and now her does not | |
0:37:53.620 --> 0:37:55.847 | |
refer to the nurse. | |
0:37:56.556 --> 0:38:10.689 | |
And there you of course have the same type | |
of stewardesses and the same type of buyers | |
0:38:10.689 --> 0:38:15.237 | |
as the machine translation. | |
0:38:16.316 --> 0:38:25.165 | |
And no think that normally yeah mean maybe | |
that's also biased. | |
0:38:27.687 --> 0:38:37.514 | |
No, but if you ask somebody, I guess if you | |
ask somebody, then I mean syntectically it's | |
0:38:37.514 --> 0:38:38.728 | |
ambiguous. | |
0:38:38.918 --> 0:38:50.248 | |
If you ask somebody to help, then the horror | |
has to refer to that. | |
0:38:50.248 --> 0:38:54.983 | |
So it should also help the. | |
0:38:56.396 --> 0:38:57.469 | |
Of the time. | |
0:38:57.469 --> 0:39:03.906 | |
The doctor is female and says please have | |
me in the procedure, but the other. | |
0:39:04.904 --> 0:39:09.789 | |
Oh, you mean that it's helping the third person. | |
0:39:12.192 --> 0:39:16.140 | |
Yeah, agree that it could also be yes. | |
0:39:16.140 --> 0:39:19.077 | |
Don't know how easy that is. | |
0:39:19.077 --> 0:39:21.102 | |
Only know the test. | |
0:39:21.321 --> 0:39:31.820 | |
Then guess yeah, then you need a situation | |
context where you know the situation, the other | |
0:39:31.820 --> 0:39:34.589 | |
person having problems. | |
0:39:36.936 --> 0:39:42.251 | |
Yeah no yeah that is like here when there | |
is additional ambiguity in there. | |
0:39:45.465 --> 0:39:48.395 | |
See that pure text models is not always okay. | |
0:39:48.395 --> 0:39:51.134 | |
How full mean there is a lot of work also. | |
0:39:52.472 --> 0:40:00.119 | |
Will not cover that in the lecture, but there | |
are things like multimodal machine translation | |
0:40:00.119 --> 0:40:07.109 | |
where you try to add pictures or something | |
like that to have more context, and then. | |
0:40:10.370 --> 0:40:23.498 | |
Yeah, it starts with this, so in order to | |
evaluate that what it does is that you translate | |
0:40:23.498 --> 0:40:25.229 | |
the system. | |
0:40:25.305 --> 0:40:32.310 | |
It's doing stereotyping so the doctor is male | |
and the nurse is female. | |
0:40:32.492 --> 0:40:42.362 | |
And then you're using word alignment, and | |
then you check whether this gender maps with | |
0:40:42.362 --> 0:40:52.345 | |
the annotated gender of there, and that is | |
how you evaluate in this type of vino empty. | |
0:40:52.832 --> 0:40:59.475 | |
Mean, as you see, you're only focusing on | |
the situation where you can or where the gender | |
0:40:59.475 --> 0:41:00.214 | |
is known. | |
0:41:00.214 --> 0:41:06.930 | |
Why for this one you don't do any evaluation, | |
but because nurses can in that case be those | |
0:41:06.930 --> 0:41:08.702 | |
and you cannot, as has. | |
0:41:08.728 --> 0:41:19.112 | |
The benchmarks are at the moment designed | |
in a way that you only evaluate things that | |
0:41:19.112 --> 0:41:20.440 | |
are known. | |
0:41:23.243 --> 0:41:25.081 | |
Then yeah, you can have a look. | |
0:41:25.081 --> 0:41:28.931 | |
For example, here what people are looking | |
is you can do the first. | |
0:41:28.931 --> 0:41:32.149 | |
Oh well, the currency, how often does it do | |
it correct? | |
0:41:32.552 --> 0:41:41.551 | |
And there you see these numbers are a bit | |
older. | |
0:41:41.551 --> 0:41:51.835 | |
There's more work on that, but this is the | |
first color. | |
0:41:51.731 --> 0:42:01.311 | |
Because they do it like in this test, they | |
do it twice, one with him and one with her. | |
0:42:01.311 --> 0:42:04.834 | |
So the chance is fifty percent. | |
0:42:05.065 --> 0:42:12.097 | |
Except somehow here, the one system seems | |
to be quite good there that everything. | |
0:42:13.433 --> 0:42:30.863 | |
What you can also do is look at the difference, | |
where you need to predict female and the difference. | |
0:42:30.850 --> 0:42:40.338 | |
It's more often correct on the male forms | |
than on the female forms, and you see that | |
0:42:40.338 --> 0:42:43.575 | |
it's except for this system. | |
0:42:43.603 --> 0:42:53.507 | |
So would assume that they maybe in this one | |
language did some type of method in there. | |
0:42:55.515 --> 0:42:57.586 | |
If you are more often mean there is like. | |
0:42:58.178 --> 0:43:01.764 | |
It's not a lot lower, there's one. | |
0:43:01.764 --> 0:43:08.938 | |
I don't know why, but if you're always to | |
the same then it should be. | |
0:43:08.938 --> 0:43:14.677 | |
You seem to be counter intuitive, so maybe | |
it's better. | |
0:43:15.175 --> 0:43:18.629 | |
Don't know exactly how yes, but it's, it's | |
true. | |
0:43:19.019 --> 0:43:20.849 | |
Mean, there's very few cases. | |
0:43:20.849 --> 0:43:22.740 | |
I also don't know for Russian. | |
0:43:22.740 --> 0:43:27.559 | |
I mean, there is, I think, mainly for Russian | |
where you have very low numbers. | |
0:43:27.559 --> 0:43:30.183 | |
I mean, I would say like forty five or so. | |
0:43:30.183 --> 0:43:32.989 | |
There can be more about renting and sampling. | |
0:43:32.989 --> 0:43:37.321 | |
I don't know if they have even more gender | |
or if they have a new tool. | |
0:43:37.321 --> 0:43:38.419 | |
I don't think so. | |
0:43:40.040 --> 0:43:46.901 | |
Then you have typically even a stronger bias | |
here where you not do the differentiation between | |
0:43:46.901 --> 0:43:53.185 | |
how often is it correct for me and the female, | |
but you are distinguishing between the. | |
0:43:53.553 --> 0:44:00.503 | |
So you're here, for you can check for each | |
occupation, which is the most important. | |
0:44:00.440 --> 0:44:06.182 | |
A comment one based on statistics, and then | |
you take that on the one side and the anti | |
0:44:06.182 --> 0:44:12.188 | |
stereotypically on the other side, and you | |
see that not in all cases but in a lot of cases | |
0:44:12.188 --> 0:44:16.081 | |
that null probabilities are even higher than | |
on the other. | |
0:44:21.061 --> 0:44:24.595 | |
Ah, I'm telling you there's something. | |
0:44:28.668 --> 0:44:32.850 | |
But it has to be for a doctor. | |
0:44:32.850 --> 0:44:39.594 | |
For example, for a doctor there three don't | |
know. | |
0:44:40.780 --> 0:44:44.275 | |
Yeah, but guess here it's mainly imminent | |
job description. | |
0:44:44.275 --> 0:44:45.104 | |
So yeah, but. | |
0:44:50.050 --> 0:45:01.145 | |
And then there is the Arabic capital gender | |
corpus where it is about more assessing how | |
0:45:01.145 --> 0:45:03.289 | |
strong a singer. | |
0:45:03.483 --> 0:45:09.445 | |
How that is done is the open subtitles. | |
0:45:09.445 --> 0:45:18.687 | |
Corpus is like a corpus of subtitles generated | |
by volunteers. | |
0:45:18.558 --> 0:45:23.426 | |
For the Words Like I Mean Myself. | |
0:45:23.303 --> 0:45:30.670 | |
And mine, and then they annotated the Arabic | |
sentences, whether here I refer to as a female | |
0:45:30.670 --> 0:45:38.198 | |
and masculine, or whether it's ambiguous, and | |
then from the male and female one they generate | |
0:45:38.198 --> 0:45:40.040 | |
types of translations. | |
0:45:43.703 --> 0:45:51.921 | |
And then a bit more different test sets as | |
the last one that is referred to as the machine. | |
0:45:52.172 --> 0:45:57.926 | |
Corpus, which is based on these lectures. | |
0:45:57.926 --> 0:46:05.462 | |
In general, this lecture is very important | |
because it. | |
0:46:05.765 --> 0:46:22.293 | |
And here is also interesting because you also | |
have the obvious signal and it's done in the | |
0:46:22.293 --> 0:46:23.564 | |
worst. | |
0:46:23.763 --> 0:46:27.740 | |
In the first case is where it can only be | |
determined based on the speaker. | |
0:46:27.968 --> 0:46:30.293 | |
So something like am a good speaker. | |
0:46:30.430 --> 0:46:32.377 | |
You cannot do that correctly. | |
0:46:32.652 --> 0:46:36.970 | |
However, if you would have the audio signal | |
you should have a lot better guests. | |
0:46:37.257 --> 0:46:47.812 | |
So it wasn't evaluated, especially machine | |
translation and speech translation system, | |
0:46:47.812 --> 0:46:53.335 | |
which take this into account or, of course,. | |
0:46:57.697 --> 0:47:04.265 | |
The second thing is where you can do it based | |
on the context. | |
0:47:04.265 --> 0:47:08.714 | |
In this case we are not using artificial. | |
0:47:11.011 --> 0:47:15.550 | |
Cope from the from the real data, so it's | |
not like artificial creative data, but. | |
0:47:15.815 --> 0:47:20.939 | |
Of course, in a lot more work you have to | |
somehow find these in the corpus and use them | |
0:47:20.939 --> 0:47:21.579 | |
as a test. | |
0:47:21.601 --> 0:47:27.594 | |
Is something she got together with two of | |
her dearest friends, this older woman, and | |
0:47:27.594 --> 0:47:34.152 | |
then, of course, here friends can we get from | |
the context, but it might be that some systems | |
0:47:34.152 --> 0:47:36.126 | |
ignore that that should be. | |
0:47:36.256 --> 0:47:43.434 | |
So you have two test sets in there, two types | |
of benchmarks, and you want to determine which | |
0:47:43.434 --> 0:47:43.820 | |
one. | |
0:47:47.787 --> 0:47:55.801 | |
Yes, this is how we can evaluate it, so the | |
next question is how can we improve our systems | |
0:47:55.801 --> 0:48:03.728 | |
because that's normally how we do evaluation | |
and why we do evaluation so before we go into | |
0:48:03.728 --> 0:48:04.251 | |
that? | |
0:48:08.508 --> 0:48:22.685 | |
One idea is to do what is referred to as modeling, | |
so the idea is somehow change the model in | |
0:48:22.685 --> 0:48:24.495 | |
a way that. | |
0:48:24.965 --> 0:48:38.271 | |
And yes, one idea is, of course, if we are | |
giving him more information, the system doesn't | |
0:48:38.271 --> 0:48:44.850 | |
need to do a guess without this information. | |
0:48:44.724 --> 0:48:47.253 | |
In order to just ambiguate the bias,. | |
0:48:47.707 --> 0:48:59.746 | |
The first thing is you can do that on the | |
sentence level, for example, especially if | |
0:48:59.746 --> 0:49:03.004 | |
you have the speakers. | |
0:49:03.063 --> 0:49:12.518 | |
You can annotate the sentence with whether | |
a speaker is made or a female, and then you | |
0:49:12.518 --> 0:49:25.998 | |
can: Here we're seeing one thing which is very | |
successful in neuromachine translation and | |
0:49:25.998 --> 0:49:30.759 | |
other kinds of neural networks. | |
0:49:31.711 --> 0:49:39.546 | |
However, in neuromachine translation, since | |
we have no longer the strong correlation between | |
0:49:39.546 --> 0:49:47.043 | |
input and output, the nice thing is you can | |
normally put everything into your input, and | |
0:49:47.043 --> 0:49:50.834 | |
if you have enough data, it's well balanced. | |
0:49:51.151 --> 0:50:00.608 | |
So how you can do it here is you can add the | |
token here saying female or male if the speaker | |
0:50:00.608 --> 0:50:01.523 | |
is male. | |
0:50:01.881 --> 0:50:07.195 | |
So, of course, this is no longer for human | |
correct translation. | |
0:50:07.195 --> 0:50:09.852 | |
It's like female Madam because. | |
0:50:10.090 --> 0:50:22.951 | |
If you are doing the same thing then the translation | |
would not be to translate female but can use | |
0:50:22.951 --> 0:50:25.576 | |
it to disintegrate. | |
0:50:25.865 --> 0:50:43.573 | |
And so this type of tagging is a very commonly | |
used method in order to add more information. | |
0:50:47.107 --> 0:50:54.047 | |
So this is first of all a very good thing, | |
a very easy one. | |
0:50:54.047 --> 0:50:57.633 | |
You don't have to change your. | |
0:50:58.018 --> 0:51:04.581 | |
For example, has also been done if you think | |
about formality in German. | |
0:51:04.581 --> 0:51:11.393 | |
Whether you have to produce or, you can: We'll | |
see it on Thursday. | |
0:51:11.393 --> 0:51:19.628 | |
It's a very common approach for domains, so | |
you put in the domain beforehand. | |
0:51:19.628 --> 0:51:24.589 | |
This is from a Twitter or something like that. | |
0:51:24.904 --> 0:51:36.239 | |
Of course, it only learns it if it has seen | |
it and it dees them out, but in this case you | |
0:51:36.239 --> 0:51:38.884 | |
don't need an equal. | |
0:51:39.159 --> 0:51:42.593 | |
But however, it's still like challenging to | |
get this availability. | |
0:51:42.983 --> 0:51:55.300 | |
If you would do that on the first of all, | |
of course, it only works if you really have | |
0:51:55.300 --> 0:52:02.605 | |
data from speaking because otherwise it's unclear. | |
0:52:02.642 --> 0:52:09.816 | |
You would only have the text and you would | |
not easily see whether it is the mayor or the | |
0:52:09.816 --> 0:52:14.895 | |
female speaker because this information has | |
been removed from. | |
0:52:16.456 --> 0:52:18.745 | |
Does anybody of you have an idea of how it | |
fits? | |
0:52:20.000 --> 0:52:25.480 | |
Manage that and still get the data of whether | |
it's made or not speaking. | |
0:52:32.152 --> 0:52:34.270 | |
Can do a small trick. | |
0:52:34.270 --> 0:52:37.834 | |
We can just look on the target side. | |
0:52:37.937 --> 0:52:43.573 | |
Mean this is, of course, only important if | |
in the target side this is the case. | |
0:52:44.004 --> 0:52:50.882 | |
So for your training data you can irritate | |
it based on your target site in German you | |
0:52:50.882 --> 0:52:51.362 | |
know. | |
0:52:51.362 --> 0:52:58.400 | |
In German you don't know but in Spanish for | |
example you know because different and then | |
0:52:58.400 --> 0:53:00.400 | |
you can use grammatical. | |
0:53:00.700 --> 0:53:10.964 | |
Of course, the test day would still need to | |
do that more interface decision. | |
0:53:13.954 --> 0:53:18.829 | |
And: You can, of course, do it even more advanced. | |
0:53:18.898 --> 0:53:30.659 | |
You can even try to add these information | |
to each word, so you're not doing it for the | |
0:53:30.659 --> 0:53:32.687 | |
full sentence. | |
0:53:32.572 --> 0:53:42.129 | |
If it's unknown, if it's female or if it's | |
male, you know word alignment so you can't | |
0:53:42.129 --> 0:53:42.573 | |
do. | |
0:53:42.502 --> 0:53:55.919 | |
Here then you can do a word alignment, which | |
is of course not always perfect, but roughly | |
0:53:55.919 --> 0:53:59.348 | |
then you can annotate. | |
0:54:01.401 --> 0:54:14.165 | |
Now you have these type of inputs where you | |
have one information per word, but on the one | |
0:54:14.165 --> 0:54:16.718 | |
end you have the. | |
0:54:17.517 --> 0:54:26.019 | |
This has been used before in other scenarios, | |
so you might not put in the gender, but in | |
0:54:26.019 --> 0:54:29.745 | |
general this can be other information. | |
0:54:30.090 --> 0:54:39.981 | |
And people refer to that or have used that | |
as a factored translation model, so what you | |
0:54:39.981 --> 0:54:42.454 | |
may do is you factor. | |
0:54:42.742 --> 0:54:45.612 | |
You have the word itself. | |
0:54:45.612 --> 0:54:48.591 | |
You might have the gender. | |
0:54:48.591 --> 0:54:55.986 | |
You could have more information like don't | |
know the paddle speech. | |
0:54:56.316 --> 0:54:58.564 | |
And then you have an embedding for each of | |
them. | |
0:54:59.199 --> 0:55:03.599 | |
And you congratulate them, and then you have | |
years of congratulated a bedding. | |
0:55:03.563 --> 0:55:09.947 | |
Which says okay, this is a female plumber | |
or a male plumber or so on. | |
0:55:09.947 --> 0:55:18.064 | |
This has additional information and then you | |
can train this factory model where you have | |
0:55:18.064 --> 0:55:22.533 | |
the ability to give the model extra information. | |
0:55:23.263 --> 0:55:35.702 | |
And of course now if you are training this | |
way directly you always need to have this information. | |
0:55:36.576 --> 0:55:45.396 | |
So that might not be the best way if you want | |
to use a translation system and sometimes don't | |
0:55:45.396 --> 0:55:45.959 | |
have. | |
0:55:46.866 --> 0:55:57.987 | |
So any idea of how you can train it or what | |
machine learning technique you can use to deal | |
0:55:57.987 --> 0:55:58.720 | |
with. | |
0:56:03.263 --> 0:56:07.475 | |
Mainly despite it already, many of your things. | |
0:56:14.154 --> 0:56:21.521 | |
Drop out so you sometimes put information | |
in there and then you can use dropouts to inputs. | |
0:56:21.861 --> 0:56:27.599 | |
Is sometimes put in this information in there, | |
sometimes not, and the system is then able | |
0:56:27.599 --> 0:56:28.874 | |
to deal with those. | |
0:56:28.874 --> 0:56:34.803 | |
If it doesn't have the information, it's doing | |
some of the best it can do, but if it has the | |
0:56:34.803 --> 0:56:39.202 | |
information, it can use the information and | |
maybe do a more rounded. | |
0:56:46.766 --> 0:56:52.831 | |
So then there is, of course, more ways to | |
try to do a moderately biased one. | |
0:56:52.993 --> 0:57:01.690 | |
We will only want to mention here because | |
you'll have a full lecture on that next week | |
0:57:01.690 --> 0:57:08.188 | |
and that is referred to where context based | |
machine translation. | |
0:57:08.728 --> 0:57:10.397 | |
Good, and in this other ones, but. | |
0:57:10.750 --> 0:57:16.830 | |
If you translate several sentences well, of | |
course, there are more situations where you | |
0:57:16.830 --> 0:57:17.866 | |
can dissemble. | |
0:57:18.118 --> 0:57:23.996 | |
Because it might be that the information is | |
not in the current sentence, but it's in the | |
0:57:23.996 --> 0:57:25.911 | |
previous sentence or before. | |
0:57:26.967 --> 0:57:33.124 | |
If you have the mean with the speaker maybe | |
not, but if it's referring to, you can core | |
0:57:33.124 --> 0:57:33.963 | |
references. | |
0:57:34.394 --> 0:57:40.185 | |
They are often referring to things in the | |
previous sentence so you can use them in order | |
0:57:40.185 --> 0:57:44.068 | |
to: And that can be done basically and very | |
easy. | |
0:57:44.068 --> 0:57:47.437 | |
You'll see more advanced options, but the | |
main. | |
0:57:48.108 --> 0:57:58.516 | |
Mean, no machine translation is a sequence | |
to sequence model, which can use any input | |
0:57:58.516 --> 0:58:02.993 | |
sequence to output sequence mapping. | |
0:58:02.993 --> 0:58:04.325 | |
So now at. | |
0:58:04.484 --> 0:58:11.281 | |
So then you can do, for example, five to five | |
translations, or also five to one, or so there's. | |
0:58:11.811 --> 0:58:19.211 | |
This is not a method like only dedicated to | |
buying, of course, but the hope is. | |
0:58:19.139 --> 0:58:25.534 | |
If you're using this because I mean bias often, | |
we have seen that it rises in situations where | |
0:58:25.534 --> 0:58:27.756 | |
we're not having enough context. | |
0:58:27.756 --> 0:58:32.940 | |
So the idea is if we generally increase our | |
context, it will also help this. | |
0:58:32.932 --> 0:58:42.378 | |
Of course, it will help other situations where | |
you need context to disintegrate. | |
0:58:43.603 --> 0:58:45.768 | |
Get There If You're Saying I'm Going to the | |
Bank. | |
0:58:46.286 --> 0:58:54.761 | |
It's not directly from this sentence clear | |
whether it's the finance institute or the bank | |
0:58:54.761 --> 0:58:59.093 | |
for sitting, but maybe if you say afterward,. | |
0:59:02.322 --> 0:59:11.258 | |
And then there is in generally a very large | |
amount of work on debiasing the word embelling. | |
0:59:11.258 --> 0:59:20.097 | |
So the one I hear like, I mean, I think that | |
partly comes from the fact that like a first. | |
0:59:21.041 --> 0:59:26.925 | |
Or that first research was done often on inspecting | |
the word embeddings and seeing whether they | |
0:59:26.925 --> 0:59:32.503 | |
are biased or not, and people found out how | |
there is some bias in there, and then the idea | |
0:59:32.503 --> 0:59:38.326 | |
is oh, if you remove them from the word embedded | |
in already, then maybe your system later will | |
0:59:38.326 --> 0:59:39.981 | |
not have that strong of a. | |
0:59:40.520 --> 0:59:44.825 | |
So how can that work? | |
0:59:44.825 --> 0:59:56.369 | |
Or like maybe first, how do words encounter | |
bias in there? | |
0:59:56.369 --> 0:59:57.152 | |
So. | |
0:59:57.137 --> 1:00:05.555 | |
So you can look at the word embedding, and | |
then you can compare the distance of the word | |
1:00:05.555 --> 1:00:11.053 | |
compared: And there's like interesting findings. | |
1:00:11.053 --> 1:00:18.284 | |
For example, you have the difference in occupation | |
and how similar. | |
1:00:18.678 --> 1:00:33.068 | |
And of course it's not a perfect correlation, | |
but you see some type of correlation: jobs | |
1:00:33.068 --> 1:00:37.919 | |
which have a high occupation. | |
1:00:37.797 --> 1:00:41.387 | |
They also are more similar to the word what | |
we're going to be talking about. | |
1:00:43.023 --> 1:00:50.682 | |
Maybe a secretary is also a bit difficult, | |
but because yeah maybe it's more often. | |
1:00:50.610 --> 1:00:52.438 | |
Done in general by by women. | |
1:00:52.438 --> 1:00:58.237 | |
However, there is a secretary like the Secretary | |
of State or so, the German minister, which | |
1:00:58.237 --> 1:01:03.406 | |
I of course know that many so in the statistics | |
they are not counting that often. | |
1:01:03.543 --> 1:01:11.576 | |
But in data they of course cook quite often, | |
so there's different ways of different meanings. | |
1:01:14.154 --> 1:01:23.307 | |
So how can you not try to remove this type | |
of bias? | |
1:01:23.307 --> 1:01:32.988 | |
One way is the idea of hearts, devices and | |
embeddings. | |
1:01:33.113 --> 1:01:39.354 | |
So if you remember on word embeddings think | |
we have this image that you can do the difference | |
1:01:39.354 --> 1:01:44.931 | |
between man and woman and add this difference | |
to king and then look at your screen. | |
1:01:45.865 --> 1:01:57.886 | |
So here's the idea we want to remove this | |
gender information from some things which should | |
1:01:57.886 --> 1:02:00.132 | |
not have gender. | |
1:02:00.120 --> 1:02:01.386 | |
The word engineer. | |
1:02:01.386 --> 1:02:06.853 | |
There is no information about the gender in | |
that, so you should remove this type. | |
1:02:07.347 --> 1:02:16.772 | |
Of course, you first need to find out where | |
these inflammations are and you can. | |
1:02:17.037 --> 1:02:23.603 | |
However, normally if you do the difference | |
like the subspace by only one example, it's | |
1:02:23.603 --> 1:02:24.659 | |
not the best. | |
1:02:24.924 --> 1:02:31.446 | |
So you can do the same thing for things like | |
brother and sister, man and dad, and then you | |
1:02:31.446 --> 1:02:38.398 | |
can somehow take the average of these differences | |
saying this is a vector which maps a male from | |
1:02:38.398 --> 1:02:39.831 | |
to the female form. | |
1:02:40.660 --> 1:02:50.455 | |
And then you can try to neutralize this gender | |
information on this dimension. | |
1:02:50.490 --> 1:02:57.951 | |
You can find it's subspace or dimensional. | |
1:02:57.951 --> 1:03:08.882 | |
It would be a line, but now this is dimensional, | |
and then you. | |
1:03:08.728 --> 1:03:13.104 | |
Representation: Where you remove this type | |
of embellishment. | |
1:03:15.595 --> 1:03:18.178 | |
This is, of course, quite strong of the questions. | |
1:03:18.178 --> 1:03:19.090 | |
How good does it? | |
1:03:19.090 --> 1:03:20.711 | |
Thanks tell them for one other. | |
1:03:20.880 --> 1:03:28.256 | |
But it's an idea we are trying to after learning | |
before we are using the Word and Banks for | |
1:03:28.256 --> 1:03:29.940 | |
machine translation. | |
1:03:29.940 --> 1:03:37.315 | |
We are trying to remove the gender information | |
from the jobs and then have a representation | |
1:03:37.315 --> 1:03:38.678 | |
which hopefully. | |
1:03:40.240 --> 1:03:45.047 | |
Similar idea is the one of agenda neutral | |
glove. | |
1:03:45.047 --> 1:03:50.248 | |
Glove is another technique to learn word embeddings. | |
1:03:50.750 --> 1:03:52.870 | |
Think we discussed one shortly. | |
1:03:52.870 --> 1:03:56.182 | |
It was too back, which was some of the first | |
one. | |
1:03:56.456 --> 1:04:04.383 | |
But there are other of course methods how | |
you can train word embeddings and glove as | |
1:04:04.383 --> 1:04:04.849 | |
one. | |
1:04:04.849 --> 1:04:07.460 | |
The idea is we're training. | |
1:04:07.747 --> 1:04:19.007 | |
At least this is somehow a bit separated, | |
so where you have part of the vector is gender | |
1:04:19.007 --> 1:04:20.146 | |
neutral. | |
1:04:20.300 --> 1:04:29.247 | |
What you need therefore is three sets of words, | |
so you have male words and you have words. | |
1:04:29.769 --> 1:04:39.071 | |
And then you're trying to learn some type | |
of vector where some dimensions are not. | |
1:04:39.179 --> 1:04:51.997 | |
So the idea is can learn a representation | |
where at least know that this part is gender | |
1:04:51.997 --> 1:04:56.123 | |
neutral and the other part. | |
1:05:00.760 --> 1:05:03.793 | |
How can we do that? | |
1:05:03.793 --> 1:05:12.435 | |
How can we change the system to learn anything | |
specific? | |
1:05:12.435 --> 1:05:20.472 | |
Nearly in all cases this works by the loss | |
function. | |
1:05:20.520 --> 1:05:26.206 | |
And that is more a general approach in machine | |
translation. | |
1:05:26.206 --> 1:05:30.565 | |
The general loss function is we are learning. | |
1:05:31.111 --> 1:05:33.842 | |
Here is the same idea. | |
1:05:33.842 --> 1:05:44.412 | |
You have the general loss function in order | |
to learn good embeddings and then you try to | |
1:05:44.412 --> 1:05:48.687 | |
introduce additional loss function. | |
1:05:48.969 --> 1:05:58.213 | |
Yes, I think yes, yes, that's the solution, | |
and how you make sure that if I have training | |
1:05:58.213 --> 1:06:07.149 | |
for all nurses of email, how do you make sure | |
that the algorithm puts it into neutral? | |
1:06:07.747 --> 1:06:12.448 | |
And you need, so this is like for only the | |
first learning of word embeddings. | |
1:06:12.448 --> 1:06:18.053 | |
Then the idea is if you have word embeddings | |
where the gender is separate and then you train | |
1:06:18.053 --> 1:06:23.718 | |
on top of that machine translation where you | |
don't change the embeddings, it should hopefully | |
1:06:23.718 --> 1:06:25.225 | |
be less and less biased. | |
1:06:25.865 --> 1:06:33.465 | |
And in order to train that yes you need additional | |
information so these information need to be | |
1:06:33.465 --> 1:06:40.904 | |
hence defined and they can't be general so | |
you need to have a list of these are male persons | |
1:06:40.904 --> 1:06:44.744 | |
or males these are nouns for females and these. | |
1:06:49.429 --> 1:06:52.575 | |
So the first step, of course, we still want | |
to have good word inventings. | |
1:06:54.314 --> 1:07:04.100 | |
So you have the normal objective function | |
of the word embedding. | |
1:07:04.100 --> 1:07:09.519 | |
It's something like the similarity. | |
1:07:09.849 --> 1:07:19.751 | |
How it's exactly derived is not that important | |
because we're not interested in love itself, | |
1:07:19.751 --> 1:07:23.195 | |
but you have any loss function. | |
1:07:23.195 --> 1:07:26.854 | |
Of course, you have to keep that. | |
1:07:27.167 --> 1:07:37.481 | |
And then there's three more lost functions | |
that you can add: So the one is you take the | |
1:07:37.481 --> 1:07:51.341 | |
average value of all the male words and the | |
average word embedding of all the female words. | |
1:07:51.731 --> 1:08:00.066 | |
So the good thing about this is we don't always | |
need to have for one word the male and the | |
1:08:00.066 --> 1:08:05.837 | |
female worship, so it's only like we have a | |
set of male words. | |
1:08:06.946 --> 1:08:21.719 | |
So this is just saying yeah, we want these | |
two should be somehow similar to each other. | |
1:08:21.719 --> 1:08:25.413 | |
It shouldn't be that. | |
1:08:30.330 --> 1:08:40.081 | |
Should be the other one, or think this should | |
be it. | |
1:08:40.081 --> 1:08:45.969 | |
This is agenda, the average of. | |
1:08:45.945 --> 1:09:01.206 | |
The average should be the same, but if you're | |
looking at the female should be at the other. | |
1:09:01.681 --> 1:09:06.959 | |
This is like on these dimensions, the male | |
should be on the one and the female on the | |
1:09:06.959 --> 1:09:07.388 | |
other. | |
1:09:07.627 --> 1:09:16.123 | |
The same yeah, this gender information should | |
be there, so you're pushing all the males to | |
1:09:16.123 --> 1:09:17.150 | |
the other. | |
1:09:21.541 --> 1:09:23.680 | |
Then their words should be. | |
1:09:23.680 --> 1:09:30.403 | |
If you have that you see the neutral words, | |
they should be in the middle of between the | |
1:09:30.403 --> 1:09:32.008 | |
male and the female. | |
1:09:32.012 --> 1:09:48.261 | |
So you say is the middle point between all | |
male and female words and just somehow putting | |
1:09:48.261 --> 1:09:51.691 | |
the neutral words. | |
1:09:52.912 --> 1:09:56.563 | |
And then you're learning them, and then you | |
can apply them in different ways. | |
1:09:57.057 --> 1:10:03.458 | |
So you have this a bit in the pre-training | |
thing. | |
1:10:03.458 --> 1:10:10.372 | |
You can use the pre-trained inbeddings on | |
the output. | |
1:10:10.372 --> 1:10:23.117 | |
All you can use are: And then you can analyze | |
what happens instead of training them directly. | |
1:10:23.117 --> 1:10:30.504 | |
If have this additional loss, which tries | |
to optimize. | |
1:10:32.432 --> 1:10:42.453 | |
And then it was evaluated exactly on the sentences | |
we had at the beginning where it is about know | |
1:10:42.453 --> 1:10:44.600 | |
her for a long time. | |
1:10:44.600 --> 1:10:48.690 | |
My friend works as an accounting cling. | |
1:10:48.788 --> 1:10:58.049 | |
So all these examples are not very difficult | |
to translation, but the question is how often | |
1:10:58.049 --> 1:10:58.660 | |
does? | |
1:11:01.621 --> 1:11:06.028 | |
That it's not that complicated as you see | |
here, so even the baseline. | |
1:11:06.366 --> 1:11:10.772 | |
If you're doing nothing is working quite well, | |
it's most challenging. | |
1:11:10.772 --> 1:11:16.436 | |
It seems overall in the situation where it's | |
a name, so for he and him he has learned the | |
1:11:16.436 --> 1:11:22.290 | |
correlation because that's maybe not surprisingly | |
because this correlation occurs more often | |
1:11:22.290 --> 1:11:23.926 | |
than with any name there. | |
1:11:24.044 --> 1:11:31.749 | |
If you have a name that you can extract, that | |
is talking about Mary, that's female is a lot | |
1:11:31.749 --> 1:11:34.177 | |
harder to extract than this. | |
1:11:34.594 --> 1:11:40.495 | |
So you'll see already in the bass line this | |
is yeah, not working, not working. | |
1:11:43.403 --> 1:11:47.159 | |
And for all the other cases it's working very | |
well. | |
1:11:47.787 --> 1:11:53.921 | |
Where all the best one is achieved here with | |
an arc debiasing both on the encoder, on the. | |
1:11:57.077 --> 1:12:09.044 | |
It makes sense that a hard debasing on the | |
decoder doesn't really work because there you | |
1:12:09.044 --> 1:12:12.406 | |
have gender information. | |
1:12:14.034 --> 1:12:17.406 | |
For glove it seems to already work here. | |
1:12:17.406 --> 1:12:20.202 | |
That's maybe surprising and yeah. | |
1:12:20.260 --> 1:12:28.263 | |
So there is no clear else we don't have numbers | |
for that doesn't really work well on the other. | |
1:12:28.263 --> 1:12:30.513 | |
So how much do I use then? | |
1:12:33.693 --> 1:12:44.720 | |
Then as a last way of improving that is a | |
bit what we had mentioned before. | |
1:12:44.720 --> 1:12:48.493 | |
That is what is referred. | |
1:12:48.488 --> 1:12:59.133 | |
One problem is the bias in the data so you | |
can adapt your data so you can just try to | |
1:12:59.133 --> 1:13:01.485 | |
find equal amount. | |
1:13:01.561 --> 1:13:11.368 | |
In your data like you adapt your data and | |
then you find your data on the smaller but | |
1:13:11.368 --> 1:13:12.868 | |
you can try. | |
1:13:18.298 --> 1:13:19.345 | |
This is line okay. | |
1:13:19.345 --> 1:13:21.605 | |
We have access to the data to the model. | |
1:13:21.605 --> 1:13:23.038 | |
We can improve our model. | |
1:13:24.564 --> 1:13:31.328 | |
One situation we haven't talked a lot about | |
but another situation might also be and that's | |
1:13:31.328 --> 1:13:37.942 | |
even getting more important is oh you want | |
to work with a model which you don't have but | |
1:13:37.942 --> 1:13:42.476 | |
you want to improve the model without having | |
access so when. | |
1:13:42.862 --> 1:13:49.232 | |
Nowadays there are a lot of companies who | |
are not developing their own system but they're | |
1:13:49.232 --> 1:13:52.983 | |
using or something like that or machine translation. | |
1:13:53.313 --> 1:13:59.853 | |
So there is interest that you might not be | |
able to find children with models completely. | |
1:14:00.080 --> 1:14:09.049 | |
So the question is, can you do some type of | |
black box adaptation of a system that takes | |
1:14:09.049 --> 1:14:19.920 | |
the black box system but tries to improve it | |
in some ways through: There's some ways of | |
1:14:19.920 --> 1:14:21.340 | |
doing that. | |
1:14:21.340 --> 1:14:30.328 | |
One is called black box injection and that's | |
what is referred to as prompt. | |
1:14:30.730 --> 1:14:39.793 | |
So the problem is if you have sentences you | |
don't have information about the speakers. | |
1:14:39.793 --> 1:14:43.127 | |
So how can you put information? | |
1:14:43.984 --> 1:14:53.299 | |
And what we know from a large language model, | |
we just prompt them, and you can do that. | |
1:14:53.233 --> 1:14:59.545 | |
Translating directly, I love you, you said | |
she said to him, I love you, and then of course | |
1:14:59.545 --> 1:15:01.210 | |
you have to strip away. | |
1:15:01.181 --> 1:15:06.629 | |
I mean, you cannot prevent the model from | |
translating that, but you should be able to | |
1:15:06.629 --> 1:15:08.974 | |
see what is the translation of this. | |
1:15:08.974 --> 1:15:14.866 | |
One can strip that away, and now the system | |
had hopefully the information that it's somebody | |
1:15:14.866 --> 1:15:15.563 | |
like that. | |
1:15:15.563 --> 1:15:17.020 | |
The speaker is female. | |
1:15:18.198 --> 1:15:23.222 | |
Because you're no longer translating love | |
you, but you're translating the sentence she | |
1:15:23.222 --> 1:15:24.261 | |
said to him love. | |
1:15:24.744 --> 1:15:37.146 | |
And so you insert this information as contextual | |
information around it and don't have to change | |
1:15:37.146 --> 1:15:38.567 | |
the model. | |
1:15:41.861 --> 1:15:56.946 | |
Last idea is to do what is referred to as | |
letters rescoring, so the idea there is you | |
1:15:56.946 --> 1:16:01.156 | |
generate a translation. | |
1:16:01.481 --> 1:16:18.547 | |
And now you have an additional component which | |
tries to add possibilities where gender information | |
1:16:18.547 --> 1:16:21.133 | |
might be lost. | |
1:16:21.261 --> 1:16:29.687 | |
It's just a graph in this way, a simplified | |
graph where there's always one word between | |
1:16:29.687 --> 1:16:31.507 | |
two notes and you. | |
1:16:31.851 --> 1:16:35.212 | |
So you have something like Zi is an ads or | |
a Zi is an ads. | |
1:16:35.535 --> 1:16:41.847 | |
And then you can generate all possible variants. | |
1:16:41.847 --> 1:16:49.317 | |
Then, of course, we're not done because the | |
final output. | |
1:16:50.530 --> 1:16:56.999 | |
Then you can re-score the system by a gender | |
de-biased model. | |
1:16:56.999 --> 1:17:03.468 | |
So the nice thing is why why don't we directly | |
use our model? | |
1:17:03.468 --> 1:17:10.354 | |
The idea is our model, which is only focusing | |
on gender devising. | |
1:17:10.530 --> 1:17:16.470 | |
It can be, for example, if it's just trained | |
on some synthetical data, it will not be that | |
1:17:16.470 --> 1:17:16.862 | |
well. | |
1:17:16.957 --> 1:17:21.456 | |
But what we can do then is now you can rescore | |
the possible translations in here. | |
1:17:21.721 --> 1:17:31.090 | |
And here the cases of course in general structure | |
is already done how to translate the words. | |
1:17:31.051 --> 1:17:42.226 | |
Then you're only using the second component | |
in order to react for some variants and then | |
1:17:42.226 --> 1:17:45.490 | |
get the best translation. | |
1:17:45.925 --> 1:17:58.479 | |
And: As the last one there is the post processing | |
so you can't have it. | |
1:17:58.538 --> 1:18:02.830 | |
Mean this was one way of post-processing was | |
to generate the lattice and retranslate it. | |
1:18:03.123 --> 1:18:08.407 | |
But you can also have a processing, for example | |
only on the target side where you have additional | |
1:18:08.407 --> 1:18:12.236 | |
components with checks about the gender which | |
maybe only knows gender. | |
1:18:12.236 --> 1:18:17.089 | |
So it's not a machine translation component | |
but more like a grammatical checker which can | |
1:18:17.089 --> 1:18:19.192 | |
be used as most processing to do that. | |
1:18:19.579 --> 1:18:22.926 | |
Think about it a bit like when you use PPT. | |
1:18:22.926 --> 1:18:25.892 | |
There's also a lot of post processing. | |
1:18:25.892 --> 1:18:32.661 | |
If you use a directive, it would tell you | |
how to build a bond, but they have some checks | |
1:18:32.661 --> 1:18:35.931 | |
either before and after to prevent things. | |
1:18:36.356 --> 1:18:40.580 | |
So often there might be an application system. | |
1:18:40.580 --> 1:18:44.714 | |
There might be extra pre and post processing. | |
1:18:48.608 --> 1:18:52.589 | |
And yeah, with this we're at the end of. | |
1:18:52.512 --> 1:19:09.359 | |
To this lecture where we focused on the bias, | |
but think a lot of these techniques we have | |
1:19:09.359 --> 1:19:11.418 | |
seen here. | |
1:19:11.331 --> 1:19:17.664 | |
So we saw, on the one hand, we saw that evaluating | |
just pure blues first might not always be. | |
1:19:17.677 --> 1:19:18.947 | |
Mean it's very important. | |
1:19:20.000 --> 1:19:30.866 | |
Always do that, but if you want to check and | |
some specific things are important, then you | |
1:19:30.866 --> 1:19:35.696 | |
might have to do dedicated evaluations. | |
1:19:36.036 --> 1:19:44.296 | |
It is now translating for the President and | |
it is like in German that guess it is not very | |
1:19:44.296 --> 1:19:45.476 | |
appropriate. | |
1:19:45.785 --> 1:19:53.591 | |
So it might be important if characteristics | |
of your system are essential to have dedicated | |
1:19:53.591 --> 1:19:54.620 | |
evaluation. | |
1:19:55.135 --> 1:20:02.478 | |
And then if you have that, of course, it might | |
be also important to develop delicate techniques. | |
1:20:02.862 --> 1:20:10.988 | |
We have seen today some how to mitigate biases, | |
but I hope you see that a lot of these techniques | |
1:20:10.988 --> 1:20:13.476 | |
you can also use to mitigate. | |
1:20:13.573 --> 1:20:31.702 | |
At least related things you can adjust the | |
training data you can do for other things. | |
1:20:33.253 --> 1:20:36.022 | |
Before we have been finishing, we have any | |
more questions. | |
1:20:41.761 --> 1:20:47.218 | |
Then thanks a lot, and then we will see each | |
other again on the first step. | |