File size: 75,027 Bytes
cb71ef5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482
2483
2484
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
2530
2531
2532
2533
2534
2535
2536
2537
2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549
2550
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
2566
2567
2568
2569
2570
2571
2572
2573
2574
2575
2576
2577
2578
2579
2580
2581
2582
2583
2584
2585
2586
2587
2588
2589
2590
2591
2592
2593
2594
2595
2596
2597
2598
2599
2600
2601
2602
2603
2604
2605
2606
2607
2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
2619
2620
2621
2622
2623
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641
2642
2643
2644
2645
2646
2647
2648
2649
2650
2651
2652
2653
2654
2655
2656
2657
2658
2659
2660
2661
2662
2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
2676
2677
2678
2679
2680
2681
2682
2683
2684
2685
2686
2687
2688
2689
2690
2691
2692
2693
2694
2695
2696
2697
2698
2699
2700
2701
2702
2703
2704
2705
2706
2707
2708
2709
2710
2711
2712
2713
2714
2715
2716
2717
2718
2719
2720
2721
2722
2723
2724
2725
2726
2727
2728
2729
2730
2731
2732
2733
2734
2735
2736
2737
2738
2739
2740
2741
2742
2743
2744
2745
2746
2747
2748
2749
2750
2751
2752
2753
2754
2755
2756
2757
2758
2759
2760
2761
2762
2763
2764
2765
2766
2767
2768
2769
2770
2771
2772
2773
2774
2775
2776
2777
2778
2779
2780
2781
2782
2783
2784
2785
2786
2787
2788
2789
2790
2791
2792
2793
2794
2795
2796
2797
2798
2799
2800
2801
2802
2803
2804
2805
2806
2807
2808
2809
2810
2811
2812
2813
2814
2815
2816
2817
2818
2819
2820
2821
2822
2823
2824
2825
2826
2827
2828
2829
2830
2831
2832
2833
2834
2835
2836
2837
2838
2839
2840
2841
2842
2843
2844
2845
2846
2847
2848
2849
2850
2851
2852
2853
2854
2855
2856
2857
2858
2859
2860
2861
2862
2863
2864
2865
2866
2867
2868
2869
2870
2871
2872
2873
2874
2875
2876
2877
2878
2879
2880
2881
2882
2883
2884
2885
2886
2887
2888
2889
2890
2891
2892
2893
2894
2895
2896
2897
2898
2899
2900
2901
2902
2903
2904
2905
2906
2907
2908
2909
2910
2911
2912
2913
2914
2915
2916
2917
2918
2919
2920
2921
2922
2923
2924
2925
2926
2927
2928
2929
2930
2931
2932
2933
2934
2935
2936
2937
2938
2939
2940
2941
2942
2943
2944
2945
2946
2947
2948
2949
2950
2951
2952
2953
2954
2955
2956
2957
2958
2959
2960
2961
2962
2963
2964
2965
2966
2967
2968
2969
2970
2971
2972
2973
2974
2975
2976
2977
2978
2979
2980
2981
2982
2983
2984
2985
WEBVTT

0:00:01.561 --> 0:00:05.186
Okay So Um.

0:00:08.268 --> 0:00:17.655
Welcome to today's presentation of the second
class and machine translation where we'll today

0:00:17.655 --> 0:00:25.044
do a bit of a specific topic and we'll talk
about linguistic backgrounds.

0:00:26.226 --> 0:00:34.851
Will cover their three different parts of
the lecture.

0:00:35.615 --> 0:00:42.538
We'll do first a very, very brief introduction
about linguistic background in a way that what

0:00:42.538 --> 0:00:49.608
is language, what are ways of describing language,
what are a bit serious behind it, very, very

0:00:49.608 --> 0:00:50.123
short.

0:00:50.410 --> 0:00:57.669
Don't know some of you have listened, think
to NLP in the last semester or so.

0:00:58.598 --> 0:01:02.553
So there we did a lot longer explanation.

0:01:02.553 --> 0:01:08.862
Here is just because we are not talking about
machine translation.

0:01:09.109 --> 0:01:15.461
So it's really focused on the parts which
are important when we talk about machine translation.

0:01:15.755 --> 0:01:19.377
Though for everybody who has listened to that
already, it's a bit of a repetition.

0:01:19.377 --> 0:01:19.683
Maybe.

0:01:19.980 --> 0:01:23.415
But it's really trying to look.

0:01:23.415 --> 0:01:31.358
These are properties of languages and how
can they influence translation.

0:01:31.671 --> 0:01:38.928
We'll use that in the second part to discuss
why is machine translation more from what we

0:01:38.928 --> 0:01:40.621
know about language.

0:01:40.940 --> 0:01:47.044
We will see that I mean there's two main things
is that the language might express ideas and

0:01:47.044 --> 0:01:53.279
information differently, and if they are expressed
different in different languages we have to

0:01:53.279 --> 0:01:54.920
do somehow the transfer.

0:01:55.135 --> 0:02:02.771
And it's not purely that we know there's words
used for it, but it's not that simple and very

0:02:02.771 --> 0:02:03.664
different.

0:02:04.084 --> 0:02:10.088
And the other problem we mentioned last time
about biases is that there's not always the

0:02:10.088 --> 0:02:12.179
same amount of information in.

0:02:12.592 --> 0:02:18.206
So it can be that there's some more information
in the one or you can't express that few information

0:02:18.206 --> 0:02:19.039
on the target.

0:02:19.039 --> 0:02:24.264
We had that also, for example, with the example
with the rice plant in Germany, we would just

0:02:24.264 --> 0:02:24.820
say rice.

0:02:24.904 --> 0:02:33.178
Or in English, while in other countries you
have to distinguish between rice plant or rice

0:02:33.178 --> 0:02:33.724
as a.

0:02:34.194 --> 0:02:40.446
And then it's not always possible to directly
infer this on the surface.

0:02:41.781 --> 0:02:48.501
And if we make it to the last point otherwise
we'll do that next Tuesday or we'll partly

0:02:48.501 --> 0:02:55.447
do it only here is like we'll describe briefly
the three main approaches on a rule based so

0:02:55.447 --> 0:02:59.675
linguistic motivated ways of doing machine
translation.

0:02:59.779 --> 0:03:03.680
We mentioned them last time like the direct
translation.

0:03:03.680 --> 0:03:10.318
The translation by transfer the lingua interlingua
bass will do that a bit more in detail today.

0:03:10.590 --> 0:03:27.400
But very briefly because this is not a focus
of this class and then next week because.

0:03:29.569 --> 0:03:31.757
Why do we think this is important?

0:03:31.757 --> 0:03:37.259
On the one hand, of course, we are dealing
with natural language, so therefore it might

0:03:37.259 --> 0:03:43.074
be good to spend a bit of time in understanding
what we are really dealing with because this

0:03:43.074 --> 0:03:45.387
is challenging these other problems.

0:03:45.785 --> 0:03:50.890
And on the other hand, this was the first
way of how we're doing machine translation.

0:03:51.271 --> 0:04:01.520
Therefore, it's interesting to understand
what was the idea behind that and also to later

0:04:01.520 --> 0:04:08.922
see what is done differently and to understand
when some models.

0:04:13.453 --> 0:04:20.213
When we're talking about linguistics, we can
of course do that on different levels and there's

0:04:20.213 --> 0:04:21.352
different ways.

0:04:21.521 --> 0:04:26.841
On the right side here you are seeing the
basic levels of linguistics.

0:04:27.007 --> 0:04:31.431
So we have at the bottom the phonetics and
phonology.

0:04:31.431 --> 0:04:38.477
Phones will not cover this year because we
are mainly focusing on text input where we

0:04:38.477 --> 0:04:42.163
are directly having directors and then work.

0:04:42.642 --> 0:04:52.646
Then what we touch today, at least mention
what it is, is a morphology which is the first

0:04:52.646 --> 0:04:53.424
level.

0:04:53.833 --> 0:04:59.654
Already mentioned it a bit on Tuesday that
of course there are some languages where this

0:04:59.654 --> 0:05:05.343
is very, very basic and there is not really
a lot of rules of how you can build words.

0:05:05.343 --> 0:05:11.099
But since I assume you all have some basic
knowledge of German there is like a lot more

0:05:11.099 --> 0:05:12.537
challenges than that.

0:05:13.473 --> 0:05:20.030
You know, maybe if you're a native speaker
that's quite easy and everything is clear,

0:05:20.030 --> 0:05:26.969
but if you have to learn it like the endings
of a word, we are famous for doing compositar

0:05:26.969 --> 0:05:29.103
and putting words together.

0:05:29.103 --> 0:05:31.467
So this is like the first lab.

0:05:32.332 --> 0:05:40.268
Then we have the syntax, which is both on
the word and on the sentence level, and that's

0:05:40.268 --> 0:05:43.567
about the structure of the sentence.

0:05:43.567 --> 0:05:46.955
What are the functions of some words?

0:05:47.127 --> 0:05:51.757
You might remember part of speech text from
From Your High School Time.

0:05:51.757 --> 0:05:57.481
There is like noun and adjective and and things
like that and this is something helpful.

0:05:57.737 --> 0:06:03.933
Just imagine in the beginning that it was
not only used for rule based but for statistical

0:06:03.933 --> 0:06:10.538
machine translation, for example, the reordering
between languages was quite a challenging task.

0:06:10.770 --> 0:06:16.330
Especially if you have long range reorderings
and their part of speech information is very

0:06:16.330 --> 0:06:16.880
helpful.

0:06:16.880 --> 0:06:20.301
You know, in German you have to move the word
the verb.

0:06:20.260 --> 0:06:26.599
To the second position, if you have Spanish
you have to change the noun and the adjective

0:06:26.599 --> 0:06:30.120
so information from part of speech could be
very.

0:06:30.410 --> 0:06:38.621
Then you have a syntax base structure where
you have a full syntax tree in the beginning

0:06:38.621 --> 0:06:43.695
and then it came into statistical machine translation.

0:06:44.224 --> 0:06:50.930
And it got more and more important for statistical
machine translation that you are really trying

0:06:50.930 --> 0:06:53.461
to model the whole syntax tree of a.

0:06:53.413 --> 0:06:57.574
Sentence in order to better match how to do
that in UM.

0:06:57.574 --> 0:07:04.335
In the target language, a bit yeah, the syntax
based statistical machine translation had a

0:07:04.335 --> 0:07:05.896
bitter of a problem.

0:07:05.896 --> 0:07:08.422
It got better and better and was.

0:07:08.368 --> 0:07:13.349
Just on the way of getting better in some
languages than traditional statistical models.

0:07:13.349 --> 0:07:18.219
But then the neural models came up and they
were just so much better in modelling that

0:07:18.219 --> 0:07:19.115
all implicitly.

0:07:19.339 --> 0:07:23.847
So that they are never were used in practice
so much.

0:07:24.304 --> 0:07:34.262
And then we'll talk about the semantics, so
what is the meaning of the words?

0:07:34.262 --> 0:07:40.007
Last time words can have different meanings.

0:07:40.260 --> 0:07:46.033
And yeah, how you represent meaning of cause
is very challenging.

0:07:45.966 --> 0:07:53.043
And normally that like formalizing this is
typically done in quite limited domains because

0:07:53.043 --> 0:08:00.043
like doing that for like all possible words
has not really been achieved yet in this very

0:08:00.043 --> 0:08:00.898
challenge.

0:08:02.882 --> 0:08:09.436
About pragmatics, so pragmatics is then what
is meaning in the context of the current situation.

0:08:09.789 --> 0:08:16.202
So one famous example is there, for example,
if you say the light is red.

0:08:16.716 --> 0:08:21.795
The traffic light is red so that typically
not you don't want to tell the other person

0:08:21.795 --> 0:08:27.458
if you're sitting in a car that it's surprising
oh the light is red but typically you're meaning

0:08:27.458 --> 0:08:30.668
okay you should stop and you shouldn't pass
the light.

0:08:30.850 --> 0:08:40.994
So the meaning of this sentence, the light,
is red in the context of sitting in the car.

0:08:42.762 --> 0:08:51.080
So let's start with the morphology so that
with the things we are starting there and one

0:08:51.080 --> 0:08:53.977
easy and first thing is there.

0:08:53.977 --> 0:09:02.575
Of course we have to split the sentence into
words or joint directors so that we have word.

0:09:02.942 --> 0:09:09.017
Because in most of our work we'll deal like
machine translation with some type of words.

0:09:09.449 --> 0:09:15.970
In neuromachine translation, people are working
also on director based and subwords, but a

0:09:15.970 --> 0:09:20.772
basic unique words of the sentence is a very
important first step.

0:09:21.421 --> 0:09:32.379
And for many languages that is quite simple
in German, it's not that hard to determine

0:09:32.379 --> 0:09:33.639
the word.

0:09:34.234 --> 0:09:46.265
In tokenization, the main challenge is if
we are doing corpus-based methods that we are

0:09:46.265 --> 0:09:50.366
also dealing as normal words.

0:09:50.770 --> 0:10:06.115
And there of course it's getting a bit more
challenging.

0:10:13.173 --> 0:10:17.426
So that is maybe the main thing where, for
example, in Germany, if you think of German

0:10:17.426 --> 0:10:19.528
tokenization, it's easy to get every word.

0:10:19.779 --> 0:10:26.159
You split it at a space, but then you would
have the dots at the end join to the last word,

0:10:26.159 --> 0:10:30.666
and of course that you don't want because it's
a different word.

0:10:30.666 --> 0:10:37.046
The last word would not be go, but go dot,
but what you can do is split up the dots always.

0:10:37.677 --> 0:10:45.390
Can you really do that always or it might
be sometimes better to keep the dot as a point?

0:10:47.807 --> 0:10:51.001
For example, email addresses or abbreviations
here.

0:10:51.001 --> 0:10:56.284
For example, doctor, maybe it doesn't make
sense to split up the dot because then you

0:10:56.284 --> 0:11:01.382
would assume all year starts a new sentence,
but it's just the DR dot from doctor.

0:11:01.721 --> 0:11:08.797
Or if you have numbers like he's a seventh
person like the zipter, then you don't want

0:11:08.797 --> 0:11:09.610
to split.

0:11:09.669 --> 0:11:15.333
So there are some things where it could be
a bit more difficult, but it's not really challenging.

0:11:16.796 --> 0:11:23.318
In other languages it's getting a lot more
challenging, especially in Asian languages

0:11:23.318 --> 0:11:26.882
where often there are no spaces between words.

0:11:27.147 --> 0:11:32.775
So you just have the sequence of characters.

0:11:32.775 --> 0:11:38.403
The quick brown fox jumps over the lazy dog.

0:11:38.999 --> 0:11:44.569
And then it still might be helpful to work
on something like words.

0:11:44.569 --> 0:11:48.009
Then you need to have a bit more complex.

0:11:48.328 --> 0:11:55.782
And here you see we are again having our typical
problem.

0:11:55.782 --> 0:12:00.408
That means that there is ambiguity.

0:12:00.600 --> 0:12:02.104
So you're seeing here.

0:12:02.104 --> 0:12:08.056
We have exactly the same sequence of characters
or here, but depending on how we split it,

0:12:08.056 --> 0:12:12.437
it means he is your servant or he is the one
who used your things.

0:12:12.437 --> 0:12:15.380
Or here we have round eyes and take the air.

0:12:15.895 --> 0:12:22.953
So then of course yeah this type of tokenization
gets more important because you could introduce

0:12:22.953 --> 0:12:27.756
already arrows and you can imagine if you're
doing it here wrong.

0:12:27.756 --> 0:12:34.086
If you once do a wrong decision it's quite
difficult to recover from a wrong decision.

0:12:34.634 --> 0:12:47.088
And so in these cases looking about how we're
doing tokenization is an important issue.

0:12:47.127 --> 0:12:54.424
And then it might be helpful to do things
like director based models where we treat each

0:12:54.424 --> 0:12:56.228
director as a symbol.

0:12:56.228 --> 0:13:01.803
For example, do this decision in the later
or never really do this?

0:13:06.306 --> 0:13:12.033
The other thing is that if we have words we
might, it might not be the optimal unit to

0:13:12.033 --> 0:13:18.155
work with because it can be that we should
look into the internal structure of words because

0:13:18.155 --> 0:13:20.986
if we have a morphological rich language,.

0:13:21.141 --> 0:13:27.100
That means we have a lot of different types
of words, and if you have a lot of many different

0:13:27.100 --> 0:13:32.552
types of words, it on the other hand means
of course each of these words we have seen

0:13:32.552 --> 0:13:33.757
very infrequently.

0:13:33.793 --> 0:13:39.681
So if you only have ten words and you have
a large corpus, each word occurs more often.

0:13:39.681 --> 0:13:45.301
If you have three million different words,
then each of them will occur less often.

0:13:45.301 --> 0:13:51.055
Hopefully you know, from machine learning,
it's helpful if you have seen each example

0:13:51.055 --> 0:13:51.858
very often.

0:13:52.552 --> 0:13:54.524
And so why does it help?

0:13:54.524 --> 0:13:56.495
Why does it help happen?

0:13:56.495 --> 0:14:02.410
Yeah, in some languages we have quite a complex
information inside a word.

0:14:02.410 --> 0:14:09.271
So here's a word from a finish talosanikiko
or something like that, and it means in my

0:14:09.271 --> 0:14:10.769
house to question.

0:14:11.491 --> 0:14:15.690
So you have all these information attached
to the word.

0:14:16.036 --> 0:14:20.326
And that of course in extreme case that's
why typically, for example, Finnish is the

0:14:20.326 --> 0:14:20.831
language.

0:14:20.820 --> 0:14:26.725
Where machine translation quality is less
good because generating all these different

0:14:26.725 --> 0:14:33.110
morphological variants is is a challenge and
the additional challenge is typically in finish

0:14:33.110 --> 0:14:39.564
not really low resource but for in low resource
languages you quite often have more difficult

0:14:39.564 --> 0:14:40.388
morphology.

0:14:40.440 --> 0:14:43.949
Mean English is an example of a relatively
easy one.

0:14:46.066 --> 0:14:54.230
And so in general we can say that words are
composed of more themes, and more themes are

0:14:54.230 --> 0:15:03.069
the smallest meaning carrying unit, so normally
it means: All morphine should have some type

0:15:03.069 --> 0:15:04.218
of meaning.

0:15:04.218 --> 0:15:09.004
For example, here does not really have a meaning.

0:15:09.289 --> 0:15:12.005
Bian has some type of meaning.

0:15:12.005 --> 0:15:14.371
It's changing the meaning.

0:15:14.371 --> 0:15:21.468
The NES has the meaning that it's making out
of an adjective, a noun, and happy.

0:15:21.701 --> 0:15:31.215
So each of these parts conveys some meaning,
but you cannot split them further up and have

0:15:31.215 --> 0:15:32.156
somehow.

0:15:32.312 --> 0:15:36.589
You see that of course a little bit more is
happening.

0:15:36.589 --> 0:15:43.511
Typically the Y is going into an E so there
can be some variation, but these are typical

0:15:43.511 --> 0:15:46.544
examples of what we have as morphines.

0:16:02.963 --> 0:16:08.804
That is, of course, a problem and that's the
question why how you do your splitting.

0:16:08.804 --> 0:16:15.057
But that problem we have anyway always because
even full words can have different meanings

0:16:15.057 --> 0:16:17.806
depending on the context they're using.

0:16:18.038 --> 0:16:24.328
So we always have to somewhat have a model
which can infer or represent the meaning of

0:16:24.328 --> 0:16:25.557
the word in the.

0:16:25.825 --> 0:16:30.917
But you are right that this problem might
get even more severe if you're splitting up.

0:16:30.917 --> 0:16:36.126
Therefore, it might not be the best to go
for the very extreme and represent each letter

0:16:36.126 --> 0:16:41.920
and have a model which is only on letters because,
of course, a letter can have a lot of different

0:16:41.920 --> 0:16:44.202
meanings depending on where it's used.

0:16:44.524 --> 0:16:50.061
And yeah, there is no right solution like
what is the right splitting.

0:16:50.061 --> 0:16:56.613
It depends on the language and the application
on the amount of data you're having.

0:16:56.613 --> 0:17:01.058
For example, typically it means the fewer
data you have.

0:17:01.301 --> 0:17:12.351
The more splitting you should do, if you have
more data, then you can be better distinguish.

0:17:13.653 --> 0:17:19.065
Then there are different types of morphines:
So we have typically one stemmed theme: It's

0:17:19.065 --> 0:17:21.746
like house or tish, so the main meaning.

0:17:21.941 --> 0:17:29.131
And then you can have functional or bound
morphemes which can be f which can be prefix,

0:17:29.131 --> 0:17:34.115
suffix, infix or circumfix so it can be before
can be after.

0:17:34.114 --> 0:17:39.416
It can be inside or it can be around it, something
like a coughed there.

0:17:39.416 --> 0:17:45.736
Typically you would say that it's not like
two more themes, G and T, because they both

0:17:45.736 --> 0:17:50.603
describe the function, but together G and T
are marking the cough.

0:17:53.733 --> 0:18:01.209
For what are people using them you can use
them for inflection to describe something like

0:18:01.209 --> 0:18:03.286
tense count person case.

0:18:04.604 --> 0:18:09.238
That is yeah, if you know German, this is
commonly used in German.

0:18:10.991 --> 0:18:16.749
But of course there is a lot more complicated
things: I think in in some languages it also.

0:18:16.749 --> 0:18:21.431
I mean, in Germany it only depends counting
person on the subject.

0:18:21.431 --> 0:18:27.650
For the word, for example, in other languages
it can also determine the first and on the

0:18:27.650 --> 0:18:28.698
second object.

0:18:28.908 --> 0:18:35.776
So that it like if you buy an apple or an
house, that not only the, the, the.

0:18:35.776 --> 0:18:43.435
Kauft depends on on me like in German, but
it can also depend on whether it's an apple

0:18:43.435 --> 0:18:44.492
or a house.

0:18:44.724 --> 0:18:48.305
And then of course you have an exploding number
of web fronts.

0:18:49.409 --> 0:19:04.731
Furthermore, it can be used to do derivations
so you can make other types of words from it.

0:19:05.165 --> 0:19:06.254
And then yeah.

0:19:06.254 --> 0:19:12.645
This is like creating new words by joining
them like rainbow waterproof but for example

0:19:12.645 --> 0:19:19.254
in German like Einköw's Wagen, Ice Cult and
so on where you can join where you can do that

0:19:19.254 --> 0:19:22.014
with nouns and German adjectives and.

0:19:22.282 --> 0:19:29.077
Then of course you might have additional challenges
like the Fugan where you have to add this one.

0:19:32.452 --> 0:19:39.021
Yeah, then there is a yeah of course additional
special things.

0:19:39.639 --> 0:19:48.537
You have to sometimes put extra stuff because
of phonology, so it's dig the plural, not plural.

0:19:48.537 --> 0:19:56.508
The third person singular, as in English,
is normally S, but by Goes, for example, is

0:19:56.508 --> 0:19:57.249
an E S.

0:19:57.277 --> 0:20:04.321
In German you can also have other things that
like Osmutta gets Mutter so you're changing

0:20:04.321 --> 0:20:11.758
the Umlaud in order to express the plural and
in other languages for example the vowel harmony

0:20:11.758 --> 0:20:17.315
where the vowels inside are changing depending
on which form you have.

0:20:17.657 --> 0:20:23.793
Which makes things more difficult than splitting
a word into its part doesn't really work anymore.

0:20:23.793 --> 0:20:28.070
So like for Muta and Muta, for example, that
is not really possible.

0:20:28.348 --> 0:20:36.520
The nice thing is, of course, more like a
general thing, but often irregular things are

0:20:36.520 --> 0:20:39.492
happening as words which occur.

0:20:39.839 --> 0:20:52.177
So that you can have enough examples, while
the regular things you can do by some type

0:20:52.177 --> 0:20:53.595
of rules.

0:20:55.655 --> 0:20:57.326
Yeah, This Can Be Done.

0:20:57.557 --> 0:21:02.849
So there are tasks on this: how to do automatic
inflection, how to analyze them.

0:21:02.849 --> 0:21:04.548
So you give it a word to.

0:21:04.548 --> 0:21:10.427
It's telling you what are the possible forms
of that, like how they are built, and so on.

0:21:10.427 --> 0:21:15.654
And for the at least Ah Iris shoes language,
there are a lot of tools for that.

0:21:15.654 --> 0:21:18.463
Of course, if you now want to do that for.

0:21:18.558 --> 0:21:24.281
Some language which is very low resourced
might be very difficult and there might be

0:21:24.281 --> 0:21:25.492
no tool for them.

0:21:28.368 --> 0:21:37.652
Good before we are going for the next part
about part of speech, are there any questions

0:21:37.652 --> 0:21:38.382
about?

0:22:01.781 --> 0:22:03.187
Yeah, we'll come to that a bit.

0:22:03.483 --> 0:22:09.108
So it's a very good question and difficult
and especially we'll see that later if you

0:22:09.108 --> 0:22:14.994
just put in words it would be very bad because
words are put into neural networks just as

0:22:14.994 --> 0:22:15.844
some digits.

0:22:15.844 --> 0:22:21.534
Each word is mapped into a jitter and you
put it in so it doesn't really know any more

0:22:21.534 --> 0:22:22.908
about the structure.

0:22:23.543 --> 0:22:29.898
What we will see therefore the most successful
approach which is mostly done is a subword

0:22:29.898 --> 0:22:34.730
unit where we split: But we will do this.

0:22:34.730 --> 0:22:40.154
Don't know if you have been in advanced.

0:22:40.154 --> 0:22:44.256
We'll cover this on a Tuesday.

0:22:44.364 --> 0:22:52.316
So there is an algorithm called bite pairing
coding, which is about splitting words into

0:22:52.316 --> 0:22:52.942
parts.

0:22:53.293 --> 0:23:00.078
So it's doing the splitting of words but not
morphologically motivated but more based on

0:23:00.078 --> 0:23:00.916
frequency.

0:23:00.940 --> 0:23:11.312
However, it performs very good and that's
why it's used and there is a bit of correlation.

0:23:11.312 --> 0:23:15.529
Sometimes they agree on count based.

0:23:15.695 --> 0:23:20.709
So we're splitting words and we're splitting
especially words which are infrequent and that's

0:23:20.709 --> 0:23:23.962
maybe a good motivation why that's good for
neural networks.

0:23:23.962 --> 0:23:28.709
That means if you have seen a word very often
you don't need to split it and it's easier

0:23:28.709 --> 0:23:30.043
to just process it fast.

0:23:30.690 --> 0:23:39.218
While if you have seen the words infrequently,
it is good to split it into parts so it can

0:23:39.218 --> 0:23:39.593
do.

0:23:39.779 --> 0:23:47.729
So there is some way of doing it, but linguists
would say this is not a morphological analyst.

0:23:47.729 --> 0:23:53.837
That is true, but we are spitting words into
parts if they are not seen.

0:23:59.699 --> 0:24:06.324
Yes, so another important thing about words
are the paddle speech text.

0:24:06.324 --> 0:24:14.881
These are the common ones: noun, verb, adjective,
verb, determine, pronoun, proposition, and

0:24:14.881 --> 0:24:16.077
conjunction.

0:24:16.077 --> 0:24:26.880
There are some more: They are not the same
in all language, but for example there is this

0:24:26.880 --> 0:24:38.104
universal grammar which tries to do this type
of part of speech text for many languages.

0:24:38.258 --> 0:24:42.018
And then, of course, it's helping you for
generalization.

0:24:42.018 --> 0:24:48.373
There are some language deals with verbs and
nouns, especially if you look at sentence structure.

0:24:48.688 --> 0:24:55.332
And so if you know the part of speech tag
you can easily generalize and do get these

0:24:55.332 --> 0:24:58.459
rules or apply these rules as you know.

0:24:58.459 --> 0:25:02.680
The verb in English is always at the second
position.

0:25:03.043 --> 0:25:10.084
So you know how to deal with verbs independently
of which words you are now really looking at.

0:25:12.272 --> 0:25:18.551
And that again can be done is ambiguous.

0:25:18.598 --> 0:25:27.171
So there are some words which can have several
pot of speech text.

0:25:27.171 --> 0:25:38.686
Example are the word can, for example, which
can be the can of beans or can do something.

0:25:38.959 --> 0:25:46.021
Often is also in English related work.

0:25:46.021 --> 0:25:55.256
Access can be to excess or to access to something.

0:25:56.836 --> 0:26:02.877
Most words have only one single part of speech
tag, but they are some where it's a bit more

0:26:02.877 --> 0:26:03.731
challenging.

0:26:03.731 --> 0:26:09.640
The nice thing is the ones which are in big
are often more words, which occur more often,

0:26:09.640 --> 0:26:12.858
while for really ware words it's not that often.

0:26:13.473 --> 0:26:23.159
If you look at these classes you can distinguish
open classes where new words can happen so

0:26:23.159 --> 0:26:25.790
we can invent new nouns.

0:26:26.926 --> 0:26:31.461
But then there are the close classes which
I think are determined or pronoun.

0:26:31.461 --> 0:26:35.414
For example, it's not that you can easily
develop your new pronoun.

0:26:35.414 --> 0:26:38.901
So there is a fixed list of pronouns and we
are using that.

0:26:38.901 --> 0:26:44.075
So it's not like that or tomorrow there is
something happening and then people are using

0:26:44.075 --> 0:26:44.482
a new.

0:26:45.085 --> 0:26:52.426
Pronoun or new conjectures, so it's like end,
because it's not that you normally invent a

0:26:52.426 --> 0:26:52.834
new.

0:27:00.120 --> 0:27:03.391
And additional to part of speech text.

0:27:03.391 --> 0:27:09.012
Then some of these part of speech texts have
different properties.

0:27:09.389 --> 0:27:21.813
So, for example, for nouns and adjectives
we can have a singular plural: In other languages,

0:27:21.813 --> 0:27:29.351
there is a duel so that a word is not only
like a single or in plural, but also like a

0:27:29.351 --> 0:27:31.257
duel if it's meaning.

0:27:31.631 --> 0:27:36.246
You have the gender and masculine feminine
neutre we know.

0:27:36.246 --> 0:27:43.912
In other language there is animated and inanimated
and you have the cases like in German you have

0:27:43.912 --> 0:27:46.884
no maternative guinetive acquisitive.

0:27:47.467 --> 0:27:57.201
So here and then in other languages you also
have Latin with the upper teeth.

0:27:57.497 --> 0:28:03.729
So there's like more, it's just like yeah,
and there you have no one to one correspondence,

0:28:03.729 --> 0:28:09.961
so it can be that there are some cases which
are only in the one language and do not happen

0:28:09.961 --> 0:28:11.519
in the other language.

0:28:13.473 --> 0:28:20.373
For whorps we have tenses of course like walk
is walking walked have walked head walked will

0:28:20.373 --> 0:28:21.560
walk and so on.

0:28:21.560 --> 0:28:28.015
Interestingly for example in Japanese this
can also happen for adjectives though there

0:28:28.015 --> 0:28:32.987
is a difference between something is white
or something was white.

0:28:35.635 --> 0:28:41.496
There is this continuous thing which should
not really have that commonly in German and

0:28:41.496 --> 0:28:47.423
I guess that's if you're German and learning
English that's something like she sings and

0:28:47.423 --> 0:28:53.350
she is singing and of course we can express
that but it's not commonly used and normally

0:28:53.350 --> 0:28:55.281
we're not doing this aspect.

0:28:55.455 --> 0:28:57.240
Also about tenses.

0:28:57.240 --> 0:29:05.505
If you use pasts in English you will also
use past tenses in German, so we have similar

0:29:05.505 --> 0:29:09.263
tenses, but the use might be different.

0:29:14.214 --> 0:29:20.710
There is uncertainty like the mood in there
indicative.

0:29:20.710 --> 0:29:26.742
If he were here, there's voices active and
passive.

0:29:27.607 --> 0:29:34.024
That you know, that is like both in German
and English there, but there is something in

0:29:34.024 --> 0:29:35.628
the Middle and Greek.

0:29:35.628 --> 0:29:42.555
I get myself taught, so there is other phenomens
than which might only happen in one language.

0:29:42.762 --> 0:29:50.101
This is, like yeah, the different synthetic
structures that you can can have in the language,

0:29:50.101 --> 0:29:57.361
and where there's the two things, so it might
be that some only are in some language, others

0:29:57.361 --> 0:29:58.376
don't exist.

0:29:58.358 --> 0:30:05.219
And on the other hand there is also matching,
so it might be that in some situations you

0:30:05.219 --> 0:30:07.224
use different structures.

0:30:10.730 --> 0:30:13.759
The next would be then about semantics.

0:30:13.759 --> 0:30:16.712
Do you have any questions before that?

0:30:19.819 --> 0:30:31.326
I'll just continue, but if something is unclear
beside the structure, we typically have more

0:30:31.326 --> 0:30:39.863
ambiguities, so it can be that words itself
have different meanings.

0:30:40.200 --> 0:30:48.115
And we are typically talking about polysemy
and homonyme, where polysemy means that a word

0:30:48.115 --> 0:30:50.637
can have different meanings.

0:30:50.690 --> 0:30:58.464
So if you have the English word interest,
it can be that you are interested in something.

0:30:58.598 --> 0:31:07.051
Or it can be like the interest rate financial,
but it is somehow related because if you are

0:31:07.051 --> 0:31:11.002
getting some interest rates there is some.

0:31:11.531 --> 0:31:18.158
Are, but there is a homophemer where they
really are not related.

0:31:18.458 --> 0:31:24.086
So you can and can doesn't really have anything
in common, so it's really very different.

0:31:24.324 --> 0:31:29.527
And of course that's not completely clear
so there is not a clear definition so for example

0:31:29.527 --> 0:31:34.730
for the bank it can be that you say it's related
but it can also be other can argue that so

0:31:34.730 --> 0:31:39.876
there are some clear things which is interest
there are some which is vague and then there

0:31:39.876 --> 0:31:43.439
are some where it's very clear again that there
are different.

0:31:45.065 --> 0:31:49.994
And in order to translate them, of course,
we might need the context to disambiguate.

0:31:49.994 --> 0:31:54.981
That's typically where we can disambiguate,
and that's not only for lexical semantics,

0:31:54.981 --> 0:32:00.198
that's generally very often that if you want
to disambiguate, context can be very helpful.

0:32:00.198 --> 0:32:03.981
So in which sentence and which general knowledge
who is speaking?

0:32:04.944 --> 0:32:09.867
You can do that externally by some disinvigration
task.

0:32:09.867 --> 0:32:14.702
Machine translation system will also do it
internally.

0:32:16.156 --> 0:32:21.485
And sometimes you're lucky and you don't need
to do it because you just have the same ambiguity

0:32:21.485 --> 0:32:23.651
in the source and the target language.

0:32:23.651 --> 0:32:26.815
And then it doesn't matter if you think about
the mouse.

0:32:26.815 --> 0:32:31.812
As I said, you don't really need to know if
it's a computer mouse or the living mouse you

0:32:31.812 --> 0:32:36.031
translate from German to English because it
has exactly the same ambiguity.

0:32:40.400 --> 0:32:46.764
There's also relations between words like
synonyms, antonyms, hipponomes, like the is

0:32:46.764 --> 0:32:50.019
a relation and the part of like Dora House.

0:32:50.019 --> 0:32:55.569
Big small is an antonym and synonym is like
which needs something similar.

0:32:56.396 --> 0:33:03.252
There are resources which try to express all
these linguistic information like word net

0:33:03.252 --> 0:33:10.107
or German net where you have a graph with words
and how they are related to each other.

0:33:11.131 --> 0:33:12.602
Which can be helpful.

0:33:12.602 --> 0:33:18.690
Typically these things were more used in tasks
where there is fewer data, so there's a lot

0:33:18.690 --> 0:33:24.510
of tasks in NLP where you have very limited
data because you really need to hand align

0:33:24.510 --> 0:33:24.911
that.

0:33:25.125 --> 0:33:28.024
Machine translation has a big advantage.

0:33:28.024 --> 0:33:31.842
There's naturally a lot of text translated
out there.

0:33:32.212 --> 0:33:39.519
Typically in machine translation we have compared
to other tasks significantly amount of data.

0:33:39.519 --> 0:33:46.212
People have looked into integrating wordnet
or things like that, but it is rarely used

0:33:46.212 --> 0:33:49.366
in like commercial systems or something.

0:33:52.692 --> 0:33:55.626
So this was based on the words.

0:33:55.626 --> 0:34:03.877
We have morphology, syntax, and semantics,
and then of course it makes sense to also look

0:34:03.877 --> 0:34:06.169
at the bigger structure.

0:34:06.169 --> 0:34:08.920
That means information about.

0:34:08.948 --> 0:34:17.822
Of course, we don't have a really morphology
there because morphology about the structure

0:34:17.822 --> 0:34:26.104
of words, but we have syntax on the sentence
level and the semantic representation.

0:34:28.548 --> 0:34:35.637
When we are thinking about the sentence structure,
then the sentence is, of course, first a sequence

0:34:35.637 --> 0:34:37.742
of words terminated by a dot.

0:34:37.742 --> 0:34:42.515
Jane bought the house and we can say something
about the structure.

0:34:42.515 --> 0:34:47.077
It's typically its subject work and then one
or several objects.

0:34:47.367 --> 0:34:51.996
And the number of objects, for example, is
then determined by the word.

0:34:52.232 --> 0:34:54.317
It's Called the Valency.

0:34:54.354 --> 0:35:01.410
So you have intransitive verbs which don't
get any object, it's just to sleep.

0:35:02.622 --> 0:35:05.912
For example, there is no object sleep beds.

0:35:05.912 --> 0:35:14.857
You cannot say that: And there are transitive
verbs where you have to put one or more objects,

0:35:14.857 --> 0:35:16.221
and you always.

0:35:16.636 --> 0:35:19.248
Sentence is not correct if you don't put the
object.

0:35:19.599 --> 0:35:33.909
So if you have to buy something you have to
say bought this or give someone something then.

0:35:34.194 --> 0:35:40.683
Here you see a bit that may be interesting
the relation between word order and morphology.

0:35:40.683 --> 0:35:47.243
Of course it's not that strong, but for example
in English you always have to first say who

0:35:47.243 --> 0:35:49.453
you gave it and what you gave.

0:35:49.453 --> 0:35:53.304
So the structure is very clear and cannot
be changed.

0:35:54.154 --> 0:36:00.801
German, for example, has a possibility of
determining what you gave and whom you gave

0:36:00.801 --> 0:36:07.913
it because there is a morphology and you can
do what you gave a different form than to whom

0:36:07.913 --> 0:36:08.685
you gave.

0:36:11.691 --> 0:36:18.477
And that is a general tendency that if you
have morphology then typically the word order

0:36:18.477 --> 0:36:25.262
is more free and possible, while in English
you cannot express these information through

0:36:25.262 --> 0:36:26.482
the morphology.

0:36:26.706 --> 0:36:30.238
You typically have to express them through
the word order.

0:36:30.238 --> 0:36:32.872
It's not as free, but it's more restricted.

0:36:35.015 --> 0:36:40.060
Yeah, the first part is typically the noun
phrase, the subject, and that can not only

0:36:40.060 --> 0:36:43.521
be a single noun, but of course it can be a
longer phrase.

0:36:43.521 --> 0:36:48.860
So if you have Jane the woman, it can be Jane,
it can be the woman, it can a woman, it can

0:36:48.860 --> 0:36:52.791
be the young woman or the young woman who lives
across the street.

0:36:53.073 --> 0:36:56.890
All of these are the subjects, so this can
be already very, very long.

0:36:57.257 --> 0:36:58.921
And they also put this.

0:36:58.921 --> 0:37:05.092
The verb is on the second position in a bit
more complicated way because if you have now

0:37:05.092 --> 0:37:11.262
the young woman who lives across the street
runs to somewhere or so then yeah runs is at

0:37:11.262 --> 0:37:16.185
the second position in this tree but the first
position is quite long.

0:37:16.476 --> 0:37:19.277
And so it's not just counting okay.

0:37:19.277 --> 0:37:22.700
The second word is always is always a word.

0:37:26.306 --> 0:37:32.681
Additional to these simple things, there's
more complex stuff.

0:37:32.681 --> 0:37:43.104
Jane bought the house from Jim without hesitation,
or Jane bought the house in the pushed neighborhood

0:37:43.104 --> 0:37:44.925
across the river.

0:37:45.145 --> 0:37:51.694
And these often lead to additional ambiguities
because it's not always completely clear to

0:37:51.694 --> 0:37:53.565
which this prepositional.

0:37:54.054 --> 0:37:59.076
So that we'll see and you have, of course,
subclasses and so on.

0:38:01.061 --> 0:38:09.926
And then there is a theory behind it which
was very important for rule based machine translation

0:38:09.926 --> 0:38:14.314
because that's exactly what you're doing there.

0:38:14.314 --> 0:38:18.609
You would take the sentence, do the syntactic.

0:38:18.979 --> 0:38:28.432
So that we can have this constituents which
like describe the basic parts of the language.

0:38:28.468 --> 0:38:35.268
And we can create the sentence structure as
a context free grammar, which you hopefully

0:38:35.268 --> 0:38:42.223
remember from basic computer science, which
is a pair of non terminals, terminal symbols,

0:38:42.223 --> 0:38:44.001
production rules, and.

0:38:43.943 --> 0:38:50.218
And the star symbol, and you can then describe
a sentence by this phrase structure grammar:

0:38:51.751 --> 0:38:59.628
So a simple example would be something like
that: you have a lexicon, Jane is a noun, Frays

0:38:59.628 --> 0:39:02.367
is a noun, Telescope is a noun.

0:39:02.782 --> 0:39:10.318
And then you have these production rules sentences:
a noun phrase in the web phrase.

0:39:10.318 --> 0:39:18.918
The noun phrase can either be a determinized
noun or it can be a noun phrase and a propositional

0:39:18.918 --> 0:39:19.628
phrase.

0:39:19.919 --> 0:39:25.569
Or a prepositional phrase and a prepositional
phrase is a preposition and a non phrase.

0:39:26.426 --> 0:39:27.622
We're looking at this.

0:39:27.622 --> 0:39:30.482
What is the valency of the word we're describing
here?

0:39:33.513 --> 0:39:36.330
How many objects would in this case the world
have?

0:39:46.706 --> 0:39:48.810
We're looking at the web phrase.

0:39:48.810 --> 0:39:54.358
The web phrase is a verb and a noun phrase,
so one object here, so this would be for a

0:39:54.358 --> 0:39:55.378
balance of one.

0:39:55.378 --> 0:40:00.925
If you have intransitive verbs, it would be
verb phrases, just a word, and if you have

0:40:00.925 --> 0:40:03.667
two, it would be noun phrase, noun phrase.

0:40:08.088 --> 0:40:15.348
And yeah, then the, the, the challenge or
what you have to do is like this: Given a natural

0:40:15.348 --> 0:40:23.657
language sentence, you want to parse it to
get this type of pastry from programming languages

0:40:23.657 --> 0:40:30.198
where you also need to parse the code in order
to get the representation.

0:40:30.330 --> 0:40:39.356
However, there is one challenge if you parse
natural language compared to computer language.

0:40:43.823 --> 0:40:56.209
So there are different ways of how you can
express things and there are different pastures

0:40:56.209 --> 0:41:00.156
belonging to the same input.

0:41:00.740 --> 0:41:05.241
So if you have Jane buys a horse, how's that
an easy example?

0:41:05.241 --> 0:41:07.491
So you do the lexicon look up.

0:41:07.491 --> 0:41:13.806
Jane can be a noun phrase, a bias is a verb,
a is a determiner, and a house is a noun.

0:41:15.215 --> 0:41:18.098
And then you can now use the grammar rules
of here.

0:41:18.098 --> 0:41:19.594
There is no rule for that.

0:41:20.080 --> 0:41:23.564
Here we have no rules, but here we have a
rule.

0:41:23.564 --> 0:41:27.920
A noun is a non-phrase, so we have mapped
that to the noun.

0:41:28.268 --> 0:41:34.012
Then we can map this to the web phrase.

0:41:34.012 --> 0:41:47.510
We have a verb noun phrase to web phrase and
then we can map this to a sentence representing:

0:41:49.069 --> 0:41:53.042
We can have that even more complex.

0:41:53.042 --> 0:42:01.431
The woman who won the lottery yesterday bought
the house across the street.

0:42:01.431 --> 0:42:05.515
The structure gets more complicated.

0:42:05.685 --> 0:42:12.103
You now see that the word phrase is at the
second position, but the noun phrase is quite.

0:42:12.052 --> 0:42:18.655
Quite big in here and the p p phrases, it's
sometimes difficult where to put them because

0:42:18.655 --> 0:42:25.038
they can be put to the noun phrase, but in
other sentences they can also be put to the

0:42:25.038 --> 0:42:25.919
web phrase.

0:42:36.496 --> 0:42:38.250
Yeah.

0:42:43.883 --> 0:42:50.321
Yes, so then either it can have two tags,
noun or noun phrase, or you can have the extra

0:42:50.321 --> 0:42:50.755
rule.

0:42:50.755 --> 0:42:57.409
The noun phrase can not only be a determiner
in the noun, but it can also be a noun phrase.

0:42:57.717 --> 0:43:04.360
Then of course either you introduce additional
rules when what is possible or the problem

0:43:04.360 --> 0:43:11.446
that if you do pastures which are not correct
and then you have to add some type of probability

0:43:11.446 --> 0:43:13.587
which type is more probable.

0:43:16.876 --> 0:43:23.280
But of course some things also can't really
model easily with this type of cheese.

0:43:23.923 --> 0:43:32.095
There, for example, the agreement is not straightforward
to do so that in subject and work you can check

0:43:32.095 --> 0:43:38.866
that the person, the agreement, the number
in person, the number agreement is correct,

0:43:38.866 --> 0:43:41.279
but if it's a singular object.

0:43:41.561 --> 0:43:44.191
A singular verb, it's also a singular.

0:43:44.604 --> 0:43:49.242
Non-subject, and if it's a plural subject,
it's a plural work.

0:43:49.489 --> 0:43:56.519
Things like that are yeah, the agreement in
determining action driven now, so they also

0:43:56.519 --> 0:43:57.717
have to agree.

0:43:57.877 --> 0:44:05.549
Things like that cannot be easily done with
this type of grammar or this subcategorization

0:44:05.549 --> 0:44:13.221
that you check whether the verb is transitive
or intransitive, and that Jane sleeps is OK,

0:44:13.221 --> 0:44:16.340
but Jane sleeps the house is not OK.

0:44:16.436 --> 0:44:21.073
And Jane Walterhouse is okay, but Jane Walterhouse
is not okay.

0:44:23.183 --> 0:44:29.285
Furthermore, this long range dependency might
be difficult and which word orders are allowed

0:44:29.285 --> 0:44:31.056
and which are not allowed.

0:44:31.571 --> 0:44:40.011
This is also not directly so you can say Maria
give de man das bourg, de man give Maria das

0:44:40.011 --> 0:44:47.258
bourg, das bourg give Maria, de man aber Maria,
de man give des bourg is some.

0:44:47.227 --> 0:44:55.191
One yeah, which one from this one is possible
and not is sometimes not possible to model,

0:44:55.191 --> 0:44:56.164
is simple.

0:44:56.876 --> 0:45:05.842
Therefore, people have done more complex stuff
like this unification grammar and tried to

0:45:05.842 --> 0:45:09.328
model both the categories of verb.

0:45:09.529 --> 0:45:13.367
The agreement has to be that it's person and
single.

0:45:13.367 --> 0:45:20.028
You're joining that so you're annotating this
thing with more information and then you have

0:45:20.028 --> 0:45:25.097
more complex synthetic structures in order
to model also these types.

0:45:28.948 --> 0:45:33.137
Yeah, why is this difficult?

0:45:33.873 --> 0:45:39.783
We have different ambiguities and that makes
it different, so words have different part

0:45:39.783 --> 0:45:43.610
of speech text and if you have time flies like
an error.

0:45:43.583 --> 0:45:53.554
It can mean that sometimes the animal L look
like an arrow and or it can mean that the time

0:45:53.554 --> 0:45:59.948
is flying very fast is going away very fast
like an error.

0:46:00.220 --> 0:46:10.473
And if you want to do a pastry, these two
meanings have a different part of speech text,

0:46:10.473 --> 0:46:13.008
so flies is the verb.

0:46:13.373 --> 0:46:17.999
And of course that is a different semantic,
and so that is very different.

0:46:19.499 --> 0:46:23.361
And otherwise a structural.

0:46:23.243 --> 0:46:32.419
Ambiguity so that like some part of the sentence
can have different rules, so the famous thing

0:46:32.419 --> 0:46:34.350
is this attachment.

0:46:34.514 --> 0:46:39.724
So the cops saw the Bulgara with a binoculars.

0:46:39.724 --> 0:46:48.038
Then with a binocular can be attached to saw
or it can be attached to the.

0:46:48.448 --> 0:46:59.897
And so in the first two it's more probable
that he saw the theft, and not that the theft

0:46:59.897 --> 0:47:01.570
has the one.

0:47:01.982 --> 0:47:13.356
And this, of course, makes things difficult
while parsing and doing structure implicitly

0:47:13.356 --> 0:47:16.424
defining the semantics.

0:47:20.120 --> 0:47:29.736
Therefore, we would then go directly to semantics,
but maybe some questions about spintax and

0:47:29.736 --> 0:47:31.373
how that works.

0:47:33.113 --> 0:47:46.647
Then we'll do a bit more about semantics,
so now we only describe the structure of the

0:47:46.647 --> 0:47:48.203
sentence.

0:47:48.408 --> 0:47:55.584
And for the meaning of the sentence we typically
have the compositionality of meaning.

0:47:55.584 --> 0:48:03.091
The meaning of the full sentence is determined
by the meaning of the individual words, and

0:48:03.091 --> 0:48:06.308
they together form the meaning of the.

0:48:06.686 --> 0:48:17.936
For words that is partly true but not always
mean for things like rainbow, jointly rain

0:48:17.936 --> 0:48:19.086
and bow.

0:48:19.319 --> 0:48:26.020
But this is not always a case, while for sentences
typically that is happening because you can't

0:48:26.020 --> 0:48:30.579
directly determine the full meaning, but you
split it into parts.

0:48:30.590 --> 0:48:36.164
Sometimes only in some parts like kick the
bucket the expression.

0:48:36.164 --> 0:48:43.596
Of course you cannot get the meaning of kick
the bucket by looking at the individual or

0:48:43.596 --> 0:48:46.130
in German abyss in its grass.

0:48:47.207 --> 0:48:53.763
You cannot get that he died by looking at
the individual words of Bis ins grass, but

0:48:53.763 --> 0:48:54.611
they have.

0:48:55.195 --> 0:49:10.264
And there are different ways of describing
that some people have tried that more commonly

0:49:10.264 --> 0:49:13.781
used for some tasks.

0:49:14.654 --> 0:49:20.073
Will come to so the first thing would be something
like first order logic.

0:49:20.073 --> 0:49:27.297
If you have Peter loves Jane then you have
this meaning and you're having the end of representation

0:49:27.297 --> 0:49:33.005
that you have a love property between Peter
and Jane and you try to construct.

0:49:32.953 --> 0:49:40.606
That you will see this a lot more complex
than directly than only doing syntax but also

0:49:40.606 --> 0:49:43.650
doing this type of representation.

0:49:44.164 --> 0:49:47.761
The other thing is to try to do frame semantics.

0:49:47.867 --> 0:49:55.094
That means that you try to represent the knowledge
about the world and you have these ah frames.

0:49:55.094 --> 0:49:58.372
For example, you might have a frame to buy.

0:49:58.418 --> 0:50:05.030
And the meaning is that you have a commercial
transaction.

0:50:05.030 --> 0:50:08.840
You have a person who is selling.

0:50:08.969 --> 0:50:10.725
You Have a Person Who's Buying.

0:50:11.411 --> 0:50:16.123
You have something that is priced, you might
have a price, and so on.

0:50:17.237 --> 0:50:22.698
And then what you are doing in semantic parsing
with frame semantics you first try to determine.

0:50:22.902 --> 0:50:30.494
Which frames are happening in the sentence,
so if it's something with Bowie buying you

0:50:30.494 --> 0:50:33.025
would try to first identify.

0:50:33.025 --> 0:50:40.704
Oh, here we have to try Brain B, which does
not always have to be indicated by the verb

0:50:40.704 --> 0:50:42.449
cell or other ways.

0:50:42.582 --> 0:50:52.515
And then you try to find out which elements
of these frame are in the sentence and try

0:50:52.515 --> 0:50:54.228
to align them.

0:50:56.856 --> 0:51:01.121
Yeah, you have, for example, to buy and sell.

0:51:01.121 --> 0:51:07.239
If you have a model that has frames, they
have the same elements.

0:51:09.829 --> 0:51:15.018
In addition over like sentence, then you have
also a phenomenon beyond sentence level.

0:51:15.018 --> 0:51:20.088
We're coming to this later because it's a
special challenge for machine translation.

0:51:20.088 --> 0:51:22.295
There is, for example, co reference.

0:51:22.295 --> 0:51:27.186
That means if you first mention it, it's like
the President of the United States.

0:51:27.467 --> 0:51:30.107
And later you would refer to him maybe as
he.

0:51:30.510 --> 0:51:36.966
And that is especially challenging in machine
translation because you're not always using

0:51:36.966 --> 0:51:38.114
the same thing.

0:51:38.114 --> 0:51:44.355
Of course, for the president, it's he and
air in German, but for other things it might

0:51:44.355 --> 0:51:49.521
be different depending on the gender in languages
that you refer to it.

0:51:55.435 --> 0:52:03.866
So much for the background and the next, we
want to look based on the knowledge we have

0:52:03.866 --> 0:52:04.345
now.

0:52:04.345 --> 0:52:10.285
Why is machine translation difficult before
we have any more?

0:52:16.316 --> 0:52:22.471
The first type of problem is what we refer
to as translation divers.

0:52:22.471 --> 0:52:30.588
That means that we have the same information
in source and target, but the problem is that

0:52:30.588 --> 0:52:33.442
they are expressed differently.

0:52:33.713 --> 0:52:42.222
So it is not the same way, and we have to
translate these things more easily by just

0:52:42.222 --> 0:52:44.924
having a bit more complex.

0:52:45.325 --> 0:52:51.324
So example is if it's only a structure in
English, the delicious.

0:52:51.324 --> 0:52:59.141
The adjective is before the noun, while in
Spanish you have to put it after the noun,

0:52:59.141 --> 0:53:02.413
and so you have to change the word.

0:53:02.983 --> 0:53:10.281
So there are different ways of divergence,
so there can be structural divergence, which

0:53:10.281 --> 0:53:10.613
is.

0:53:10.550 --> 0:53:16.121
The word orders so that the order is different,
so in German we have that especially in the

0:53:16.121 --> 0:53:19.451
in the sub clause, while in English in the
sub clause.

0:53:19.451 --> 0:53:24.718
The verb is also at the second position, in
German it's at the end, and so you have to

0:53:24.718 --> 0:53:25.506
move it all.

0:53:25.465 --> 0:53:27.222
Um All Over.

0:53:27.487 --> 0:53:32.978
It can be that that it's a complete different
grammatical role.

0:53:33.253 --> 0:53:35.080
So,.

0:53:35.595 --> 0:53:37.458
You Have You Like Her.

0:53:38.238 --> 0:53:41.472
And eh in in.

0:53:41.261 --> 0:53:47.708
English: In Spanish it's a la ti gusta which
means she so now she is no longer like object

0:53:47.708 --> 0:53:54.509
but she is subject here and you are now acquisitive
and then pleases or like yeah so you really

0:53:54.509 --> 0:53:58.689
use a different sentence structure and you
have to change.

0:53:59.139 --> 0:54:03.624
Can also be the head switch.

0:54:03.624 --> 0:54:09.501
In English you say the baby just ate.

0:54:09.501 --> 0:54:16.771
In Spanish literary you say the baby finishes.

0:54:16.997 --> 0:54:20.803
So the is no longer the word, but the finishing
is the word.

0:54:21.241 --> 0:54:30.859
So you have to learn so you cannot always
have the same structures in your input and

0:54:30.859 --> 0:54:31.764
output.

0:54:36.856 --> 0:54:42.318
Lexical things like to swim across or to cross
swimming.

0:54:43.243 --> 0:54:57.397
You have categorical like an adjective gets
into a noun, so you have a little bread to

0:54:57.397 --> 0:55:00.162
make a decision.

0:55:00.480 --> 0:55:15.427
That is the one challenge and the even bigger
challenge is referred to as translation.

0:55:17.017 --> 0:55:19.301
That can be their lexical mismatch.

0:55:19.301 --> 0:55:21.395
That's the fish we talked about.

0:55:21.395 --> 0:55:27.169
If it's like the, the fish you eat or the
fish which is living is the two different worlds

0:55:27.169 --> 0:55:27.931
in Spanish.

0:55:28.108 --> 0:55:34.334
And then that's partly sometimes even not
known, so even the human might not be able

0:55:34.334 --> 0:55:34.627
to.

0:55:34.774 --> 0:55:40.242
Infer that you maybe need to see the context
you maybe need to have the sentences around,

0:55:40.242 --> 0:55:45.770
so one problem is that at least traditional
machine translation works on a sentence level,

0:55:45.770 --> 0:55:51.663
so we take each sentence and translate it independent
of everything else, but that's, of course,

0:55:51.663 --> 0:55:52.453
not correct.

0:55:52.532 --> 0:55:59.901
Will look into some ways of looking at and
doing document-based machine translation, but.

0:56:00.380 --> 0:56:06.793
There's gender information might be a problem,
so in English it's player and you don't know

0:56:06.793 --> 0:56:10.139
if it's Spieler Spielerin or if it's not known.

0:56:10.330 --> 0:56:15.770
But in the English, if you now generate German,
you should know is the reader.

0:56:15.770 --> 0:56:21.830
Does he know the gender or does he not know
the gender and then generate the right one?

0:56:22.082 --> 0:56:38.333
So just imagine a commentator if he's talking
about the player and you can see if it's male

0:56:38.333 --> 0:56:40.276
or female.

0:56:40.540 --> 0:56:47.801
So in generally the problem is that if you
have less information and you need more information

0:56:47.801 --> 0:56:51.928
in your target, this translation doesn't really
work.

0:56:55.175 --> 0:56:59.180
Another problem is we just talked about the
the.

0:56:59.119 --> 0:57:01.429
The co reference.

0:57:01.641 --> 0:57:08.818
So if you refer to an object and that can
be across sentence boundaries then you have

0:57:08.818 --> 0:57:14.492
to use the right pronoun and you cannot just
translate the pronoun.

0:57:14.492 --> 0:57:18.581
If the baby does not thrive on raw milk boil
it.

0:57:19.079 --> 0:57:28.279
And if you are now using it and just take
the typical translation, it will be: And That

0:57:28.279 --> 0:57:31.065
Will Be Ah Wrong.

0:57:31.291 --> 0:57:35.784
No, that will be even right because it is
dust baby.

0:57:35.784 --> 0:57:42.650
Yes, but I mean, you have to determine that
and it might be wrong at some point.

0:57:42.650 --> 0:57:48.753
So getting this this um yeah, it will be wrong
yes, that is right yeah.

0:57:48.908 --> 0:57:55.469
Because in English both are baby and milk,
and baby are both referred to it, so if you

0:57:55.469 --> 0:58:02.180
do S it will be to the first one referred to,
so it's correct, but in Germany it will be

0:58:02.180 --> 0:58:06.101
S, and so if you translate it as S it will
be baby.

0:58:06.546 --> 0:58:13.808
But you have to do Z because milk is female,
although that is really very uncommon because

0:58:13.808 --> 0:58:18.037
maybe a model is an object and so it should
be more.

0:58:18.358 --> 0:58:25.176
Of course, I agree there might be a situation
which is a bit created and not a common thing,

0:58:25.176 --> 0:58:29.062
but you can see that these things are not that
easy.

0:58:29.069 --> 0:58:31.779
Another example is this: Dr.

0:58:31.779 --> 0:58:37.855
McLean often brings his dog champion to visit
with his patients.

0:58:37.855 --> 0:58:41.594
He loves to give big wets loppy kisses.

0:58:42.122 --> 0:58:58.371
And there, of course, it's also important
if he refers to the dog or to the doctor.

0:58:59.779 --> 0:59:11.260
Another example of challenging is that we
don't have a fixed language and that was referred

0:59:11.260 --> 0:59:16.501
to morphology and we can build new words.

0:59:16.496 --> 0:59:23.787
So we can in all languages build new words
by just concatinating part of it like braxits,

0:59:23.787 --> 0:59:30.570
some things like: And then, of course, also
words don't exist in languages, don't exist

0:59:30.570 --> 0:59:31.578
in isolations.

0:59:32.012 --> 0:59:41.591
In Germany you can now use the word download
somewhere and you can also use a morphological

0:59:41.591 --> 0:59:43.570
operation on that.

0:59:43.570 --> 0:59:48.152
I guess there is even not the correct word.

0:59:48.508 --> 0:59:55.575
But so you have to deal with these things,
and yeah, in social meters.

0:59:55.996 --> 1:00:00.215
This word is maybe most of you have forgotten
already.

1:00:00.215 --> 1:00:02.517
This was ten years ago or so.

1:00:02.517 --> 1:00:08.885
I don't know there was a volcano in Iceland
which stopped Europeans flying around.

1:00:09.929 --> 1:00:14.706
So there is always new words coming up and
you have to deal with.

1:00:18.278 --> 1:00:24.041
Yeah, one last thing, so some of these examples
we have seen are a bit artificial.

1:00:24.041 --> 1:00:30.429
So one example what is very common with machine
translation doesn't really work is this box

1:00:30.429 --> 1:00:31.540
was in the pen.

1:00:32.192 --> 1:00:36.887
And maybe you would be surprised, at least
when read it.

1:00:36.887 --> 1:00:39.441
How can a box be inside a pen?

1:00:40.320 --> 1:00:44.175
Does anybody have a solution for that while
the sentence is still correct?

1:00:47.367 --> 1:00:51.692
Maybe it's directly clear for you, maybe your
English was aside, yeah.

1:00:54.654 --> 1:01:07.377
Yes, like at a farm or for small children,
and that is also called a pen or a pen on a

1:01:07.377 --> 1:01:08.254
farm.

1:01:08.368 --> 1:01:12.056
And then this is, and so you can mean okay.

1:01:12.056 --> 1:01:16.079
To infer these two meanings is quite difficult.

1:01:16.436 --> 1:01:23.620
But at least when I saw it, I wasn't completely
convinced because it's maybe not the sentence

1:01:23.620 --> 1:01:29.505
you're using in your daily life, and some of
these constructions seem to be.

1:01:29.509 --> 1:01:35.155
They are very good in showing where the problem
is, but the question is, does it really imply

1:01:35.155 --> 1:01:35.995
in real life?

1:01:35.996 --> 1:01:42.349
And therefore here some examples also that
we had here with a lecture translator that

1:01:42.349 --> 1:01:43.605
really occurred.

1:01:43.605 --> 1:01:49.663
They maybe looked simple, but you will see
that some of them still are happening.

1:01:50.050 --> 1:01:53.948
And they are partly about spitting words,
and then they are happening.

1:01:54.294 --> 1:01:56.816
So Um.

1:01:56.596 --> 1:02:03.087
We had a text about the numeral system in
German, the Silen system, which got splitted

1:02:03.087 --> 1:02:07.041
into sub parts because otherwise we can't translate.

1:02:07.367 --> 1:02:14.927
And then he did only a proximate match and
was talking about the binary payment system

1:02:14.927 --> 1:02:23.270
because the payment system was a lot more common
in the training data than the Thailand system.

1:02:23.823 --> 1:02:29.900
And so there you see like rare words, which
don't occur that often.

1:02:29.900 --> 1:02:38.211
They are very challenging to deal with because
we are good and inferring that sometimes, but

1:02:38.211 --> 1:02:41.250
for others that's very difficult.

1:02:44.344 --> 1:02:49.605
Another challenge is that, of course, the
context is very difficult.

1:02:50.010 --> 1:02:56.448
This is also an example a bit older from also
the lecture translators we were translating

1:02:56.448 --> 1:03:01.813
in mass lecture, and he was always talking
about the omens of the numbers.

1:03:02.322 --> 1:03:11.063
Which doesn't make any sense at all, but the
German word fortsizing can of course mean the

1:03:11.063 --> 1:03:12.408
sign and the.

1:03:12.732 --> 1:03:22.703
And if you not have the right to main knowledge
in there and encode it, it might use the main

1:03:22.703 --> 1:03:23.869
knowledge.

1:03:25.705 --> 1:03:31.205
A more recent version of that is like here
from a paper where it's about translating.

1:03:31.205 --> 1:03:36.833
We had this pivot based translation where
you translate maybe to English and to another

1:03:36.833 --> 1:03:39.583
because you have not enough training data.

1:03:40.880 --> 1:03:48.051
And we did that from Dutch to German guess
if you don't understand Dutch, if you speak

1:03:48.051 --> 1:03:48.710
German.

1:03:48.908 --> 1:03:56.939
So we have this raven forebuilt, which means
to geben in English.

1:03:56.939 --> 1:04:05.417
It's correctly in setting an example: However,
if we're then translate to German, he didn't

1:04:05.417 --> 1:04:11.524
get the full context, and in German you normally
don't set an example, but you give an example,

1:04:11.524 --> 1:04:16.740
and so yes, going through another language
you introduce their additional errors.

1:04:19.919 --> 1:04:27.568
Good so much for this are there more questions
about why this is difficult.

1:04:30.730 --> 1:04:35.606
Then we'll start with this one.

1:04:35.606 --> 1:04:44.596
I have to leave a bit early today in a quarter
of an hour.

1:04:44.904 --> 1:04:58.403
If you look about linguistic approaches to
machine translation, they are typically described

1:04:58.403 --> 1:05:03.599
by: So we can do a direct translation, so you
take the Suez language.

1:05:03.599 --> 1:05:09.452
Do not apply a lot of the analysis we were
discussing today about syntax representation,

1:05:09.452 --> 1:05:11.096
semantic representation.

1:05:11.551 --> 1:05:14.678
But you directly translate to your target
text.

1:05:14.678 --> 1:05:16.241
That's here the direct.

1:05:16.516 --> 1:05:19.285
Then there is a transfer based approach.

1:05:19.285 --> 1:05:23.811
Then you transfer everything over and you
do the text translation.

1:05:24.064 --> 1:05:28.354
And you can do that at two levels, more at
the syntax level.

1:05:28.354 --> 1:05:34.683
That means you only do synthetic analysts
like you do a pasture or so, or at the semantic

1:05:34.683 --> 1:05:37.848
level where you do a semantic parsing frame.

1:05:38.638 --> 1:05:51.489
Then there is an interlingua based approach
where you don't do any transfer anymore, but

1:05:51.489 --> 1:05:55.099
you only do an analysis.

1:05:57.437 --> 1:06:02.790
So how does now the direct transfer, the direct
translation?

1:06:03.043 --> 1:06:07.031
Look like it's one of the earliest approaches.

1:06:07.327 --> 1:06:18.485
So you do maybe some morphological analysts,
but not a lot, and then you do this bilingual

1:06:18.485 --> 1:06:20.202
word mapping.

1:06:20.540 --> 1:06:25.067
You might do some here in generations.

1:06:25.067 --> 1:06:32.148
These two things are not really big, but you
are working on.

1:06:32.672 --> 1:06:39.237
And of course this might be a first easy solution
about all the challenges we have seen that

1:06:39.237 --> 1:06:41.214
the structure is different.

1:06:41.214 --> 1:06:45.449
That you have to reorder, look at the agreement,
then work.

1:06:45.449 --> 1:06:47.638
That's why the first approach.

1:06:47.827 --> 1:06:54.618
So if we have different word order, structural
shifts or idiomatic expressions that doesn't

1:06:54.618 --> 1:06:55.208
really.

1:06:57.797 --> 1:07:05.034
Then there are these rule based approaches
which were more commonly used.

1:07:05.034 --> 1:07:15.249
They might still be somewhere: Mean most commonly
they are now used by neural networks but wouldn't

1:07:15.249 --> 1:07:19.254
be sure there is no system out there but.

1:07:19.719 --> 1:07:25.936
And in this transfer based approach we have
these steps there nicely visualized in the.

1:07:26.406 --> 1:07:32.397
Triangle, so we have the analytic of the sur
sentence where we then get some type of abstract

1:07:32.397 --> 1:07:33.416
representation.

1:07:33.693 --> 1:07:40.010
Then we are doing the transfer of the representation
of the source sentence into the representation

1:07:40.010 --> 1:07:40.263
of.

1:07:40.580 --> 1:07:46.754
And then we have the generation where we take
this abstract representation and do then the

1:07:46.754 --> 1:07:47.772
surface forms.

1:07:47.772 --> 1:07:54.217
For example, it might be that there is no
morphological variants in the episode representation

1:07:54.217 --> 1:07:56.524
and we have to do this agreement.

1:07:56.656 --> 1:08:00.077
Which components do you they need?

1:08:01.061 --> 1:08:08.854
You need monolingual source and target lexicon
and the corresponding grammars in order to

1:08:08.854 --> 1:08:12.318
do both the analyst and the generation.

1:08:12.412 --> 1:08:18.584
Then you need the bilingual dictionary in
order to do the lexical translation and the

1:08:18.584 --> 1:08:25.116
bilingual transfer rules in order to transfer
the grammar, for example in German, into the

1:08:25.116 --> 1:08:28.920
grammar in English, and that enables you to
do that.

1:08:29.269 --> 1:08:32.579
So an example is is something like this here.

1:08:32.579 --> 1:08:38.193
So if you're doing a syntactic transfer it
means you're starting with John E.

1:08:38.193 --> 1:08:38.408
Z.

1:08:38.408 --> 1:08:43.014
Apple you do the analyst then you have this
type of graph here.

1:08:43.014 --> 1:08:48.340
Therefore you need your monolingual lexicon
and your monolingual grammar.

1:08:48.748 --> 1:08:59.113
Then you're doing the transfer where you're
transferring this representation into this

1:08:59.113 --> 1:09:01.020
representation.

1:09:01.681 --> 1:09:05.965
So how could this type of translation then
look like?

1:09:07.607 --> 1:09:08.276
Style.

1:09:08.276 --> 1:09:14.389
We have the example of a delicious soup and
una soup deliciosa.

1:09:14.894 --> 1:09:22.173
This is your source language tree and this
is your target language tree and then the rules

1:09:22.173 --> 1:09:26.092
that you need are these ones to do the transfer.

1:09:26.092 --> 1:09:31.211
So if you have a noun phrase that also goes
to the noun phrase.

1:09:31.691 --> 1:09:44.609
You see here that the switch is happening,
so the second position is here at the first

1:09:44.609 --> 1:09:46.094
position.

1:09:46.146 --> 1:09:52.669
Then you have the translation of determiner
of the words, so the dictionary entries.

1:09:53.053 --> 1:10:07.752
And with these types of rules you can then
do these mappings and do the transfer between

1:10:07.752 --> 1:10:11.056
the representation.

1:10:25.705 --> 1:10:32.505
Think it more depends on the amount of expertise
you have in representing them.

1:10:32.505 --> 1:10:35.480
The rules will get more difficult.

1:10:36.136 --> 1:10:42.445
For example, these rule based were, so I think
it more depends on how difficult the structure

1:10:42.445 --> 1:10:42.713
is.

1:10:42.713 --> 1:10:48.619
So for German generating German they were
quite long, quite successful because modeling

1:10:48.619 --> 1:10:52.579
all the German phenomena which are in there
was difficult.

1:10:52.953 --> 1:10:56.786
And that can be done there, and it wasn't
easy to learn that just from data.

1:10:59.019 --> 1:11:07.716
Think even if you think about Chinese and
English or so, if you have the trees there

1:11:07.716 --> 1:11:10.172
is quite some rule and.

1:11:15.775 --> 1:11:23.370
Another thing is you can also try to do something
like that on the semantic, which means this

1:11:23.370 --> 1:11:24.905
gets more complex.

1:11:25.645 --> 1:11:31.047
This gets maybe a bit easier because this
representation, the semantic representation

1:11:31.047 --> 1:11:36.198
between languages, are more similar and therefore
this gets more difficult again.

1:11:36.496 --> 1:11:45.869
So typically if you go higher in your triangle
this is more work while this is less work.

1:11:49.729 --> 1:11:56.023
So it can be then, for example, like in Gusta,
we have again that the the the order changes.

1:11:56.023 --> 1:12:02.182
So you see the transfer rule for like is that
the first argument is here and the second is

1:12:02.182 --> 1:12:06.514
there, while on the on the Gusta side here
the second argument.

1:12:06.466 --> 1:12:11.232
It is in the first position and the first
argument is in the second position.

1:12:11.511 --> 1:12:14.061
So that you do yeah, and also there you're
ordering,.

1:12:14.354 --> 1:12:20.767
From the principle it is more like you have
a different type of formalism of representing

1:12:20.767 --> 1:12:27.038
your sentence and therefore you need to do
more on one side and less on the other side.

1:12:32.852 --> 1:12:42.365
Then so in general transfer based approaches
are you have to first select how to represent

1:12:42.365 --> 1:12:44.769
a synthetic structure.

1:12:45.165 --> 1:12:55.147
There's like these variable abstraction levels
and then you have the three components: The

1:12:55.147 --> 1:13:04.652
disadvantage is that on the one hand you need
normally a lot of experts monolingual experts

1:13:04.652 --> 1:13:08.371
who analyze how to do the transfer.

1:13:08.868 --> 1:13:18.860
And if you're doing a new language, you have
to do analyst transfer in generation and the

1:13:18.860 --> 1:13:19.970
transfer.

1:13:20.400 --> 1:13:27.074
So if you need one language, add one language
in existing systems, of course you have to

1:13:27.074 --> 1:13:29.624
do transfer to all the languages.

1:13:32.752 --> 1:13:39.297
Therefore, the other idea which people were
interested in is the interlingua based machine

1:13:39.297 --> 1:13:40.232
translation.

1:13:40.560 --> 1:13:47.321
Where the idea is that we have this intermediate
language with this abstract language independent

1:13:47.321 --> 1:13:53.530
representation and so the important thing is
it's language independent so it's really the

1:13:53.530 --> 1:13:59.188
same for all language and it's a pure meaning
and there is no ambiguity in there.

1:14:00.100 --> 1:14:05.833
That allows this nice translation without
transfer, so you just do an analysis into your

1:14:05.833 --> 1:14:11.695
representation, and there afterwards you do
the generation into the other target language.

1:14:13.293 --> 1:14:16.953
And that of course makes especially multilingual.

1:14:16.953 --> 1:14:19.150
It's like somehow is a dream.

1:14:19.150 --> 1:14:25.519
If you want to add a language you just need
to add one analyst tool and one generation

1:14:25.519 --> 1:14:25.959
tool.

1:14:29.249 --> 1:14:32.279
Which is not the case in the other scenario.

1:14:33.193 --> 1:14:40.547
However, the big challenge is in this case
the interlingua based representation because

1:14:40.547 --> 1:14:47.651
you need to represent all different types of
knowledge in there in order to do that.

1:14:47.807 --> 1:14:54.371
And also like world knowledge, so something
like an apple is a fruit and property is a

1:14:54.371 --> 1:14:57.993
fruit, so they are eatable and stuff like that.

1:14:58.578 --> 1:15:06.286
So that is why this is typically always only
done for small amounts of data.

1:15:06.326 --> 1:15:13.106
So what people have done for special applications
like hotel reservation people have looked into

1:15:13.106 --> 1:15:18.348
that, but they have typically not done it for
any possibility of doing it.

1:15:18.718 --> 1:15:31.640
So the advantage is you need to represent
all the world knowledge in your interlingua.

1:15:32.092 --> 1:15:40.198
And that is not possible at the moment or
never was possible so far.

1:15:40.198 --> 1:15:47.364
Typically they were for small domains for
hotel reservation.

1:15:51.431 --> 1:15:57.926
But of course this idea of doing that and
that's why some people are interested in is

1:15:57.926 --> 1:16:04.950
like if you now do a neural system where you
learn the representation in your neural network

1:16:04.950 --> 1:16:07.442
is that some type of artificial.

1:16:08.848 --> 1:16:09.620
Interlingua.

1:16:09.620 --> 1:16:15.025
However, what we at least found out until
now is that there's often very language specific

1:16:15.025 --> 1:16:15.975
information in.

1:16:16.196 --> 1:16:19.648
And they might be important and essential.

1:16:19.648 --> 1:16:26.552
You don't have all the information in your
input, so you typically can't do resolving

1:16:26.552 --> 1:16:32.412
all ambiguities inside there because you might
not have all information.

1:16:32.652 --> 1:16:37.870
So in English you don't know if it's a living
fish or the fish which you're eating, and if

1:16:37.870 --> 1:16:43.087
you're translating to Germany you also don't
have to resolve this problem because you have

1:16:43.087 --> 1:16:45.610
the same ambiguity in your target language.

1:16:45.610 --> 1:16:50.828
So why would you put in our effort in finding
out if it's a dish or the other fish if it's

1:16:50.828 --> 1:16:52.089
not necessary at all?

1:16:54.774 --> 1:16:59.509
Yeah Yeah.

1:17:05.585 --> 1:17:15.019
The semantic transfer is not the same for
both languages, so you still represent the

1:17:15.019 --> 1:17:17.127
semantic language.

1:17:17.377 --> 1:17:23.685
So you have the like semantic representation
in the Gusta, but that's not the same as semantic

1:17:23.685 --> 1:17:28.134
representation for both languages, and that's
the main difference.

1:17:35.515 --> 1:17:44.707
Okay, then these are the most important things
for today: what is language and how our rule

1:17:44.707 --> 1:17:46.205
based systems.

1:17:46.926 --> 1:17:59.337
And if there is no more questions thank you
for joining, we have today a bit of a shorter

1:17:59.337 --> 1:18:00.578
lecture.