All language subtitles for 002 Exploring the Dataset (SQuAD)_en

af Afrikaans
ak Akan
sq Albanian
am Amharic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranรฎ)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:10,980 --> 00:00:16,020 So in this lecture we will discuss the squad data set, which is the dataset we'll be using for this 2 00:00:16,020 --> 00:00:16,740 task. 3 00:00:17,340 --> 00:00:19,470 Squad simply stands for Stanford. 4 00:00:19,470 --> 00:00:21,120 Question Answering Data Set. 5 00:00:21,660 --> 00:00:26,700 Now, as you already know, the task of question answering is still constrained in a few ways. 6 00:00:27,060 --> 00:00:32,880 We can't yet give a neural network a big database of knowledge and just ask it any question we want. 7 00:00:33,210 --> 00:00:35,730 Instead, we do what is called extractive. 8 00:00:35,730 --> 00:00:36,870 Question Answering. 9 00:00:37,650 --> 00:00:43,380 What this means is that we're going to give the network a pair of texts, namely the question and the 10 00:00:43,380 --> 00:00:46,500 context which contains the answer to the question. 11 00:00:46,890 --> 00:00:50,850 In that way, the answer is always a substring of the context. 12 00:00:51,390 --> 00:00:57,780 Note also that because of this, the network never has to actually generate any text, so we don't require 13 00:00:57,780 --> 00:00:59,700 an encoder decoder setup. 14 00:01:04,209 --> 00:01:09,100 Now there are some details that will become important when you want to actually write the code. 15 00:01:09,520 --> 00:01:12,190 Firstly, let's look at how we will load in the data. 16 00:01:12,610 --> 00:01:18,070 As you can see, we just call the standard function load data set passing in the string squad. 17 00:01:18,430 --> 00:01:25,060 The data set comes with five columns which are ID title, context, question and answers. 18 00:01:25,390 --> 00:01:27,850 Note that the title is pretty much irrelevant. 19 00:01:28,540 --> 00:01:34,450 Interestingly, ID is not irrelevant, which may seem strange since we've ignored it up until this point. 20 00:01:34,630 --> 00:01:36,940 We'll discuss this more when the time comes. 21 00:01:41,530 --> 00:01:43,570 So let's look at some examples. 22 00:01:43,780 --> 00:01:45,400 Here's a context. 23 00:01:45,430 --> 00:01:49,420 It says, Architecturally, the school has a Catholic character. 24 00:01:49,660 --> 00:01:55,090 Atop the main building is Gold Dome, is a golden statue of the Virgin Mary, and so on and so forth. 25 00:01:55,480 --> 00:01:59,200 The question is what is in front of the Notre Dame main building? 26 00:01:59,530 --> 00:02:02,920 And the corresponding answer is a copper statue of Christ. 27 00:02:04,660 --> 00:02:07,450 Note that the answer seems to have a funny format. 28 00:02:07,840 --> 00:02:10,300 Firstly, the text is stored in a list. 29 00:02:10,750 --> 00:02:12,640 We'll see why that makes sense shortly. 30 00:02:14,280 --> 00:02:20,280 Furthermore, we see that in addition to the text, we also get the position of the start of the answer 31 00:02:20,280 --> 00:02:21,900 in terms of characters. 32 00:02:22,380 --> 00:02:25,680 As you recall, a string is simply an array of characters. 33 00:02:25,680 --> 00:02:30,990 So if you think of the context as an array of characters, this would be the index of the start of the 34 00:02:30,990 --> 00:02:31,650 answer. 35 00:02:33,590 --> 00:02:38,420 So for this example, the corresponding title happens to be University of Notre Dame. 36 00:02:39,530 --> 00:02:43,520 As you can see, this is irrelevant for finding the answer to the question. 37 00:02:48,000 --> 00:02:51,750 What should strike you as interesting is that the answer column is plural. 38 00:02:52,620 --> 00:02:57,150 This implies that there can potentially be multiple answers to the same question. 39 00:02:57,570 --> 00:03:00,870 This also explains why the answer data is stored in lists. 40 00:03:01,350 --> 00:03:02,880 Now, how can this be? 41 00:03:03,270 --> 00:03:07,110 Well, consider the question where did Super Bowl 50 take place? 42 00:03:07,410 --> 00:03:10,410 One possible answer is Santa Clara, California. 43 00:03:10,440 --> 00:03:11,880 This is a true fact. 44 00:03:12,960 --> 00:03:15,960 But another possible answer is Levi's Stadium. 45 00:03:15,990 --> 00:03:17,730 This is also a true fact. 46 00:03:18,630 --> 00:03:21,780 So how can one question have multiple answers? 47 00:03:22,050 --> 00:03:24,220 Well, here's the context where this came from. 48 00:03:24,240 --> 00:03:32,160 It says The game was played on February seven, 2016 at Levi's Stadium in the San Francisco Bay area 49 00:03:32,160 --> 00:03:33,900 at Santa Clara, California. 50 00:03:34,350 --> 00:03:38,400 Depending on how you interpret this question, both answers would be valid. 51 00:03:38,940 --> 00:03:43,530 So this is an example of where the same question can have multiple valid answers. 52 00:03:45,310 --> 00:03:51,220 Now, oddly, this data set is built such that for some questions, the exact same answer can appear 53 00:03:51,220 --> 00:03:52,390 multiple times. 54 00:03:52,420 --> 00:03:54,220 I'm not sure why that is. 55 00:03:54,820 --> 00:03:59,280 Finally, note that this only happens for the validation set for the train set. 56 00:03:59,290 --> 00:04:01,280 Although the column is called Answers. 57 00:04:01,300 --> 00:04:03,490 There is only one answer per sample. 58 00:04:03,790 --> 00:04:09,190 Since our neural networks loss function is only built for one target per input, this is a good thing 59 00:04:09,190 --> 00:04:14,530 since it means we don't have to do any extra work to split up multiple answers into separate training 60 00:04:14,530 --> 00:04:15,400 samples. 5820

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.