All language subtitles for 021 Action Selection Policies-subtitle-en

af Afrikaans
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
ur Urdu Download
uz Uzbek
vi Vietnamese
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,590 --> 00:00:03,970 Hello and welcome back to the course on artificial intelligence. 2 00:00:04,070 --> 00:00:05,420 I hope you're enjoying the course so far. 3 00:00:05,420 --> 00:00:09,050 And today we're talking about action the selection policies. 4 00:00:09,050 --> 00:00:11,010 All right let's get straight into it. 5 00:00:11,030 --> 00:00:17,930 Previously we talked about adding a neural network to our simple learning and so far we are getting 6 00:00:18,020 --> 00:00:21,230 quite into deep learning. 7 00:00:21,230 --> 00:00:26,620 We've talked about the learning part quite a bit including adding some elements to it. 8 00:00:26,630 --> 00:00:30,020 And today we're talking about this part we're talking about the acting. 9 00:00:30,020 --> 00:00:31,290 So let's have a look. 10 00:00:31,310 --> 00:00:38,690 So here we've got what we discussed about acting that once you input the values the parameters are the 11 00:00:38,690 --> 00:00:45,230 vector describing the state agent is clearly in that environment then that is after all the learning 12 00:00:45,230 --> 00:00:47,290 is done or even before the learning is done. 13 00:00:47,420 --> 00:00:52,000 Basically we get all the q values so we're not interested in the learning right now we insist on acting 14 00:00:52,010 --> 00:00:57,350 so once we have these key values how do we understand which one we need to use. 15 00:00:57,350 --> 00:00:58,910 Well if you think about it. 16 00:00:58,910 --> 00:01:01,890 Q values are simply predictions for the cube. 17 00:01:01,910 --> 00:01:08,630 So as we did in the simple learning algorithm what did we do we just selected the one with the best 18 00:01:09,180 --> 00:01:10,420 of the highest value. 19 00:01:10,430 --> 00:01:15,380 Once we have the one with the highest IQ value we just take that action because it just brings us the 20 00:01:15,380 --> 00:01:20,330 highest value and that we know that Duval's calculator's immediate reward that we expect to receive 21 00:01:20,360 --> 00:01:23,100 Plus the DK factor times the value of the next date. 22 00:01:23,120 --> 00:01:29,480 And it's a recursive calculation so why not why wouldn't you take the best value and that's kind of 23 00:01:29,480 --> 00:01:30,570 the end of it. 24 00:01:30,800 --> 00:01:35,360 But as you can see here it's not as simple here we're using a soft max function and this is where we're 25 00:01:35,360 --> 00:01:37,910 going to talk about actual selection policies. 26 00:01:37,940 --> 00:01:41,210 So here in reality we don't have to have just a software function. 27 00:01:41,300 --> 00:01:49,190 We can have different action selection policies for example we've got Epsilon greedy Epsilon's soft 28 00:01:49,470 --> 00:01:54,950 and we've got the soft Macs and those are kind of like the most commonly used action selection policies 29 00:01:54,960 --> 00:01:56,300 of course there are others. 30 00:01:56,300 --> 00:02:02,120 For instance the most basic one is a very simple action sociables it just select the best. 31 00:02:02,120 --> 00:02:03,770 The one with the highest Q value. 32 00:02:03,980 --> 00:02:09,800 But why doesn't that action pulse fly and why do we have different types of action pulse action selection 33 00:02:09,800 --> 00:02:10,510 policies. 34 00:02:10,520 --> 00:02:15,270 Well it all boils down to exploration versus exploitation. 35 00:02:15,560 --> 00:02:22,670 And that is the core of reinforcement learning because we already talked about this a little bit that 36 00:02:22,880 --> 00:02:28,400 your agent when it's operating in an environment it might predict certain queue values which might be 37 00:02:28,400 --> 00:02:34,970 good and it might turn out great it might turn out that those are available and will be forced to explore. 38 00:02:34,970 --> 00:02:40,640 So if we for instance in this case predict that Q2 is the best one and then it takes Q To takes action 39 00:02:40,640 --> 00:02:42,350 to and it. 40 00:02:42,500 --> 00:02:46,880 So from here to Section 2 and then it gets it gets a very negative reward. 41 00:02:46,880 --> 00:02:51,980 Then the environment is forcing the agent to go and explode because now he's going to learn that oh 42 00:02:51,980 --> 00:02:56,740 actually I thought Q2 was going to be very good but it turned out very bad. 43 00:02:56,780 --> 00:02:58,370 So the results are not very bad. 44 00:02:58,370 --> 00:03:02,730 So the networks can update itself so next time he's in the state he's going to probably eat my soul 45 00:03:02,720 --> 00:03:04,010 just get to it. 46 00:03:04,190 --> 00:03:09,470 You know like if it is a very very favorable so you might think that that's like you know you might 47 00:03:09,470 --> 00:03:14,900 need a couple of times a couple of penalties or punishments in order to learn it is about action. 48 00:03:14,990 --> 00:03:20,030 But maybe he'll already soon learn that I'm going to take a different action and take the wrist action 49 00:03:20,030 --> 00:03:22,020 because now it has the best value. 50 00:03:22,160 --> 00:03:28,880 So sometimes the environment forces the agent to take different to explore different actions but sometimes 51 00:03:29,180 --> 00:03:36,860 the agent might get it find itself stuck in a local maximum it might find that it followed through through 52 00:03:36,860 --> 00:03:42,110 its initial exploration and found that oh this is a pretty cool action like I'm going to go right here. 53 00:03:42,200 --> 00:03:43,920 And that d'esprit collection. 54 00:03:43,940 --> 00:03:49,760 But the problem is that it thinks is the best action simply because it hasn't explored is explored going 55 00:03:49,760 --> 00:03:55,850 up his nose or going left is explore going right but it hasn't explored going down from that specific 56 00:03:56,360 --> 00:04:01,490 state that it's in and now that it's kind of like biased towards this action and think thinks a good 57 00:04:01,490 --> 00:04:03,800 action is going to keep taking it is going to keep getting. 58 00:04:03,840 --> 00:04:06,570 He's going to keep taking is actually going to keep getting a good reward. 59 00:04:06,620 --> 00:04:14,000 But what if this action would have been even better if this action would have been so much better that 60 00:04:14,060 --> 00:04:19,310 if it knew about this action it would actually switch to this action but because it got stuck in a local 61 00:04:19,310 --> 00:04:23,580 maximum is getting these good rewards is just going to be reinforced. 62 00:04:23,630 --> 00:04:27,770 This is going to keep reinforcing itself that or the violence going to reinforce it that this is a good 63 00:04:27,770 --> 00:04:29,450 action to take keep doing that. 64 00:04:29,510 --> 00:04:35,330 But really the reality is that there's this other action that hasn't found yet or hasn't even explored. 65 00:04:35,570 --> 00:04:37,090 That would have been much better. 66 00:04:37,130 --> 00:04:43,790 So what we want to do is we want to come up with an actual selection policy that allows our agent not 67 00:04:43,910 --> 00:04:45,800 to get stuck in a local maximum. 68 00:04:45,800 --> 00:04:50,120 Yes it's important to you know keep doing the good actions that's the exploitation part. 69 00:04:50,180 --> 00:04:52,000 We won't exploit what we've found. 70 00:04:52,100 --> 00:04:56,720 But at the same time we still want to explore we never want to stop exploring as like in life you never 71 00:04:56,720 --> 00:04:59,000 want to stop learning you stop learning you die. 72 00:04:59,120 --> 00:05:05,030 That's things like that that when you're not growing you're dying or something got so you want to keep 73 00:05:05,090 --> 00:05:07,580 learning and your agent wants to keep learning. 74 00:05:07,760 --> 00:05:10,200 And that's where these action selection policies come in. 75 00:05:10,400 --> 00:05:16,190 So we've got three you listed here so the first one is Epsilon greedy it's a very simple one it sounds 76 00:05:16,190 --> 00:05:22,140 pretty complex in the sense that like it's got a cool name and usually things with surgical names. 77 00:05:22,370 --> 00:05:23,170 It's actually not. 78 00:05:23,180 --> 00:05:31,530 So basically what it does is it will select the one with the best Q value and epsilon like Epsilon you 79 00:05:31,540 --> 00:05:35,240 might hear other places it's just like a selection policy. 80 00:05:35,240 --> 00:05:41,210 So in this case we're using it to slick so our out of Al-Q values are by sales like the one with the 81 00:05:41,540 --> 00:05:45,980 highest Q value all the time except for Epsilon percent of the time. 82 00:05:45,980 --> 00:05:53,300 So for instance if you set epsilon to 10 percent then you're going to or 0.1 than 10 percent of the 83 00:05:53,300 --> 00:05:56,740 time that the action is going to be selected at random. 84 00:05:56,750 --> 00:06:01,990 So 90 percent of the time you're still going to be selecting the best action based on the highest value. 85 00:06:02,120 --> 00:06:05,580 But 10 percent of the time is going to be selecting a random action. 86 00:06:05,600 --> 00:06:11,120 Uniform it is going to be absolutely randomly taking an action or if you said epsilon to zero point 87 00:06:11,420 --> 00:06:18,380 five for 0.05 that means that 95 percent of the time the agent is going to be taking the action with 88 00:06:18,380 --> 00:06:19,200 the highest value. 89 00:06:19,220 --> 00:06:22,470 But 5 percent of the time it's still going to be selecting and random action. 90 00:06:22,490 --> 00:06:25,550 So it's going to be going out there and exploring. 91 00:06:25,790 --> 00:06:31,640 So Epsilon's soft is very similar to the way that does kind of like why it's called FCL greedy because 92 00:06:31,750 --> 00:06:39,780 then you're greedily selecting the action the good action except for that little episode. 93 00:06:39,780 --> 00:06:40,290 Some of the time. 94 00:06:40,280 --> 00:06:46,970 So the lower the EPS deal they'll lower the Lepp Epsilon the more greasily you're selecting that kind 95 00:06:46,970 --> 00:06:53,870 of the action that is the optimal action and the less you're leaving the less chances you leaving for 96 00:06:53,870 --> 00:06:56,000 exploration Epsilon's soft is the opposite. 97 00:06:56,000 --> 00:07:02,000 So basically you're selecting at random you're selecting one minus Epsilon cent of the time. 98 00:07:02,000 --> 00:07:08,240 So if you epsilons like 0.1 to 10 percent then only 10 percent of the time you're taking this action. 99 00:07:08,490 --> 00:07:12,410 And 90 percent of the time you're selecting a random action. 100 00:07:12,410 --> 00:07:19,000 So very very simple just inverted algorithms and a soft Max is kind of like the next step from or it's 101 00:07:19,070 --> 00:07:24,350 it's a more advanced version I would say over epsilon of epsilon greedy algorithm although they both 102 00:07:24,350 --> 00:07:26,570 have merit and they both have a place. 103 00:07:26,610 --> 00:07:30,860 We're going to be using self-finance in our coding in our practical sort of thing. 104 00:07:30,860 --> 00:07:35,270 So that's what we're going to talk in a bit more detail about soft max. 105 00:07:35,330 --> 00:07:36,380 So let's have a look. 106 00:07:36,380 --> 00:07:38,440 So let's move on to your next hopefully. 107 00:07:38,450 --> 00:07:42,800 It's pretty clear about Ebsen agrees it's a pretty straightforward algorithm. 108 00:07:42,800 --> 00:07:45,100 Select this one. 109 00:07:45,230 --> 00:07:47,790 Most of the time except for sometimes go and explore. 110 00:07:47,800 --> 00:07:53,820 And now we also see why it's important to do that exploration so that we don't end up in local maximums 111 00:07:53,840 --> 00:07:58,780 in our in our optimization process so now we're going to talk a bit more about soft Macs. 112 00:07:58,880 --> 00:08:02,680 There's a tutorial on soft marks at the end of the course. 113 00:08:02,750 --> 00:08:09,560 I think it's an annex number two where we talk about the concept of Maxim's because you refresh a little 114 00:08:09,560 --> 00:08:14,650 bit here so there we're talking about neural networks and by the way we're all going to be covering 115 00:08:14,720 --> 00:08:15,290 convolutional. 116 00:08:15,290 --> 00:08:18,170 We're not covering evolution neural networks in this section. 117 00:08:18,210 --> 00:08:21,470 Of course in this section we're still using a vector. 118 00:08:21,800 --> 00:08:27,770 But in the next section of the course when we're we're creating an AI to play Doom we are going to be 119 00:08:27,770 --> 00:08:32,870 using convolutional neural network so it could be beneficial for you to look at in relational neural 120 00:08:32,870 --> 00:08:38,300 networks and then take a self max function or you can learn a bit more about soft Max. 121 00:08:38,300 --> 00:08:43,020 After you take the convolutional neural networks and of course later on. 122 00:08:43,250 --> 00:08:48,130 But here's a quick refresher So here we've got our convolutional neural network which decides whether 123 00:08:48,130 --> 00:08:48,950 it's a dog or cat. 124 00:08:48,950 --> 00:08:56,090 So here we've got the voting process between these neurons and this one says that it's a it's got the 125 00:08:56,090 --> 00:09:04,250 features you know the fluffy ears What's the pointed pointed face type of thing and the kind of the 126 00:09:04,250 --> 00:09:09,930 features are the types of eyes the eye with eyes look all these features that belong to a dog. 127 00:09:09,930 --> 00:09:13,890 So it's a 95 percent chance that it's a dog and the 5 percent chance that it's a cat. 128 00:09:13,910 --> 00:09:19,460 But the question is how did we get in that Tauriel we're talking about how do we get these values to 129 00:09:19,490 --> 00:09:20,530 add up to one. 130 00:09:20,870 --> 00:09:27,650 Well whatever convolutional all our whole neural networks are the convolutional neural network plus 131 00:09:27,650 --> 00:09:33,300 the fully connected Lares whatever it's bad out whatever the values that we apply to soft max function 132 00:09:33,300 --> 00:09:33,980 are here. 133 00:09:34,010 --> 00:09:37,720 This is where we introduced the formula for the soft next function. 134 00:09:37,810 --> 00:09:38,620 Is what it looks like. 135 00:09:38,780 --> 00:09:40,420 And then we got these values. 136 00:09:40,620 --> 00:09:43,460 And so basically that's a quick refresher. 137 00:09:43,460 --> 00:09:46,050 This is the formula for the soft Max. 138 00:09:46,100 --> 00:09:50,900 It's what it does is it takes however many outputs you have doesn't matter. 139 00:09:50,900 --> 00:09:58,130 It will take them and it will squash them all into values between 0 and 1 regardless of how big they 140 00:09:58,130 --> 00:10:03,720 are just by it's for me you can see that there's a total sum at the bottom so these devices are going 141 00:10:03,720 --> 00:10:04,860 to be zero and in. 142 00:10:04,860 --> 00:10:08,630 And also the all these values are going to add up to one always. 143 00:10:08,700 --> 00:10:16,770 And so that's that's very beneficial for us because when we're using the soft max function what happens 144 00:10:16,800 --> 00:10:21,390 is we get these values we select this best view value. 145 00:10:21,390 --> 00:10:26,740 But in reality what happens is these values that we get there are actual numbers right. 146 00:10:26,750 --> 00:10:28,760 So this is some kind of numbers. 147 00:10:28,920 --> 00:10:31,720 They don't have to all add up to one and don't have to be between 0 and 1. 148 00:10:31,730 --> 00:10:32,830 Just some numbers. 149 00:10:33,140 --> 00:10:38,520 But when we apply soft Max we don't just select the best one we actually get numbers like that so we 150 00:10:38,520 --> 00:10:44,310 get our numbers in the range between 0 and 1 and that are also that also add up to 1. 151 00:10:44,310 --> 00:10:47,220 And so what other thing do we know that adds up to one. 152 00:10:47,340 --> 00:10:53,010 Well probabilities we know that probabilities always have to add up to 1 so that is why we can say here 153 00:10:53,010 --> 00:10:57,990 we've got q values but here all of a sudden we've got soft or we've got probabilities. 154 00:10:57,990 --> 00:11:02,740 So we can say that the likelihood of this being the best action is 90 percent. 155 00:11:02,840 --> 00:11:08,610 This lesbian section 5 percent 2 percent 3 percent because we know the higher your value the better 156 00:11:08,610 --> 00:11:09,290 the action. 157 00:11:09,390 --> 00:11:14,920 So if we squash them to 0 to 1 then these become possibilities and we can deal with them as such. 158 00:11:15,090 --> 00:11:22,840 And therefore now is when the action is selected and that's how we come up with Q2. 159 00:11:22,890 --> 00:11:28,580 But if you look at it closely this isn't a strict 100 percent and these are not Saroo 0 percent. 160 00:11:28,590 --> 00:11:30,670 So this is a 5 percent to 3 percent. 161 00:11:30,810 --> 00:11:42,360 So the most natural way to apply the soft Max in order to preserve exploration in the algorithm is to 162 00:11:42,480 --> 00:11:48,600 use these exact probabilities as how often we're going to be taking that action. 163 00:11:48,600 --> 00:11:55,710 So these probabilities actually present the distribution of these actions that we're taking so basically 164 00:11:55,890 --> 00:12:01,740 soft Max makes it very easy for us to come up with a way to combine exploitation and exploration. 165 00:12:01,740 --> 00:12:06,930 So the best the best action will always have the high probability because it has highest Q value and 166 00:12:06,930 --> 00:12:11,190 therefore here we're going to be just going to use these as our distribution or we're going to say okay 167 00:12:11,190 --> 00:12:16,080 we're going to be taking Q2 90 percent of the time but 5 percent of the time we still get to be taking 168 00:12:16,120 --> 00:12:21,170 Q1 and 2 percent of the time we get to 3 and 3 percent of the time we're going to be taking Q4. 169 00:12:21,420 --> 00:12:27,090 And the beauty here is also that as these values update as and as the agent goes through the network 170 00:12:27,090 --> 00:12:35,220 more and more and more it becomes more familiar with with the environment and therefore these updates 171 00:12:35,210 --> 00:12:41,640 so this value for instance might become like it might might ascertain that this value is actually less 172 00:12:41,640 --> 00:12:47,060 or this actually is higher and so these probabilities will also change as an agent goes through. 173 00:12:47,070 --> 00:12:49,190 So even though here we've got Choo-Choo. 174 00:12:49,200 --> 00:12:55,560 Nobody is to say that sometimes 5 percent of the time to be more precise we'll be selecting Q1 as the 175 00:12:55,560 --> 00:13:00,040 action to take and sometimes or action one will be taking action one. 176 00:13:00,180 --> 00:13:05,280 Sometimes will be taking action through a two action three two percent of the time and action for will 177 00:13:05,280 --> 00:13:06,400 be taking about 3 percent. 178 00:13:06,420 --> 00:13:13,800 So every action has a chance to play in this process as long as we have enough iterations an agent goes 179 00:13:13,800 --> 00:13:17,930 through lots and lots of times through these states that they're in. 180 00:13:17,940 --> 00:13:23,880 And that's that's how this that's how any kind of deep learning algorithm works that you want to do 181 00:13:23,880 --> 00:13:30,030 this many many times so that you learn from experience and therefore as you can see here it's a very 182 00:13:30,030 --> 00:13:31,840 natural transition to. 183 00:13:31,860 --> 00:13:37,590 We're not just randomly like an Epson angry algorithm and not just randomly selecting the actions we're 184 00:13:37,590 --> 00:13:44,100 selecting them based on their soft max values which makes it makes it like has some logic behind it 185 00:13:44,190 --> 00:13:48,780 not just not just that random 10 percent of the time we're selecting a random action but there's some 186 00:13:48,780 --> 00:13:53,200 logic behind how we're doing it and based on the key values that we've explored. 187 00:13:53,280 --> 00:13:58,620 And so that's the action selection policy that we're going to be using in this course. 188 00:13:58,620 --> 00:14:04,590 You're welcome to definitely check out Ebsen greedy action section Polsce if you like but we're going 189 00:14:04,590 --> 00:14:10,920 to be predominately using the soft Max action section policy and I've got an interesting reading for 190 00:14:10,920 --> 00:14:11,490 you. 191 00:14:11,490 --> 00:14:17,430 So this is called adaptive Epsilon greedy exploration in reinforcement learning based on value differences 192 00:14:17,430 --> 00:14:18,870 it's the 2010 article. 193 00:14:18,930 --> 00:14:27,270 And it's interesting because Mike Michel I'm not sure how to pronounce Michelle and Miquel toxic introduces 194 00:14:27,450 --> 00:14:36,420 a different type of Algren's and adjusted Epsilon greedy algorithm and called the VDB VDB algorithm 195 00:14:37,230 --> 00:14:40,030 or epsilon greedy VDB algorithm you can see here. 196 00:14:40,410 --> 00:14:46,590 And he actually compares compares to the Ebsen greedy and soft Max and it's an absolute greedy algorithm 197 00:14:46,650 --> 00:14:55,740 which basically the main idea behind it is to adjust the value of epsilon depending on the state the 198 00:14:55,740 --> 00:14:56,550 agent is in. 199 00:14:56,550 --> 00:15:01,820 So if if the agent is very certain about the state in then Epsilon should be smaller so they should 200 00:15:01,820 --> 00:15:06,340 be less exploration if the agent is answered Epson's should be higher should be more exploration. 201 00:15:06,350 --> 00:15:08,930 So it is a 2010 article. 202 00:15:09,260 --> 00:15:17,930 I'm not sure if it's if this new proposed algorithm is widely used or is as being accepted in the community 203 00:15:18,010 --> 00:15:23,090 or or if artificial Times has kind of a way from this this suggestion. 204 00:15:23,090 --> 00:15:29,450 But nevertheless it will definitely help you reinforce your knowledge about action selection policies 205 00:15:29,450 --> 00:15:33,180 which we discussed the Epsom Ingredion the soft Naxal help you ill give you an opportunity to compel 206 00:15:33,200 --> 00:15:38,900 Subha site and also see in which direction people actually think when they want to improve artificial 207 00:15:38,900 --> 00:15:46,040 intelligence so if you're ever planning on creating really interesting algorithms that are pushing the 208 00:15:46,040 --> 00:15:51,770 edge of Elche artificial intelligence and pushing the envelope in this space then this could be a good 209 00:15:52,130 --> 00:16:00,140 way for you to see in which direction people think sometimes when they're trying to improve the norms 210 00:16:00,200 --> 00:16:04,070 of artificial intelligence or the norms that existed back then in 2010. 211 00:16:04,070 --> 00:16:04,760 So there we go. 212 00:16:04,790 --> 00:16:11,020 Hopefully you enjoyed today's tutorial about the action selection policies and we learned about abseil 213 00:16:11,060 --> 00:16:18,240 greedy Epson salt and the soft Macs and now you're even more prepared for the practical side of things. 214 00:16:18,290 --> 00:16:20,840 And on that note I look forward see your next step. 215 00:16:20,840 --> 00:16:22,570 And until then enjoy AI. 25355

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.