All language subtitles for 2. ADVANCE PATTERN MATCHING (REGULAR EXPRESSIONS)

af Afrikaans
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French Download
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 0 1 00:00:00,768 --> 00:00:04,864 Now let's discuss different wildcards of regex expression 1 2 00:00:05,632 --> 00:00:11,520 There are multiple wildcard, we have only selected few important wildcard 2 3 00:00:11,776 --> 00:00:13,824 The first one is the or operator 3 4 00:00:14,080 --> 00:00:17,920 With this pipe symbol you can denoted it as a Or operator 4 5 00:00:18,688 --> 00:00:20,992 The second is the asterisk sign 5 6 00:00:21,504 --> 00:00:26,624 asterisk denotes repetition of previous item 0 or more time 6 7 00:00:27,392 --> 00:00:29,184 Then there is a plus symbol 7 8 00:00:29,440 --> 00:00:35,584 Plus denotes repetition of previous item one or more time don't worry we will discuss this 8 9 00:00:35,840 --> 00:00:38,144 In more detail in our example 9 10 00:00:39,168 --> 00:00:40,704 Next is a ? 10 11 00:00:41,216 --> 00:00:45,568 Question mark denotes repetition of previous item 0 or 1 time 11 12 00:00:46,336 --> 00:00:46,848 Then 12 13 00:00:47,104 --> 00:00:53,248 There is a curly bracket and in curly bracket we can mention any number and this denotes 13 14 00:00:53,504 --> 00:00:57,600 repetition of previous item exactly m number of times 14 15 00:00:58,880 --> 00:01:02,208 So for example if you want a character a 15 16 00:01:02,720 --> 00:01:04,512 To come exactly three times 16 17 00:01:05,280 --> 00:01:06,048 You will denote A 17 18 00:01:06,304 --> 00:01:08,608 Curly bracket 3 18 19 00:01:10,912 --> 00:01:17,056 And if you want A to come m times or more than that then you will write curly brackets 19 20 00:01:17,312 --> 00:01:17,824 M 20 21 00:01:18,080 --> 00:01:18,592 Comma 21 22 00:01:19,104 --> 00:01:23,712 This denotes that the repetition of previous item M or more time 22 23 00:01:24,480 --> 00:01:29,600 Similarly if you want to apply upper limit and lower limit on the repetition you will write 23 24 00:01:30,112 --> 00:01:32,416 Curly bracket m, n 24 25 00:01:33,184 --> 00:01:38,560 Now as I told you earlier tilde operator will match only a part of string 25 26 00:01:39,072 --> 00:01:39,584 Now 26 27 00:01:39,840 --> 00:01:44,448 To denote the start of a string you have to write carat(^) or hat symbol 27 28 00:01:45,216 --> 00:01:48,800 And to denote end of the string you have to write ampersand($) symbol 28 29 00:01:50,336 --> 00:01:54,176 In regular expression you can also use character sets 29 30 00:01:54,432 --> 00:01:59,296 So you can mention all the possible characters in square bracket 30 31 00:02:00,320 --> 00:02:05,184 And SQL will treat it as or operations, so for example if I write 31 32 00:02:05,696 --> 00:02:07,488 Square bracket ABC 32 33 00:02:07,744 --> 00:02:09,024 That means that 33 34 00:02:09,280 --> 00:02:13,632 That place can contain a or b or c 34 35 00:02:14,400 --> 00:02:20,544 Similarly there are short hands also so for example if you want to mention all the characters 35 36 00:02:20,800 --> 00:02:24,384 From A to Z you can just write square bracket 36 37 00:02:25,664 --> 00:02:27,712 A then dash then Z 37 38 00:02:27,968 --> 00:02:32,832 Similarly if you want to mention numeric characters you can write 38 39 00:02:33,088 --> 00:02:35,392 1-9 39 40 00:02:36,160 --> 00:02:40,768 Now the operator to match using the regular expression is tilde sign 40 41 00:02:41,536 --> 00:02:43,584 But tilde sign is case sensitive 41 42 00:02:44,352 --> 00:02:47,680 You can also use tilde and then an asterisk sign 42 43 00:02:48,192 --> 00:02:50,240 To make it case insensitive 43 44 00:02:51,776 --> 00:02:54,336 Now let us look at some examples 44 45 00:02:55,104 --> 00:03:00,736 Suppose we want to identify all the customer where customer name start with a 45 46 00:03:03,040 --> 00:03:04,064 In this case 46 47 00:03:04,832 --> 00:03:10,976 We will write, tilde then * sign that means we are matching using case insensitive 47 48 00:03:11,232 --> 00:03:12,512 Operator 48 49 00:03:14,560 --> 00:03:20,704 Then like any other string we will start mentioning the rules using a single quotation, then we have to 49 50 00:03:20,960 --> 00:03:25,056 to mention the start of string that's why we will mention a ^ symbol 50 51 00:03:25,824 --> 00:03:28,128 And my first character should be A 51 52 00:03:28,384 --> 00:03:32,480 That's why I am writing ^ symbol then character a 52 53 00:03:33,248 --> 00:03:35,808 And this A can repeat itself 53 54 00:03:36,064 --> 00:03:38,880 Either one time or more time 54 55 00:03:39,136 --> 00:03:41,440 That's why I have to write plus symbol 55 56 00:03:42,208 --> 00:03:46,048 Now after my first character there can be anything 56 57 00:03:46,304 --> 00:03:52,448 So I am mentioning the character set, my character set contains all the alphabets of English that is 57 58 00:03:52,704 --> 00:03:53,728 A to Z 58 59 00:03:53,984 --> 00:03:59,360 And it also contains a space, since there is a space between first name and last name 59 60 00:03:59,616 --> 00:04:02,432 That's why I have to write space also 60 61 00:04:04,224 --> 00:04:09,600 So what this means is my second character can be anything from A to Z 61 62 00:04:10,624 --> 00:04:12,928 Or it can also be space 62 63 00:04:13,696 --> 00:04:18,815 No I want to replicate same rule for other positions as well 63 64 00:04:19,071 --> 00:04:21,375 That's why I have to use plus symbol 64 65 00:04:22,911 --> 00:04:25,727 And then have to mention the end of a string 65 66 00:04:25,983 --> 00:04:28,287 That's why I am writing $ symbol 66 67 00:04:29,311 --> 00:04:33,919 Do note that the symbol for space is backslash s \s 67 68 00:04:34,175 --> 00:04:38,271 Similarly symbol for dot is backslash dot 68 69 00:04:38,527 --> 00:04:40,063 And symbol for 69 70 00:04:40,575 --> 00:04:46,719 Dash is backslash dash, So for all the special characters you have to write backslash before 70 71 00:04:46,975 --> 00:04:47,487 Them 71 72 00:04:48,511 --> 00:04:51,327 So let's go to PG admin to write this 72 73 00:04:52,607 --> 00:04:56,959 So you will write select star from customer 73 74 00:05:01,823 --> 00:05:04,127 Where 74 75 00:05:04,383 --> 00:05:06,943 Customer name 75 76 00:05:09,247 --> 00:05:14,367 Then a tilde operator with asterisk sign to make it case insensitive 76 77 00:05:14,879 --> 00:05:18,463 And then in the single quotation I will write hat 77 78 00:05:19,231 --> 00:05:19,743 A 78 79 00:05:20,511 --> 00:05:24,095 Just to denote that string is starting with A 79 80 00:05:24,607 --> 00:05:28,447 Then this A can repeat itself n number of times 80 81 00:05:28,959 --> 00:05:31,263 So I am mentioning + sign 81 82 00:05:31,519 --> 00:05:36,127 Then I am defining the character set which contain all the characters from A to Z 82 83 00:05:36,383 --> 00:05:37,407 And a space 83 84 00:05:37,663 --> 00:05:40,223 Now the characters of this character set 84 85 00:05:40,991 --> 00:05:43,551 Can repeat itself N number of 85 86 00:05:43,807 --> 00:05:47,135 Time that's why I am mentioning a Plus sign 86 87 00:05:47,647 --> 00:05:50,207 And then I have to mention the end of a string 87 88 00:05:50,463 --> 00:05:53,023 That's why I am mentioning & sign 88 89 00:05:55,071 --> 00:05:57,119 Let's run this query 89 90 00:05:57,887 --> 00:06:01,471 So right now we are not getting our desired result 90 91 00:06:02,751 --> 00:06:06,335 So we have to change this / to backslash 91 92 00:06:07,359 --> 00:06:10,431 And then again rerun the code 92 93 00:06:13,503 --> 00:06:17,599 Now you can see that all the customer names are starting with A 93 94 00:06:18,879 --> 00:06:25,023 And after that there can be N number of characters between A to Z or space 94 95 00:06:26,815 --> 00:06:27,839 Now suppose 95 96 00:06:28,863 --> 00:06:35,007 In the second example, we want all the customers whose first name is start with a b c 96 97 00:06:35,263 --> 00:06:36,287 Or D 97 98 00:06:38,335 --> 00:06:41,663 So in this example we will use the or operator 98 99 00:06:42,687 --> 00:06:48,063 Will provider pipe operator between a b c and d and put this 99 100 00:06:48,319 --> 00:06:49,855 In a parentheses 100 101 00:06:51,135 --> 00:06:57,279 We can also use character set of ABCD instead of this or operator and parentheses 101 102 00:06:59,071 --> 00:07:01,631 So let's write this query in pg admin 102 103 00:07:02,655 --> 00:07:05,727 Will write select star from customer 103 104 00:07:11,615 --> 00:07:13,663 Where 104 105 00:07:13,919 --> 00:07:16,223 Customer name 105 106 00:07:17,759 --> 00:07:19,295 Then the tilde operator 106 107 00:07:19,551 --> 00:07:22,623 With asterisk sign to make it case insensitive 107 108 00:07:23,391 --> 00:07:26,975 And then in single quotes I will write 108 109 00:07:31,327 --> 00:07:37,471 Start of the string then ABCD in the parentheses with 109 110 00:07:37,727 --> 00:07:38,495 Or operator 110 111 00:07:40,543 --> 00:07:45,407 Then this can repeat N number of time that's why I will put 111 112 00:07:46,431 --> 00:07:47,711 Plus symbol 112 113 00:07:47,967 --> 00:07:52,063 Then in the next character set it can take any value from A to Z 113 114 00:07:52,575 --> 00:07:56,927 And a space as well remember to put backslash as for space 114 115 00:07:57,183 --> 00:07:58,719 And then the + sign 115 116 00:07:58,975 --> 00:08:00,511 For the repetition 116 117 00:08:01,023 --> 00:08:02,559 Then I will end the string 117 118 00:08:03,071 --> 00:08:04,095 And 118 119 00:08:04,607 --> 00:08:05,375 Run the code 119 120 00:08:10,751 --> 00:08:16,895 You can see we have all the customer whose first name starts with either A or B or 120 121 00:08:17,151 --> 00:08:18,175 C or d 121 122 00:08:18,687 --> 00:08:24,575 Remember if you want to write same query using like we have to use four like statements 122 123 00:08:26,367 --> 00:08:29,439 With regex we can do this in a single query 123 124 00:08:30,463 --> 00:08:30,975 Now 124 125 00:08:31,231 --> 00:08:32,767 in the third example 125 126 00:08:33,535 --> 00:08:39,679 I want the first name and the last name of my customer to be exactly of 4 character and the first name 126 127 00:08:39,935 --> 00:08:44,031 Name should start with either A B C or d 127 128 00:08:44,799 --> 00:08:45,567 To do this 128 129 00:08:45,823 --> 00:08:51,967 I will write the start of the string then I will mention a b c d using or operator 129 130 00:08:53,247 --> 00:08:58,623 Then here I will not write plus symbol because I want exactly One character 130 131 00:08:59,391 --> 00:09:01,951 After that I will write a to z 131 132 00:09:02,719 --> 00:09:04,255 This is the character set 132 133 00:09:05,023 --> 00:09:08,607 And I want this character set to repeat exactly three times 133 134 00:09:09,119 --> 00:09:12,447 So my first character should be ABCD 134 135 00:09:12,703 --> 00:09:16,287 And the next three character Can Be Anything between a to z 135 136 00:09:17,567 --> 00:09:21,151 And after this first name I want space in between 136 137 00:09:21,919 --> 00:09:24,991 That's why I will mention backslash s 137 138 00:09:26,015 --> 00:09:32,159 And after that I want my last name to contain exactly four character that's why I will 138 139 00:09:32,415 --> 00:09:38,559 Write A dash Z, I am mentioning the character set and I want this character set to repeat 139 140 00:09:38,815 --> 00:09:40,095 itself 4 times 140 141 00:09:41,631 --> 00:09:42,655 So let's 141 142 00:09:42,911 --> 00:09:45,471 Write this example in PG admin 142 143 00:10:07,231 --> 00:10:13,375 Mention the start of string I will write ^ symbol after that I want my first name to start with 143 144 00:10:13,631 --> 00:10:19,775 A B C or D Audi that's why I will write all this characters with 144 145 00:10:20,031 --> 00:10:21,567 Or symbol or pipe symbol 145 146 00:10:21,823 --> 00:10:27,967 After that I want any character between A to Z and wanted to repeat 3 times 146 147 00:10:28,223 --> 00:10:32,575 Therefore I will put 3 in the curly brackets after that I want a space 147 148 00:10:33,855 --> 00:10:38,207 So I mentioned backslash space after that I want a to z 148 149 00:10:39,231 --> 00:10:43,839 4 times to ensure that my last name should contain only 4 characters 149 150 00:10:44,351 --> 00:10:46,911 So I will mention four in the 150 151 00:10:47,167 --> 00:10:50,239 Curly brackets after that I want my string 151 152 00:10:51,007 --> 00:10:54,335 To get over so that's why I will put $ symbol 152 153 00:10:57,919 --> 00:10:59,967 Now let's run this 153 154 00:11:05,087 --> 00:11:05,855 You can see 154 155 00:11:06,111 --> 00:11:07,647 We are getting all the customer 155 156 00:11:08,415 --> 00:11:11,743 Who is first name and last name contains only four characters 156 157 00:11:12,767 --> 00:11:13,279 And 157 158 00:11:13,535 --> 00:11:14,303 Customers 158 159 00:11:14,559 --> 00:11:15,071 Where 159 160 00:11:15,327 --> 00:11:16,095 First name are 160 161 00:11:16,607 --> 00:11:19,679 Starting with A B C or d 161 162 00:11:22,751 --> 00:11:23,775 Now suppose 162 163 00:11:24,031 --> 00:11:27,103 We have some email IDs in our customer name 163 164 00:11:27,871 --> 00:11:34,015 And for this example we are using another table that is the users table we have already loaded 164 165 00:11:34,271 --> 00:11:35,039 Data into this table 165 166 00:11:35,295 --> 00:11:41,183 For your practice we have provided a Notepad file containing the queries to create this table 166 167 00:11:42,207 --> 00:11:44,511 In this table we want to identify 167 168 00:11:45,535 --> 00:11:47,327 Valid email ids 168 169 00:11:47,583 --> 00:11:49,119 That are present in my name 169 170 00:11:50,655 --> 00:11:52,959 So first let's go to 170 171 00:11:53,471 --> 00:11:55,263 PG admin and view the table 171 172 00:11:55,775 --> 00:12:00,639 Select star from users 172 173 00:12:08,575 --> 00:12:11,647 You can see that we have only one column that is 173 174 00:12:11,903 --> 00:12:12,927 Column for name 174 175 00:12:13,183 --> 00:12:16,511 And in that column we have few email ids as well 175 176 00:12:18,047 --> 00:12:23,423 So now we want to identify this email ID , only valid email ids 176 177 00:12:23,935 --> 00:12:24,959 From this data 177 178 00:12:30,079 --> 00:12:32,383 So a valid email ID 178 179 00:12:32,639 --> 00:12:37,247 Should contain alphanumeric string it can also contain a dot 179 180 00:12:37,503 --> 00:12:38,271 Or a Dash 180 181 00:12:39,295 --> 00:12:42,111 Then there should be a @ sign 181 182 00:12:43,903 --> 00:12:45,439 And then after that 182 183 00:12:45,695 --> 00:12:48,255 There can be alphanumeric string 183 184 00:12:48,511 --> 00:12:49,791 For the domain name 184 185 00:12:50,047 --> 00:12:52,095 It can also contain dash as well 185 186 00:12:53,375 --> 00:12:56,191 And after there that should be a dot 186 187 00:12:56,959 --> 00:12:59,007 For example in gmail.com 187 188 00:12:59,519 --> 00:13:02,079 There is a .com coming at the end 188 189 00:13:02,591 --> 00:13:03,615 That's why 189 190 00:13:03,871 --> 00:13:10,015 We are putting a condition on the dot and after that there can be 2 to 5 characters 190 191 00:13:10,271 --> 00:13:12,831 That's why we are putting a to z 191 192 00:13:13,087 --> 00:13:14,879 It should be between two 192 193 00:13:15,135 --> 00:13:15,903 Or 5 193 194 00:13:16,671 --> 00:13:21,279 So we are not accepting a single character and we are not accepting character length 194 195 00:13:21,791 --> 00:13:22,815 More than 5 195 196 00:13:23,583 --> 00:13:25,119 So let's 196 197 00:13:25,375 --> 00:13:27,423 Write this in our PG admin 197 198 00:13:33,567 --> 00:13:36,383 Select star from users 198 199 00:13:39,199 --> 00:13:40,479 Where 199 200 00:13:41,503 --> 00:13:45,087 Make it case insensitive I will write tilde star 200 201 00:13:47,135 --> 00:13:48,927 Then under single quotes 201 202 00:13:50,207 --> 00:13:52,255 To mention the first part of our 202 203 00:13:52,511 --> 00:13:53,791 Email ID 203 204 00:13:54,047 --> 00:13:58,911 We'll mention all the possible characters that is a to z 0 to 9 204 205 00:13:59,423 --> 00:14:00,447 Dot 205 206 00:14:00,959 --> 00:14:01,727 or Dash 206 207 00:14:03,007 --> 00:14:07,359 Or underscore and we want this to repeat N number of time 207 208 00:14:10,943 --> 00:14:12,223 After that we want 208 209 00:14:12,479 --> 00:14:17,087 @ symbol and after that we want another character set 209 210 00:14:18,111 --> 00:14:20,415 Of A to Z or 0 to 9 210 211 00:14:20,671 --> 00:14:26,815 Or Dash since domain name contains only alphanumeric characters and can also contain Dash 211 212 00:14:28,095 --> 00:14:30,655 And we want this to repeat n number of times 212 213 00:14:31,679 --> 00:14:32,959 That's why we are putting 213 214 00:14:33,215 --> 00:14:33,727 + 214 215 00:14:33,983 --> 00:14:37,567 After that we want a dot to come after the domain name 215 216 00:14:38,079 --> 00:14:43,711 That's why backslash dot and after that another character set of a to z 216 217 00:14:43,967 --> 00:14:47,551 We want to repeat it exactly between 2 and 5 217 218 00:14:48,575 --> 00:14:53,183 So this query will retrieve only valid email ids from a database 218 219 00:14:53,439 --> 00:14:55,231 Let's run this query 219 220 00:14:57,535 --> 00:14:59,583 You can see we are only getting 220 221 00:14:59,839 --> 00:15:01,375 Two valid email ids 221 222 00:15:01,631 --> 00:15:05,471 There was an email id that was starting with @ 222 223 00:15:06,239 --> 00:15:09,567 And we have filtered out that email id using this query 223 224 00:15:11,103 --> 00:15:17,247 So regex expressions are very useful and you can use it in other advance string function 224 225 00:15:17,503 --> 00:15:18,015 As well 225 226 00:15:18,271 --> 00:15:20,319 SO that's all for this lecture 18935

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.