subtitlecat.com

All language subtitles for Chapter 4

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified) Download

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,000 --> 00:00:09,000 Welcome back to the lecture machine vision. Today we want to discuss chapter number four on curve fitting. 2 00:00:09,000 --> 00:00:12,000 What does that mean? 3 00:00:12,000 --> 00:00:21,000 Let us assume we make pictures like that. For instance in a factory we want to control the output of a machine. 4 00:00:21,000 --> 00:00:25,000 So we make pictures of the objects that are produced. 5 00:00:25,000 --> 00:00:33,000 Based on that we do some edge detection to determine the boundaries of the object. 6 00:00:33,000 --> 00:00:37,000 As a result we get an image as the one shown on the right hand side. 7 00:00:37,000 --> 00:00:42,000 An image that is black for all pixels except for the edge pixels. 8 00:00:42,000 --> 00:00:47,000 Now we can represent this object as a set of edge pixels. 9 00:00:47,000 --> 00:00:57,000 However for further processing doing all computations on the basis of individual pixels might be relatively slow. 10 00:00:57,000 --> 00:01:03,000 And it might not allow to do some higher level reasoning on the shape of the object. 11 00:01:03,000 --> 00:01:12,000 For that reason we might like to represent this object not just as a set of pixels but as a geometric shape. 12 00:01:12,000 --> 00:01:21,000 For instance as a polygon and represent the polygon just by the coordinates of the vertices of the coordinates. 13 00:01:21,000 --> 00:01:38,000 So we could reduce the amount of memory that is required for representing this shape by just six coordinates rather than hundreds or even thousands of pixels. 14 00:01:38,000 --> 00:01:45,000 Now we want to discuss how we can get from this edge image to the geometric representation. 15 00:01:45,000 --> 00:01:56,000 For that we want to start with the simplest elements that we can use namely lines and line segments. 16 00:01:56,000 --> 00:02:04,000 But before we do that let's repeat a little bit how we can represent two dimensional geometric objects. 17 00:02:04,000 --> 00:02:11,000 So this should be just a repetition of what you should already know from your lectures in mathematics. 18 00:02:11,000 --> 00:02:16,000 The first thing that we need is a dot product. 19 00:02:16,000 --> 00:02:26,000 The dot product of two vectors say p and q is defined as a sum over the pairwise products of the elements of the two vectors. 20 00:02:26,000 --> 00:02:37,000 So if p and q are two dimensional vectors and the elements of p are p1 and p2 and the elements of q are q1 and q2, 21 00:02:37,000 --> 00:02:43,000 then the dot product is just p1 times q1 plus p2 times q2. 22 00:02:43,000 --> 00:02:52,000 Within this lecture I will always use the notation with the special brackets that you find here for the dot product. 23 00:02:52,000 --> 00:02:57,000 I could also write p transpose times q that leads to the same result, 24 00:02:57,000 --> 00:03:06,000 but to make clear that we are faced with a dot product I use this special notation with the brackets. 25 00:03:06,000 --> 00:03:13,000 The dot product has some mathematical properties. For instance it is linear in both of its arguments. 26 00:03:13,000 --> 00:03:23,000 That means if we apply a linear combination of vectors p and r with prefactors alpha and beta as first argument, 27 00:03:23,000 --> 00:03:34,000 and another linear combination of vectors q and s with scalar prefactors gamma and delta to a second argument of the dot product, 28 00:03:34,000 --> 00:03:39,000 then we can multiply out the dot product and we get four terms. 29 00:03:39,000 --> 00:03:50,000 Each of those terms contains just a pairwise dot product of two of the basic vectors that are used in the linear combinations 30 00:03:50,000 --> 00:03:57,000 and all the prefactors are multiplied out, are put in front of the dot product. 31 00:03:57,000 --> 00:04:11,000 Another important property which leads to a geometric interpretation of the dot product is that the dot product of p and q is equal to 32 00:04:11,000 --> 00:04:22,000 the Euclidean length of the vector p times the Euclidean length of vector q times the cosine of the angle between vector p and q. 33 00:04:22,000 --> 00:04:30,000 That means there is a relationship between the angle between two vectors and their dot product. 34 00:04:30,000 --> 00:04:40,000 And from that it follows that the dot product of a vector with itself is nothing else than the squared length of the vector 35 00:04:40,000 --> 00:04:49,000 because the cosine of zero degree is one and that the dot product of two vectors which are orthogonal is always zero 36 00:04:49,000 --> 00:04:56,000 because the cosine of 90 degree is zero. 37 00:04:56,000 --> 00:05:04,000 So that means if you have two vectors p and q and p is a unit vector that means it has length one. 38 00:05:04,000 --> 00:05:11,000 And we calculate the dot product of p and q and multiply it with this unit vector p. 39 00:05:11,000 --> 00:05:26,000 What happens? Well, what we get is a kind of projection of the vector q onto the direction that is given by vector p. 40 00:05:26,000 --> 00:05:36,000 So the vector dot product of p and q times p is given as the red vector in the diagram. 41 00:05:36,000 --> 00:05:46,000 So the dot product of p and q provides the length of this vector because the vector p has unit length. 42 00:05:46,000 --> 00:05:55,000 So we can see that the dot product can be used to calculate some orthogonal projection of a vector onto a certain direction. 43 00:05:55,000 --> 00:06:02,000 Based on this we can define lines and line segments. 44 00:06:02,000 --> 00:06:12,000 So the first definition of a line segment with two end points p and q is given as a set of points x 45 00:06:12,000 --> 00:06:18,000 which can be composed as one minus tau times p plus tau times q. 46 00:06:18,000 --> 00:06:22,000 Vector is a real number between zero and one. 47 00:06:22,000 --> 00:06:36,000 That means the definition of a line segment is given by a set of points and all these points are represented in this way. 48 00:06:36,000 --> 00:06:44,000 We can extend that line segment to a full line, that means a line that has an infinite extension, 49 00:06:44,000 --> 00:06:53,000 by relaxing the values of tau to the real numbers, so we do not restrict them anymore to the interval between zero and one, 50 00:06:53,000 --> 00:07:00,000 but to all real numbers. And then we get a representation of the full straight line. 51 00:07:00,000 --> 00:07:06,000 For all values of tau between zero and one we get the line segment between p and q. 52 00:07:06,000 --> 00:07:16,000 For all values of tau less than zero we get those elements of the line which are outside of the line segment between p and q, 53 00:07:16,000 --> 00:07:27,000 but closer to p than q. And for values of tau greater than one we get all those points which are outside of the line segment between p and q, 54 00:07:27,000 --> 00:07:35,000 but which are closer to q than to p. 55 00:07:35,000 --> 00:07:43,000 For some purposes it is useful to be able to calculate the perpendicular point of a certain point onto such a line. 56 00:07:43,000 --> 00:07:56,000 So we assume that a certain point r is given. This point might be part of the line, but it might also be not an element of the line. 57 00:07:56,000 --> 00:08:10,000 And we want to calculate that point on the line that is closest to r. And that point is the point where the angle between the vector that connects r and l 58 00:08:10,000 --> 00:08:21,000 and the vector that connects p and q is equal to 90°. That means this perpendicular point, say l, must be part of the line. 59 00:08:21,000 --> 00:08:31,000 So l must have a representation that is given by one minus tau times p plus tau times q for a certain value of tau. 60 00:08:31,000 --> 00:08:43,000 Furthermore, the vector from r to l should be orthogonal to the vector of p to q. That means the dot product of l minus r and p minus q must be equal to zero. 61 00:08:43,000 --> 00:08:55,000 Now we can substitute in the dot product the right hand side of the first equation. That means instead of l we write one minus tau times p plus tau times q. 62 00:08:55,000 --> 00:09:02,000 Then we multiply out the dot product and we get one equation that depends only on tau. 63 00:09:02,000 --> 00:09:16,000 We resolve this equation with respect to tau and get as result that tau is equal to the dot product of p minus r and p minus q over the dot product of p minus q and p minus q. 64 00:09:16,000 --> 00:09:29,000 So in the denominator you clearly see we are faced with a squared Euclidean length of the vector between p and q. 65 00:09:29,000 --> 00:09:41,000 Beside the representation of a line that we have derived on the slide before, there is a second way to represent a line in the two dimensional space called the normal form. 66 00:09:41,000 --> 00:09:53,000 For that we need a vector n that is orthogonal to the line and has a length one. So it is an orthogonal unit vector. 67 00:09:53,000 --> 00:10:01,000 For that vector we know it is orthogonal, so the dot product of n and the vector q minus p is equal to zero. 68 00:10:01,000 --> 00:10:14,000 We observe that the dot product of n and an arbitrary point x on the line is equal to the dot product of n and one minus tau times p plus tau times q minus one. 69 00:10:14,000 --> 00:10:22,000 So we just substituted x by the definition of such a point that we have derived on the slide before. 70 00:10:22,000 --> 00:10:32,000 Now we can multiply out the dot product and get the dot product of n and p plus tau times the dot product of n and q minus p. 71 00:10:32,000 --> 00:10:41,000 The dot product of n and q minus p is equal to zero, as we have seen, because n is orthogonal to that vector. 72 00:10:41,000 --> 00:10:45,000 So what remains is just the dot product of n and p. 73 00:10:45,000 --> 00:10:59,000 If we reformulate that, then we get that for each point x on the line it holds that the dot product of n and x minus the dot product of n and p is equal to zero. 74 00:10:59,000 --> 00:11:20,000 If we substitute minus the dot product of n and p by a new constant, say c, then we get the standard representation, normal form representation of a line, zero equals the dot product of n and x plus c. 75 00:11:20,000 --> 00:11:25,000 At the bottom right we see a sketch of the situation. 76 00:11:25,000 --> 00:11:33,000 We have the line, we have the two points p and q on the line that were used on the slide before to represent the line, 77 00:11:33,000 --> 00:11:44,000 and we have the unit normal vector n that points into a direction that is orthogonal to the line. 78 00:11:44,000 --> 00:11:53,000 The representation in normal form has some advantages over the representation that we have seen before, 79 00:11:53,000 --> 00:12:01,000 because it allows us to calculate the distance of an arbitrary point to the line in a very easy way. 80 00:12:01,000 --> 00:12:09,000 Let us assume we want to calculate the distance of a point capital R to that line. 81 00:12:09,000 --> 00:12:16,000 With the standard definition of a line, we first would have to calculate the perpendicular point L, 82 00:12:16,000 --> 00:12:27,000 and afterwards calculate the vector that connects L and R and evaluate the Euclidean length of this vector. 83 00:12:27,000 --> 00:12:37,000 With the units with a normal form, we can simplify that and get as result the dot product of n and R. 84 00:12:37,000 --> 00:12:46,000 R is a vector that starts in the origin of the coordinate system and points to the point capital R. 85 00:12:46,000 --> 00:12:52,000 The dot product of n and R plus c. 86 00:12:52,000 --> 00:12:58,000 This is a sign number. The sign tells us whether we are on the left hand side or on the right hand side of the line. 87 00:12:58,000 --> 00:13:05,000 If you are just interested in the distance, we just take the absolute value of this number. 88 00:13:05,000 --> 00:13:09,000 How can we derive that from the plot on the right hand side? 89 00:13:09,000 --> 00:13:28,000 On the right hand side we see the point capital R and the point capital L, which is the perpendicular point to R. 90 00:13:28,000 --> 00:13:34,000 We also see the small vector R that connects the origin of the coordinate system with the point capital R. 91 00:13:34,000 --> 00:13:38,000 We see the vector n, the normal vector. 92 00:13:38,000 --> 00:13:44,000 We see the line that starts in the origin and follows the direction of n. 93 00:13:44,000 --> 00:14:01,000 We know that calculating the dot product of a vector and a unit vector is the same as projecting that vector onto the direction that is given by the unit vector. 94 00:14:01,000 --> 00:14:16,000 So calculating the dot product of n and the vector R is nothing else than calculating the length between the origin of the coordinate system and the point R'. 95 00:14:16,000 --> 00:14:31,000 Furthermore, we know that we are interested in the distance between the points R and L, which is exactly the same as the distance between the points R' and L'. 96 00:14:31,000 --> 00:14:40,000 So what we need is to know how large is the distance between the origin and the point L'. 97 00:14:40,000 --> 00:14:53,000 However, we know that L' is an element of the line. So we know that it has exactly the same orthogonal distance as the point P or the point Q. 98 00:14:53,000 --> 00:15:08,000 That means if we project the point P or Q or L or L' or any other point of the line onto the direction that is given by n, we will always get the same number, the same length. 99 00:15:08,000 --> 00:15:21,000 So this is the same as the dot product of n and P or the dot product of n and L or the dot product of n and Q. 100 00:15:21,000 --> 00:15:31,000 And we know that this is already -c. And therefore we can derive the formula that is shown here. 101 00:15:31,000 --> 00:15:40,000 Furthermore, you might sometimes be interested in getting an arbitrary point on the line if you know the line just represented in normal form. 102 00:15:40,000 --> 00:15:53,000 How do we get that? Well, what could we do? We could pick an arbitrary point - doesn't matter where - in space and then project that point onto the line. 103 00:15:53,000 --> 00:16:01,000 The simplest way is to take the origin as such a point and project the origin onto the line. 104 00:16:01,000 --> 00:16:11,000 That means we are searching a point that has a representation tau times n, where tau might be a real number. 105 00:16:11,000 --> 00:16:23,000 And we are searching for such a value of tau that the dot product of tau times n and n is equal to -c. 106 00:16:23,000 --> 00:16:32,000 And when we resolve this equation we get that the value of tau needs to be chosen as -c. 107 00:16:32,000 --> 00:16:41,000 This yields then one point, namely the point -c times n. And we know that this is element of the line. 108 00:16:41,000 --> 00:16:55,000 And if we look more carefully on the drawing we will find that this point -c times n is exactly the point L' in the drawing. 109 00:16:55,000 --> 00:17:03,000 To finalize this brief repetition of the basics of two-dimensional geometry, let us have a look at vectors. 110 00:17:03,000 --> 00:17:13,000 If we are faced with an arbitrary two-dimensional vector V with elements V1 and V2, we want to calculate a unit normal vector for that vector. 111 00:17:13,000 --> 00:17:21,000 How can we do that? The first step is to normalize the vector. That means to make it a vector of length 1. 112 00:17:21,000 --> 00:17:33,000 This can be done by dividing the vector by its vector length. That means calculating 1 over the Euclidean length of V times the vector. 113 00:17:33,000 --> 00:17:44,000 The second step is to rate-rotate it by 90° either to the left or to the right. In the two-dimensional case this is rather easy. 114 00:17:44,000 --> 00:17:53,000 If we have a vector V1, V2 and we want to rotate it by 90°, we get the vector -V2, V1. 115 00:17:53,000 --> 00:18:00,000 If we want to rotate it by -90°, we get the vector V2, -V1. 116 00:18:00,000 --> 00:18:17,000 That means a unit normal vector for vector V has the shape 1 over the length of vector V times -V2, V1. 117 00:18:17,000 --> 00:18:24,000 Every vector of course has a polar representation in the two-dimensional space. 118 00:18:24,000 --> 00:18:32,000 Each vector V can be represented as R times cos(phi) sin(phi). 119 00:18:32,000 --> 00:18:43,000 That means the vector cos(phi) sin(phi) is a unit vector that points into that direction of that vector V and has unit length. 120 00:18:43,000 --> 00:18:54,000 The scalar R is a non-negative number and it tells us how far away from the origin that point is. 121 00:18:54,000 --> 00:19:03,000 If we are given a vector V with elements entries V1 and V2, we might be interested in calculating its polar representation. 122 00:19:03,000 --> 00:19:13,000 Determining its distance from the origin is rather simple, so we just need to calculate the Euclidean length of vector V. 123 00:19:13,000 --> 00:19:17,000 But how do we get the angle phi? 124 00:19:17,000 --> 00:19:22,000 We have to consider trigonometric calculations here. 125 00:19:22,000 --> 00:19:34,000 The first way that you would find is to somehow use the Arcus Tangent to calculate that vector. 126 00:19:34,000 --> 00:19:42,000 That's possible, but it requires some case distinction to get the angle phi directly. 127 00:19:42,000 --> 00:19:52,000 The simpler way is to use a specific function that is known as I_A_10_2, the Arcus Tangent with two arguments, 128 00:19:52,000 --> 00:20:06,000 that is defined to take the two components of the two-dimensional vector V1 and V2 and that provides this angle phi into which this vector V1 and V2 points. 129 00:20:06,000 --> 00:20:16,000 The function I_A_10_2 is a standard function, so you can use it in mathematical formula as well as in programming languages. 130 00:20:16,000 --> 00:20:25,000 That means every time you need to know the angle of a two-dimensional vector, use this function I_A_10_2, 131 00:20:25,000 --> 00:20:39,000 rather than somehow using the Arcus Tangent of some ratio of some numbers and then adding some case distinction. 132 00:20:39,000 --> 00:20:47,000 Now we have all the prerequisites for our further processing of the edge bitmaps. 133 00:20:47,000 --> 00:20:53,000 We assume that we started with a gray value image and extracted the gray value edges. 134 00:20:53,000 --> 00:21:01,000 Now we know all the pixels that are gray level edge pixels, but we want to get to a shape description. 135 00:21:01,000 --> 00:21:08,000 The first way that we want to consider is to use a technique called the Huff Transform, 136 00:21:08,000 --> 00:21:19,000 which transforms the content of the edge image into a different space and extracts the dominant lines in that space. 137 00:21:19,000 --> 00:21:30,000 From the results of previous slides we know that we can represent each line in the two-dimensional space in its normal form. 138 00:21:30,000 --> 00:21:42,000 If we represent the normal vector as the vector with elements cosine phi and sine phi and the position of a point as the vector x, y, 139 00:21:42,000 --> 00:21:52,000 then we can represent each line as x times cosine phi plus y times sine phi plus z equals zero. 140 00:21:52,000 --> 00:21:56,000 That's just the normal form of a line. 141 00:21:56,000 --> 00:22:09,000 Furthermore, we can limit ourselves to angles phi between 0 and 180° and the values of z from the real numbers. 142 00:22:09,000 --> 00:22:16,000 Why don't we need to consider larger angles between 180° and 360°? 143 00:22:16,000 --> 00:22:26,000 Well, the answer is easy, because each line has two representations, one with a normal vector pointing to the left and one with a normal vector pointing to the right. 144 00:22:26,000 --> 00:22:35,000 Here we restrict ourselves to just one of these representations by limiting the angles between 0 and 180°. 145 00:22:35,000 --> 00:22:44,000 But still we can represent each line in such a representation. 146 00:22:44,000 --> 00:22:55,000 That means we can represent all the lines in the two-dimensional space by two numbers, the pair of phi and z. 147 00:22:55,000 --> 00:22:59,000 Now let us plot this parameter space. 148 00:22:59,000 --> 00:23:06,000 On the left-hand side we see an image space, the original image space in which we detect edge pixels. 149 00:23:06,000 --> 00:23:11,000 It has two coordinate axes x and y as usual in an image. 150 00:23:11,000 --> 00:23:17,000 On the right-hand side we find the parameter space. It has the two axes phi and z. 151 00:23:17,000 --> 00:23:27,000 And each line in the image is represented by one point in that parameter space. 152 00:23:27,000 --> 00:23:37,000 If we consider one point in the image, indicated by the red circle, then we can find several lines that are passing through this point. 153 00:23:37,000 --> 00:23:43,000 For instance this vertical line. It has a certain representation in parameter space. 154 00:23:43,000 --> 00:23:51,000 For instance this line has an angle of phi equal to 0 and a certain offset z. 155 00:23:51,000 --> 00:24:02,000 And that pair of parameters is represented in the parameter space at a certain position that is indicated by the blue circle. 156 00:24:02,000 --> 00:24:08,000 But that's not the only line that passes through the red point. 157 00:24:08,000 --> 00:24:13,000 We might also consider that line that has a different angle and a different offset. 158 00:24:13,000 --> 00:24:21,000 So it has a different representation in the parameter space, which maybe is located at that position. 159 00:24:21,000 --> 00:24:31,000 Like that we go on and consider further lines in the image space, which refer to further points in the parameter space. 160 00:24:31,000 --> 00:24:38,000 For instance the horizontal line in the image space has a representation in parameter space at that point. 161 00:24:38,000 --> 00:24:49,000 And that diagonal line has a representation in parameter space at another position. 162 00:24:49,000 --> 00:25:00,000 And even those four lines are not all. There are infinitely many lines that are passing through the red point of interest. 163 00:25:00,000 --> 00:25:03,000 And all of these have a representation in parameter space. 164 00:25:03,000 --> 00:25:10,000 If we calculate those points we find that those are located on a sine curve. 165 00:25:10,000 --> 00:25:17,000 A sine curve with a certain amplitude and a certain phase shift. 166 00:25:17,000 --> 00:25:23,000 Now let's consider another point in the image. Another edge point that we have found. 167 00:25:23,000 --> 00:25:28,000 For instance the point on the left hand side was indicated by a red circle. 168 00:25:28,000 --> 00:25:33,000 Again we can consider the different lines that are passing through that point. 169 00:25:33,000 --> 00:25:42,000 And we get another sine curve in the parameter space as a splice of all the lines that represent that point. 170 00:25:42,000 --> 00:25:45,000 All the lines that intersect at that point. 171 00:25:45,000 --> 00:25:55,000 And we observe that the two sine curves that we obtained in the parameter space are intersecting at a single parameter set. 172 00:25:55,000 --> 00:26:05,000 And that's obviously the parameter set that represents the line that connects the two red points in the image space. 173 00:26:05,000 --> 00:26:08,000 So what did we find? 174 00:26:08,000 --> 00:26:18,000 If we observe lines in the image space, those lines are represented in the parameter space as points. 175 00:26:18,000 --> 00:26:25,000 If we consider points in the image space we can represent them in the parameter space as sine curves. 176 00:26:25,000 --> 00:26:34,000 And the sine curves implicitly represent all the lines that intersect in the point of interest in the image space. 177 00:26:34,000 --> 00:26:41,000 And if we consider several points on the same line in the image space, 178 00:26:41,000 --> 00:26:51,000 then what we can find in the parameter space are several sine curves that are all intersecting in the same point in the parameter space. 179 00:26:51,000 --> 00:26:58,000 Namely the point that refers to the line in the image space on which all the original points are located. 180 00:26:58,000 --> 00:27:02,000 And this gives the idea of the half transform. 181 00:27:02,000 --> 00:27:08,000 For each point, for each edge point in the image we create the sine curve in the parameter space. 182 00:27:08,000 --> 00:27:14,000 And then we consider the points of intersections of the sine curves. 183 00:27:14,000 --> 00:27:26,000 And those points where many sine curves intersect are likely to represent lines that are very dominant in the image. 184 00:27:26,000 --> 00:27:29,000 So here's the basic procedure. 185 00:27:29,000 --> 00:27:40,000 We first take all the edge pixels and for each edge pixel we calculate the sine curve and we might draw it or somehow represent it, that sine curve. 186 00:27:40,000 --> 00:27:51,000 So for instance here we have this parameter space, this phi and c, and the blue line, the blue curve represents one edge pixel. 187 00:27:51,000 --> 00:28:00,000 The green represents another edge pixel, the red one represents another edge pixel and so on and so forth. 188 00:28:00,000 --> 00:28:07,000 After we have done that we check for points of intersection in the parameter space. 189 00:28:07,000 --> 00:28:13,000 So in this case we might have found one very dominant point of intersection. 190 00:28:13,000 --> 00:28:23,000 That means many sine curves are intersecting in that point for a certain parameter value of phi and c. 191 00:28:23,000 --> 00:28:33,000 And then this parameter value reveals the line on which all those points are located in the image. 192 00:28:33,000 --> 00:28:37,000 In practice, however, things turn out not to be that easy. 193 00:28:37,000 --> 00:28:47,000 Because we often do not get just a unique point of intersection, but we get an area where many points of intersections can be found. 194 00:28:47,000 --> 00:28:57,000 But due to numerical imprecision in determining the point position of edge pixels, etc., they do not intersect in a single point. 195 00:28:57,000 --> 00:29:08,000 But as we can see here, all these points of intersection are very close by, but they are not exactly the same. 196 00:29:08,000 --> 00:29:21,000 Furthermore, it might happen that the image contains several dominant lines, so that we get several of these dominant areas of points of intersection. 197 00:29:21,000 --> 00:29:24,000 And then we would need to extract those. 198 00:29:24,000 --> 00:29:28,000 Furthermore, we get points of intersection which are not showing dominant lines. 199 00:29:28,000 --> 00:29:34,000 So where only, for instance, two sine curves are intersecting, which are not supported by more than two points, 200 00:29:34,000 --> 00:29:43,000 and therefore cannot be considered to be really dominant, apparent lines in the image. 201 00:29:43,000 --> 00:29:50,000 So how can we find those areas with a lot of points of intersection close by? 202 00:29:50,000 --> 00:29:55,000 The idea is to use something that is called an accumulator array. 203 00:29:55,000 --> 00:30:03,000 So for that purpose, we split up the whole parameter space into small cells. 204 00:30:03,000 --> 00:30:14,000 So for instance, we might consider the width of each cell to be, for instance, 1 degree or 10 degrees or 0.1 degree. 205 00:30:14,000 --> 00:30:26,000 And in the vertical direction, we might consider the spacing to be 1 pixel, for instance, height, or 10 pixels height or 0.1 pixels height. 206 00:30:26,000 --> 00:30:30,000 It depends on which precision we want to get. 207 00:30:30,000 --> 00:30:37,000 So now we have a lot of cells, and we partition the space, the parameter space, in many cells. 208 00:30:37,000 --> 00:30:44,000 For each cell, we implement a counter. 209 00:30:44,000 --> 00:30:53,000 So what we do then is that we count how many of the sine curves are passing through each of these cells. 210 00:30:53,000 --> 00:31:01,000 And then afterwards, we can check the counter values and search for the local maxima of the counter values. 211 00:31:01,000 --> 00:31:13,000 And those local maxima reveal the dominant areas of intersection of sine curves. 212 00:31:13,000 --> 00:31:23,000 So in this case, we would have one global maximum where five of the sine curves intersect. 213 00:31:23,000 --> 00:31:30,000 And we might find a second local maximum, which is still dominant. 214 00:31:30,000 --> 00:31:32,000 Where three sine curves intersect. 215 00:31:32,000 --> 00:31:39,000 And then apart from that, we might also find some other areas where only two sine curves intersect. 216 00:31:39,000 --> 00:31:43,000 So now we pick the most dominant local maxima. 217 00:31:43,000 --> 00:31:48,000 That means those maxima, which are supported by very many sine curves. 218 00:31:48,000 --> 00:31:56,000 So in practice, very many means 50 or 100 or 200 or 500 maybe. 219 00:31:56,000 --> 00:32:03,000 And treat those as the dominant lines in the image. 220 00:32:03,000 --> 00:32:05,000 So this yields the Haftrand form. 221 00:32:05,000 --> 00:32:11,000 So in total, summing up, first we create an accumulator array of adequate precision. 222 00:32:11,000 --> 00:32:18,000 Adequate means, well, the more cells we have, the more computational effort we have, the more accurate will be the result. 223 00:32:18,000 --> 00:32:23,000 The smaller the number of cells is, the less computational effort we get. 224 00:32:23,000 --> 00:32:28,000 And however, the precision of course suffers from smaller resolution. 225 00:32:28,000 --> 00:32:33,000 And then we initialize the counters for each of the cells with zero. 226 00:32:33,000 --> 00:32:36,000 In step number two, we go through all the edge pixels. 227 00:32:36,000 --> 00:32:42,000 For each edge pixel, we calculate the sine curve that refers to it in parameter space. 228 00:32:42,000 --> 00:32:52,000 And then we check which accumulator cells are, through which accumulator cells the sine curve passes through. 229 00:32:52,000 --> 00:32:57,000 And increment the counter for those accumulator cells. 230 00:32:57,000 --> 00:33:02,000 And at the end, we search the accumulator array for the dominant local maxima. 231 00:33:02,000 --> 00:33:07,000 And these provide the dominant lines in the image. 232 00:33:07,000 --> 00:33:11,000 So this procedure is known as Haftransform in computer vision. 233 00:33:11,000 --> 00:33:20,000 And sometimes in a more generic measurement system sense, it is also known as a Radon transform. 234 00:33:20,000 --> 00:33:27,000 But for us, the word Haftransform is sufficient to know. 235 00:33:27,000 --> 00:33:30,000 So with the Haftransform, we can determine lines or straight lines. 236 00:33:30,000 --> 00:33:37,000 The Haftransform does not provide endpoints of those lines or line segments for us. 237 00:33:37,000 --> 00:33:45,000 So that means, if we are interested in line segments rather than lines, we need to do some further processing, 238 00:33:45,000 --> 00:33:48,000 which is part of one of the assignments. 239 00:33:48,000 --> 00:33:56,000 So what we need to do for that purpose is somehow determine all the points that refer to the line, that support the line. 240 00:33:56,000 --> 00:34:03,000 Then check where do we find those points in a dense way, where do we have gaps. 241 00:34:03,000 --> 00:34:11,000 And based on that, do some reasoning to determine the endpoints of the line segment. 242 00:34:11,000 --> 00:34:14,000 Let's have a look at some examples. 243 00:34:14,000 --> 00:34:18,000 Again, this image with a cup on a mousepad. 244 00:34:18,000 --> 00:34:27,000 When we use the edge extraction with a KANI operator, we get an edge image that looks like that. 245 00:34:27,000 --> 00:34:32,000 Then we transform everything in the Hafts parameter space. 246 00:34:32,000 --> 00:34:38,000 And visualizing this accumulator array as an image yields a result that looks like that. 247 00:34:38,000 --> 00:34:49,000 On the horizontal axis, the horizontal axis provides the angles between 0 and a little bit more than 180 degrees. 248 00:34:49,000 --> 00:34:55,000 And the vertical axis are the C values. 249 00:34:55,000 --> 00:35:06,000 The brightness of the pixels, so each pixel in this visualization, refers to one accumulator cell. 250 00:35:06,000 --> 00:35:12,000 And the brightness refers to the counter number of the accumulator cells. 251 00:35:12,000 --> 00:35:16,000 That means the brighter the pixel is, the larger the counter value is. 252 00:35:16,000 --> 00:35:21,000 And we clearly see there are four dominant local maxima. 253 00:35:21,000 --> 00:35:29,000 And those four dominant local maxima refer to four lines in the image, namely those that are colored here. 254 00:35:29,000 --> 00:35:36,000 That means the blue, the cyan, the magenta and the green line segment here. 255 00:35:36,000 --> 00:35:41,000 Each of those four refers to one of the four dominant maxima in the image. 256 00:35:41,000 --> 00:35:49,000 Beside that, of course, we also see that there are some other local maxima, but they are not that dominant as those four. 257 00:35:49,000 --> 00:36:03,000 And therefore, if you want to represent this scene with lines, then these four lines are the most important ones. 258 00:36:03,000 --> 00:36:07,000 Let's talk about the properties of the Huff Transform. 259 00:36:07,000 --> 00:36:13,000 As I've already said, the result depends on the size and the precision of the accumulator array. 260 00:36:13,000 --> 00:36:20,000 The more cells we have, the smaller each cell is, the more accurate the result will be. 261 00:36:20,000 --> 00:36:25,000 However, the more computational effort we need. 262 00:36:25,000 --> 00:36:31,000 Determining significant peaks in the accumulator array might be difficult in practice. 263 00:36:31,000 --> 00:36:39,000 I found that there are many bad implementations of the Huff Transform, which you find in computer vision libraries. 264 00:36:39,000 --> 00:36:44,000 And only a few really good implementations of the Huff Transform. 265 00:36:44,000 --> 00:36:51,000 Therefore, if you are working with the bad ones, you might be frustrated very easily. 266 00:36:51,000 --> 00:37:00,000 If you are using better ones, the results are nicer and the quality of the results are better. 267 00:37:00,000 --> 00:37:05,000 So check which kind of implementation you are using. 268 00:37:05,000 --> 00:37:11,000 On the assignments, we will provide an own implementation of the Huff Transform, 269 00:37:11,000 --> 00:37:18,000 which somehow circumvents some of these problems and yields, I think, nice results. 270 00:37:18,000 --> 00:37:26,000 Maybe it's not the quickest implementation, but at least you do not get frustrated that easily. 271 00:37:26,000 --> 00:37:35,000 For instance, if you are trying in comparison the MATLAB provided implementation, 272 00:37:35,000 --> 00:37:39,000 you will easily start to become frustrated from the Huff Transform. 273 00:37:39,000 --> 00:37:46,000 But that's not a matter of the method itself, but more of the implementation. 274 00:37:46,000 --> 00:37:52,000 One property of the Huff Transform is that it ignores all gradient information. 275 00:37:52,000 --> 00:38:05,000 So it adds pixels individually and it does not consider any knowledge about direction of an edge that passes through that point. 276 00:38:05,000 --> 00:38:12,000 However, especially if you are dealing with edge pixels, usually we know the orientation of the edge, 277 00:38:12,000 --> 00:38:20,000 because from the gradient at that position we can derive at least get a rough approximation of the edge orientation. 278 00:38:20,000 --> 00:38:30,000 But this information is completely ignored by the Huff Transform, at least in its standard implementation. 279 00:38:30,000 --> 00:38:41,000 And in natural scenes, if you would go outside and take a picture from some natural scenes, 280 00:38:41,000 --> 00:38:52,000 then it seems that the accumulator arrays easily is flooded and the apparent lines or the dominant lines are somehow not that easily found. 281 00:38:52,000 --> 00:38:58,000 So the Huff Transform works well on technical images where the relevant objects are clearly visible 282 00:38:58,000 --> 00:39:08,000 and the background is not that dominant. If you are going to natural scenes, Huff Transform doesn't work that well in practice. 283 00:39:08,000 --> 00:39:16,000 There are some extensions of the Huff Transform. So the standard form is implemented for lines, detecting lines, 284 00:39:16,000 --> 00:39:22,000 but there are also extensions to work to extract for instance circles or ellipses. 285 00:39:22,000 --> 00:39:29,000 They are not discussed here in the lecture, but the extension is rather easy. 286 00:39:29,000 --> 00:39:40,000 So it is just a small and easy transformation of knowledge that is needed to derive those methods. 287 00:39:40,000 --> 00:39:47,000 For circles, the Huff Transform is used rather often. For ellipses, things become more difficult. 288 00:39:47,000 --> 00:39:50,000 The parameter space gets five-dimensional. 289 00:39:50,000 --> 00:39:59,000 For ellipses, the Huff Transform is not used that often. 290 00:39:59,000 --> 00:40:03,000 Then there is another extension called the randomized Huff Transform. 291 00:40:03,000 --> 00:40:15,000 It avoids some of the problems of the standard Huff Transform, but I don't introduce it in the lecture due to time reasons. 292 00:40:15,000 --> 00:40:25,000 Furthermore, there is one version called generalized Huff Transform that allows to extract any kind of shape. 293 00:40:25,000 --> 00:40:39,000 You provide a certain template of a shape and then the generalized Huff Transform searches the image for those kind of shapes. 294 00:40:39,000 --> 00:40:50,000 Besides the way to use the Huff Transform to extract lines from an image, there are also alternative ways to get lines and other structures from an image. 295 00:40:50,000 --> 00:40:56,000 The second approach that we want to discuss is based out of several steps. 296 00:40:56,000 --> 00:41:10,000 The first step is to somehow transform the representation of edge pixels from an edge bitmap to a list of edge pixels. 297 00:41:10,000 --> 00:41:20,000 That means we somehow collect all the pixels in such a way that we have those organized in lists of adjacent edge pixels. 298 00:41:20,000 --> 00:41:31,000 Based on that representation, we do some polyline segmentation. That means we somehow segment these lists of edge pixels into sublists, 299 00:41:31,000 --> 00:41:42,000 where each sublist represents one polyline segment. 300 00:41:42,000 --> 00:41:50,000 The last step is to fit the lines as best as possible to the edge pixels. 301 00:41:50,000 --> 00:42:02,000 Let's follow this path. First, discuss the edge following, then the polyline segmentation, and at the end the line fitting step. 302 00:42:02,000 --> 00:42:09,000 Edge following. The basic idea of edge following is that we travel through the image, 303 00:42:09,000 --> 00:42:20,000 we collect all the edge pixels which we have found in the image, and then arrange those in lists of pixel coordinates, 304 00:42:20,000 --> 00:42:30,000 where we arrange those lists in such a way that neighboring elements in the list refer to neighboring pixels in the image. 305 00:42:30,000 --> 00:42:42,000 As a result, we get a set of lists of edge pixels. Each list of edge pixels represents one contour, 306 00:42:42,000 --> 00:42:50,000 and the whole set of all these lists represents all the contours which we have found. 307 00:42:50,000 --> 00:43:04,000 On the right hand side, for instance, one contour that will be represented afterwards as a list of edge pixels is the large contour around the mouse pad. 308 00:43:04,000 --> 00:43:10,000 This contour is shown in the image at the bottom with a dark blue color. 309 00:43:10,000 --> 00:43:24,000 All these pixels finally are arranged in one list, and the list is ordered in the sense that neighboring elements in the list refer to neighboring elements in the image. 310 00:43:24,000 --> 00:43:36,000 Beside that large contour, there are other contours that we can find in the edge image, and each of these contours becomes one list of edge pixels. 311 00:43:36,000 --> 00:43:49,000 For instance, the upper bound of the cup is composed out of two different contours, and each of these two contours becomes one edge list. 312 00:43:49,000 --> 00:43:57,000 In the image below, those are represented with two different shades of magenta color. 313 00:43:57,000 --> 00:44:09,000 Besides that, some more contours are found, and those will become individual additional lists of pixels. 314 00:44:09,000 --> 00:44:16,000 The procedure is so simple that I even don't write down an algorithm here. 315 00:44:16,000 --> 00:44:21,000 The idea is we travel through the image from top to bottom, from left to right. 316 00:44:21,000 --> 00:44:27,000 Once we find an edge pixel, we start a new list of edge pixels. 317 00:44:27,000 --> 00:44:43,000 Then we use the gradient information to follow the edge, and we collect all the edge pixels, which we find on that list, until there are no more edge pixels that we can add. 318 00:44:43,000 --> 00:44:51,000 As a result, we get these lists of edge pixels. 319 00:44:51,000 --> 00:44:58,000 If we apply this method to a little bit more complicated scenes, we get the result as shown here. 320 00:44:58,000 --> 00:45:09,000 In these two images, black pixels means there is no edge at those positions, while all the other colors refer to edge pixels. 321 00:45:09,000 --> 00:45:21,000 Each contour that is found, subsumed, or collected in one list of edge pixels is colored with the same color. 322 00:45:21,000 --> 00:45:27,000 On the right hand side, it is very clear that we get several lists of edge pixels. 323 00:45:27,000 --> 00:45:37,000 For instance, one that is colorized on the image with dark green pixels. 324 00:45:37,000 --> 00:45:42,000 And others, which are colorized with black pixels. 325 00:45:42,000 --> 00:45:56,000 At that point, it is worth mentioning that these lists of edge pixels are not yet fixed to a certain object. 326 00:45:56,000 --> 00:46:05,000 Or it might happen that several edges, which belong to different objects or to different areas, are somehow connected with each other. 327 00:46:05,000 --> 00:46:07,000 One example can be seen here. 328 00:46:07,000 --> 00:46:17,000 The list of pixels starts here, then it goes here, here, here, here, here, downward here, and here again. 329 00:46:17,000 --> 00:46:24,000 As we can see, these edges that are collected here belong to different areas. 330 00:46:24,000 --> 00:46:31,000 For instance, some belong to the boundary between the white field marking and the robot, 331 00:46:31,000 --> 00:46:38,000 while others belong to the boundary between the field marking and the soccer field. 332 00:46:38,000 --> 00:46:44,000 There is no guarantee that this list contains some semantic information. 333 00:46:44,000 --> 00:46:51,000 Since we collected all the edges of the pixels in this same object, we need to split them up. 334 00:46:51,000 --> 00:46:56,000 As we can see in this example, as the lists are not only representing lines, 335 00:46:56,000 --> 00:47:02,000 we at least get some information about which pixels belong together or might belong together. 336 00:47:02,000 --> 00:47:06,000 Or several parts, which are put together. 337 00:47:06,000 --> 00:47:12,000 To represent them as polylines, we need to split those lists of pixels up into several parts, 338 00:47:12,000 --> 00:47:22,000 where we can represent each part, each segment, as a line segment. 339 00:47:22,000 --> 00:47:34,000 That's our task. We want to subdivide the pixel list in such a way that the sub-lists can be represented by individual line segments. 340 00:47:34,000 --> 00:47:37,000 There are several algorithms to do that. 341 00:47:37,000 --> 00:47:44,000 Here we only want to introduce one, the so-called Ramadaglas-Poika algorithm. 342 00:47:44,000 --> 00:47:49,000 How does it proceed? It's rather straightforward, I would say. 343 00:47:49,000 --> 00:47:58,000 The assumption of the Ramadaglas-Poika algorithm is that in one list of pixels it represents one line. 344 00:47:58,000 --> 00:48:02,000 That means when we connect the two endpoints in the list, 345 00:48:02,000 --> 00:48:08,000 then the line that we get should somehow fit to all the points in the list. 346 00:48:08,000 --> 00:48:13,000 And if that's not fulfilled, then we somehow split up this list into parts, 347 00:48:13,000 --> 00:48:22,000 so that those parts more likely can be represented by individual line segments. 348 00:48:22,000 --> 00:48:24,000 So, how does it proceed? 349 00:48:24,000 --> 00:48:30,000 Well, as I said, we first connect the first point of the list with the last point of the list, 350 00:48:30,000 --> 00:48:34,000 indicated here by the sine line. 351 00:48:34,000 --> 00:48:43,000 Then we calculate the orthogonal distance of all pixels in that list to this line that we just estimated. 352 00:48:43,000 --> 00:48:54,000 Some of these distances will be small and support the assumption that those pixels belong to that line that connects the first and the last pixel. 353 00:48:54,000 --> 00:49:01,000 Other distances drawn in orange in this image, 354 00:49:01,000 --> 00:49:05,000 they are too large to support this hypothesis. 355 00:49:05,000 --> 00:49:14,000 That means if we find one point with a distance that is larger than a certain acceptance threshold, 356 00:49:14,000 --> 00:49:22,000 then we argue that we somehow need to split up this list of pixels into parts. 357 00:49:22,000 --> 00:49:32,000 So, for that purpose, we select the pixel that has the largest distance from the line between the endpoints, 358 00:49:32,000 --> 00:49:38,000 and we split the list of pixels at that point into two sub-lists. 359 00:49:38,000 --> 00:49:44,000 And then we iterate this algorithm on each of the sub-lists. 360 00:49:44,000 --> 00:49:59,000 So, for instance, after this partitioning step, we repeat the first step and create the line between the first point and the corner point that we just selected. 361 00:49:59,000 --> 00:50:04,000 This yields the first sub-list. 362 00:50:04,000 --> 00:50:11,000 Again, we check the distance between the points in that sub-list and the estimated line. 363 00:50:11,000 --> 00:50:20,000 And in this case, we find that the distance is small enough, that all the distances are below an acceptance threshold. 364 00:50:20,000 --> 00:50:24,000 In that case, we stop the splitting process. 365 00:50:24,000 --> 00:50:34,000 For the second sub-list that was generated, that starts at the new corner point and ends at the endpoint of the list of edge pixels, 366 00:50:34,000 --> 00:50:40,000 we estimate the line by connecting the corner point with the endpoint. 367 00:50:40,000 --> 00:50:48,000 Then we calculate again the distances, and we find that some of the distances are too large. 368 00:50:48,000 --> 00:50:59,000 Again, we select the point that maximizes the distance to the estimated line, which is shown with a red ring. 369 00:50:59,000 --> 00:51:05,000 And at that point, we split up this list into another two sub-lists. 370 00:51:05,000 --> 00:51:12,000 And then we again start this algorithm on each of the two sub-lists that were generated. 371 00:51:12,000 --> 00:51:22,000 That means we create a line that connects the two corner points that define that sub-list, namely the first and the last point of the sub-list. 372 00:51:22,000 --> 00:51:32,000 Then we check whether the distances of all intermediate points are small enough, below an acceptance threshold, and if so, we stop the splitting process. 373 00:51:32,000 --> 00:51:41,000 If not, then we select the point that has the largest distance from that line and split up this list into two sub-lists. 374 00:51:41,000 --> 00:51:50,000 And like that we go on until we cannot split anything anymore. 375 00:51:50,000 --> 00:51:57,000 So if we apply this algorithm on our example image, we will find a result that looks as the right image. 376 00:51:57,000 --> 00:52:10,000 So the left image is the input, that each color refers to one list of edge pixels, and the right picture is the output of the Ramadaklas-Polka algorithm. 377 00:52:10,000 --> 00:52:19,000 So again, each color refers to one sub-list that is a result of that splitting approach. 378 00:52:19,000 --> 00:52:25,000 As we can see, the round shapes are split up into a lot of small segments. 379 00:52:25,000 --> 00:52:30,000 Each of those segments can be represented with a line segment. 380 00:52:30,000 --> 00:52:44,000 And the long straight edges are preserved. 381 00:52:44,000 --> 00:52:49,000 Another example for the other two example images are shown here. 382 00:52:49,000 --> 00:53:03,000 So now each sub-list of edge pixels that were generated is shown by a red line, and the end points are shown by yellow pixels. 383 00:53:03,000 --> 00:53:17,000 So the yellow pixels are the end points, and the red lines represent the list of edge pixels that were represented by this line segment or by this pixel list. 384 00:53:17,000 --> 00:53:30,000 As we can see, the complicated structures are now split into several line segments. 385 00:53:30,000 --> 00:53:39,000 If we zoom in at the university main building, we see how many small segments are created. 386 00:53:39,000 --> 00:53:54,000 We can see that, for instance, at the windows, we get individual line segments for the structures that we find here. 387 00:53:54,000 --> 00:54:00,000 Okay, so now we did the edge following, and we did the polyline segmentation. 388 00:54:00,000 --> 00:54:08,000 And now, as a last step, we do some line fitting. 389 00:54:08,000 --> 00:54:15,000 With the polyline segmentation, we obtained some list of edge pixels. 390 00:54:15,000 --> 00:54:28,000 And we obtained some first initial guess of a line that represents those pixels, namely the line that connects the two end points of the list of edge pixels. 391 00:54:28,000 --> 00:54:39,000 However, this estimate might be suboptimal. It is acceptable somehow, but there might be other lines which fit better to these pixels. 392 00:54:39,000 --> 00:54:52,000 As well, the result of the Huff Transform might also be suboptimal, especially if we use a cell partitioning of the Accumulator array that has two large cells. 393 00:54:52,000 --> 00:55:00,000 On the right-hand side, we see an example. So here, the black points are the edge pixels. 394 00:55:00,000 --> 00:55:05,000 The red line shows which of those are neighboring the list of edge pixels. 395 00:55:05,000 --> 00:55:11,000 And the side line shows the line that connects the two end points. 396 00:55:11,000 --> 00:55:20,000 And maybe this is not really optimal. Especially on the right-hand side, it is far away from most of the pixels. 397 00:55:20,000 --> 00:55:29,000 And we might consider that another line, say that one here in yellow, might fit better than the side line did. 398 00:55:29,000 --> 00:55:37,000 But how can we estimate such an optimal line? 399 00:55:37,000 --> 00:55:47,000 Well, we know that we have to represent the line with a normal vector, n, and an offset parameter, c. 400 00:55:47,000 --> 00:55:54,000 So what we need to do is, we need to somehow determine optimal values for n and c. 401 00:55:54,000 --> 00:55:58,000 But what means optimal in this case? 402 00:55:58,000 --> 00:56:04,000 Well, let us consider again a set of edge pixels and a certain line. 403 00:56:04,000 --> 00:56:10,000 Say this is our first guess. It does not matter how we created this first guess. 404 00:56:10,000 --> 00:56:16,000 So just a first guess, which we want to evaluate. How good is it? 405 00:56:16,000 --> 00:56:24,000 What we can do is, that we can calculate the orthogonal distance of each of those points from the list. 406 00:56:24,000 --> 00:56:28,000 Represented by the yellow dashed lines. 407 00:56:28,000 --> 00:56:34,000 And of course, for a good line, those distances would be small. 408 00:56:34,000 --> 00:56:38,000 In the best case, they would all be zero. 409 00:56:38,000 --> 00:56:45,000 But in practice, it happens that you never find a line that makes all those distances zero. 410 00:56:45,000 --> 00:56:48,000 But somehow, you have to find a compromise. 411 00:56:48,000 --> 00:57:03,000 So now we have to rotate this line a little bit and shift it in such a way that we find the best compromise among the distances that we get. 412 00:57:03,000 --> 00:57:13,000 So we want to find parameters n and c that minimize those distances. 413 00:57:13,000 --> 00:57:21,000 Since we cannot minimize all the distances at the same time, we need to formulate a compromise. 414 00:57:21,000 --> 00:57:28,000 And this can be done by summing up the square distances, d i^2, over all the pixels. 415 00:57:28,000 --> 00:57:38,000 And stating that we want to find those normal vector n and offset parameter c, such that this sum becomes minimal. 416 00:57:38,000 --> 00:57:41,000 So we might treat that as a kind of search problem. 417 00:57:41,000 --> 00:57:51,000 We are varying the normal vector n and the parameter c over a large possible number of values. 418 00:57:51,000 --> 00:58:04,000 And for each of those values, we check the distances of each pixel and then we select the solution with the smallest sum of square distances. 419 00:58:04,000 --> 00:58:10,000 If we do that, we have to consider that n needs to be a unit vector. 420 00:58:10,000 --> 00:58:12,000 So a vector of length 1. 421 00:58:12,000 --> 00:58:20,000 And this needs to be stated as an additional constraint in this optimization problem. 422 00:58:20,000 --> 00:58:30,000 So typically we add that as a constraint subject to the dot product of n with itself must be equal to 1. 423 00:58:30,000 --> 00:58:34,000 So the Euclidean length of n must be equal to 1. 424 00:58:34,000 --> 00:58:39,000 So this states a mathematical optimization problem with constraints. 425 00:58:39,000 --> 00:58:43,000 It states what we want to have. 426 00:58:43,000 --> 00:58:48,000 It does not necessarily or directly state how we get that. 427 00:58:48,000 --> 00:58:53,000 But it already states what is our goal of calculation. 428 00:58:53,000 --> 00:58:59,000 This method is known as the total least sum of squares approach. 429 00:58:59,000 --> 00:59:05,000 So sum of squares, so sum of square distances. 430 00:59:05,000 --> 00:59:10,000 And we want to minimize it, so we want to find the least sum of squares. 431 00:59:10,000 --> 00:59:16,000 And the word total comes from the fact that we minimize the orthogonal distances. 432 00:59:16,000 --> 00:59:26,000 So the next question is how can we solve this optimization problem? 433 00:59:26,000 --> 00:59:34,000 That means how can we find the values of n and c without searching blindly values? 434 00:59:34,000 --> 00:59:46,000 So the mathematical theory from optimization theory states that we have to determine or that we have to build the so called Lagrange function. 435 00:59:46,000 --> 00:59:53,000 The Lagrange function is a function that is composed out of the original function that we want to minimize. 436 00:59:53,000 --> 00:59:56,000 That means the sum over the di-squares. 437 00:59:56,000 --> 01:00:00,000 And adds an additional term for each constraint. 438 01:00:00,000 --> 01:00:11,000 First we have to reformulate the constraints in such a way that they have the form something, some function that depends on n and c equals 0. 439 01:00:11,000 --> 01:00:17,000 We can get that by subtracting 1 in the constraint. 440 01:00:17,000 --> 01:00:27,000 And then we add that constraint to the Lagrange function and multiply it with an additional variable, which is called the Lagrange variable. 441 01:00:27,000 --> 01:00:35,000 In this case it is called lambda. So for each constraint we get one of these Lagrange multipliers. 442 01:00:35,000 --> 01:00:48,000 So the Lagrange function in this case then becomes the sum of the di-squared minus lambda times the dot product of n with itself minus 1. 443 01:00:48,000 --> 01:01:04,000 So if we substitute di by the way we can evaluate di, namely as the dot product of n and xi plus c, then we get the result in the line below. 444 01:01:04,000 --> 01:01:23,000 Now the optimization theory states that the solution of our optimization problem can be found by zeroing the Lagrange function with respect to the original variables of the problem. 445 01:01:23,000 --> 01:01:32,000 That means we have to calculate the partial derivative of the Lagrange function with respect to the parameter c. 446 01:01:32,000 --> 01:01:36,000 And with respect to the normal vector n. 447 01:01:36,000 --> 01:01:48,000 And for each of these derivatives we must zero them and then if we are lucky we can resolve the system of equations with respect to the unknown variables n and c. 448 01:01:48,000 --> 01:01:56,000 So let's do that. First let's start with the partial derivative of the Lagrange function with respect to c. 449 01:01:56,000 --> 01:02:23,000 We get just using the basic mathematical rules for calculating derivatives we get that this derivative is 2 times the sum over of the dot product of n and xi summing up over all pixels i plus 2 times the number of pixels capital C n times c. 450 01:02:23,000 --> 01:02:26,000 And this should be equal to zero. 451 01:02:26,000 --> 01:02:47,000 So if we zero it and resolve the resulting equation with respect to the unknown variable c we get that c must be equal to minus 1 over n times the dot product of times the sum of the dot product of n and xi. 452 01:02:47,000 --> 01:03:04,000 Or if we like to rewrite it we can also write it's minus 1 over n times the dot product of n and the sum of all xi. 453 01:03:04,000 --> 01:03:27,000 And if we want to rewrite that again it's nothing else than minus the dot product of n times the average position of the pixels. So 1 over n times the sum of xi which yields just the average position. 454 01:03:27,000 --> 01:03:37,000 In the next step we calculate the partial derivatives of the Lagrange function with respect to the two elements of the vector n. 455 01:03:37,000 --> 01:03:48,000 That means with respect to n1 and n2. Again just using the basic rules for calculating derivatives we get the two equations here. 456 01:03:48,000 --> 01:03:53,000 We both set those terms to zero. 457 01:03:53,000 --> 01:04:05,000 After that we substitute c by our previous result. That means by minus the dot product of n and the average x position or the average pixel position. 458 01:04:05,000 --> 01:04:29,000 So 1 over n times the sum of xi. And afterwards we somehow rearrange all the terms in such a way that we have terms like something times n1 plus something else times n2 equals something that does only depend on lambda. 459 01:04:29,000 --> 01:04:45,000 And by doing that we get the terms at the bottom of the slide. So as we can see those prefactors which are abbreviated with alpha, beta and gamma they only depend on the pixel coordinates. 460 01:04:45,000 --> 01:04:57,000 They only depend on the xi values. Please note that here for the xi values the second index refers to the first or second entry of the xi vector. 461 01:04:57,000 --> 01:05:07,000 So i refers to the index of the pixel that we address and 1 and 2 as the second index refers to the position in the vector. 462 01:05:07,000 --> 01:05:13,000 So the first element in the vector or the second element in the vector. 463 01:05:13,000 --> 01:05:25,000 So these terms alpha, beta and gamma only depend on the pixel coordinates that we know that we can derive or that we can calculate from the list of pixel positions that we have. 464 01:05:25,000 --> 01:05:47,000 So at the end if we simplify everything we get the equation alpha, beta, a row vector alpha, beta times n equals lambda times the first element of n and a row vector of beta and gamma times the vector n is equal to lambda times n2. 465 01:05:47,000 --> 01:06:02,000 So by rewriting that as a matrix vector equation we get that the matrix, the symmetric matrix that contains the entries alpha, beta, beta and gamma times the normal vector n is equal to lambda times n. 466 01:06:02,000 --> 01:06:09,000 So we have to search for at appropriate vectors n an appropriate value lambda. 467 01:06:09,000 --> 01:06:21,000 And if you go back to your math class you should have learned that these kinds of problems are known as eigenvalue eigenvector problems in mathematics. 468 01:06:21,000 --> 01:06:33,000 So n is known as a so called eigenvector and lambda is called as a so called eigenvalue. 469 01:06:33,000 --> 01:06:54,000 So that means what we have to do is we have to solve this eigenvector eigenvalue problem and find an appropriate eigenvalue for lambda and an appropriate eigenvector n for n. 470 01:06:54,000 --> 01:07:10,000 So from the mathematical theory of eigenvalues and eigenvector it turns out that if we found an eigenvector n for this matrix then we can also scale this eigenvector as we like. 471 01:07:10,000 --> 01:07:19,000 So if n is an eigenvector then 2 times n is also an eigenvector and 3 times n and 4 times n and 0.1 times n as well. 472 01:07:19,000 --> 01:07:28,000 Since we know that n in our case must be a unit vector we can choose an eigenvector that has unit length. 473 01:07:28,000 --> 01:07:40,000 That means we search an eigenvector and if it doesn't have unit length then we scale it in such a way that the scaled vector has unit length. 474 01:07:40,000 --> 01:07:51,000 However still there are two solutions. For these kind of matrices as we find here we always find two real eigenvalues. 475 01:07:51,000 --> 01:08:03,000 So two numbers lambda1 and lambda2 and appropriate eigenvectors for these two values which solve the system of equation. 476 01:08:03,000 --> 01:08:14,000 For these kind of matrices it turns out that all the eigenvalues are real valued and that all the eigenvalues are non-negative. 477 01:08:14,000 --> 01:08:18,000 Though they might be zero, they might be positive but they are never negative. 478 01:08:18,000 --> 01:08:33,000 That is a result from mathematics which I don't like to derive here which is maybe also a little bit beyond the scope of a basic mathematics class but it's still true. 479 01:08:33,000 --> 01:08:41,000 So if you are interested in the result then consider the literature for it to check how this result is derived. 480 01:08:41,000 --> 01:08:57,000 But the fact is these kind of matrices always have as many eigenvalues as there are rows and columns and they are all non-negative real numbers. 481 01:08:57,000 --> 01:09:02,000 So which of those eigenvalues should we choose? 482 01:09:02,000 --> 01:09:17,000 Well a little bit of geometric analysis turns out that the smaller eigenvalue, so in this case lambda2, minimizes the distances. 483 01:09:17,000 --> 01:09:23,000 While lambda1, the larger eigenvalue, maximizes the distances somehow. 484 01:09:23,000 --> 01:09:28,000 And we don't want to maximize the distances, we want to minimize them. 485 01:09:28,000 --> 01:09:45,000 So we choose the smaller eigenvalue lambda2 and for that eigenvalue lambda2 we calculate the corresponding eigenvector with unit length which provides the vector n. 486 01:09:45,000 --> 01:09:52,000 So calculating eigenvectors and eigenvalues is not that difficult for a 2x2 matrix. 487 01:09:52,000 --> 01:10:06,000 If you are interested, look in the mathematics literature for it, then you will find that you just have to solve a quadratic problem. 488 01:10:06,000 --> 01:10:14,000 And that's always possible and therefore it is not that difficult to solve that here. 489 01:10:14,000 --> 01:10:25,000 If you don't want to derive it yourself, you find software libraries that are doing this job for you in more or less all software packages on linear algebra. 490 01:10:25,000 --> 01:10:33,000 Of course including Matlab which also provides a method to do that. 491 01:10:33,000 --> 01:10:42,000 So now let's summarize this total d-squares method to estimate the best line that fits a certain set of points. 492 01:10:42,000 --> 01:10:51,000 The first step is that we need to calculate the pixel positions of all the pixels that we want to consider. 493 01:10:51,000 --> 01:10:55,000 So what we have to do is we have to calculate some sums. 494 01:10:55,000 --> 01:11:09,000 The sum over all x positions, the sum over all y positions, the sum over the squares of x positions, the sum over the squares of y positions and the sum of the mixed x, y product. 495 01:11:09,000 --> 01:11:19,000 And these sums are needed to establish these entries alpha, beta and gamma for the matrix that we need to create. 496 01:11:19,000 --> 01:11:34,000 Then in the second step we create this matrix, alpha, beta, beta, gamma and calculate the eigenvalues lambda and the eigenvectors that are corresponding to these eigenvalues. 497 01:11:34,000 --> 01:11:46,000 We select the smaller among the two eigenvalues and pick the appropriate eigenvector for that eigenvalue and normalize it to unit length. 498 01:11:46,000 --> 01:11:52,000 Once we have done that we calculate the offset value c based on n. 499 01:11:52,000 --> 01:12:01,000 So just by taking c equal to minus the dot product of n and the average pixel position. 500 01:12:01,000 --> 01:12:09,000 So and then you are done. Then you know the line that represents a certain set of pixels. 501 01:12:09,000 --> 01:12:18,000 But still you don't know where the endpoints are if you are interested in line segment and not an infinitely extended line. 502 01:12:18,000 --> 01:12:27,000 If you want to do that then you need additional techniques which you will find in the assignments and the exercise classes. 503 01:12:27,000 --> 01:12:40,000 Now we analyze again, we analyze somehow where the pixels are located along the line and then we select which points are the endpoints. 504 01:12:40,000 --> 01:12:44,000 So have a look at the result of this calculation. 505 01:12:44,000 --> 01:12:50,000 So here is a small clipping of this example image of the main building of KIT. 506 01:12:50,000 --> 01:12:59,000 The left upper image shows the result of the line segmentation, of the polyline segmentation step of the Ramad-Douglas-Poiker algorithm. 507 01:12:59,000 --> 01:13:12,000 And for instance at that position here we find that the line that is actually creating these edges is vertical. 508 01:13:12,000 --> 01:13:19,000 But these lines that connect endpoints are not vertical but they are skewed a little bit. 509 01:13:19,000 --> 01:13:28,000 After we have fitted all those lines again with the total least squares approach we find that the situation has improved. 510 01:13:28,000 --> 01:13:39,000 And that now the lines that we have estimated are really aligned with the structure that they should represent. 511 01:13:39,000 --> 01:13:45,000 So it is a small improvement but it is an improvement. 512 01:13:45,000 --> 01:13:53,000 To summarize this part of total least squares let us make a comparison. 513 01:13:53,000 --> 01:13:57,000 You might know already least squares methods from other lectures. 514 01:13:57,000 --> 01:14:03,000 For instance from statistics lectures in which as well least squares methods are often used. 515 01:14:03,000 --> 01:14:08,000 Those are typically so called ordinary least squares method. 516 01:14:08,000 --> 01:14:19,000 And the question is what makes the difference between the total least squares method that we consider here and these ordinary least squares method that you might know from other lectures. 517 01:14:19,000 --> 01:14:27,000 Well, the task itself is already a little bit different in the two methods. 518 01:14:27,000 --> 01:14:38,000 In the total least squares approach we deal with points in the two dimensional plane and we don't care about the orientation of the coordinate system in that plane. 519 01:14:38,000 --> 01:14:44,000 So we don't treat the x-axis in a different way than the y-axis. 520 01:14:44,000 --> 01:14:50,000 The result should be independent of the choice of the coordinate axis. 521 01:14:50,000 --> 01:15:02,000 While in the ordinary least squares we are not fitting a line in a two dimensional plane but we fit a linear function to a set of points. 522 01:15:02,000 --> 01:15:06,000 And here x and y have different meanings. 523 01:15:06,000 --> 01:15:11,000 So x is the independent variable, y is the dependent variable. 524 01:15:11,000 --> 01:15:17,000 And we want to find the relationship in which way y depends on x. 525 01:15:17,000 --> 01:15:20,000 So we treat y as a function of x. 526 01:15:20,000 --> 01:15:23,000 So that's something different. 527 01:15:23,000 --> 01:15:25,000 We are not interested in the line as such. 528 01:15:25,000 --> 01:15:33,000 We are interested in a linear function and the line is just the graph that visualizes the linear function. 529 01:15:33,000 --> 01:15:46,000 As a consequence in the ordinary least squares approach, the graph that we get, the function that we get, of course depends on the choice of the coordinate system. 530 01:15:46,000 --> 01:15:50,000 So that's relevant. 531 01:15:50,000 --> 01:15:57,000 So the method is anisotropic if you went to state it in one word. 532 01:15:57,000 --> 01:16:04,000 While in the total least squares the result is independent of the choice of coordinate systems in the two dimensional plane. 533 01:16:04,000 --> 01:16:07,000 So this result is isotropic. 534 01:16:07,000 --> 01:16:11,000 Yeah, and what do we minimize? 535 01:16:11,000 --> 01:16:18,000 In the total least squares as we have seen we minimize the orthogonal distance of the points to the line. 536 01:16:18,000 --> 01:16:22,000 In the ordinary least squares we do not minimize an orthogonal distance. 537 01:16:22,000 --> 01:16:38,000 What we minimize is the distance between the point and the linear function just considering the difference in y values, not considering differences in x values. 538 01:16:38,000 --> 01:16:53,000 And that, of course, is an important difference so that the result that you would get with ordinary least squares looks different than what you get with total least squares. 539 01:16:53,000 --> 01:17:04,000 And furthermore with ordinary least squares you will not be able to fit something like vertical lines. 540 01:17:04,000 --> 01:17:09,000 With total least squares no problem. 541 01:17:09,000 --> 01:17:16,000 So if you are dealing with least squares methods always check which case you are facing. 542 01:17:16,000 --> 01:17:21,000 Are you facing a case where the coordinate system doesn't matter? 543 01:17:21,000 --> 01:17:25,000 Then choose total least squares. 544 01:17:25,000 --> 01:17:38,000 If you are treating a case where you have a functional relationship between y and x then choose ordinary least squares. 545 01:17:38,000 --> 01:17:42,000 The total least squares approach has many advantages. 546 01:17:42,000 --> 01:17:45,000 It can be calculated very efficiently. 547 01:17:45,000 --> 01:17:47,000 It yields nice results. 548 01:17:47,000 --> 01:17:54,000 However, it suffers from one problem because it is not robust with respect to outliers. 549 01:17:54,000 --> 01:17:55,000 What is an outlier? 550 01:17:55,000 --> 01:18:01,000 Well, an outlier is a point that deviates very much from the line. 551 01:18:01,000 --> 01:18:10,000 That means if we consider the points which you find below, total least squares performs well and there is no problem. 552 01:18:10,000 --> 01:18:14,000 But there is also not a single outlier there. 553 01:18:14,000 --> 01:18:20,000 If we vary one point to that position this point becomes an outlier. 554 01:18:20,000 --> 01:18:25,000 It is somehow an errorless point if you like what I like to say. 555 01:18:25,000 --> 01:18:34,000 And total least square is not good in dealing with these points because the estimate that it would yield looks more like that. 556 01:18:34,000 --> 01:18:41,000 And as you can see the line is queued and does not really fit to the majority of points anymore. 557 01:18:41,000 --> 01:18:47,000 So what we need is a method that is more robust with respect to these outliers. 558 01:18:47,000 --> 01:18:54,000 So that a single outlier does not have such a strong impact on the solution. 559 01:18:54,000 --> 01:18:57,000 There are several ways to implement these ideas. 560 01:18:57,000 --> 01:19:01,000 One idea is to reduce the influence of outliers. 561 01:19:01,000 --> 01:19:10,000 That means those methods still consider outliers and outliers still have an influence, but not that strong as with total least squares. 562 01:19:10,000 --> 01:19:15,000 One of those methods is called M estimators and it is discussed in the lecture. 563 01:19:15,000 --> 01:19:19,000 Another way is to ignore outliers completely. 564 01:19:19,000 --> 01:19:21,000 There are as well several methods. 565 01:19:21,000 --> 01:19:25,000 One of that is called RANSAK which will be discussed in the lecture. 566 01:19:25,000 --> 01:19:33,000 So first of all let's have a look at M estimators. 567 01:19:33,000 --> 01:19:41,000 When we consider total least squares we minimize the sum over the squared distances. 568 01:19:41,000 --> 01:19:49,000 So the squared distance is plotted on the right hand side with a red curve. 569 01:19:49,000 --> 01:19:53,000 This one here. 570 01:19:53,000 --> 01:20:06,000 And obviously the larger the distance is, so that means the farther away from zero on the horizontal axis we are, the stronger increases di^2. 571 01:20:06,000 --> 01:20:14,000 That means for small deviations di^2 still is small. 572 01:20:14,000 --> 01:20:22,000 For large distances di, di^2 becomes huge. 573 01:20:22,000 --> 01:20:28,000 Now we might think why do we use the square function here? 574 01:20:28,000 --> 01:20:33,000 Can't we use functions that do not grow that quickly as the square function? 575 01:20:33,000 --> 01:20:36,000 And this gives the idea for M estimators. 576 01:20:36,000 --> 01:20:50,000 So instead of using the square function we use another function, say rho, which is often called the loss function of the estimator, and apply di to rho. 577 01:20:50,000 --> 01:20:56,000 And for rho we use functions that do not grow as quickly as the square function. 578 01:20:56,000 --> 01:21:04,000 But still in the vicinity of zero we want to preserve the nice properties of the square function, 579 01:21:04,000 --> 01:21:15,000 so we search for those functions which have similar behavior for small values and which do not grow that quickly for large values of di. 580 01:21:15,000 --> 01:21:21,000 On the right hand side we see several possible options and the formula. 581 01:21:21,000 --> 01:21:28,000 So the standard way of total e^2 would be the square function, which is shown here. 582 01:21:28,000 --> 01:21:36,000 So the farther away from zero we get the stronger the square function looks like. 583 01:21:36,000 --> 01:21:42,000 The first way to reduce this influence is to use the so-called Huber function. 584 01:21:42,000 --> 01:21:45,000 The Huber function is drawn in blue. 585 01:21:45,000 --> 01:21:54,000 The Huber function is exactly the same as the squared error term in the vicinity of zero. 586 01:21:54,000 --> 01:22:03,000 But from a certain point on, from here on, it does not grow radically, but it only grows linearly. 587 01:22:03,000 --> 01:22:07,000 So this can be also seen here in the formula. 588 01:22:07,000 --> 01:22:18,000 For in the vicinity of zero, in the first of these two cases, the function grows with a square of d. 589 01:22:18,000 --> 01:22:29,000 And in the second case, if we are away from zero, then the growth depends linearly on d. 590 01:22:29,000 --> 01:22:40,000 The parameter k, that defines the switching point between quadratic and linear behavior, 591 01:22:40,000 --> 01:22:44,000 has to be chosen manually according to the problem. 592 01:22:44,000 --> 01:22:53,000 So k has to be chosen in such a way that typical inliers are smaller than k, have distances smaller than k, 593 01:22:53,000 --> 01:22:57,000 and typical outliers have distances larger than k. 594 01:22:57,000 --> 01:23:03,000 And obviously the growth of the Huber function is smaller than the growth of the square function. 595 01:23:03,000 --> 01:23:17,000 That means points with large distances, outliers, have smaller impact on the solution than for the squared loss function. 596 01:23:17,000 --> 01:23:29,000 We can even further decrease the influence of large outliers using other error terms. 597 01:23:29,000 --> 01:23:35,000 For instance, one choice would be the Cauchy loss function that is plotted in green. 598 01:23:35,000 --> 01:23:43,000 In the vicinity of zero, it approximates the squared error term. 599 01:23:43,000 --> 01:23:51,000 But for larger values of d, it does not grow that much anymore. 600 01:23:51,000 --> 01:24:03,000 So the growth itself, so the slope of the function decreases for larger values of d. 601 01:24:03,000 --> 01:24:11,000 That means here, cross-outliers have even less influence on the result than for the Huber loss function. 602 01:24:11,000 --> 01:24:21,000 And another choice, which is even more strict than Cauchy, is the Tucky loss function plotted in violet, 603 01:24:21,000 --> 01:24:30,000 where points which are far away do not have a stronger impact than points, for instance, at this position. 604 01:24:30,000 --> 01:24:37,000 So here the influence of cross-outliers is limited, even completely limited. 605 01:24:37,000 --> 01:24:45,000 All these functions have some parameters, so for instance the kappa in the case of the Cauchy function, 606 01:24:45,000 --> 01:24:51,000 the k in case of the Huber function, the a in case of the Tucky function, 607 01:24:51,000 --> 01:25:03,000 and those values have to be chosen appropriately depending on the problem. 608 01:25:03,000 --> 01:25:08,000 That was the M estimator. The second option is to ignore outliers completely. 609 01:25:08,000 --> 01:25:15,000 And how can we implement that? Well, we could also say it's a kind of M estimator, 610 01:25:15,000 --> 01:25:22,000 but this M estimator ignores points completely, which are far away than a certain threshold. 611 01:25:22,000 --> 01:25:29,000 That means we could say we want to minimize the sum over sigma of di, 612 01:25:29,000 --> 01:25:36,000 where sigma is a function that is equal to zero if the distance is smaller than a threshold, 613 01:25:36,000 --> 01:25:41,000 or it's one if the distance is larger than a threshold. 614 01:25:41,000 --> 01:25:49,000 That means what we do with this kind of loss function is we just count the outliers. 615 01:25:49,000 --> 01:25:56,000 We just count those points which are far away from the line than a certain threshold theta is. 616 01:25:56,000 --> 01:26:04,000 Theta again is a parameter that we have to choose application dependent in an appropriate manner. 617 01:26:04,000 --> 01:26:08,000 So here the distance itself doesn't matter at all. 618 01:26:08,000 --> 01:26:15,000 It just matters whether it's larger or smaller than the threshold. 619 01:26:15,000 --> 01:26:22,000 So for this case here, that we can see here at the bottom, 620 01:26:22,000 --> 01:26:31,000 we can argue that for this choice of line, if we assume a certain threshold size theta, 621 01:26:31,000 --> 01:26:39,000 which is shown here by the dashed lines, so the dashed lines have a distance from the solid line of exactly theta, 622 01:26:39,000 --> 01:26:42,000 we just count the number of inliers and the number of outliers. 623 01:26:42,000 --> 01:26:50,000 So in this case the number of outliers is two. So the error term will yield an error of two. 624 01:26:50,000 --> 01:27:01,000 For that choice that we can see here, well if we count how many points are outside of this band between the two dashed lines, 625 01:27:01,000 --> 01:27:10,000 we find it's eight points. So the error term is eight. That's a bad fit. 626 01:27:10,000 --> 01:27:15,000 This idea is known in literature as the RANSAC approach. 627 01:27:15,000 --> 01:27:21,000 RANSAC stands for random sample consensus. So why random sample consensus? 628 01:27:21,000 --> 01:27:27,000 Because the algorithm that implements this idea uses some randomization. 629 01:27:27,000 --> 01:27:33,000 It randomly picks two points from the set of edge pixels. 630 01:27:33,000 --> 01:27:38,000 Then it creates a line that passes through these two points. 631 01:27:38,000 --> 01:27:43,000 So afterwards it checks how many outliers do we have for that line. 632 01:27:43,000 --> 01:27:49,000 And this process is repeated again and again. And since we always pick randomly two points, 633 01:27:49,000 --> 01:27:54,000 the solutions that we get in each trial will be different. 634 01:27:54,000 --> 01:28:03,000 And at the end we check which of the trials yielded the best solution, the smallest number of outliers. 635 01:28:03,000 --> 01:28:06,000 And that's the solution of the RANSAC algorithm. 636 01:28:06,000 --> 01:28:12,000 Of course, if we like, we can add one additional step. 637 01:28:12,000 --> 01:28:17,000 Once we found the best solution, we know which points are outliers and which points are inliers. 638 01:28:17,000 --> 01:28:30,000 Based on that, we could do a total least sum of squares fit only on the inliers to obtain an even better solution. 639 01:28:30,000 --> 01:28:36,000 Here is a procedure shown as a kind of processing plot. 640 01:28:36,000 --> 01:28:45,000 So we start in the first trial, pick randomly two points, estimate the line parameters, evaluate the error and obtain the number of outliers. 641 01:28:45,000 --> 01:28:51,000 Then we repeat that in a second trial, independent of the first trial and so on and so on. 642 01:28:51,000 --> 01:28:53,000 Up to the case trial. 643 01:28:53,000 --> 01:29:00,000 Usually based on an assumption of how many outliers we typically expect to have, 644 01:29:00,000 --> 01:29:15,000 we determine the number of trials in such a way that the probability of randomly picking two outliers all the time in all the trials is so small that we can neglect it. 645 01:29:15,000 --> 01:29:26,000 After we have done that, we pick the trial with the smallest number of outliers and that yields the line parameters. 646 01:29:26,000 --> 01:29:31,000 Here is an example again, a sequence of points. 647 01:29:31,000 --> 01:29:40,000 In the first trial, we might pick the two red points and fit a line to them and then check how many outliers do we get. 648 01:29:40,000 --> 01:29:45,000 In this case, for that choice of theta, we get six outliers. 649 01:29:45,000 --> 01:29:54,000 So six points outlier of this band that is bounded by the dashed lines. 650 01:29:54,000 --> 01:30:03,000 Independent of this first choice, we repeat the process with the second trial and in the second trial we pick randomly another point. 651 01:30:03,000 --> 01:30:09,000 So these might be the points which are now shown in red. 652 01:30:09,000 --> 01:30:16,000 Again, we connect these two points, fit a line to them and count the number of outliers. 653 01:30:16,000 --> 01:30:26,000 In that case, we find seven points outside of the band which is bounded by the two dashed lines. 654 01:30:26,000 --> 01:30:36,000 In a third trial, we might pick these two points shown in red, estimate a line and count the outliers equal to three. 655 01:30:36,000 --> 01:30:44,000 And in a fourth trial, we might pick those two points which are unfortunately one of them is an outlier. 656 01:30:44,000 --> 01:30:50,000 We again estimate the line that connects both, count the number of outliers and get ten. 657 01:30:50,000 --> 01:31:00,000 So at the end, we select the trial with the smallest number of outliers and we find, yeah, it was the trial number three. 658 01:31:00,000 --> 01:31:11,000 That means that becomes the overall solution of the RANSAC estimate. 659 01:31:11,000 --> 01:31:18,000 Let us compare the properties of the two estimators for robust estimation, the M estimator and the RANSAC. 660 01:31:18,000 --> 01:31:29,000 The idea in the M estimator is to use a loss function that puts less influence on points that are far away on the outliers. 661 01:31:29,000 --> 01:31:38,000 So the outliers still have an influence on the solution, but the solution is smaller than for the sum of squares approach. 662 01:31:38,000 --> 01:31:42,000 In RANSAC, we completely ignore the outliers. 663 01:31:42,000 --> 01:31:47,000 We just consider inliers and ignore the outliers completely. 664 01:31:47,000 --> 01:31:52,000 So it doesn't matter how far a point an outlier is away from the line. 665 01:31:52,000 --> 01:32:00,000 The influence does not depend on that. The influence is just zero. 666 01:32:00,000 --> 01:32:10,000 The parameters which we need to determine to use that method is, well, for the M estimator, we have to determine the error terms or the loss function 667 01:32:10,000 --> 01:32:14,000 and maybe the width parameter of the loss function. 668 01:32:14,000 --> 01:32:25,000 For the RANSAC, we need to choose an acceptance threshold and we need to determine a number of trials that we run RANSAC. 669 01:32:25,000 --> 01:32:39,000 And the algorithm, well, which algorithm do we use for the M estimator that becomes a numerical optimizer that minimizes a kind of weighted least squares. 670 01:32:39,000 --> 01:32:49,000 So the implementation of the M estimator is traced back to the total least squares method, but with some additional weights for all the points. 671 01:32:49,000 --> 01:33:01,000 And that yields a numeric iterative solution algorithm that calculates the solution in an iterative manner. 672 01:33:01,000 --> 01:33:08,000 For RANSAC, we make repeated guesses from pairs of points and then check the number of outliers. 673 01:33:08,000 --> 01:33:24,000 So, I've implemented or we have implemented together with Ahivi a demo tool that allows to play around with these methods and to check how the solutions change. 674 01:33:24,000 --> 01:33:31,000 Unfortunately, in this slideshow, it is hard to integrate this tool. 675 01:33:31,000 --> 01:33:47,000 Therefore, I will create an extra video with just this demo tool and put it on Ilya. So please check this extra video with a demo tool. 676 01:33:47,000 --> 01:33:56,000 Now we have seen how we can fit lines and how we can determine polylines from edge images. 677 01:33:56,000 --> 01:34:01,000 However, lines are not the only structures that are relevant in practice. 678 01:34:01,000 --> 01:34:08,000 Often we want to estimate circles or pieces of circles and ellipses to structures. 679 01:34:08,000 --> 01:34:15,000 So the question is how can we fit those structures to edge lists? 680 01:34:15,000 --> 01:34:22,000 Well, have a look at those on the subsequent slides. 681 01:34:22,000 --> 01:34:28,000 First of all, we need to know how we can determine or how can we represent circles and ellipses. 682 01:34:28,000 --> 01:34:35,000 So for lines we already know that there is a normal form which is very suitable for the estimation task. 683 01:34:35,000 --> 01:34:39,000 For circles, the representation must be different. 684 01:34:39,000 --> 01:34:46,000 In a classical way, a circle is represented as a set of points that all have the same distance from a center. 685 01:34:46,000 --> 01:34:53,000 So what we need to know is the center point and the radius of the circle. 686 01:34:53,000 --> 01:34:56,000 So this classical approximation would look like that. 687 01:34:56,000 --> 01:35:08,000 So if we want to consider all points x, y, which are on a certain circle, which is defined by its midpoint, m1, m2, and its radius r, 688 01:35:08,000 --> 01:35:20,000 then we would say the set of all points for which x minus m1 squared plus y minus m2 squared minus r squared equals 0 describes all those points. 689 01:35:20,000 --> 01:35:32,000 Or in other words, the set of all points x, y, which have a Euclidean distance equal to r, describe that circle. 690 01:35:32,000 --> 01:35:41,000 If we want to determine the distance of a point from the circle, what do we do? 691 01:35:41,000 --> 01:35:51,000 Say we want to consider this point vector x and calculate the smallest distance of that point from the circle. 692 01:35:51,000 --> 01:36:00,000 What we do is we calculate the distance of that point from the center point of the circle and subtract the radius. 693 01:36:00,000 --> 01:36:14,000 That means the Euclidean distance of a point x, y from the circle becomes the square root of x minus m1 squared plus y minus m2 squared minus r. 694 01:36:14,000 --> 01:36:20,000 This is a signed number, of course, so the sign tells us whether we are inside or outside of the circle. 695 01:36:20,000 --> 01:36:26,000 If you are not interested in the sign itself, we take the absolute value of this number. 696 01:36:26,000 --> 01:36:38,000 So the first term with the square root is just the Euclidean distance between the point x and the center point, and this distance is compared with the radius of the circle. 697 01:36:38,000 --> 01:36:44,000 This is known as the Euclidean distance of a point from the circle. 698 01:36:44,000 --> 01:36:52,000 However, in practice it is useful to consider also another kind of distance measure, 699 01:36:52,000 --> 01:37:01,000 which is not a Euclidean distance for sure, but a related distance, which is also known as the algebraic distance. 700 01:37:01,000 --> 01:37:12,000 In that distance we do not calculate the Euclidean distance between x and m, between the point x and the center point, but the squared Euclidean distance, 701 01:37:12,000 --> 01:37:16,000 and subtract the squared radius. 702 01:37:16,000 --> 01:37:22,000 Of course, for all points on the circle this distance also is equal to zero. 703 01:37:22,000 --> 01:37:33,000 And for all the points outside of the circle, which are not exactly on the circle, this distance grows with increasing distance from the circle. 704 01:37:33,000 --> 01:37:34,000 That makes sense. 705 01:37:34,000 --> 01:37:38,000 But indeed it is not the same as the Euclidean distance. 706 01:37:38,000 --> 01:37:49,000 But the advantage is that we get rid of the square root, and square roots are hard to deal with in efficient calculations. 707 01:37:49,000 --> 01:38:04,000 If we compare these two distance measurements, we find that the distances that are calculated for certain points look like that. 708 01:38:04,000 --> 01:38:13,000 Here on the plot we see on the horizontal axis the distance or the position of a real point. 709 01:38:13,000 --> 01:38:19,000 Zero would mean the point is located in the center, at the center point of the circle. 710 01:38:19,000 --> 01:38:25,000 R means it is located on the circle, so at a distance of R from the center. 711 01:38:25,000 --> 01:38:31,000 And all the points that are larger than R means the point is outside of the circle. 712 01:38:31,000 --> 01:38:42,000 And the vertical axis, or the two vertical axes, better to say, represent the distance measure that is given by either the Euclidean distance in blue, 713 01:38:42,000 --> 01:38:45,000 or the ultra-biotic distance in red. 714 01:38:45,000 --> 01:38:53,000 For the Euclidean distance we obtain a result that is intuitive in what we expect. 715 01:38:53,000 --> 01:39:03,000 So the error increases in a linear way as far away from the circle we get the larger the Euclidean distance is. 716 01:39:03,000 --> 01:39:14,000 Compared to that, the algebraic distance increases faster for points outside of the circle and increases not that fast for points inside of the circle. 717 01:39:14,000 --> 01:39:21,000 However, for points which are close to the circle, both distances are very similar. 718 01:39:21,000 --> 01:39:30,000 So both distance measures do not deviate considerably as long as we are close to the circle itself. 719 01:39:30,000 --> 01:39:39,000 But for points which are far away from the circle, both distances differ considerably. 720 01:39:39,000 --> 01:39:57,000 Now let's use again some method to estimate the circle parameters from a set of points, from a set of noisy points which are scattered around a circle. 721 01:39:57,000 --> 01:40:06,000 So here we find the points in black, and as we can see they deviate rather strong from the real circle. 722 01:40:06,000 --> 01:40:11,000 So the impact of the algebraic distance here becomes apparent. 723 01:40:11,000 --> 01:40:18,000 And if we start to minimize the squared distance, the squared sum of distance of all the points, 724 01:40:18,000 --> 01:40:25,000 in order to obtain the parameters of the circle, we get two different solutions. 725 01:40:25,000 --> 01:40:38,000 If we use the Euclidean distance, we get a radius that is smaller than what we get when we use the algebraic distance to solve the minimization task. 726 01:40:38,000 --> 01:40:49,000 Of course, this is what we expect, because in the algebraic distance points that are outside of the circle have more impact on the solution, 727 01:40:49,000 --> 01:40:57,000 because the algebraic distance grows more rapidly for points outside of the circle than for points inside of the circle. 728 01:40:57,000 --> 01:41:01,000 And then we get a deviation like that, as we can see it here. 729 01:41:01,000 --> 01:41:09,000 However, if the points are not scattered around in such a strong manner as we can see it here in the example, 730 01:41:09,000 --> 01:41:18,000 but if the points are just scattered a little bit around the real circle, then the influence of the algebraic distance remains rather small, 731 01:41:18,000 --> 01:41:22,000 and the results are almost the same. 732 01:41:22,000 --> 01:41:31,000 That means if the points are spread a lot, then it is better to use the Euclidean distance, although the computation is more difficult. 733 01:41:31,000 --> 01:41:40,000 If the points are not spread very much around the circle, then the algebraic distance is very suitable. 734 01:41:40,000 --> 01:41:46,000 And the advantage of the algebraic distance is that the computation of the circle is much easier. 735 01:41:46,000 --> 01:41:58,000 So, if you want to minimize the square Euclidean distance for a circle, then we cannot solve that analytically. 736 01:41:58,000 --> 01:42:03,000 We do not find a closed-form solution, but we need a numerical optimization. 737 01:42:03,000 --> 01:42:10,000 That means an iterative approach that needs several iterations to find the solution. 738 01:42:10,000 --> 01:42:16,000 For the algebraic distance, that is not the case, so we find a closed-form solution. 739 01:42:16,000 --> 01:42:20,000 For that purpose, we rewrite the algebraic distance a little bit. 740 01:42:20,000 --> 01:42:26,000 Instead of writing x minus m1 squared plus y minus m2 squared minus r3 squared, 741 01:42:26,000 --> 01:42:32,000 we can multiply these brackets and then rearrange all the terms. 742 01:42:32,000 --> 01:42:43,000 Then we get equal representation, first terms that do not depend on the parameters of the circle itself, that is x squared plus y squared, 743 01:42:43,000 --> 01:42:53,000 then parameters or some terms which do not depend on x and y, this is m1 squared plus m2 squared minus r squared, 744 01:42:53,000 --> 01:43:03,000 then some terms that depend on x, which is minus 2m1 times x, and some terms that depend on y, which is minus 2m2 times y. 745 01:43:03,000 --> 01:43:19,000 Now, if we rename minus 2 times m1 as a, minus 2 times m2 as b, and m1 squared plus m2 squared minus r squared as c, 746 01:43:19,000 --> 01:43:27,000 then we get an algebraic representation that has a form ax plus by plus c plus x squared plus y squared. 747 01:43:27,000 --> 01:43:34,000 We find that there is a unique mapping between the old parameters of the circle m1, m2, r, 748 01:43:34,000 --> 01:43:39,000 and the new parameters with which we represent the circle a, b, and c. 749 01:43:39,000 --> 01:43:51,000 So a is just minus 2 times m1, b is equal to minus 2 times m2, and c is equal to m1 squared plus m2 squared minus r squared. 750 01:43:51,000 --> 01:44:05,000 So there is a unique mapping between these two parameter sets, so that we can represent a circle either using m1, m2, and r, or a, b, and c. 751 01:44:05,000 --> 01:44:18,000 So now what we need to do is we want to minimize the sum of the squared algebraic distance of all the points that we consider for the estimation of the circle. 752 01:44:18,000 --> 01:44:27,000 So we sum up over all the points, so i is the index for each point, and then we take the squared euclidean distance of that point. 753 01:44:27,000 --> 01:44:36,000 Now we can again calculate the partial derivatives of this term with respect to a, b, and c, 754 01:44:36,000 --> 01:44:43,000 zero those partial derivatives, rewrite everything and arrange it as a matrix vector multiplication. 755 01:44:43,000 --> 01:44:52,000 And after some steps of calculation, which are not really difficult, but which are a little bit too lengthy to show them here on the slide, 756 01:44:52,000 --> 01:45:04,000 we get as a result a matrix on the left hand side, a 3 by 3 matrix, times the vector of unknown parameters a, b, and c, equals some vector on the right hand side. 757 01:45:04,000 --> 01:45:14,000 And we see that the matrix on the left hand side and the vector on the right hand side are built out of the coordinates of the pixels that we want to consider. 758 01:45:14,000 --> 01:45:27,000 So that means once we know the pixels that we want to use to estimate the circle parameters, we can first create this matrix and this vector on the right hand side. 759 01:45:27,000 --> 01:45:37,000 Then we get a system of linear equations and as long as this matrix has full rank, which is typically fulfilled if we consider at least three points, 760 01:45:37,000 --> 01:45:48,000 then we can resolve the system of equations and get solutions for the unknown parameters a, b, and c. 761 01:45:48,000 --> 01:46:02,000 Once we have found a, b, and c in such a way, we can calculate m1, m2, and r in a straightforward manner, as it is shown here. 762 01:46:02,000 --> 01:46:15,000 So that means we find a closed form solution in terms of a linear system of equations to calculate the unknown parameters of the circle. 763 01:46:15,000 --> 01:46:21,000 And that is computationally rather efficient. 764 01:46:21,000 --> 01:46:26,000 Let's have a look at how this kind of circle estimation works in practice. 765 01:46:26,000 --> 01:46:37,000 And for that purpose, let's consider some work that has been done in our group many years ago by Christoph Speck, a former PhD student. 766 01:46:37,000 --> 01:46:47,000 He actually was working in the field of forensic computer vision and one of his tasks was to analyze bullet casings. 767 01:46:47,000 --> 01:47:03,000 So bullet casings are say parts of bullets of guns and once you shoot with a gun, the bullet cases get some typical structures impressed by the gun. 768 01:47:03,000 --> 01:47:11,000 So that means if someone was murdered and police comes, they are typically interested in analyzing these bullet casings 769 01:47:11,000 --> 01:47:24,000 because the structures that you find on the bullet cases give some hint or give some cues on the gun with which the bullet casing was shot. 770 01:47:24,000 --> 01:47:34,000 That means once you find a gun, you can use that gun in an experimental shooting and then collect the bullet casings there as well 771 01:47:34,000 --> 01:47:38,000 and then compare whether the structures that you find are the same. 772 01:47:38,000 --> 01:47:49,000 And then you know whether the bullet was shot from that gun that you found or whether that gun was not the one that was used for shooting. 773 01:47:49,000 --> 01:47:58,000 So in this case, to give you an idea, mainly those structures here in the center matter. 774 01:47:58,000 --> 01:48:04,000 The structures which you see outside here are not that relevant, but the structures inside. 775 01:48:04,000 --> 01:48:10,000 Because they are created by the gun when the bullet is shot. 776 01:48:10,000 --> 01:48:15,000 And as you can see, we find a lot of circular structures here. 777 01:48:15,000 --> 01:48:27,000 And if you have a considerable microscope or a certain camera with suitable magnification, we get pictures like that. 778 01:48:27,000 --> 01:48:34,000 And one of the tasks to analyze these structures is to estimate the circles. 779 01:48:34,000 --> 01:48:45,000 On the right hand side, we get the circles that are estimated using first some kind of half transform for circles to get some basic idea where circles are located 780 01:48:45,000 --> 01:48:52,000 and afterwards use some circle fitting with the algebraic distance to get the final fit. 781 01:48:52,000 --> 01:48:58,000 And we clearly see that those circles are nicely fit by this method. 782 01:48:58,000 --> 01:49:02,000 Now let's extend our findings to ellipses. 783 01:49:02,000 --> 01:49:05,000 Ellipses can be seen as generalized circles. 784 01:49:05,000 --> 01:49:08,000 They also have a center like circles. 785 01:49:08,000 --> 01:49:15,000 They have radii, but ellipses have two radii, two different radii, not just one like circles. 786 01:49:15,000 --> 01:49:19,000 And they have a turning angle that describes the orientation of the ellipse. 787 01:49:19,000 --> 01:49:25,000 The figure on the right hand side illustrates these properties of ellipses. 788 01:49:25,000 --> 01:49:29,000 So here we have the center point of the ellipse. 789 01:49:29,000 --> 01:49:31,000 Then we have two main axes. 790 01:49:31,000 --> 01:49:41,000 The axis with a larger radius R1 and the orthogonal axis with a smaller radius R2. 791 01:49:41,000 --> 01:49:52,000 Furthermore, these two axes do not necessarily have the same orientation as the coordinate axis of that plane, 792 01:49:52,000 --> 01:49:55,000 but they might be turned with respect to that coordinate axis. 793 01:49:55,000 --> 01:50:07,000 And this turning between the coordinate axis and the main axis of the ellipse is given by this turning angle theta. 794 01:50:07,000 --> 01:50:14,000 That means in total we need five parameters R1, R2, the two parameters of the vector M, 795 01:50:14,000 --> 01:50:20,000 and the angle theta to describe an ellipse. 796 01:50:20,000 --> 01:50:28,000 However, in practice it is sometimes convenient to use other parameterizations of ellipses. 797 01:50:28,000 --> 01:50:36,000 So as we have seen it for circles, where the usage of the radius and the center is just one way to represent an ellipse, 798 01:50:36,000 --> 01:50:40,000 there are also other ways to represent an ellipse. 799 01:50:40,000 --> 01:50:47,000 And the general way to do it is to use six parameters A, H, B, G, F and C. 800 01:50:47,000 --> 01:51:00,000 And to state that all points that meet this equation A times x squared plus H times xy plus B times y squared plus G times x plus F times y plus C equals 0 801 01:51:00,000 --> 01:51:05,000 are those points which are located on the ellipse. 802 01:51:05,000 --> 01:51:12,000 We need one additional constraint, namely that 4 times AB minus H squared is greater than 0. 803 01:51:12,000 --> 01:51:17,000 Only if this condition is met we are really facing an ellipse. 804 01:51:17,000 --> 01:51:29,000 If this constraint is not met, then we might be faced with for instance a hyperbola, which we are not interested in here. 805 01:51:29,000 --> 01:51:34,000 So let's again stick to this representation with six parameters. 806 01:51:34,000 --> 01:51:40,000 However, we have seen already that five parameters are actually enough to represent an ellipse. 807 01:51:40,000 --> 01:51:44,000 But in this modified representation we have six parameters. 808 01:51:44,000 --> 01:51:50,000 So there is one degree of freedom in the choice of parameters. 809 01:51:50,000 --> 01:51:55,000 That means if we find a suitable parameter set A, H, B, G, F, C, 810 01:51:55,000 --> 01:52:00,000 we could multiply all these parameters with the same positive constant 811 01:52:00,000 --> 01:52:09,000 and we would still get a representation of the same ellipse where the representation is different. 812 01:52:09,000 --> 01:52:15,000 So one ellipse has several representations of this form. 813 01:52:15,000 --> 01:52:20,000 This is not really nice if we want to find the best parameter set, 814 01:52:20,000 --> 01:52:33,000 because it introduces some problems in the optimization algorithm that should find the best fit. 815 01:52:33,000 --> 01:52:40,000 Therefore we want to somehow restrict these parameters in such a way 816 01:52:40,000 --> 01:52:47,000 that we get rid of this additional degree of freedom. 817 01:52:47,000 --> 01:52:50,000 There are several possibilities how we could do that. 818 01:52:50,000 --> 01:52:57,000 One way would be to state that A must be equal to 1, or that A plus B must be equal to 1, 819 01:52:57,000 --> 01:53:02,000 or that the Euclidean length of the parameter vector is equal to 1. 820 01:53:02,000 --> 01:53:08,000 In some papers you even find the possibility to set C equal to 1. 821 01:53:08,000 --> 01:53:13,000 However, the last version is a little bit tricky, 822 01:53:13,000 --> 01:53:17,000 because that solution is not invariant with respect to translation. 823 01:53:17,000 --> 01:53:25,000 So there are some ellipses which cannot be represented with this choice of C. 824 01:53:25,000 --> 01:53:31,000 There are several algorithms to solve this problem, how we can fit an ellipse to a set of points. 825 01:53:31,000 --> 01:53:38,000 Meanwhile the most popular one is the approach of Fitzgibbon, Pilou and Fischer. 826 01:53:38,000 --> 01:53:44,000 I don't want to go into details, because the mathematical details are a little bit more difficult. 827 01:53:44,000 --> 01:53:47,000 It is based on a generalist eigenvalue problem, 828 01:53:47,000 --> 01:53:55,000 so it becomes a little bit more difficult than what we have seen in the line fit problem, 829 01:53:55,000 --> 01:54:00,000 where we had a normal eigenvalue problem, now we have a generalist eigenvalue problem, 830 01:54:00,000 --> 01:54:05,000 and that is then more difficult to be solved numerically. 831 01:54:05,000 --> 01:54:10,000 I don't want to go into these details, but I just want to mention there is this approach, 832 01:54:10,000 --> 01:54:18,000 and of course you also find implementations of it in most software libraries for computer vision. 833 01:54:18,000 --> 01:54:22,000 So I recommend to use this method if you need it. 834 01:54:22,000 --> 01:54:29,000 It is based on the idea to use the constraint 4ab minus h squared equal to 1, 835 01:54:29,000 --> 01:54:40,000 to get rid of this degree of freedom, and based on that it is possible to get a nice mathematical modeling. 836 01:54:40,000 --> 01:54:48,000 When we use this method on the example picture with a cup on the mousepad, 837 01:54:48,000 --> 01:54:53,000 and apply it to those structures which are really elliptical, 838 01:54:53,000 --> 01:54:57,000 we get the results that are shown on the picture at the bottom. 839 01:54:57,000 --> 01:55:07,000 So the colored ellipses are those that are fitted to the edge lists that we find. 840 01:55:07,000 --> 01:55:16,000 And if we for instance consider the violet, or the green, or also the cyan ellipse that we estimated, 841 01:55:16,000 --> 01:55:21,000 we can see that they really fit well to what we expect. 842 01:55:21,000 --> 01:55:30,000 For the red ellipse that's a little bit maybe unexpected, because it also adapts to non-elliptical parts of the cup. 843 01:55:30,000 --> 01:55:40,000 But still it's not really bad what we get. 844 01:55:40,000 --> 01:55:48,000 So that's all we want to discuss about line fitting, or about curve fitting in general. 845 01:55:48,000 --> 01:55:51,000 So let's summarize our findings. 846 01:55:51,000 --> 01:55:57,000 First we introduce the half transform as a method to extract lines from an image. 847 01:55:57,000 --> 01:56:06,000 The idea is to consider all the edge pixels for each edge pixel, 848 01:56:06,000 --> 01:56:11,000 create all the lines that pass through these edge pixels, 849 01:56:11,000 --> 01:56:24,000 and then use the parameter space and increment all the cells in this accumulator array in the parameter space, 850 01:56:24,000 --> 01:56:29,000 which represent lines that pass through this point. 851 01:56:29,000 --> 01:56:35,000 And afterwards we are searching for local maxima in this accumulator array, 852 01:56:35,000 --> 01:56:41,000 and those local maxima provide to us some dominant lines in the image. 853 01:56:41,000 --> 01:56:44,000 So a very general approach. 854 01:56:44,000 --> 01:56:51,000 It does not use gradient information, it is somehow reasonable, it yields reasonable results. 855 01:56:51,000 --> 01:56:59,000 However, it is not considering any local information of a pixel and its vicinity. 856 01:56:59,000 --> 01:57:08,000 It's just considering there are pixels, but it does not consider neighborhood of pixels, 857 01:57:08,000 --> 01:57:15,000 it does not consider gradient orientation or something like that. 858 01:57:15,000 --> 01:57:23,000 The other way was to collect edge pixels in a list of edge pixels, 859 01:57:23,000 --> 01:57:31,000 using the neighborhood and using the gradient information to collect edge pixels, which are somehow belonging to the same edge. 860 01:57:31,000 --> 01:57:37,000 First doing some edge following and then applying the Ramada-Klas-Polka algorithm 861 01:57:37,000 --> 01:57:41,000 to split up those edge element lists into smaller pieces, 862 01:57:41,000 --> 01:57:49,000 where we can assume that each of the smaller sub-lists can be represented by an individual line. 863 01:57:49,000 --> 01:57:56,000 And then we were discussing line estimation techniques, starting with the total least squares approach, 864 01:57:56,000 --> 01:58:01,000 that just minimizes the sum of squared distances of points from the line. 865 01:58:01,000 --> 01:58:07,000 This yields somehow very good results, but it suffers from outliers. 866 01:58:07,000 --> 01:58:15,000 So if we are faced with outliers in the data, then this approach does not perform as good as we would expect it to be. 867 01:58:15,000 --> 01:58:19,000 To avoid these problems with outliers, we get to know M-estimators, 868 01:58:19,000 --> 01:58:28,000 which are somehow changing the error term that is used, the loss function, to reduce the influence of outliers. 869 01:58:28,000 --> 01:58:36,000 And we discussed the RANSAC, which completely ignores outliers and just counts outliers, 870 01:58:36,000 --> 01:58:41,000 but does not distinguish between an outlier that is far away and that is very far away, 871 01:58:41,000 --> 01:58:47,000 but only considers there is an outlier or there isn't. 872 01:58:47,000 --> 01:58:53,000 Finally, we discussed the case of circles and ellipses. 873 01:58:53,000 --> 01:59:02,000 We have introduced several ways to represent circles and ellipses as parametric curves, 874 01:59:02,000 --> 01:59:11,000 and we have then derived two different distance measures, the Euclidean distance and the algebraic distance. 875 01:59:11,000 --> 01:59:18,000 We have found for circles that it is easier, computationally easier, to deal with the algebraic distance, 876 01:59:18,000 --> 01:59:23,000 but the results are more accurate if we would use the Euclidean distance. 877 01:59:23,000 --> 01:59:30,000 But at least if the points that we consider for circle estimation are not scattered too much, 878 01:59:30,000 --> 01:59:34,000 as long as they are still very close to the circle, 879 01:59:34,000 --> 01:59:41,000 the algebraic distance approximates the Euclidean distance sufficiently well. 880 01:59:41,000 --> 01:59:52,000 While the points are scattered very much, then the algebraic distance overestimates the radius of the circle that we get. 881 01:59:52,000 --> 02:00:04,000 For ellipses, we just introduced or mentioned the algorithm with which we can estimate the ellipses without going into details. 116693