Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,000 --> 00:00:09,000
Welcome back to the lecture machine vision. Today we want to discuss chapter number four on curve fitting.
2
00:00:09,000 --> 00:00:12,000
What does that mean?
3
00:00:12,000 --> 00:00:21,000
Let us assume we make pictures like that. For instance in a factory we want to control the output of a machine.
4
00:00:21,000 --> 00:00:25,000
So we make pictures of the objects that are produced.
5
00:00:25,000 --> 00:00:33,000
Based on that we do some edge detection to determine the boundaries of the object.
6
00:00:33,000 --> 00:00:37,000
As a result we get an image as the one shown on the right hand side.
7
00:00:37,000 --> 00:00:42,000
An image that is black for all pixels except for the edge pixels.
8
00:00:42,000 --> 00:00:47,000
Now we can represent this object as a set of edge pixels.
9
00:00:47,000 --> 00:00:57,000
However for further processing doing all computations on the basis of individual pixels might be relatively slow.
10
00:00:57,000 --> 00:01:03,000
And it might not allow to do some higher level reasoning on the shape of the object.
11
00:01:03,000 --> 00:01:12,000
For that reason we might like to represent this object not just as a set of pixels but as a geometric shape.
12
00:01:12,000 --> 00:01:21,000
For instance as a polygon and represent the polygon just by the coordinates of the vertices of the coordinates.
13
00:01:21,000 --> 00:01:38,000
So we could reduce the amount of memory that is required for representing this shape by just six coordinates rather than hundreds or even thousands of pixels.
14
00:01:38,000 --> 00:01:45,000
Now we want to discuss how we can get from this edge image to the geometric representation.
15
00:01:45,000 --> 00:01:56,000
For that we want to start with the simplest elements that we can use namely lines and line segments.
16
00:01:56,000 --> 00:02:04,000
But before we do that let's repeat a little bit how we can represent two dimensional geometric objects.
17
00:02:04,000 --> 00:02:11,000
So this should be just a repetition of what you should already know from your lectures in mathematics.
18
00:02:11,000 --> 00:02:16,000
The first thing that we need is a dot product.
19
00:02:16,000 --> 00:02:26,000
The dot product of two vectors say p and q is defined as a sum over the pairwise products of the elements of the two vectors.
20
00:02:26,000 --> 00:02:37,000
So if p and q are two dimensional vectors and the elements of p are p1 and p2 and the elements of q are q1 and q2,
21
00:02:37,000 --> 00:02:43,000
then the dot product is just p1 times q1 plus p2 times q2.
22
00:02:43,000 --> 00:02:52,000
Within this lecture I will always use the notation with the special brackets that you find here for the dot product.
23
00:02:52,000 --> 00:02:57,000
I could also write p transpose times q that leads to the same result,
24
00:02:57,000 --> 00:03:06,000
but to make clear that we are faced with a dot product I use this special notation with the brackets.
25
00:03:06,000 --> 00:03:13,000
The dot product has some mathematical properties. For instance it is linear in both of its arguments.
26
00:03:13,000 --> 00:03:23,000
That means if we apply a linear combination of vectors p and r with prefactors alpha and beta as first argument,
27
00:03:23,000 --> 00:03:34,000
and another linear combination of vectors q and s with scalar prefactors gamma and delta to a second argument of the dot product,
28
00:03:34,000 --> 00:03:39,000
then we can multiply out the dot product and we get four terms.
29
00:03:39,000 --> 00:03:50,000
Each of those terms contains just a pairwise dot product of two of the basic vectors that are used in the linear combinations
30
00:03:50,000 --> 00:03:57,000
and all the prefactors are multiplied out, are put in front of the dot product.
31
00:03:57,000 --> 00:04:11,000
Another important property which leads to a geometric interpretation of the dot product is that the dot product of p and q is equal to
32
00:04:11,000 --> 00:04:22,000
the Euclidean length of the vector p times the Euclidean length of vector q times the cosine of the angle between vector p and q.
33
00:04:22,000 --> 00:04:30,000
That means there is a relationship between the angle between two vectors and their dot product.
34
00:04:30,000 --> 00:04:40,000
And from that it follows that the dot product of a vector with itself is nothing else than the squared length of the vector
35
00:04:40,000 --> 00:04:49,000
because the cosine of zero degree is one and that the dot product of two vectors which are orthogonal is always zero
36
00:04:49,000 --> 00:04:56,000
because the cosine of 90 degree is zero.
37
00:04:56,000 --> 00:05:04,000
So that means if you have two vectors p and q and p is a unit vector that means it has length one.
38
00:05:04,000 --> 00:05:11,000
And we calculate the dot product of p and q and multiply it with this unit vector p.
39
00:05:11,000 --> 00:05:26,000
What happens? Well, what we get is a kind of projection of the vector q onto the direction that is given by vector p.
40
00:05:26,000 --> 00:05:36,000
So the vector dot product of p and q times p is given as the red vector in the diagram.
41
00:05:36,000 --> 00:05:46,000
So the dot product of p and q provides the length of this vector because the vector p has unit length.
42
00:05:46,000 --> 00:05:55,000
So we can see that the dot product can be used to calculate some orthogonal projection of a vector onto a certain direction.
43
00:05:55,000 --> 00:06:02,000
Based on this we can define lines and line segments.
44
00:06:02,000 --> 00:06:12,000
So the first definition of a line segment with two end points p and q is given as a set of points x
45
00:06:12,000 --> 00:06:18,000
which can be composed as one minus tau times p plus tau times q.
46
00:06:18,000 --> 00:06:22,000
Vector is a real number between zero and one.
47
00:06:22,000 --> 00:06:36,000
That means the definition of a line segment is given by a set of points and all these points are represented in this way.
48
00:06:36,000 --> 00:06:44,000
We can extend that line segment to a full line, that means a line that has an infinite extension,
49
00:06:44,000 --> 00:06:53,000
by relaxing the values of tau to the real numbers, so we do not restrict them anymore to the interval between zero and one,
50
00:06:53,000 --> 00:07:00,000
but to all real numbers. And then we get a representation of the full straight line.
51
00:07:00,000 --> 00:07:06,000
For all values of tau between zero and one we get the line segment between p and q.
52
00:07:06,000 --> 00:07:16,000
For all values of tau less than zero we get those elements of the line which are outside of the line segment between p and q,
53
00:07:16,000 --> 00:07:27,000
but closer to p than q. And for values of tau greater than one we get all those points which are outside of the line segment between p and q,
54
00:07:27,000 --> 00:07:35,000
but which are closer to q than to p.
55
00:07:35,000 --> 00:07:43,000
For some purposes it is useful to be able to calculate the perpendicular point of a certain point onto such a line.
56
00:07:43,000 --> 00:07:56,000
So we assume that a certain point r is given. This point might be part of the line, but it might also be not an element of the line.
57
00:07:56,000 --> 00:08:10,000
And we want to calculate that point on the line that is closest to r. And that point is the point where the angle between the vector that connects r and l
58
00:08:10,000 --> 00:08:21,000
and the vector that connects p and q is equal to 90°. That means this perpendicular point, say l, must be part of the line.
59
00:08:21,000 --> 00:08:31,000
So l must have a representation that is given by one minus tau times p plus tau times q for a certain value of tau.
60
00:08:31,000 --> 00:08:43,000
Furthermore, the vector from r to l should be orthogonal to the vector of p to q. That means the dot product of l minus r and p minus q must be equal to zero.
61
00:08:43,000 --> 00:08:55,000
Now we can substitute in the dot product the right hand side of the first equation. That means instead of l we write one minus tau times p plus tau times q.
62
00:08:55,000 --> 00:09:02,000
Then we multiply out the dot product and we get one equation that depends only on tau.
63
00:09:02,000 --> 00:09:16,000
We resolve this equation with respect to tau and get as result that tau is equal to the dot product of p minus r and p minus q over the dot product of p minus q and p minus q.
64
00:09:16,000 --> 00:09:29,000
So in the denominator you clearly see we are faced with a squared Euclidean length of the vector between p and q.
65
00:09:29,000 --> 00:09:41,000
Beside the representation of a line that we have derived on the slide before, there is a second way to represent a line in the two dimensional space called the normal form.
66
00:09:41,000 --> 00:09:53,000
For that we need a vector n that is orthogonal to the line and has a length one. So it is an orthogonal unit vector.
67
00:09:53,000 --> 00:10:01,000
For that vector we know it is orthogonal, so the dot product of n and the vector q minus p is equal to zero.
68
00:10:01,000 --> 00:10:14,000
We observe that the dot product of n and an arbitrary point x on the line is equal to the dot product of n and one minus tau times p plus tau times q minus one.
69
00:10:14,000 --> 00:10:22,000
So we just substituted x by the definition of such a point that we have derived on the slide before.
70
00:10:22,000 --> 00:10:32,000
Now we can multiply out the dot product and get the dot product of n and p plus tau times the dot product of n and q minus p.
71
00:10:32,000 --> 00:10:41,000
The dot product of n and q minus p is equal to zero, as we have seen, because n is orthogonal to that vector.
72
00:10:41,000 --> 00:10:45,000
So what remains is just the dot product of n and p.
73
00:10:45,000 --> 00:10:59,000
If we reformulate that, then we get that for each point x on the line it holds that the dot product of n and x minus the dot product of n and p is equal to zero.
74
00:10:59,000 --> 00:11:20,000
If we substitute minus the dot product of n and p by a new constant, say c, then we get the standard representation, normal form representation of a line, zero equals the dot product of n and x plus c.
75
00:11:20,000 --> 00:11:25,000
At the bottom right we see a sketch of the situation.
76
00:11:25,000 --> 00:11:33,000
We have the line, we have the two points p and q on the line that were used on the slide before to represent the line,
77
00:11:33,000 --> 00:11:44,000
and we have the unit normal vector n that points into a direction that is orthogonal to the line.
78
00:11:44,000 --> 00:11:53,000
The representation in normal form has some advantages over the representation that we have seen before,
79
00:11:53,000 --> 00:12:01,000
because it allows us to calculate the distance of an arbitrary point to the line in a very easy way.
80
00:12:01,000 --> 00:12:09,000
Let us assume we want to calculate the distance of a point capital R to that line.
81
00:12:09,000 --> 00:12:16,000
With the standard definition of a line, we first would have to calculate the perpendicular point L,
82
00:12:16,000 --> 00:12:27,000
and afterwards calculate the vector that connects L and R and evaluate the Euclidean length of this vector.
83
00:12:27,000 --> 00:12:37,000
With the units with a normal form, we can simplify that and get as result the dot product of n and R.
84
00:12:37,000 --> 00:12:46,000
R is a vector that starts in the origin of the coordinate system and points to the point capital R.
85
00:12:46,000 --> 00:12:52,000
The dot product of n and R plus c.
86
00:12:52,000 --> 00:12:58,000
This is a sign number. The sign tells us whether we are on the left hand side or on the right hand side of the line.
87
00:12:58,000 --> 00:13:05,000
If you are just interested in the distance, we just take the absolute value of this number.
88
00:13:05,000 --> 00:13:09,000
How can we derive that from the plot on the right hand side?
89
00:13:09,000 --> 00:13:28,000
On the right hand side we see the point capital R and the point capital L, which is the perpendicular point to R.
90
00:13:28,000 --> 00:13:34,000
We also see the small vector R that connects the origin of the coordinate system with the point capital R.
91
00:13:34,000 --> 00:13:38,000
We see the vector n, the normal vector.
92
00:13:38,000 --> 00:13:44,000
We see the line that starts in the origin and follows the direction of n.
93
00:13:44,000 --> 00:14:01,000
We know that calculating the dot product of a vector and a unit vector is the same as projecting that vector onto the direction that is given by the unit vector.
94
00:14:01,000 --> 00:14:16,000
So calculating the dot product of n and the vector R is nothing else than calculating the length between the origin of the coordinate system and the point R'.
95
00:14:16,000 --> 00:14:31,000
Furthermore, we know that we are interested in the distance between the points R and L, which is exactly the same as the distance between the points R' and L'.
96
00:14:31,000 --> 00:14:40,000
So what we need is to know how large is the distance between the origin and the point L'.
97
00:14:40,000 --> 00:14:53,000
However, we know that L' is an element of the line. So we know that it has exactly the same orthogonal distance as the point P or the point Q.
98
00:14:53,000 --> 00:15:08,000
That means if we project the point P or Q or L or L' or any other point of the line onto the direction that is given by n, we will always get the same number, the same length.
99
00:15:08,000 --> 00:15:21,000
So this is the same as the dot product of n and P or the dot product of n and L or the dot product of n and Q.
100
00:15:21,000 --> 00:15:31,000
And we know that this is already -c. And therefore we can derive the formula that is shown here.
101
00:15:31,000 --> 00:15:40,000
Furthermore, you might sometimes be interested in getting an arbitrary point on the line if you know the line just represented in normal form.
102
00:15:40,000 --> 00:15:53,000
How do we get that? Well, what could we do? We could pick an arbitrary point - doesn't matter where - in space and then project that point onto the line.
103
00:15:53,000 --> 00:16:01,000
The simplest way is to take the origin as such a point and project the origin onto the line.
104
00:16:01,000 --> 00:16:11,000
That means we are searching a point that has a representation tau times n, where tau might be a real number.
105
00:16:11,000 --> 00:16:23,000
And we are searching for such a value of tau that the dot product of tau times n and n is equal to -c.
106
00:16:23,000 --> 00:16:32,000
And when we resolve this equation we get that the value of tau needs to be chosen as -c.
107
00:16:32,000 --> 00:16:41,000
This yields then one point, namely the point -c times n. And we know that this is element of the line.
108
00:16:41,000 --> 00:16:55,000
And if we look more carefully on the drawing we will find that this point -c times n is exactly the point L' in the drawing.
109
00:16:55,000 --> 00:17:03,000
To finalize this brief repetition of the basics of two-dimensional geometry, let us have a look at vectors.
110
00:17:03,000 --> 00:17:13,000
If we are faced with an arbitrary two-dimensional vector V with elements V1 and V2, we want to calculate a unit normal vector for that vector.
111
00:17:13,000 --> 00:17:21,000
How can we do that? The first step is to normalize the vector. That means to make it a vector of length 1.
112
00:17:21,000 --> 00:17:33,000
This can be done by dividing the vector by its vector length. That means calculating 1 over the Euclidean length of V times the vector.
113
00:17:33,000 --> 00:17:44,000
The second step is to rate-rotate it by 90° either to the left or to the right. In the two-dimensional case this is rather easy.
114
00:17:44,000 --> 00:17:53,000
If we have a vector V1, V2 and we want to rotate it by 90°, we get the vector -V2, V1.
115
00:17:53,000 --> 00:18:00,000
If we want to rotate it by -90°, we get the vector V2, -V1.
116
00:18:00,000 --> 00:18:17,000
That means a unit normal vector for vector V has the shape 1 over the length of vector V times -V2, V1.
117
00:18:17,000 --> 00:18:24,000
Every vector of course has a polar representation in the two-dimensional space.
118
00:18:24,000 --> 00:18:32,000
Each vector V can be represented as R times cos(phi) sin(phi).
119
00:18:32,000 --> 00:18:43,000
That means the vector cos(phi) sin(phi) is a unit vector that points into that direction of that vector V and has unit length.
120
00:18:43,000 --> 00:18:54,000
The scalar R is a non-negative number and it tells us how far away from the origin that point is.
121
00:18:54,000 --> 00:19:03,000
If we are given a vector V with elements entries V1 and V2, we might be interested in calculating its polar representation.
122
00:19:03,000 --> 00:19:13,000
Determining its distance from the origin is rather simple, so we just need to calculate the Euclidean length of vector V.
123
00:19:13,000 --> 00:19:17,000
But how do we get the angle phi?
124
00:19:17,000 --> 00:19:22,000
We have to consider trigonometric calculations here.
125
00:19:22,000 --> 00:19:34,000
The first way that you would find is to somehow use the Arcus Tangent to calculate that vector.
126
00:19:34,000 --> 00:19:42,000
That's possible, but it requires some case distinction to get the angle phi directly.
127
00:19:42,000 --> 00:19:52,000
The simpler way is to use a specific function that is known as I_A_10_2, the Arcus Tangent with two arguments,
128
00:19:52,000 --> 00:20:06,000
that is defined to take the two components of the two-dimensional vector V1 and V2 and that provides this angle phi into which this vector V1 and V2 points.
129
00:20:06,000 --> 00:20:16,000
The function I_A_10_2 is a standard function, so you can use it in mathematical formula as well as in programming languages.
130
00:20:16,000 --> 00:20:25,000
That means every time you need to know the angle of a two-dimensional vector, use this function I_A_10_2,
131
00:20:25,000 --> 00:20:39,000
rather than somehow using the Arcus Tangent of some ratio of some numbers and then adding some case distinction.
132
00:20:39,000 --> 00:20:47,000
Now we have all the prerequisites for our further processing of the edge bitmaps.
133
00:20:47,000 --> 00:20:53,000
We assume that we started with a gray value image and extracted the gray value edges.
134
00:20:53,000 --> 00:21:01,000
Now we know all the pixels that are gray level edge pixels, but we want to get to a shape description.
135
00:21:01,000 --> 00:21:08,000
The first way that we want to consider is to use a technique called the Huff Transform,
136
00:21:08,000 --> 00:21:19,000
which transforms the content of the edge image into a different space and extracts the dominant lines in that space.
137
00:21:19,000 --> 00:21:30,000
From the results of previous slides we know that we can represent each line in the two-dimensional space in its normal form.
138
00:21:30,000 --> 00:21:42,000
If we represent the normal vector as the vector with elements cosine phi and sine phi and the position of a point as the vector x, y,
139
00:21:42,000 --> 00:21:52,000
then we can represent each line as x times cosine phi plus y times sine phi plus z equals zero.
140
00:21:52,000 --> 00:21:56,000
That's just the normal form of a line.
141
00:21:56,000 --> 00:22:09,000
Furthermore, we can limit ourselves to angles phi between 0 and 180° and the values of z from the real numbers.
142
00:22:09,000 --> 00:22:16,000
Why don't we need to consider larger angles between 180° and 360°?
143
00:22:16,000 --> 00:22:26,000
Well, the answer is easy, because each line has two representations, one with a normal vector pointing to the left and one with a normal vector pointing to the right.
144
00:22:26,000 --> 00:22:35,000
Here we restrict ourselves to just one of these representations by limiting the angles between 0 and 180°.
145
00:22:35,000 --> 00:22:44,000
But still we can represent each line in such a representation.
146
00:22:44,000 --> 00:22:55,000
That means we can represent all the lines in the two-dimensional space by two numbers, the pair of phi and z.
147
00:22:55,000 --> 00:22:59,000
Now let us plot this parameter space.
148
00:22:59,000 --> 00:23:06,000
On the left-hand side we see an image space, the original image space in which we detect edge pixels.
149
00:23:06,000 --> 00:23:11,000
It has two coordinate axes x and y as usual in an image.
150
00:23:11,000 --> 00:23:17,000
On the right-hand side we find the parameter space. It has the two axes phi and z.
151
00:23:17,000 --> 00:23:27,000
And each line in the image is represented by one point in that parameter space.
152
00:23:27,000 --> 00:23:37,000
If we consider one point in the image, indicated by the red circle, then we can find several lines that are passing through this point.
153
00:23:37,000 --> 00:23:43,000
For instance this vertical line. It has a certain representation in parameter space.
154
00:23:43,000 --> 00:23:51,000
For instance this line has an angle of phi equal to 0 and a certain offset z.
155
00:23:51,000 --> 00:24:02,000
And that pair of parameters is represented in the parameter space at a certain position that is indicated by the blue circle.
156
00:24:02,000 --> 00:24:08,000
But that's not the only line that passes through the red point.
157
00:24:08,000 --> 00:24:13,000
We might also consider that line that has a different angle and a different offset.
158
00:24:13,000 --> 00:24:21,000
So it has a different representation in the parameter space, which maybe is located at that position.
159
00:24:21,000 --> 00:24:31,000
Like that we go on and consider further lines in the image space, which refer to further points in the parameter space.
160
00:24:31,000 --> 00:24:38,000
For instance the horizontal line in the image space has a representation in parameter space at that point.
161
00:24:38,000 --> 00:24:49,000
And that diagonal line has a representation in parameter space at another position.
162
00:24:49,000 --> 00:25:00,000
And even those four lines are not all. There are infinitely many lines that are passing through the red point of interest.
163
00:25:00,000 --> 00:25:03,000
And all of these have a representation in parameter space.
164
00:25:03,000 --> 00:25:10,000
If we calculate those points we find that those are located on a sine curve.
165
00:25:10,000 --> 00:25:17,000
A sine curve with a certain amplitude and a certain phase shift.
166
00:25:17,000 --> 00:25:23,000
Now let's consider another point in the image. Another edge point that we have found.
167
00:25:23,000 --> 00:25:28,000
For instance the point on the left hand side was indicated by a red circle.
168
00:25:28,000 --> 00:25:33,000
Again we can consider the different lines that are passing through that point.
169
00:25:33,000 --> 00:25:42,000
And we get another sine curve in the parameter space as a splice of all the lines that represent that point.
170
00:25:42,000 --> 00:25:45,000
All the lines that intersect at that point.
171
00:25:45,000 --> 00:25:55,000
And we observe that the two sine curves that we obtained in the parameter space are intersecting at a single parameter set.
172
00:25:55,000 --> 00:26:05,000
And that's obviously the parameter set that represents the line that connects the two red points in the image space.
173
00:26:05,000 --> 00:26:08,000
So what did we find?
174
00:26:08,000 --> 00:26:18,000
If we observe lines in the image space, those lines are represented in the parameter space as points.
175
00:26:18,000 --> 00:26:25,000
If we consider points in the image space we can represent them in the parameter space as sine curves.
176
00:26:25,000 --> 00:26:34,000
And the sine curves implicitly represent all the lines that intersect in the point of interest in the image space.
177
00:26:34,000 --> 00:26:41,000
And if we consider several points on the same line in the image space,
178
00:26:41,000 --> 00:26:51,000
then what we can find in the parameter space are several sine curves that are all intersecting in the same point in the parameter space.
179
00:26:51,000 --> 00:26:58,000
Namely the point that refers to the line in the image space on which all the original points are located.
180
00:26:58,000 --> 00:27:02,000
And this gives the idea of the half transform.
181
00:27:02,000 --> 00:27:08,000
For each point, for each edge point in the image we create the sine curve in the parameter space.
182
00:27:08,000 --> 00:27:14,000
And then we consider the points of intersections of the sine curves.
183
00:27:14,000 --> 00:27:26,000
And those points where many sine curves intersect are likely to represent lines that are very dominant in the image.
184
00:27:26,000 --> 00:27:29,000
So here's the basic procedure.
185
00:27:29,000 --> 00:27:40,000
We first take all the edge pixels and for each edge pixel we calculate the sine curve and we might draw it or somehow represent it, that sine curve.
186
00:27:40,000 --> 00:27:51,000
So for instance here we have this parameter space, this phi and c, and the blue line, the blue curve represents one edge pixel.
187
00:27:51,000 --> 00:28:00,000
The green represents another edge pixel, the red one represents another edge pixel and so on and so forth.
188
00:28:00,000 --> 00:28:07,000
After we have done that we check for points of intersection in the parameter space.
189
00:28:07,000 --> 00:28:13,000
So in this case we might have found one very dominant point of intersection.
190
00:28:13,000 --> 00:28:23,000
That means many sine curves are intersecting in that point for a certain parameter value of phi and c.
191
00:28:23,000 --> 00:28:33,000
And then this parameter value reveals the line on which all those points are located in the image.
192
00:28:33,000 --> 00:28:37,000
In practice, however, things turn out not to be that easy.
193
00:28:37,000 --> 00:28:47,000
Because we often do not get just a unique point of intersection, but we get an area where many points of intersections can be found.
194
00:28:47,000 --> 00:28:57,000
But due to numerical imprecision in determining the point position of edge pixels, etc., they do not intersect in a single point.
195
00:28:57,000 --> 00:29:08,000
But as we can see here, all these points of intersection are very close by, but they are not exactly the same.
196
00:29:08,000 --> 00:29:21,000
Furthermore, it might happen that the image contains several dominant lines, so that we get several of these dominant areas of points of intersection.
197
00:29:21,000 --> 00:29:24,000
And then we would need to extract those.
198
00:29:24,000 --> 00:29:28,000
Furthermore, we get points of intersection which are not showing dominant lines.
199
00:29:28,000 --> 00:29:34,000
So where only, for instance, two sine curves are intersecting, which are not supported by more than two points,
200
00:29:34,000 --> 00:29:43,000
and therefore cannot be considered to be really dominant, apparent lines in the image.
201
00:29:43,000 --> 00:29:50,000
So how can we find those areas with a lot of points of intersection close by?
202
00:29:50,000 --> 00:29:55,000
The idea is to use something that is called an accumulator array.
203
00:29:55,000 --> 00:30:03,000
So for that purpose, we split up the whole parameter space into small cells.
204
00:30:03,000 --> 00:30:14,000
So for instance, we might consider the width of each cell to be, for instance, 1 degree or 10 degrees or 0.1 degree.
205
00:30:14,000 --> 00:30:26,000
And in the vertical direction, we might consider the spacing to be 1 pixel, for instance, height, or 10 pixels height or 0.1 pixels height.
206
00:30:26,000 --> 00:30:30,000
It depends on which precision we want to get.
207
00:30:30,000 --> 00:30:37,000
So now we have a lot of cells, and we partition the space, the parameter space, in many cells.
208
00:30:37,000 --> 00:30:44,000
For each cell, we implement a counter.
209
00:30:44,000 --> 00:30:53,000
So what we do then is that we count how many of the sine curves are passing through each of these cells.
210
00:30:53,000 --> 00:31:01,000
And then afterwards, we can check the counter values and search for the local maxima of the counter values.
211
00:31:01,000 --> 00:31:13,000
And those local maxima reveal the dominant areas of intersection of sine curves.
212
00:31:13,000 --> 00:31:23,000
So in this case, we would have one global maximum where five of the sine curves intersect.
213
00:31:23,000 --> 00:31:30,000
And we might find a second local maximum, which is still dominant.
214
00:31:30,000 --> 00:31:32,000
Where three sine curves intersect.
215
00:31:32,000 --> 00:31:39,000
And then apart from that, we might also find some other areas where only two sine curves intersect.
216
00:31:39,000 --> 00:31:43,000
So now we pick the most dominant local maxima.
217
00:31:43,000 --> 00:31:48,000
That means those maxima, which are supported by very many sine curves.
218
00:31:48,000 --> 00:31:56,000
So in practice, very many means 50 or 100 or 200 or 500 maybe.
219
00:31:56,000 --> 00:32:03,000
And treat those as the dominant lines in the image.
220
00:32:03,000 --> 00:32:05,000
So this yields the Haftrand form.
221
00:32:05,000 --> 00:32:11,000
So in total, summing up, first we create an accumulator array of adequate precision.
222
00:32:11,000 --> 00:32:18,000
Adequate means, well, the more cells we have, the more computational effort we have, the more accurate will be the result.
223
00:32:18,000 --> 00:32:23,000
The smaller the number of cells is, the less computational effort we get.
224
00:32:23,000 --> 00:32:28,000
And however, the precision of course suffers from smaller resolution.
225
00:32:28,000 --> 00:32:33,000
And then we initialize the counters for each of the cells with zero.
226
00:32:33,000 --> 00:32:36,000
In step number two, we go through all the edge pixels.
227
00:32:36,000 --> 00:32:42,000
For each edge pixel, we calculate the sine curve that refers to it in parameter space.
228
00:32:42,000 --> 00:32:52,000
And then we check which accumulator cells are, through which accumulator cells the sine curve passes through.
229
00:32:52,000 --> 00:32:57,000
And increment the counter for those accumulator cells.
230
00:32:57,000 --> 00:33:02,000
And at the end, we search the accumulator array for the dominant local maxima.
231
00:33:02,000 --> 00:33:07,000
And these provide the dominant lines in the image.
232
00:33:07,000 --> 00:33:11,000
So this procedure is known as Haftransform in computer vision.
233
00:33:11,000 --> 00:33:20,000
And sometimes in a more generic measurement system sense, it is also known as a Radon transform.
234
00:33:20,000 --> 00:33:27,000
But for us, the word Haftransform is sufficient to know.
235
00:33:27,000 --> 00:33:30,000
So with the Haftransform, we can determine lines or straight lines.
236
00:33:30,000 --> 00:33:37,000
The Haftransform does not provide endpoints of those lines or line segments for us.
237
00:33:37,000 --> 00:33:45,000
So that means, if we are interested in line segments rather than lines, we need to do some further processing,
238
00:33:45,000 --> 00:33:48,000
which is part of one of the assignments.
239
00:33:48,000 --> 00:33:56,000
So what we need to do for that purpose is somehow determine all the points that refer to the line, that support the line.
240
00:33:56,000 --> 00:34:03,000
Then check where do we find those points in a dense way, where do we have gaps.
241
00:34:03,000 --> 00:34:11,000
And based on that, do some reasoning to determine the endpoints of the line segment.
242
00:34:11,000 --> 00:34:14,000
Let's have a look at some examples.
243
00:34:14,000 --> 00:34:18,000
Again, this image with a cup on a mousepad.
244
00:34:18,000 --> 00:34:27,000
When we use the edge extraction with a KANI operator, we get an edge image that looks like that.
245
00:34:27,000 --> 00:34:32,000
Then we transform everything in the Hafts parameter space.
246
00:34:32,000 --> 00:34:38,000
And visualizing this accumulator array as an image yields a result that looks like that.
247
00:34:38,000 --> 00:34:49,000
On the horizontal axis, the horizontal axis provides the angles between 0 and a little bit more than 180 degrees.
248
00:34:49,000 --> 00:34:55,000
And the vertical axis are the C values.
249
00:34:55,000 --> 00:35:06,000
The brightness of the pixels, so each pixel in this visualization, refers to one accumulator cell.
250
00:35:06,000 --> 00:35:12,000
And the brightness refers to the counter number of the accumulator cells.
251
00:35:12,000 --> 00:35:16,000
That means the brighter the pixel is, the larger the counter value is.
252
00:35:16,000 --> 00:35:21,000
And we clearly see there are four dominant local maxima.
253
00:35:21,000 --> 00:35:29,000
And those four dominant local maxima refer to four lines in the image, namely those that are colored here.
254
00:35:29,000 --> 00:35:36,000
That means the blue, the cyan, the magenta and the green line segment here.
255
00:35:36,000 --> 00:35:41,000
Each of those four refers to one of the four dominant maxima in the image.
256
00:35:41,000 --> 00:35:49,000
Beside that, of course, we also see that there are some other local maxima, but they are not that dominant as those four.
257
00:35:49,000 --> 00:36:03,000
And therefore, if you want to represent this scene with lines, then these four lines are the most important ones.
258
00:36:03,000 --> 00:36:07,000
Let's talk about the properties of the Huff Transform.
259
00:36:07,000 --> 00:36:13,000
As I've already said, the result depends on the size and the precision of the accumulator array.
260
00:36:13,000 --> 00:36:20,000
The more cells we have, the smaller each cell is, the more accurate the result will be.
261
00:36:20,000 --> 00:36:25,000
However, the more computational effort we need.
262
00:36:25,000 --> 00:36:31,000
Determining significant peaks in the accumulator array might be difficult in practice.
263
00:36:31,000 --> 00:36:39,000
I found that there are many bad implementations of the Huff Transform, which you find in computer vision libraries.
264
00:36:39,000 --> 00:36:44,000
And only a few really good implementations of the Huff Transform.
265
00:36:44,000 --> 00:36:51,000
Therefore, if you are working with the bad ones, you might be frustrated very easily.
266
00:36:51,000 --> 00:37:00,000
If you are using better ones, the results are nicer and the quality of the results are better.
267
00:37:00,000 --> 00:37:05,000
So check which kind of implementation you are using.
268
00:37:05,000 --> 00:37:11,000
On the assignments, we will provide an own implementation of the Huff Transform,
269
00:37:11,000 --> 00:37:18,000
which somehow circumvents some of these problems and yields, I think, nice results.
270
00:37:18,000 --> 00:37:26,000
Maybe it's not the quickest implementation, but at least you do not get frustrated that easily.
271
00:37:26,000 --> 00:37:35,000
For instance, if you are trying in comparison the MATLAB provided implementation,
272
00:37:35,000 --> 00:37:39,000
you will easily start to become frustrated from the Huff Transform.
273
00:37:39,000 --> 00:37:46,000
But that's not a matter of the method itself, but more of the implementation.
274
00:37:46,000 --> 00:37:52,000
One property of the Huff Transform is that it ignores all gradient information.
275
00:37:52,000 --> 00:38:05,000
So it adds pixels individually and it does not consider any knowledge about direction of an edge that passes through that point.
276
00:38:05,000 --> 00:38:12,000
However, especially if you are dealing with edge pixels, usually we know the orientation of the edge,
277
00:38:12,000 --> 00:38:20,000
because from the gradient at that position we can derive at least get a rough approximation of the edge orientation.
278
00:38:20,000 --> 00:38:30,000
But this information is completely ignored by the Huff Transform, at least in its standard implementation.
279
00:38:30,000 --> 00:38:41,000
And in natural scenes, if you would go outside and take a picture from some natural scenes,
280
00:38:41,000 --> 00:38:52,000
then it seems that the accumulator arrays easily is flooded and the apparent lines or the dominant lines are somehow not that easily found.
281
00:38:52,000 --> 00:38:58,000
So the Huff Transform works well on technical images where the relevant objects are clearly visible
282
00:38:58,000 --> 00:39:08,000
and the background is not that dominant. If you are going to natural scenes, Huff Transform doesn't work that well in practice.
283
00:39:08,000 --> 00:39:16,000
There are some extensions of the Huff Transform. So the standard form is implemented for lines, detecting lines,
284
00:39:16,000 --> 00:39:22,000
but there are also extensions to work to extract for instance circles or ellipses.
285
00:39:22,000 --> 00:39:29,000
They are not discussed here in the lecture, but the extension is rather easy.
286
00:39:29,000 --> 00:39:40,000
So it is just a small and easy transformation of knowledge that is needed to derive those methods.
287
00:39:40,000 --> 00:39:47,000
For circles, the Huff Transform is used rather often. For ellipses, things become more difficult.
288
00:39:47,000 --> 00:39:50,000
The parameter space gets five-dimensional.
289
00:39:50,000 --> 00:39:59,000
For ellipses, the Huff Transform is not used that often.
290
00:39:59,000 --> 00:40:03,000
Then there is another extension called the randomized Huff Transform.
291
00:40:03,000 --> 00:40:15,000
It avoids some of the problems of the standard Huff Transform, but I don't introduce it in the lecture due to time reasons.
292
00:40:15,000 --> 00:40:25,000
Furthermore, there is one version called generalized Huff Transform that allows to extract any kind of shape.
293
00:40:25,000 --> 00:40:39,000
You provide a certain template of a shape and then the generalized Huff Transform searches the image for those kind of shapes.
294
00:40:39,000 --> 00:40:50,000
Besides the way to use the Huff Transform to extract lines from an image, there are also alternative ways to get lines and other structures from an image.
295
00:40:50,000 --> 00:40:56,000
The second approach that we want to discuss is based out of several steps.
296
00:40:56,000 --> 00:41:10,000
The first step is to somehow transform the representation of edge pixels from an edge bitmap to a list of edge pixels.
297
00:41:10,000 --> 00:41:20,000
That means we somehow collect all the pixels in such a way that we have those organized in lists of adjacent edge pixels.
298
00:41:20,000 --> 00:41:31,000
Based on that representation, we do some polyline segmentation. That means we somehow segment these lists of edge pixels into sublists,
299
00:41:31,000 --> 00:41:42,000
where each sublist represents one polyline segment.
300
00:41:42,000 --> 00:41:50,000
The last step is to fit the lines as best as possible to the edge pixels.
301
00:41:50,000 --> 00:42:02,000
Let's follow this path. First, discuss the edge following, then the polyline segmentation, and at the end the line fitting step.
302
00:42:02,000 --> 00:42:09,000
Edge following. The basic idea of edge following is that we travel through the image,
303
00:42:09,000 --> 00:42:20,000
we collect all the edge pixels which we have found in the image, and then arrange those in lists of pixel coordinates,
304
00:42:20,000 --> 00:42:30,000
where we arrange those lists in such a way that neighboring elements in the list refer to neighboring pixels in the image.
305
00:42:30,000 --> 00:42:42,000
As a result, we get a set of lists of edge pixels. Each list of edge pixels represents one contour,
306
00:42:42,000 --> 00:42:50,000
and the whole set of all these lists represents all the contours which we have found.
307
00:42:50,000 --> 00:43:04,000
On the right hand side, for instance, one contour that will be represented afterwards as a list of edge pixels is the large contour around the mouse pad.
308
00:43:04,000 --> 00:43:10,000
This contour is shown in the image at the bottom with a dark blue color.
309
00:43:10,000 --> 00:43:24,000
All these pixels finally are arranged in one list, and the list is ordered in the sense that neighboring elements in the list refer to neighboring elements in the image.
310
00:43:24,000 --> 00:43:36,000
Beside that large contour, there are other contours that we can find in the edge image, and each of these contours becomes one list of edge pixels.
311
00:43:36,000 --> 00:43:49,000
For instance, the upper bound of the cup is composed out of two different contours, and each of these two contours becomes one edge list.
312
00:43:49,000 --> 00:43:57,000
In the image below, those are represented with two different shades of magenta color.
313
00:43:57,000 --> 00:44:09,000
Besides that, some more contours are found, and those will become individual additional lists of pixels.
314
00:44:09,000 --> 00:44:16,000
The procedure is so simple that I even don't write down an algorithm here.
315
00:44:16,000 --> 00:44:21,000
The idea is we travel through the image from top to bottom, from left to right.
316
00:44:21,000 --> 00:44:27,000
Once we find an edge pixel, we start a new list of edge pixels.
317
00:44:27,000 --> 00:44:43,000
Then we use the gradient information to follow the edge, and we collect all the edge pixels, which we find on that list, until there are no more edge pixels that we can add.
318
00:44:43,000 --> 00:44:51,000
As a result, we get these lists of edge pixels.
319
00:44:51,000 --> 00:44:58,000
If we apply this method to a little bit more complicated scenes, we get the result as shown here.
320
00:44:58,000 --> 00:45:09,000
In these two images, black pixels means there is no edge at those positions, while all the other colors refer to edge pixels.
321
00:45:09,000 --> 00:45:21,000
Each contour that is found, subsumed, or collected in one list of edge pixels is colored with the same color.
322
00:45:21,000 --> 00:45:27,000
On the right hand side, it is very clear that we get several lists of edge pixels.
323
00:45:27,000 --> 00:45:37,000
For instance, one that is colorized on the image with dark green pixels.
324
00:45:37,000 --> 00:45:42,000
And others, which are colorized with black pixels.
325
00:45:42,000 --> 00:45:56,000
At that point, it is worth mentioning that these lists of edge pixels are not yet fixed to a certain object.
326
00:45:56,000 --> 00:46:05,000
Or it might happen that several edges, which belong to different objects or to different areas, are somehow connected with each other.
327
00:46:05,000 --> 00:46:07,000
One example can be seen here.
328
00:46:07,000 --> 00:46:17,000
The list of pixels starts here, then it goes here, here, here, here, here, downward here, and here again.
329
00:46:17,000 --> 00:46:24,000
As we can see, these edges that are collected here belong to different areas.
330
00:46:24,000 --> 00:46:31,000
For instance, some belong to the boundary between the white field marking and the robot,
331
00:46:31,000 --> 00:46:38,000
while others belong to the boundary between the field marking and the soccer field.
332
00:46:38,000 --> 00:46:44,000
There is no guarantee that this list contains some semantic information.
333
00:46:44,000 --> 00:46:51,000
Since we collected all the edges of the pixels in this same object, we need to split them up.
334
00:46:51,000 --> 00:46:56,000
As we can see in this example, as the lists are not only representing lines,
335
00:46:56,000 --> 00:47:02,000
we at least get some information about which pixels belong together or might belong together.
336
00:47:02,000 --> 00:47:06,000
Or several parts, which are put together.
337
00:47:06,000 --> 00:47:12,000
To represent them as polylines, we need to split those lists of pixels up into several parts,
338
00:47:12,000 --> 00:47:22,000
where we can represent each part, each segment, as a line segment.
339
00:47:22,000 --> 00:47:34,000
That's our task. We want to subdivide the pixel list in such a way that the sub-lists can be represented by individual line segments.
340
00:47:34,000 --> 00:47:37,000
There are several algorithms to do that.
341
00:47:37,000 --> 00:47:44,000
Here we only want to introduce one, the so-called Ramadaglas-Poika algorithm.
342
00:47:44,000 --> 00:47:49,000
How does it proceed? It's rather straightforward, I would say.
343
00:47:49,000 --> 00:47:58,000
The assumption of the Ramadaglas-Poika algorithm is that in one list of pixels it represents one line.
344
00:47:58,000 --> 00:48:02,000
That means when we connect the two endpoints in the list,
345
00:48:02,000 --> 00:48:08,000
then the line that we get should somehow fit to all the points in the list.
346
00:48:08,000 --> 00:48:13,000
And if that's not fulfilled, then we somehow split up this list into parts,
347
00:48:13,000 --> 00:48:22,000
so that those parts more likely can be represented by individual line segments.
348
00:48:22,000 --> 00:48:24,000
So, how does it proceed?
349
00:48:24,000 --> 00:48:30,000
Well, as I said, we first connect the first point of the list with the last point of the list,
350
00:48:30,000 --> 00:48:34,000
indicated here by the sine line.
351
00:48:34,000 --> 00:48:43,000
Then we calculate the orthogonal distance of all pixels in that list to this line that we just estimated.
352
00:48:43,000 --> 00:48:54,000
Some of these distances will be small and support the assumption that those pixels belong to that line that connects the first and the last pixel.
353
00:48:54,000 --> 00:49:01,000
Other distances drawn in orange in this image,
354
00:49:01,000 --> 00:49:05,000
they are too large to support this hypothesis.
355
00:49:05,000 --> 00:49:14,000
That means if we find one point with a distance that is larger than a certain acceptance threshold,
356
00:49:14,000 --> 00:49:22,000
then we argue that we somehow need to split up this list of pixels into parts.
357
00:49:22,000 --> 00:49:32,000
So, for that purpose, we select the pixel that has the largest distance from the line between the endpoints,
358
00:49:32,000 --> 00:49:38,000
and we split the list of pixels at that point into two sub-lists.
359
00:49:38,000 --> 00:49:44,000
And then we iterate this algorithm on each of the sub-lists.
360
00:49:44,000 --> 00:49:59,000
So, for instance, after this partitioning step, we repeat the first step and create the line between the first point and the corner point that we just selected.
361
00:49:59,000 --> 00:50:04,000
This yields the first sub-list.
362
00:50:04,000 --> 00:50:11,000
Again, we check the distance between the points in that sub-list and the estimated line.
363
00:50:11,000 --> 00:50:20,000
And in this case, we find that the distance is small enough, that all the distances are below an acceptance threshold.
364
00:50:20,000 --> 00:50:24,000
In that case, we stop the splitting process.
365
00:50:24,000 --> 00:50:34,000
For the second sub-list that was generated, that starts at the new corner point and ends at the endpoint of the list of edge pixels,
366
00:50:34,000 --> 00:50:40,000
we estimate the line by connecting the corner point with the endpoint.
367
00:50:40,000 --> 00:50:48,000
Then we calculate again the distances, and we find that some of the distances are too large.
368
00:50:48,000 --> 00:50:59,000
Again, we select the point that maximizes the distance to the estimated line, which is shown with a red ring.
369
00:50:59,000 --> 00:51:05,000
And at that point, we split up this list into another two sub-lists.
370
00:51:05,000 --> 00:51:12,000
And then we again start this algorithm on each of the two sub-lists that were generated.
371
00:51:12,000 --> 00:51:22,000
That means we create a line that connects the two corner points that define that sub-list, namely the first and the last point of the sub-list.
372
00:51:22,000 --> 00:51:32,000
Then we check whether the distances of all intermediate points are small enough, below an acceptance threshold, and if so, we stop the splitting process.
373
00:51:32,000 --> 00:51:41,000
If not, then we select the point that has the largest distance from that line and split up this list into two sub-lists.
374
00:51:41,000 --> 00:51:50,000
And like that we go on until we cannot split anything anymore.
375
00:51:50,000 --> 00:51:57,000
So if we apply this algorithm on our example image, we will find a result that looks as the right image.
376
00:51:57,000 --> 00:52:10,000
So the left image is the input, that each color refers to one list of edge pixels, and the right picture is the output of the Ramadaklas-Polka algorithm.
377
00:52:10,000 --> 00:52:19,000
So again, each color refers to one sub-list that is a result of that splitting approach.
378
00:52:19,000 --> 00:52:25,000
As we can see, the round shapes are split up into a lot of small segments.
379
00:52:25,000 --> 00:52:30,000
Each of those segments can be represented with a line segment.
380
00:52:30,000 --> 00:52:44,000
And the long straight edges are preserved.
381
00:52:44,000 --> 00:52:49,000
Another example for the other two example images are shown here.
382
00:52:49,000 --> 00:53:03,000
So now each sub-list of edge pixels that were generated is shown by a red line, and the end points are shown by yellow pixels.
383
00:53:03,000 --> 00:53:17,000
So the yellow pixels are the end points, and the red lines represent the list of edge pixels that were represented by this line segment or by this pixel list.
384
00:53:17,000 --> 00:53:30,000
As we can see, the complicated structures are now split into several line segments.
385
00:53:30,000 --> 00:53:39,000
If we zoom in at the university main building, we see how many small segments are created.
386
00:53:39,000 --> 00:53:54,000
We can see that, for instance, at the windows, we get individual line segments for the structures that we find here.
387
00:53:54,000 --> 00:54:00,000
Okay, so now we did the edge following, and we did the polyline segmentation.
388
00:54:00,000 --> 00:54:08,000
And now, as a last step, we do some line fitting.
389
00:54:08,000 --> 00:54:15,000
With the polyline segmentation, we obtained some list of edge pixels.
390
00:54:15,000 --> 00:54:28,000
And we obtained some first initial guess of a line that represents those pixels, namely the line that connects the two end points of the list of edge pixels.
391
00:54:28,000 --> 00:54:39,000
However, this estimate might be suboptimal. It is acceptable somehow, but there might be other lines which fit better to these pixels.
392
00:54:39,000 --> 00:54:52,000
As well, the result of the Huff Transform might also be suboptimal, especially if we use a cell partitioning of the Accumulator array that has two large cells.
393
00:54:52,000 --> 00:55:00,000
On the right-hand side, we see an example. So here, the black points are the edge pixels.
394
00:55:00,000 --> 00:55:05,000
The red line shows which of those are neighboring the list of edge pixels.
395
00:55:05,000 --> 00:55:11,000
And the side line shows the line that connects the two end points.
396
00:55:11,000 --> 00:55:20,000
And maybe this is not really optimal. Especially on the right-hand side, it is far away from most of the pixels.
397
00:55:20,000 --> 00:55:29,000
And we might consider that another line, say that one here in yellow, might fit better than the side line did.
398
00:55:29,000 --> 00:55:37,000
But how can we estimate such an optimal line?
399
00:55:37,000 --> 00:55:47,000
Well, we know that we have to represent the line with a normal vector, n, and an offset parameter, c.
400
00:55:47,000 --> 00:55:54,000
So what we need to do is, we need to somehow determine optimal values for n and c.
401
00:55:54,000 --> 00:55:58,000
But what means optimal in this case?
402
00:55:58,000 --> 00:56:04,000
Well, let us consider again a set of edge pixels and a certain line.
403
00:56:04,000 --> 00:56:10,000
Say this is our first guess. It does not matter how we created this first guess.
404
00:56:10,000 --> 00:56:16,000
So just a first guess, which we want to evaluate. How good is it?
405
00:56:16,000 --> 00:56:24,000
What we can do is, that we can calculate the orthogonal distance of each of those points from the list.
406
00:56:24,000 --> 00:56:28,000
Represented by the yellow dashed lines.
407
00:56:28,000 --> 00:56:34,000
And of course, for a good line, those distances would be small.
408
00:56:34,000 --> 00:56:38,000
In the best case, they would all be zero.
409
00:56:38,000 --> 00:56:45,000
But in practice, it happens that you never find a line that makes all those distances zero.
410
00:56:45,000 --> 00:56:48,000
But somehow, you have to find a compromise.
411
00:56:48,000 --> 00:57:03,000
So now we have to rotate this line a little bit and shift it in such a way that we find the best compromise among the distances that we get.
412
00:57:03,000 --> 00:57:13,000
So we want to find parameters n and c that minimize those distances.
413
00:57:13,000 --> 00:57:21,000
Since we cannot minimize all the distances at the same time, we need to formulate a compromise.
414
00:57:21,000 --> 00:57:28,000
And this can be done by summing up the square distances, d i^2, over all the pixels.
415
00:57:28,000 --> 00:57:38,000
And stating that we want to find those normal vector n and offset parameter c, such that this sum becomes minimal.
416
00:57:38,000 --> 00:57:41,000
So we might treat that as a kind of search problem.
417
00:57:41,000 --> 00:57:51,000
We are varying the normal vector n and the parameter c over a large possible number of values.
418
00:57:51,000 --> 00:58:04,000
And for each of those values, we check the distances of each pixel and then we select the solution with the smallest sum of square distances.
419
00:58:04,000 --> 00:58:10,000
If we do that, we have to consider that n needs to be a unit vector.
420
00:58:10,000 --> 00:58:12,000
So a vector of length 1.
421
00:58:12,000 --> 00:58:20,000
And this needs to be stated as an additional constraint in this optimization problem.
422
00:58:20,000 --> 00:58:30,000
So typically we add that as a constraint subject to the dot product of n with itself must be equal to 1.
423
00:58:30,000 --> 00:58:34,000
So the Euclidean length of n must be equal to 1.
424
00:58:34,000 --> 00:58:39,000
So this states a mathematical optimization problem with constraints.
425
00:58:39,000 --> 00:58:43,000
It states what we want to have.
426
00:58:43,000 --> 00:58:48,000
It does not necessarily or directly state how we get that.
427
00:58:48,000 --> 00:58:53,000
But it already states what is our goal of calculation.
428
00:58:53,000 --> 00:58:59,000
This method is known as the total least sum of squares approach.
429
00:58:59,000 --> 00:59:05,000
So sum of squares, so sum of square distances.
430
00:59:05,000 --> 00:59:10,000
And we want to minimize it, so we want to find the least sum of squares.
431
00:59:10,000 --> 00:59:16,000
And the word total comes from the fact that we minimize the orthogonal distances.
432
00:59:16,000 --> 00:59:26,000
So the next question is how can we solve this optimization problem?
433
00:59:26,000 --> 00:59:34,000
That means how can we find the values of n and c without searching blindly values?
434
00:59:34,000 --> 00:59:46,000
So the mathematical theory from optimization theory states that we have to determine or that we have to build the so called Lagrange function.
435
00:59:46,000 --> 00:59:53,000
The Lagrange function is a function that is composed out of the original function that we want to minimize.
436
00:59:53,000 --> 00:59:56,000
That means the sum over the di-squares.
437
00:59:56,000 --> 01:00:00,000
And adds an additional term for each constraint.
438
01:00:00,000 --> 01:00:11,000
First we have to reformulate the constraints in such a way that they have the form something, some function that depends on n and c equals 0.
439
01:00:11,000 --> 01:00:17,000
We can get that by subtracting 1 in the constraint.
440
01:00:17,000 --> 01:00:27,000
And then we add that constraint to the Lagrange function and multiply it with an additional variable, which is called the Lagrange variable.
441
01:00:27,000 --> 01:00:35,000
In this case it is called lambda. So for each constraint we get one of these Lagrange multipliers.
442
01:00:35,000 --> 01:00:48,000
So the Lagrange function in this case then becomes the sum of the di-squared minus lambda times the dot product of n with itself minus 1.
443
01:00:48,000 --> 01:01:04,000
So if we substitute di by the way we can evaluate di, namely as the dot product of n and xi plus c, then we get the result in the line below.
444
01:01:04,000 --> 01:01:23,000
Now the optimization theory states that the solution of our optimization problem can be found by zeroing the Lagrange function with respect to the original variables of the problem.
445
01:01:23,000 --> 01:01:32,000
That means we have to calculate the partial derivative of the Lagrange function with respect to the parameter c.
446
01:01:32,000 --> 01:01:36,000
And with respect to the normal vector n.
447
01:01:36,000 --> 01:01:48,000
And for each of these derivatives we must zero them and then if we are lucky we can resolve the system of equations with respect to the unknown variables n and c.
448
01:01:48,000 --> 01:01:56,000
So let's do that. First let's start with the partial derivative of the Lagrange function with respect to c.
449
01:01:56,000 --> 01:02:23,000
We get just using the basic mathematical rules for calculating derivatives we get that this derivative is 2 times the sum over of the dot product of n and xi summing up over all pixels i plus 2 times the number of pixels capital C n times c.
450
01:02:23,000 --> 01:02:26,000
And this should be equal to zero.
451
01:02:26,000 --> 01:02:47,000
So if we zero it and resolve the resulting equation with respect to the unknown variable c we get that c must be equal to minus 1 over n times the dot product of times the sum of the dot product of n and xi.
452
01:02:47,000 --> 01:03:04,000
Or if we like to rewrite it we can also write it's minus 1 over n times the dot product of n and the sum of all xi.
453
01:03:04,000 --> 01:03:27,000
And if we want to rewrite that again it's nothing else than minus the dot product of n times the average position of the pixels. So 1 over n times the sum of xi which yields just the average position.
454
01:03:27,000 --> 01:03:37,000
In the next step we calculate the partial derivatives of the Lagrange function with respect to the two elements of the vector n.
455
01:03:37,000 --> 01:03:48,000
That means with respect to n1 and n2. Again just using the basic rules for calculating derivatives we get the two equations here.
456
01:03:48,000 --> 01:03:53,000
We both set those terms to zero.
457
01:03:53,000 --> 01:04:05,000
After that we substitute c by our previous result. That means by minus the dot product of n and the average x position or the average pixel position.
458
01:04:05,000 --> 01:04:29,000
So 1 over n times the sum of xi. And afterwards we somehow rearrange all the terms in such a way that we have terms like something times n1 plus something else times n2 equals something that does only depend on lambda.
459
01:04:29,000 --> 01:04:45,000
And by doing that we get the terms at the bottom of the slide. So as we can see those prefactors which are abbreviated with alpha, beta and gamma they only depend on the pixel coordinates.
460
01:04:45,000 --> 01:04:57,000
They only depend on the xi values. Please note that here for the xi values the second index refers to the first or second entry of the xi vector.
461
01:04:57,000 --> 01:05:07,000
So i refers to the index of the pixel that we address and 1 and 2 as the second index refers to the position in the vector.
462
01:05:07,000 --> 01:05:13,000
So the first element in the vector or the second element in the vector.
463
01:05:13,000 --> 01:05:25,000
So these terms alpha, beta and gamma only depend on the pixel coordinates that we know that we can derive or that we can calculate from the list of pixel positions that we have.
464
01:05:25,000 --> 01:05:47,000
So at the end if we simplify everything we get the equation alpha, beta, a row vector alpha, beta times n equals lambda times the first element of n and a row vector of beta and gamma times the vector n is equal to lambda times n2.
465
01:05:47,000 --> 01:06:02,000
So by rewriting that as a matrix vector equation we get that the matrix, the symmetric matrix that contains the entries alpha, beta, beta and gamma times the normal vector n is equal to lambda times n.
466
01:06:02,000 --> 01:06:09,000
So we have to search for at appropriate vectors n an appropriate value lambda.
467
01:06:09,000 --> 01:06:21,000
And if you go back to your math class you should have learned that these kinds of problems are known as eigenvalue eigenvector problems in mathematics.
468
01:06:21,000 --> 01:06:33,000
So n is known as a so called eigenvector and lambda is called as a so called eigenvalue.
469
01:06:33,000 --> 01:06:54,000
So that means what we have to do is we have to solve this eigenvector eigenvalue problem and find an appropriate eigenvalue for lambda and an appropriate eigenvector n for n.
470
01:06:54,000 --> 01:07:10,000
So from the mathematical theory of eigenvalues and eigenvector it turns out that if we found an eigenvector n for this matrix then we can also scale this eigenvector as we like.
471
01:07:10,000 --> 01:07:19,000
So if n is an eigenvector then 2 times n is also an eigenvector and 3 times n and 4 times n and 0.1 times n as well.
472
01:07:19,000 --> 01:07:28,000
Since we know that n in our case must be a unit vector we can choose an eigenvector that has unit length.
473
01:07:28,000 --> 01:07:40,000
That means we search an eigenvector and if it doesn't have unit length then we scale it in such a way that the scaled vector has unit length.
474
01:07:40,000 --> 01:07:51,000
However still there are two solutions. For these kind of matrices as we find here we always find two real eigenvalues.
475
01:07:51,000 --> 01:08:03,000
So two numbers lambda1 and lambda2 and appropriate eigenvectors for these two values which solve the system of equation.
476
01:08:03,000 --> 01:08:14,000
For these kind of matrices it turns out that all the eigenvalues are real valued and that all the eigenvalues are non-negative.
477
01:08:14,000 --> 01:08:18,000
Though they might be zero, they might be positive but they are never negative.
478
01:08:18,000 --> 01:08:33,000
That is a result from mathematics which I don't like to derive here which is maybe also a little bit beyond the scope of a basic mathematics class but it's still true.
479
01:08:33,000 --> 01:08:41,000
So if you are interested in the result then consider the literature for it to check how this result is derived.
480
01:08:41,000 --> 01:08:57,000
But the fact is these kind of matrices always have as many eigenvalues as there are rows and columns and they are all non-negative real numbers.
481
01:08:57,000 --> 01:09:02,000
So which of those eigenvalues should we choose?
482
01:09:02,000 --> 01:09:17,000
Well a little bit of geometric analysis turns out that the smaller eigenvalue, so in this case lambda2, minimizes the distances.
483
01:09:17,000 --> 01:09:23,000
While lambda1, the larger eigenvalue, maximizes the distances somehow.
484
01:09:23,000 --> 01:09:28,000
And we don't want to maximize the distances, we want to minimize them.
485
01:09:28,000 --> 01:09:45,000
So we choose the smaller eigenvalue lambda2 and for that eigenvalue lambda2 we calculate the corresponding eigenvector with unit length which provides the vector n.
486
01:09:45,000 --> 01:09:52,000
So calculating eigenvectors and eigenvalues is not that difficult for a 2x2 matrix.
487
01:09:52,000 --> 01:10:06,000
If you are interested, look in the mathematics literature for it, then you will find that you just have to solve a quadratic problem.
488
01:10:06,000 --> 01:10:14,000
And that's always possible and therefore it is not that difficult to solve that here.
489
01:10:14,000 --> 01:10:25,000
If you don't want to derive it yourself, you find software libraries that are doing this job for you in more or less all software packages on linear algebra.
490
01:10:25,000 --> 01:10:33,000
Of course including Matlab which also provides a method to do that.
491
01:10:33,000 --> 01:10:42,000
So now let's summarize this total d-squares method to estimate the best line that fits a certain set of points.
492
01:10:42,000 --> 01:10:51,000
The first step is that we need to calculate the pixel positions of all the pixels that we want to consider.
493
01:10:51,000 --> 01:10:55,000
So what we have to do is we have to calculate some sums.
494
01:10:55,000 --> 01:11:09,000
The sum over all x positions, the sum over all y positions, the sum over the squares of x positions, the sum over the squares of y positions and the sum of the mixed x, y product.
495
01:11:09,000 --> 01:11:19,000
And these sums are needed to establish these entries alpha, beta and gamma for the matrix that we need to create.
496
01:11:19,000 --> 01:11:34,000
Then in the second step we create this matrix, alpha, beta, beta, gamma and calculate the eigenvalues lambda and the eigenvectors that are corresponding to these eigenvalues.
497
01:11:34,000 --> 01:11:46,000
We select the smaller among the two eigenvalues and pick the appropriate eigenvector for that eigenvalue and normalize it to unit length.
498
01:11:46,000 --> 01:11:52,000
Once we have done that we calculate the offset value c based on n.
499
01:11:52,000 --> 01:12:01,000
So just by taking c equal to minus the dot product of n and the average pixel position.
500
01:12:01,000 --> 01:12:09,000
So and then you are done. Then you know the line that represents a certain set of pixels.
501
01:12:09,000 --> 01:12:18,000
But still you don't know where the endpoints are if you are interested in line segment and not an infinitely extended line.
502
01:12:18,000 --> 01:12:27,000
If you want to do that then you need additional techniques which you will find in the assignments and the exercise classes.
503
01:12:27,000 --> 01:12:40,000
Now we analyze again, we analyze somehow where the pixels are located along the line and then we select which points are the endpoints.
504
01:12:40,000 --> 01:12:44,000
So have a look at the result of this calculation.
505
01:12:44,000 --> 01:12:50,000
So here is a small clipping of this example image of the main building of KIT.
506
01:12:50,000 --> 01:12:59,000
The left upper image shows the result of the line segmentation, of the polyline segmentation step of the Ramad-Douglas-Poiker algorithm.
507
01:12:59,000 --> 01:13:12,000
And for instance at that position here we find that the line that is actually creating these edges is vertical.
508
01:13:12,000 --> 01:13:19,000
But these lines that connect endpoints are not vertical but they are skewed a little bit.
509
01:13:19,000 --> 01:13:28,000
After we have fitted all those lines again with the total least squares approach we find that the situation has improved.
510
01:13:28,000 --> 01:13:39,000
And that now the lines that we have estimated are really aligned with the structure that they should represent.
511
01:13:39,000 --> 01:13:45,000
So it is a small improvement but it is an improvement.
512
01:13:45,000 --> 01:13:53,000
To summarize this part of total least squares let us make a comparison.
513
01:13:53,000 --> 01:13:57,000
You might know already least squares methods from other lectures.
514
01:13:57,000 --> 01:14:03,000
For instance from statistics lectures in which as well least squares methods are often used.
515
01:14:03,000 --> 01:14:08,000
Those are typically so called ordinary least squares method.
516
01:14:08,000 --> 01:14:19,000
And the question is what makes the difference between the total least squares method that we consider here and these ordinary least squares method that you might know from other lectures.
517
01:14:19,000 --> 01:14:27,000
Well, the task itself is already a little bit different in the two methods.
518
01:14:27,000 --> 01:14:38,000
In the total least squares approach we deal with points in the two dimensional plane and we don't care about the orientation of the coordinate system in that plane.
519
01:14:38,000 --> 01:14:44,000
So we don't treat the x-axis in a different way than the y-axis.
520
01:14:44,000 --> 01:14:50,000
The result should be independent of the choice of the coordinate axis.
521
01:14:50,000 --> 01:15:02,000
While in the ordinary least squares we are not fitting a line in a two dimensional plane but we fit a linear function to a set of points.
522
01:15:02,000 --> 01:15:06,000
And here x and y have different meanings.
523
01:15:06,000 --> 01:15:11,000
So x is the independent variable, y is the dependent variable.
524
01:15:11,000 --> 01:15:17,000
And we want to find the relationship in which way y depends on x.
525
01:15:17,000 --> 01:15:20,000
So we treat y as a function of x.
526
01:15:20,000 --> 01:15:23,000
So that's something different.
527
01:15:23,000 --> 01:15:25,000
We are not interested in the line as such.
528
01:15:25,000 --> 01:15:33,000
We are interested in a linear function and the line is just the graph that visualizes the linear function.
529
01:15:33,000 --> 01:15:46,000
As a consequence in the ordinary least squares approach, the graph that we get, the function that we get, of course depends on the choice of the coordinate system.
530
01:15:46,000 --> 01:15:50,000
So that's relevant.
531
01:15:50,000 --> 01:15:57,000
So the method is anisotropic if you went to state it in one word.
532
01:15:57,000 --> 01:16:04,000
While in the total least squares the result is independent of the choice of coordinate systems in the two dimensional plane.
533
01:16:04,000 --> 01:16:07,000
So this result is isotropic.
534
01:16:07,000 --> 01:16:11,000
Yeah, and what do we minimize?
535
01:16:11,000 --> 01:16:18,000
In the total least squares as we have seen we minimize the orthogonal distance of the points to the line.
536
01:16:18,000 --> 01:16:22,000
In the ordinary least squares we do not minimize an orthogonal distance.
537
01:16:22,000 --> 01:16:38,000
What we minimize is the distance between the point and the linear function just considering the difference in y values, not considering differences in x values.
538
01:16:38,000 --> 01:16:53,000
And that, of course, is an important difference so that the result that you would get with ordinary least squares looks different than what you get with total least squares.
539
01:16:53,000 --> 01:17:04,000
And furthermore with ordinary least squares you will not be able to fit something like vertical lines.
540
01:17:04,000 --> 01:17:09,000
With total least squares no problem.
541
01:17:09,000 --> 01:17:16,000
So if you are dealing with least squares methods always check which case you are facing.
542
01:17:16,000 --> 01:17:21,000
Are you facing a case where the coordinate system doesn't matter?
543
01:17:21,000 --> 01:17:25,000
Then choose total least squares.
544
01:17:25,000 --> 01:17:38,000
If you are treating a case where you have a functional relationship between y and x then choose ordinary least squares.
545
01:17:38,000 --> 01:17:42,000
The total least squares approach has many advantages.
546
01:17:42,000 --> 01:17:45,000
It can be calculated very efficiently.
547
01:17:45,000 --> 01:17:47,000
It yields nice results.
548
01:17:47,000 --> 01:17:54,000
However, it suffers from one problem because it is not robust with respect to outliers.
549
01:17:54,000 --> 01:17:55,000
What is an outlier?
550
01:17:55,000 --> 01:18:01,000
Well, an outlier is a point that deviates very much from the line.
551
01:18:01,000 --> 01:18:10,000
That means if we consider the points which you find below, total least squares performs well and there is no problem.
552
01:18:10,000 --> 01:18:14,000
But there is also not a single outlier there.
553
01:18:14,000 --> 01:18:20,000
If we vary one point to that position this point becomes an outlier.
554
01:18:20,000 --> 01:18:25,000
It is somehow an errorless point if you like what I like to say.
555
01:18:25,000 --> 01:18:34,000
And total least square is not good in dealing with these points because the estimate that it would yield looks more like that.
556
01:18:34,000 --> 01:18:41,000
And as you can see the line is queued and does not really fit to the majority of points anymore.
557
01:18:41,000 --> 01:18:47,000
So what we need is a method that is more robust with respect to these outliers.
558
01:18:47,000 --> 01:18:54,000
So that a single outlier does not have such a strong impact on the solution.
559
01:18:54,000 --> 01:18:57,000
There are several ways to implement these ideas.
560
01:18:57,000 --> 01:19:01,000
One idea is to reduce the influence of outliers.
561
01:19:01,000 --> 01:19:10,000
That means those methods still consider outliers and outliers still have an influence, but not that strong as with total least squares.
562
01:19:10,000 --> 01:19:15,000
One of those methods is called M estimators and it is discussed in the lecture.
563
01:19:15,000 --> 01:19:19,000
Another way is to ignore outliers completely.
564
01:19:19,000 --> 01:19:21,000
There are as well several methods.
565
01:19:21,000 --> 01:19:25,000
One of that is called RANSAK which will be discussed in the lecture.
566
01:19:25,000 --> 01:19:33,000
So first of all let's have a look at M estimators.
567
01:19:33,000 --> 01:19:41,000
When we consider total least squares we minimize the sum over the squared distances.
568
01:19:41,000 --> 01:19:49,000
So the squared distance is plotted on the right hand side with a red curve.
569
01:19:49,000 --> 01:19:53,000
This one here.
570
01:19:53,000 --> 01:20:06,000
And obviously the larger the distance is, so that means the farther away from zero on the horizontal axis we are, the stronger increases di^2.
571
01:20:06,000 --> 01:20:14,000
That means for small deviations di^2 still is small.
572
01:20:14,000 --> 01:20:22,000
For large distances di, di^2 becomes huge.
573
01:20:22,000 --> 01:20:28,000
Now we might think why do we use the square function here?
574
01:20:28,000 --> 01:20:33,000
Can't we use functions that do not grow that quickly as the square function?
575
01:20:33,000 --> 01:20:36,000
And this gives the idea for M estimators.
576
01:20:36,000 --> 01:20:50,000
So instead of using the square function we use another function, say rho, which is often called the loss function of the estimator, and apply di to rho.
577
01:20:50,000 --> 01:20:56,000
And for rho we use functions that do not grow as quickly as the square function.
578
01:20:56,000 --> 01:21:04,000
But still in the vicinity of zero we want to preserve the nice properties of the square function,
579
01:21:04,000 --> 01:21:15,000
so we search for those functions which have similar behavior for small values and which do not grow that quickly for large values of di.
580
01:21:15,000 --> 01:21:21,000
On the right hand side we see several possible options and the formula.
581
01:21:21,000 --> 01:21:28,000
So the standard way of total e^2 would be the square function, which is shown here.
582
01:21:28,000 --> 01:21:36,000
So the farther away from zero we get the stronger the square function looks like.
583
01:21:36,000 --> 01:21:42,000
The first way to reduce this influence is to use the so-called Huber function.
584
01:21:42,000 --> 01:21:45,000
The Huber function is drawn in blue.
585
01:21:45,000 --> 01:21:54,000
The Huber function is exactly the same as the squared error term in the vicinity of zero.
586
01:21:54,000 --> 01:22:03,000
But from a certain point on, from here on, it does not grow radically, but it only grows linearly.
587
01:22:03,000 --> 01:22:07,000
So this can be also seen here in the formula.
588
01:22:07,000 --> 01:22:18,000
For in the vicinity of zero, in the first of these two cases, the function grows with a square of d.
589
01:22:18,000 --> 01:22:29,000
And in the second case, if we are away from zero, then the growth depends linearly on d.
590
01:22:29,000 --> 01:22:40,000
The parameter k, that defines the switching point between quadratic and linear behavior,
591
01:22:40,000 --> 01:22:44,000
has to be chosen manually according to the problem.
592
01:22:44,000 --> 01:22:53,000
So k has to be chosen in such a way that typical inliers are smaller than k, have distances smaller than k,
593
01:22:53,000 --> 01:22:57,000
and typical outliers have distances larger than k.
594
01:22:57,000 --> 01:23:03,000
And obviously the growth of the Huber function is smaller than the growth of the square function.
595
01:23:03,000 --> 01:23:17,000
That means points with large distances, outliers, have smaller impact on the solution than for the squared loss function.
596
01:23:17,000 --> 01:23:29,000
We can even further decrease the influence of large outliers using other error terms.
597
01:23:29,000 --> 01:23:35,000
For instance, one choice would be the Cauchy loss function that is plotted in green.
598
01:23:35,000 --> 01:23:43,000
In the vicinity of zero, it approximates the squared error term.
599
01:23:43,000 --> 01:23:51,000
But for larger values of d, it does not grow that much anymore.
600
01:23:51,000 --> 01:24:03,000
So the growth itself, so the slope of the function decreases for larger values of d.
601
01:24:03,000 --> 01:24:11,000
That means here, cross-outliers have even less influence on the result than for the Huber loss function.
602
01:24:11,000 --> 01:24:21,000
And another choice, which is even more strict than Cauchy, is the Tucky loss function plotted in violet,
603
01:24:21,000 --> 01:24:30,000
where points which are far away do not have a stronger impact than points, for instance, at this position.
604
01:24:30,000 --> 01:24:37,000
So here the influence of cross-outliers is limited, even completely limited.
605
01:24:37,000 --> 01:24:45,000
All these functions have some parameters, so for instance the kappa in the case of the Cauchy function,
606
01:24:45,000 --> 01:24:51,000
the k in case of the Huber function, the a in case of the Tucky function,
607
01:24:51,000 --> 01:25:03,000
and those values have to be chosen appropriately depending on the problem.
608
01:25:03,000 --> 01:25:08,000
That was the M estimator. The second option is to ignore outliers completely.
609
01:25:08,000 --> 01:25:15,000
And how can we implement that? Well, we could also say it's a kind of M estimator,
610
01:25:15,000 --> 01:25:22,000
but this M estimator ignores points completely, which are far away than a certain threshold.
611
01:25:22,000 --> 01:25:29,000
That means we could say we want to minimize the sum over sigma of di,
612
01:25:29,000 --> 01:25:36,000
where sigma is a function that is equal to zero if the distance is smaller than a threshold,
613
01:25:36,000 --> 01:25:41,000
or it's one if the distance is larger than a threshold.
614
01:25:41,000 --> 01:25:49,000
That means what we do with this kind of loss function is we just count the outliers.
615
01:25:49,000 --> 01:25:56,000
We just count those points which are far away from the line than a certain threshold theta is.
616
01:25:56,000 --> 01:26:04,000
Theta again is a parameter that we have to choose application dependent in an appropriate manner.
617
01:26:04,000 --> 01:26:08,000
So here the distance itself doesn't matter at all.
618
01:26:08,000 --> 01:26:15,000
It just matters whether it's larger or smaller than the threshold.
619
01:26:15,000 --> 01:26:22,000
So for this case here, that we can see here at the bottom,
620
01:26:22,000 --> 01:26:31,000
we can argue that for this choice of line, if we assume a certain threshold size theta,
621
01:26:31,000 --> 01:26:39,000
which is shown here by the dashed lines, so the dashed lines have a distance from the solid line of exactly theta,
622
01:26:39,000 --> 01:26:42,000
we just count the number of inliers and the number of outliers.
623
01:26:42,000 --> 01:26:50,000
So in this case the number of outliers is two. So the error term will yield an error of two.
624
01:26:50,000 --> 01:27:01,000
For that choice that we can see here, well if we count how many points are outside of this band between the two dashed lines,
625
01:27:01,000 --> 01:27:10,000
we find it's eight points. So the error term is eight. That's a bad fit.
626
01:27:10,000 --> 01:27:15,000
This idea is known in literature as the RANSAC approach.
627
01:27:15,000 --> 01:27:21,000
RANSAC stands for random sample consensus. So why random sample consensus?
628
01:27:21,000 --> 01:27:27,000
Because the algorithm that implements this idea uses some randomization.
629
01:27:27,000 --> 01:27:33,000
It randomly picks two points from the set of edge pixels.
630
01:27:33,000 --> 01:27:38,000
Then it creates a line that passes through these two points.
631
01:27:38,000 --> 01:27:43,000
So afterwards it checks how many outliers do we have for that line.
632
01:27:43,000 --> 01:27:49,000
And this process is repeated again and again. And since we always pick randomly two points,
633
01:27:49,000 --> 01:27:54,000
the solutions that we get in each trial will be different.
634
01:27:54,000 --> 01:28:03,000
And at the end we check which of the trials yielded the best solution, the smallest number of outliers.
635
01:28:03,000 --> 01:28:06,000
And that's the solution of the RANSAC algorithm.
636
01:28:06,000 --> 01:28:12,000
Of course, if we like, we can add one additional step.
637
01:28:12,000 --> 01:28:17,000
Once we found the best solution, we know which points are outliers and which points are inliers.
638
01:28:17,000 --> 01:28:30,000
Based on that, we could do a total least sum of squares fit only on the inliers to obtain an even better solution.
639
01:28:30,000 --> 01:28:36,000
Here is a procedure shown as a kind of processing plot.
640
01:28:36,000 --> 01:28:45,000
So we start in the first trial, pick randomly two points, estimate the line parameters, evaluate the error and obtain the number of outliers.
641
01:28:45,000 --> 01:28:51,000
Then we repeat that in a second trial, independent of the first trial and so on and so on.
642
01:28:51,000 --> 01:28:53,000
Up to the case trial.
643
01:28:53,000 --> 01:29:00,000
Usually based on an assumption of how many outliers we typically expect to have,
644
01:29:00,000 --> 01:29:15,000
we determine the number of trials in such a way that the probability of randomly picking two outliers all the time in all the trials is so small that we can neglect it.
645
01:29:15,000 --> 01:29:26,000
After we have done that, we pick the trial with the smallest number of outliers and that yields the line parameters.
646
01:29:26,000 --> 01:29:31,000
Here is an example again, a sequence of points.
647
01:29:31,000 --> 01:29:40,000
In the first trial, we might pick the two red points and fit a line to them and then check how many outliers do we get.
648
01:29:40,000 --> 01:29:45,000
In this case, for that choice of theta, we get six outliers.
649
01:29:45,000 --> 01:29:54,000
So six points outlier of this band that is bounded by the dashed lines.
650
01:29:54,000 --> 01:30:03,000
Independent of this first choice, we repeat the process with the second trial and in the second trial we pick randomly another point.
651
01:30:03,000 --> 01:30:09,000
So these might be the points which are now shown in red.
652
01:30:09,000 --> 01:30:16,000
Again, we connect these two points, fit a line to them and count the number of outliers.
653
01:30:16,000 --> 01:30:26,000
In that case, we find seven points outside of the band which is bounded by the two dashed lines.
654
01:30:26,000 --> 01:30:36,000
In a third trial, we might pick these two points shown in red, estimate a line and count the outliers equal to three.
655
01:30:36,000 --> 01:30:44,000
And in a fourth trial, we might pick those two points which are unfortunately one of them is an outlier.
656
01:30:44,000 --> 01:30:50,000
We again estimate the line that connects both, count the number of outliers and get ten.
657
01:30:50,000 --> 01:31:00,000
So at the end, we select the trial with the smallest number of outliers and we find, yeah, it was the trial number three.
658
01:31:00,000 --> 01:31:11,000
That means that becomes the overall solution of the RANSAC estimate.
659
01:31:11,000 --> 01:31:18,000
Let us compare the properties of the two estimators for robust estimation, the M estimator and the RANSAC.
660
01:31:18,000 --> 01:31:29,000
The idea in the M estimator is to use a loss function that puts less influence on points that are far away on the outliers.
661
01:31:29,000 --> 01:31:38,000
So the outliers still have an influence on the solution, but the solution is smaller than for the sum of squares approach.
662
01:31:38,000 --> 01:31:42,000
In RANSAC, we completely ignore the outliers.
663
01:31:42,000 --> 01:31:47,000
We just consider inliers and ignore the outliers completely.
664
01:31:47,000 --> 01:31:52,000
So it doesn't matter how far a point an outlier is away from the line.
665
01:31:52,000 --> 01:32:00,000
The influence does not depend on that. The influence is just zero.
666
01:32:00,000 --> 01:32:10,000
The parameters which we need to determine to use that method is, well, for the M estimator, we have to determine the error terms or the loss function
667
01:32:10,000 --> 01:32:14,000
and maybe the width parameter of the loss function.
668
01:32:14,000 --> 01:32:25,000
For the RANSAC, we need to choose an acceptance threshold and we need to determine a number of trials that we run RANSAC.
669
01:32:25,000 --> 01:32:39,000
And the algorithm, well, which algorithm do we use for the M estimator that becomes a numerical optimizer that minimizes a kind of weighted least squares.
670
01:32:39,000 --> 01:32:49,000
So the implementation of the M estimator is traced back to the total least squares method, but with some additional weights for all the points.
671
01:32:49,000 --> 01:33:01,000
And that yields a numeric iterative solution algorithm that calculates the solution in an iterative manner.
672
01:33:01,000 --> 01:33:08,000
For RANSAC, we make repeated guesses from pairs of points and then check the number of outliers.
673
01:33:08,000 --> 01:33:24,000
So, I've implemented or we have implemented together with Ahivi a demo tool that allows to play around with these methods and to check how the solutions change.
674
01:33:24,000 --> 01:33:31,000
Unfortunately, in this slideshow, it is hard to integrate this tool.
675
01:33:31,000 --> 01:33:47,000
Therefore, I will create an extra video with just this demo tool and put it on Ilya. So please check this extra video with a demo tool.
676
01:33:47,000 --> 01:33:56,000
Now we have seen how we can fit lines and how we can determine polylines from edge images.
677
01:33:56,000 --> 01:34:01,000
However, lines are not the only structures that are relevant in practice.
678
01:34:01,000 --> 01:34:08,000
Often we want to estimate circles or pieces of circles and ellipses to structures.
679
01:34:08,000 --> 01:34:15,000
So the question is how can we fit those structures to edge lists?
680
01:34:15,000 --> 01:34:22,000
Well, have a look at those on the subsequent slides.
681
01:34:22,000 --> 01:34:28,000
First of all, we need to know how we can determine or how can we represent circles and ellipses.
682
01:34:28,000 --> 01:34:35,000
So for lines we already know that there is a normal form which is very suitable for the estimation task.
683
01:34:35,000 --> 01:34:39,000
For circles, the representation must be different.
684
01:34:39,000 --> 01:34:46,000
In a classical way, a circle is represented as a set of points that all have the same distance from a center.
685
01:34:46,000 --> 01:34:53,000
So what we need to know is the center point and the radius of the circle.
686
01:34:53,000 --> 01:34:56,000
So this classical approximation would look like that.
687
01:34:56,000 --> 01:35:08,000
So if we want to consider all points x, y, which are on a certain circle, which is defined by its midpoint, m1, m2, and its radius r,
688
01:35:08,000 --> 01:35:20,000
then we would say the set of all points for which x minus m1 squared plus y minus m2 squared minus r squared equals 0 describes all those points.
689
01:35:20,000 --> 01:35:32,000
Or in other words, the set of all points x, y, which have a Euclidean distance equal to r, describe that circle.
690
01:35:32,000 --> 01:35:41,000
If we want to determine the distance of a point from the circle, what do we do?
691
01:35:41,000 --> 01:35:51,000
Say we want to consider this point vector x and calculate the smallest distance of that point from the circle.
692
01:35:51,000 --> 01:36:00,000
What we do is we calculate the distance of that point from the center point of the circle and subtract the radius.
693
01:36:00,000 --> 01:36:14,000
That means the Euclidean distance of a point x, y from the circle becomes the square root of x minus m1 squared plus y minus m2 squared minus r.
694
01:36:14,000 --> 01:36:20,000
This is a signed number, of course, so the sign tells us whether we are inside or outside of the circle.
695
01:36:20,000 --> 01:36:26,000
If you are not interested in the sign itself, we take the absolute value of this number.
696
01:36:26,000 --> 01:36:38,000
So the first term with the square root is just the Euclidean distance between the point x and the center point, and this distance is compared with the radius of the circle.
697
01:36:38,000 --> 01:36:44,000
This is known as the Euclidean distance of a point from the circle.
698
01:36:44,000 --> 01:36:52,000
However, in practice it is useful to consider also another kind of distance measure,
699
01:36:52,000 --> 01:37:01,000
which is not a Euclidean distance for sure, but a related distance, which is also known as the algebraic distance.
700
01:37:01,000 --> 01:37:12,000
In that distance we do not calculate the Euclidean distance between x and m, between the point x and the center point, but the squared Euclidean distance,
701
01:37:12,000 --> 01:37:16,000
and subtract the squared radius.
702
01:37:16,000 --> 01:37:22,000
Of course, for all points on the circle this distance also is equal to zero.
703
01:37:22,000 --> 01:37:33,000
And for all the points outside of the circle, which are not exactly on the circle, this distance grows with increasing distance from the circle.
704
01:37:33,000 --> 01:37:34,000
That makes sense.
705
01:37:34,000 --> 01:37:38,000
But indeed it is not the same as the Euclidean distance.
706
01:37:38,000 --> 01:37:49,000
But the advantage is that we get rid of the square root, and square roots are hard to deal with in efficient calculations.
707
01:37:49,000 --> 01:38:04,000
If we compare these two distance measurements, we find that the distances that are calculated for certain points look like that.
708
01:38:04,000 --> 01:38:13,000
Here on the plot we see on the horizontal axis the distance or the position of a real point.
709
01:38:13,000 --> 01:38:19,000
Zero would mean the point is located in the center, at the center point of the circle.
710
01:38:19,000 --> 01:38:25,000
R means it is located on the circle, so at a distance of R from the center.
711
01:38:25,000 --> 01:38:31,000
And all the points that are larger than R means the point is outside of the circle.
712
01:38:31,000 --> 01:38:42,000
And the vertical axis, or the two vertical axes, better to say, represent the distance measure that is given by either the Euclidean distance in blue,
713
01:38:42,000 --> 01:38:45,000
or the ultra-biotic distance in red.
714
01:38:45,000 --> 01:38:53,000
For the Euclidean distance we obtain a result that is intuitive in what we expect.
715
01:38:53,000 --> 01:39:03,000
So the error increases in a linear way as far away from the circle we get the larger the Euclidean distance is.
716
01:39:03,000 --> 01:39:14,000
Compared to that, the algebraic distance increases faster for points outside of the circle and increases not that fast for points inside of the circle.
717
01:39:14,000 --> 01:39:21,000
However, for points which are close to the circle, both distances are very similar.
718
01:39:21,000 --> 01:39:30,000
So both distance measures do not deviate considerably as long as we are close to the circle itself.
719
01:39:30,000 --> 01:39:39,000
But for points which are far away from the circle, both distances differ considerably.
720
01:39:39,000 --> 01:39:57,000
Now let's use again some method to estimate the circle parameters from a set of points, from a set of noisy points which are scattered around a circle.
721
01:39:57,000 --> 01:40:06,000
So here we find the points in black, and as we can see they deviate rather strong from the real circle.
722
01:40:06,000 --> 01:40:11,000
So the impact of the algebraic distance here becomes apparent.
723
01:40:11,000 --> 01:40:18,000
And if we start to minimize the squared distance, the squared sum of distance of all the points,
724
01:40:18,000 --> 01:40:25,000
in order to obtain the parameters of the circle, we get two different solutions.
725
01:40:25,000 --> 01:40:38,000
If we use the Euclidean distance, we get a radius that is smaller than what we get when we use the algebraic distance to solve the minimization task.
726
01:40:38,000 --> 01:40:49,000
Of course, this is what we expect, because in the algebraic distance points that are outside of the circle have more impact on the solution,
727
01:40:49,000 --> 01:40:57,000
because the algebraic distance grows more rapidly for points outside of the circle than for points inside of the circle.
728
01:40:57,000 --> 01:41:01,000
And then we get a deviation like that, as we can see it here.
729
01:41:01,000 --> 01:41:09,000
However, if the points are not scattered around in such a strong manner as we can see it here in the example,
730
01:41:09,000 --> 01:41:18,000
but if the points are just scattered a little bit around the real circle, then the influence of the algebraic distance remains rather small,
731
01:41:18,000 --> 01:41:22,000
and the results are almost the same.
732
01:41:22,000 --> 01:41:31,000
That means if the points are spread a lot, then it is better to use the Euclidean distance, although the computation is more difficult.
733
01:41:31,000 --> 01:41:40,000
If the points are not spread very much around the circle, then the algebraic distance is very suitable.
734
01:41:40,000 --> 01:41:46,000
And the advantage of the algebraic distance is that the computation of the circle is much easier.
735
01:41:46,000 --> 01:41:58,000
So, if you want to minimize the square Euclidean distance for a circle, then we cannot solve that analytically.
736
01:41:58,000 --> 01:42:03,000
We do not find a closed-form solution, but we need a numerical optimization.
737
01:42:03,000 --> 01:42:10,000
That means an iterative approach that needs several iterations to find the solution.
738
01:42:10,000 --> 01:42:16,000
For the algebraic distance, that is not the case, so we find a closed-form solution.
739
01:42:16,000 --> 01:42:20,000
For that purpose, we rewrite the algebraic distance a little bit.
740
01:42:20,000 --> 01:42:26,000
Instead of writing x minus m1 squared plus y minus m2 squared minus r3 squared,
741
01:42:26,000 --> 01:42:32,000
we can multiply these brackets and then rearrange all the terms.
742
01:42:32,000 --> 01:42:43,000
Then we get equal representation, first terms that do not depend on the parameters of the circle itself, that is x squared plus y squared,
743
01:42:43,000 --> 01:42:53,000
then parameters or some terms which do not depend on x and y, this is m1 squared plus m2 squared minus r squared,
744
01:42:53,000 --> 01:43:03,000
then some terms that depend on x, which is minus 2m1 times x, and some terms that depend on y, which is minus 2m2 times y.
745
01:43:03,000 --> 01:43:19,000
Now, if we rename minus 2 times m1 as a, minus 2 times m2 as b, and m1 squared plus m2 squared minus r squared as c,
746
01:43:19,000 --> 01:43:27,000
then we get an algebraic representation that has a form ax plus by plus c plus x squared plus y squared.
747
01:43:27,000 --> 01:43:34,000
We find that there is a unique mapping between the old parameters of the circle m1, m2, r,
748
01:43:34,000 --> 01:43:39,000
and the new parameters with which we represent the circle a, b, and c.
749
01:43:39,000 --> 01:43:51,000
So a is just minus 2 times m1, b is equal to minus 2 times m2, and c is equal to m1 squared plus m2 squared minus r squared.
750
01:43:51,000 --> 01:44:05,000
So there is a unique mapping between these two parameter sets, so that we can represent a circle either using m1, m2, and r, or a, b, and c.
751
01:44:05,000 --> 01:44:18,000
So now what we need to do is we want to minimize the sum of the squared algebraic distance of all the points that we consider for the estimation of the circle.
752
01:44:18,000 --> 01:44:27,000
So we sum up over all the points, so i is the index for each point, and then we take the squared euclidean distance of that point.
753
01:44:27,000 --> 01:44:36,000
Now we can again calculate the partial derivatives of this term with respect to a, b, and c,
754
01:44:36,000 --> 01:44:43,000
zero those partial derivatives, rewrite everything and arrange it as a matrix vector multiplication.
755
01:44:43,000 --> 01:44:52,000
And after some steps of calculation, which are not really difficult, but which are a little bit too lengthy to show them here on the slide,
756
01:44:52,000 --> 01:45:04,000
we get as a result a matrix on the left hand side, a 3 by 3 matrix, times the vector of unknown parameters a, b, and c, equals some vector on the right hand side.
757
01:45:04,000 --> 01:45:14,000
And we see that the matrix on the left hand side and the vector on the right hand side are built out of the coordinates of the pixels that we want to consider.
758
01:45:14,000 --> 01:45:27,000
So that means once we know the pixels that we want to use to estimate the circle parameters, we can first create this matrix and this vector on the right hand side.
759
01:45:27,000 --> 01:45:37,000
Then we get a system of linear equations and as long as this matrix has full rank, which is typically fulfilled if we consider at least three points,
760
01:45:37,000 --> 01:45:48,000
then we can resolve the system of equations and get solutions for the unknown parameters a, b, and c.
761
01:45:48,000 --> 01:46:02,000
Once we have found a, b, and c in such a way, we can calculate m1, m2, and r in a straightforward manner, as it is shown here.
762
01:46:02,000 --> 01:46:15,000
So that means we find a closed form solution in terms of a linear system of equations to calculate the unknown parameters of the circle.
763
01:46:15,000 --> 01:46:21,000
And that is computationally rather efficient.
764
01:46:21,000 --> 01:46:26,000
Let's have a look at how this kind of circle estimation works in practice.
765
01:46:26,000 --> 01:46:37,000
And for that purpose, let's consider some work that has been done in our group many years ago by Christoph Speck, a former PhD student.
766
01:46:37,000 --> 01:46:47,000
He actually was working in the field of forensic computer vision and one of his tasks was to analyze bullet casings.
767
01:46:47,000 --> 01:47:03,000
So bullet casings are say parts of bullets of guns and once you shoot with a gun, the bullet cases get some typical structures impressed by the gun.
768
01:47:03,000 --> 01:47:11,000
So that means if someone was murdered and police comes, they are typically interested in analyzing these bullet casings
769
01:47:11,000 --> 01:47:24,000
because the structures that you find on the bullet cases give some hint or give some cues on the gun with which the bullet casing was shot.
770
01:47:24,000 --> 01:47:34,000
That means once you find a gun, you can use that gun in an experimental shooting and then collect the bullet casings there as well
771
01:47:34,000 --> 01:47:38,000
and then compare whether the structures that you find are the same.
772
01:47:38,000 --> 01:47:49,000
And then you know whether the bullet was shot from that gun that you found or whether that gun was not the one that was used for shooting.
773
01:47:49,000 --> 01:47:58,000
So in this case, to give you an idea, mainly those structures here in the center matter.
774
01:47:58,000 --> 01:48:04,000
The structures which you see outside here are not that relevant, but the structures inside.
775
01:48:04,000 --> 01:48:10,000
Because they are created by the gun when the bullet is shot.
776
01:48:10,000 --> 01:48:15,000
And as you can see, we find a lot of circular structures here.
777
01:48:15,000 --> 01:48:27,000
And if you have a considerable microscope or a certain camera with suitable magnification, we get pictures like that.
778
01:48:27,000 --> 01:48:34,000
And one of the tasks to analyze these structures is to estimate the circles.
779
01:48:34,000 --> 01:48:45,000
On the right hand side, we get the circles that are estimated using first some kind of half transform for circles to get some basic idea where circles are located
780
01:48:45,000 --> 01:48:52,000
and afterwards use some circle fitting with the algebraic distance to get the final fit.
781
01:48:52,000 --> 01:48:58,000
And we clearly see that those circles are nicely fit by this method.
782
01:48:58,000 --> 01:49:02,000
Now let's extend our findings to ellipses.
783
01:49:02,000 --> 01:49:05,000
Ellipses can be seen as generalized circles.
784
01:49:05,000 --> 01:49:08,000
They also have a center like circles.
785
01:49:08,000 --> 01:49:15,000
They have radii, but ellipses have two radii, two different radii, not just one like circles.
786
01:49:15,000 --> 01:49:19,000
And they have a turning angle that describes the orientation of the ellipse.
787
01:49:19,000 --> 01:49:25,000
The figure on the right hand side illustrates these properties of ellipses.
788
01:49:25,000 --> 01:49:29,000
So here we have the center point of the ellipse.
789
01:49:29,000 --> 01:49:31,000
Then we have two main axes.
790
01:49:31,000 --> 01:49:41,000
The axis with a larger radius R1 and the orthogonal axis with a smaller radius R2.
791
01:49:41,000 --> 01:49:52,000
Furthermore, these two axes do not necessarily have the same orientation as the coordinate axis of that plane,
792
01:49:52,000 --> 01:49:55,000
but they might be turned with respect to that coordinate axis.
793
01:49:55,000 --> 01:50:07,000
And this turning between the coordinate axis and the main axis of the ellipse is given by this turning angle theta.
794
01:50:07,000 --> 01:50:14,000
That means in total we need five parameters R1, R2, the two parameters of the vector M,
795
01:50:14,000 --> 01:50:20,000
and the angle theta to describe an ellipse.
796
01:50:20,000 --> 01:50:28,000
However, in practice it is sometimes convenient to use other parameterizations of ellipses.
797
01:50:28,000 --> 01:50:36,000
So as we have seen it for circles, where the usage of the radius and the center is just one way to represent an ellipse,
798
01:50:36,000 --> 01:50:40,000
there are also other ways to represent an ellipse.
799
01:50:40,000 --> 01:50:47,000
And the general way to do it is to use six parameters A, H, B, G, F and C.
800
01:50:47,000 --> 01:51:00,000
And to state that all points that meet this equation A times x squared plus H times xy plus B times y squared plus G times x plus F times y plus C equals 0
801
01:51:00,000 --> 01:51:05,000
are those points which are located on the ellipse.
802
01:51:05,000 --> 01:51:12,000
We need one additional constraint, namely that 4 times AB minus H squared is greater than 0.
803
01:51:12,000 --> 01:51:17,000
Only if this condition is met we are really facing an ellipse.
804
01:51:17,000 --> 01:51:29,000
If this constraint is not met, then we might be faced with for instance a hyperbola, which we are not interested in here.
805
01:51:29,000 --> 01:51:34,000
So let's again stick to this representation with six parameters.
806
01:51:34,000 --> 01:51:40,000
However, we have seen already that five parameters are actually enough to represent an ellipse.
807
01:51:40,000 --> 01:51:44,000
But in this modified representation we have six parameters.
808
01:51:44,000 --> 01:51:50,000
So there is one degree of freedom in the choice of parameters.
809
01:51:50,000 --> 01:51:55,000
That means if we find a suitable parameter set A, H, B, G, F, C,
810
01:51:55,000 --> 01:52:00,000
we could multiply all these parameters with the same positive constant
811
01:52:00,000 --> 01:52:09,000
and we would still get a representation of the same ellipse where the representation is different.
812
01:52:09,000 --> 01:52:15,000
So one ellipse has several representations of this form.
813
01:52:15,000 --> 01:52:20,000
This is not really nice if we want to find the best parameter set,
814
01:52:20,000 --> 01:52:33,000
because it introduces some problems in the optimization algorithm that should find the best fit.
815
01:52:33,000 --> 01:52:40,000
Therefore we want to somehow restrict these parameters in such a way
816
01:52:40,000 --> 01:52:47,000
that we get rid of this additional degree of freedom.
817
01:52:47,000 --> 01:52:50,000
There are several possibilities how we could do that.
818
01:52:50,000 --> 01:52:57,000
One way would be to state that A must be equal to 1, or that A plus B must be equal to 1,
819
01:52:57,000 --> 01:53:02,000
or that the Euclidean length of the parameter vector is equal to 1.
820
01:53:02,000 --> 01:53:08,000
In some papers you even find the possibility to set C equal to 1.
821
01:53:08,000 --> 01:53:13,000
However, the last version is a little bit tricky,
822
01:53:13,000 --> 01:53:17,000
because that solution is not invariant with respect to translation.
823
01:53:17,000 --> 01:53:25,000
So there are some ellipses which cannot be represented with this choice of C.
824
01:53:25,000 --> 01:53:31,000
There are several algorithms to solve this problem, how we can fit an ellipse to a set of points.
825
01:53:31,000 --> 01:53:38,000
Meanwhile the most popular one is the approach of Fitzgibbon, Pilou and Fischer.
826
01:53:38,000 --> 01:53:44,000
I don't want to go into details, because the mathematical details are a little bit more difficult.
827
01:53:44,000 --> 01:53:47,000
It is based on a generalist eigenvalue problem,
828
01:53:47,000 --> 01:53:55,000
so it becomes a little bit more difficult than what we have seen in the line fit problem,
829
01:53:55,000 --> 01:54:00,000
where we had a normal eigenvalue problem, now we have a generalist eigenvalue problem,
830
01:54:00,000 --> 01:54:05,000
and that is then more difficult to be solved numerically.
831
01:54:05,000 --> 01:54:10,000
I don't want to go into these details, but I just want to mention there is this approach,
832
01:54:10,000 --> 01:54:18,000
and of course you also find implementations of it in most software libraries for computer vision.
833
01:54:18,000 --> 01:54:22,000
So I recommend to use this method if you need it.
834
01:54:22,000 --> 01:54:29,000
It is based on the idea to use the constraint 4ab minus h squared equal to 1,
835
01:54:29,000 --> 01:54:40,000
to get rid of this degree of freedom, and based on that it is possible to get a nice mathematical modeling.
836
01:54:40,000 --> 01:54:48,000
When we use this method on the example picture with a cup on the mousepad,
837
01:54:48,000 --> 01:54:53,000
and apply it to those structures which are really elliptical,
838
01:54:53,000 --> 01:54:57,000
we get the results that are shown on the picture at the bottom.
839
01:54:57,000 --> 01:55:07,000
So the colored ellipses are those that are fitted to the edge lists that we find.
840
01:55:07,000 --> 01:55:16,000
And if we for instance consider the violet, or the green, or also the cyan ellipse that we estimated,
841
01:55:16,000 --> 01:55:21,000
we can see that they really fit well to what we expect.
842
01:55:21,000 --> 01:55:30,000
For the red ellipse that's a little bit maybe unexpected, because it also adapts to non-elliptical parts of the cup.
843
01:55:30,000 --> 01:55:40,000
But still it's not really bad what we get.
844
01:55:40,000 --> 01:55:48,000
So that's all we want to discuss about line fitting, or about curve fitting in general.
845
01:55:48,000 --> 01:55:51,000
So let's summarize our findings.
846
01:55:51,000 --> 01:55:57,000
First we introduce the half transform as a method to extract lines from an image.
847
01:55:57,000 --> 01:56:06,000
The idea is to consider all the edge pixels for each edge pixel,
848
01:56:06,000 --> 01:56:11,000
create all the lines that pass through these edge pixels,
849
01:56:11,000 --> 01:56:24,000
and then use the parameter space and increment all the cells in this accumulator array in the parameter space,
850
01:56:24,000 --> 01:56:29,000
which represent lines that pass through this point.
851
01:56:29,000 --> 01:56:35,000
And afterwards we are searching for local maxima in this accumulator array,
852
01:56:35,000 --> 01:56:41,000
and those local maxima provide to us some dominant lines in the image.
853
01:56:41,000 --> 01:56:44,000
So a very general approach.
854
01:56:44,000 --> 01:56:51,000
It does not use gradient information, it is somehow reasonable, it yields reasonable results.
855
01:56:51,000 --> 01:56:59,000
However, it is not considering any local information of a pixel and its vicinity.
856
01:56:59,000 --> 01:57:08,000
It's just considering there are pixels, but it does not consider neighborhood of pixels,
857
01:57:08,000 --> 01:57:15,000
it does not consider gradient orientation or something like that.
858
01:57:15,000 --> 01:57:23,000
The other way was to collect edge pixels in a list of edge pixels,
859
01:57:23,000 --> 01:57:31,000
using the neighborhood and using the gradient information to collect edge pixels, which are somehow belonging to the same edge.
860
01:57:31,000 --> 01:57:37,000
First doing some edge following and then applying the Ramada-Klas-Polka algorithm
861
01:57:37,000 --> 01:57:41,000
to split up those edge element lists into smaller pieces,
862
01:57:41,000 --> 01:57:49,000
where we can assume that each of the smaller sub-lists can be represented by an individual line.
863
01:57:49,000 --> 01:57:56,000
And then we were discussing line estimation techniques, starting with the total least squares approach,
864
01:57:56,000 --> 01:58:01,000
that just minimizes the sum of squared distances of points from the line.
865
01:58:01,000 --> 01:58:07,000
This yields somehow very good results, but it suffers from outliers.
866
01:58:07,000 --> 01:58:15,000
So if we are faced with outliers in the data, then this approach does not perform as good as we would expect it to be.
867
01:58:15,000 --> 01:58:19,000
To avoid these problems with outliers, we get to know M-estimators,
868
01:58:19,000 --> 01:58:28,000
which are somehow changing the error term that is used, the loss function, to reduce the influence of outliers.
869
01:58:28,000 --> 01:58:36,000
And we discussed the RANSAC, which completely ignores outliers and just counts outliers,
870
01:58:36,000 --> 01:58:41,000
but does not distinguish between an outlier that is far away and that is very far away,
871
01:58:41,000 --> 01:58:47,000
but only considers there is an outlier or there isn't.
872
01:58:47,000 --> 01:58:53,000
Finally, we discussed the case of circles and ellipses.
873
01:58:53,000 --> 01:59:02,000
We have introduced several ways to represent circles and ellipses as parametric curves,
874
01:59:02,000 --> 01:59:11,000
and we have then derived two different distance measures, the Euclidean distance and the algebraic distance.
875
01:59:11,000 --> 01:59:18,000
We have found for circles that it is easier, computationally easier, to deal with the algebraic distance,
876
01:59:18,000 --> 01:59:23,000
but the results are more accurate if we would use the Euclidean distance.
877
01:59:23,000 --> 01:59:30,000
But at least if the points that we consider for circle estimation are not scattered too much,
878
01:59:30,000 --> 01:59:34,000
as long as they are still very close to the circle,
879
01:59:34,000 --> 01:59:41,000
the algebraic distance approximates the Euclidean distance sufficiently well.
880
01:59:41,000 --> 01:59:52,000
While the points are scattered very much, then the algebraic distance overestimates the radius of the circle that we get.
881
01:59:52,000 --> 02:00:04,000
For ellipses, we just introduced or mentioned the algorithm with which we can estimate the ellipses without going into details.
116693
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.