/  Deep Learning Interview questions and answers   /  What Is Geometric Transformation?

What Is Geometric Transformation?

A geometric transformation is any bijection of a set having some geometric structure to itself or another such set. Specifically, “A geometric transformation is a function whose domain and range are sets of points. Most often the domain and range of a geometric transformation are both R2 or both R3. Often geometric transformations are required to be 1-1 functions, so that they have inverses.” [1] The study of geometry may be approached via the study of these transformations.

There are two basic steps in geometric transformations:

1. A spatial transformation of the physical rearrangement of pixels in the image, and

2. a grey level interpolation, which assigns grey levels to the transformed image

Spatial transformation

Pixel coordinates (x,y) undergo geometric distortion to produce an image with coordinates (x’,y’):

where r and s are functions depending on x and y.

Grey Level Interpolation

The problem we have to consider here is that, in general, the distortion correction equations will produce values x’ and y’ that are not integers. We end up with a set of grey levels for non integer positions in the image. We want to determine what grey levels should be assigned to the integer pixel locations in the output image.

The simplest approach is to assign the grey value for F(x,y) to the pixel having closest integer

coordinates to . The problem with this is that some pixels may be assigned two grey values, and some may not be assigned a grey level at all – depending on how the integer rounding turns out.

The way to solve this is to look at the problem the other way round. Consider integer pixel locations in the output image and calculate where they must have come from in the input image. That is, work out the inverse image transformation.

These locations in the input image will not (in general) have integer coordinates. However, we do know the grey levels of the 4 surrounding integer pixel positions. All we have to do is interpolate across these known intensities to determine the correct grey level of the position when the output pixel came from.

Various interpolation schemes can be used. A common one is bilinear interpolation, given by

v(x,y) = c1x + c2y + c3xy + c4,

where v(x,y) is the grey value at position (x,y).

Thus we have four coefficients to solve for. We use the known grey values of the 4 pixels surrounding the come from’ location to solve for the coefficients.

We need to solve the equation

or, in short,

[V] = [M][C],

which implies

[C] = [M]-1[V].

This has to be done for every pixel location in the output image and is thus a lot of computation! Alternatively, one could simply use the integer pixel position closest to the come from location’. This is adequate for most cases.

Translation

A translation moves an object to a different position on the screen. You can translate a point in 2D by adding translation coordinate (tx, ty) to the original coordinate (X, Y) to get the new coordinate (X’, Y’).

From the above figure, you can write that −

X’ = X + tx

Y’ = Y + ty

The pair (tx, ty) is called the translation vector or shift vector. The above equations can also be represented using the column vectors.

$P = \frac{[X]}{[Y]}$ p’ = $\frac{[X’]}{[Y’]}$T = $\frac{[t_{x}]}{[t_{y}]}$

We can write it as −

P’ = P + T

Rotation

In rotation, we rotate the object at particular angle θ (theta) from its origin. From the following figure, we can see that the point P(X, Y) is located at angle φ from the horizontal X coordinate with distance r from the origin.

Let us suppose you want to rotate it at the angle θ. After rotating it to a new location, you will get a new point P’ (X’, Y’).

standard trigonometric the original coordinate of point P(X, Y) can be represented as −

$X = r \, cos \, \phi …… (1)$

$Y = r \, sin \, \phi …… (2)$

Same way we can represent the point P’ (X’, Y’) as −

${x}’= r \: cos \: \left ( \phi \: + \: \theta \right ) = r\: cos \: \phi \: cos \: \theta \: − \: r \: sin \: \phi \: sin \: \theta ……. (3)$

${y}’= r \: sin \: \left ( \phi \: + \: \theta \right ) = r\: cos \: \phi \: sin \: \theta \: + \: r \: sin \: \phi \: cos \: \theta ……. (4)$

Substituting equation (1) & (2) in (3) & (4) respectively, we will get

${x}’= x \: cos \: \theta − \: y \: sin \: \theta$

${y}’= x \: sin \: \theta + \: y \: cos \: \theta$

Representing the above equation in matrix form,

$$[X’ Y’] = [X Y] \begin{bmatrix} cos\theta & sin\theta \\ −sin\theta & cos\theta \end{bmatrix}OR$$

P’ = P . R

Where R is the rotation matrix

$$R = \begin{bmatrix} cos\theta & sin\theta \\ −sin\theta & cos\theta \end{bmatrix}$$

The rotation angle can be positive and negative.

For positive rotation angle, we can use the above rotation matrix. However, for negative angle rotation, the matrix will change as shown below −

$$R = \begin{bmatrix} cos(−\theta) & sin(−\theta) \\ -sin(−\theta) & cos(−\theta) \end{bmatrix}$$

$$=\begin{bmatrix} cos\theta & −sin\theta \\ sin\theta & cos\theta \end{bmatrix} \left (\because cos(−\theta ) = cos \theta \; and\; sin(−\theta ) = −sin \theta \right )$$

Scaling

To change the size of an object, scaling transformation is used. In the scaling process, you either expand or compress the dimensions of the object. Scaling can be achieved by multiplying the original coordinates of th

e object with the scaling factor to get the desired result.

Let us assume that the original coordinates are (X, Y), the scaling factors are (SX, SY), and the produced coordinates are (X’, Y’). This can be mathematically represented as shown below −

X’ = X . SX and Y’ = Y . SY

The scaling factor SX, SY scales the object in X and Y direction respectively. The above equations can also be represented in matrix form as below −

$$\binom{X’}{Y’} = \binom{X}{Y} \begin{bmatrix} S_{x} & 0\\ 0 & S_{y} \end{bmatrix}$$

OR

P’ = P . S

Where S is the scaling matrix. The scaling process is shown in the following figure.

If we provide values less than 1 to the scaling factor S, then we can reduce the size of the object. If we provide values greater than 1, then we can increase the size of the object.