Urządzenia do śledzenie wzroku (eye-tracking) są w ostatnich latach niezwykle popularnym narzędziem badawczym pozwalającym na uzyskanie informacji odnośnie ruchu oka oraz położenia w danym przedziale czasowym. Dane pozyskane w ten sposób mogą zostać wykorzystane np. podczas prowadzenia badań z zakresu użyteczności interfejsów, skuteczności przekazu reklamowego czy jako bezdotykowy interfejs komputerowy. Celem opracowania jest przedstawienie metod śledzenia wzroku z wykorzystaniem standardowej kamery internetowej.
Chapter 1
Introduction
Eye tracking (also called »oculogra ¡in Polish) is a collection of techniques
research to obtain information regarding eye movement
positions in a given time period and (if any) point k-
eyesight. Data obtained in this way can be used
e.g. during research on the usability of interfaces,
cheap text or the effectiveness of an advertising message.
Every day our eyes are intensively used by many
for various purposes: to read, to watch entertainment videos, to
guarding and learning new things. However, we do not all pass
they realize how complicated the functioning is
human visual system.
Okulogra a, i.e. ± tracking eye movements
king) is a technique that has been used for over 100 years in areas such as
psychology, medicine, human-computer interaction, marketing and many
other.
1.1 Putting the problem
The goal of the master’s thesis is to develop and implement a ± le-
real-time eyes of a PC user
5
via webcam. The system’s intended purpose
the point of the current concentration of the sitting person is to be estimated
monitor.
The most important assumption of the work set by the author is
development of a universal application that does not require any »special
equipment gurus for action. The ideal solution should be composed of
from a personal computer and a single universal webcam
based utility.
There are many commercial systems on the market that allow ±
vision. The biggest disadvantage of existing solutions is the necessity
owning special equipment, starting with dedicated devices
head-mounted for infrared cameras. Šatwo
see the advantages of a system that would not require specialized equipment.
This would make it easy to popularize the solution. However,
no web cam makes eye tracking accuracy
much smaller compared to commercial systems.
Another important issue is the requirement that the system be in working order
weighing in real time, which allows you to keep track of every point
concentration. This involves the selection of algorithms for them
too much computational complexity. Great emphasis has been placed on
resistance of image processing algorithms to changing conditions of
smoldering.
Creating a system that would be able to meet the expectations of
magania is not a banal thing. ‘it is even a fact that it does not
there is still no commercial solution to this problem.
6
Chapter 2
Existing solutions
There are many methods for recording visual activity
of man, starting from the usual direct observation through invasion
z o.o. mechanical methods, and finally testing the potential difference
Electric leads between the two sides of the eyeball.
A huge number of tests have been done in the field of eye tracking.
Because of this, there are many different solutions that are significantly different from
each other. The choice of the optimal method depends on the purpose for which
important research. The intended use of the system determines the required
support, resolution, measurement frequency, ease and convenience of use,
and also the price.
Eye tracking systems can be divided into several groups: by reason
on the position of the device relative to the head (mobile and non-mobile), type
obtained data, methods of determining the point of the book (or itself
eye movement).
The following is an overview of the measurement methods used
eye position.
7
Figure 2.3: 3D Video-Oculography system [26]
Three-dimensional direction of vision
The third type of eye tracking is the so-called 3D oculogra. Except
data on the degree of horizontal and vertical eye deviation are provided
information on the degree of rotation (whorl) of the eyeball around its own axis.
Some devices even allow you to monitor movements and
Head positions, which can be later combined with eye rotation data. Pink-
this type of bond is provided by the German company SensoMotoric Instru-
ments GmbH. This is the 3D Video-Oculography ® head-up device
(3D VOG) fig. 2.3.
The data obtained are most often used for research into
binocular vision research and research
vestibular row.
10
2.2 Methods of measuring eye position
2.2.1 Electro-oculograure (EEA)
The technique is to measure the difference in the potential of the skin, using
eyes attached. Eye position can be detected from a large
thanks to registering very small differences in the skin’s potential.
However, this technique is quite difficult and is not suitable for everyday use
use because it requires close contact of the electrodes with the skin of the user
no one .. This method was the most used solution 40 years ago,
but it still applies [3].
The method is based on measuring the difference in bioelectric potentials
muscles located in the ocular region. Based on the amplitude of the
gneus calculates the distance between rapid changes in eye position
(also known as saccades). It is possible thanks to this and potential
the front of the eyeball differs from the potential of its posterior part. systematic
themes based on this method are very susceptible to evoked noise
the activity of other facial muscles. Elektrookulogra ¦ is used
most often in medicine.
Changes occurring in the charge of electric fields caused by movement
eye allows you to monitor its position.
2.2.2 Technician using contact lenses
Thanks to the use of special contact lenses it is possible
accurate assessment of eye movement. These lenses contain small ones
induction coils. The exact positioning of the lenses is determined by
recording electromagnetic field changes caused by motion
eye. One of the main problems is the compensation of head movements.
Head-mounted devices must be used. The disadvantages are
traffic restrictions and cumbersome devices, making use of this
the method is limited to laboratory experiments.
11
2.2.3 Optical techniques based on video recording
2.2.3.1 Using infrared light
Infrared lighting of the optical axis of the eye makes it much easier to locate
iris lysis [4]. ™ renica reflects almost all light, which makes it
the camera registers it as a white circle, this phenomenon is analogous to that
red eye. ™ source of infrared light
beyond the optical axis of the eye, the camera records the pupils as dark
area fig. 2.5
The method consists in determining the position between the middle of the pupil and
a reflection of the infrared light source from the cornea. Light exesses
puffed up from various eyeball structures are called Purkinje images
(Purkinje imaging) Fig. 2.4.
Eye trackers of this kind determine the position of the eye by registering the position
reflection of light from the surface of the cornea (the so-called first Purkin image
its P1 also called glint) relative to the center of the pupil. In order to
increase of measurement accuracy it is possible to increase the number of registers
up to four points (glints) created by the device.
There are also devices with greater measurement precision called
dual Purkinje eye trackers
using light reflection from the surface of the cornea (i.e. the first one)
Purkinje image, P1) relative to the posterior surface of the lens
(called the fourth Purkinje picture, P4). Such systems are characterized by
considerable accuracy, but that their biggest disadvantage is that the fourth
Purkinje’s picture is very weak, as a result of which the examined person does not
he can move his head during the experiment. This effect
usually achieved through the use of a chin rest or special
a bite-bar.
12
s
Figure 2.4: Four types of Purkinje images
Figure 2.5: Eye image recorded using the device
SMI
13
2.2.3.2 Without the use of infrared light
The image is recorded using a standard camera. Operation of this
group of systems consists in measuring the shape and position of the eye cornea
relative to the selected part of the face (e.g. eye corners). It happens very often
that part of the cornea is obscured by the eyelid, which may cause
weight distortion of the vertical position of the eye. Rozwi¡zaniem
is tracking the pupil instead of the cornea, which causes the area to be
the eyelid cover decreases. However, the contrast between the pupil
and the remainder of the eye is much smaller than that of the cornea.
A different approach is to use a neural network. collection
learning is the sequence of images representing the eye and point on
the monitor the person is currently looking at. The process of
The calibration of such systems is very time consuming. Big is required
the number of input images for proper teaching of the neural network
[21].
Systems based on the use of a standard video camera installed
they usually give the user that the head is completely still during
research.
Implementation is required to allow free head movement
methods that precisely determine the current position of the head (shifts
and shortcuts).
2.3 Review of existing solutions
There is a whole range of commercial eye tracking systems on the market.
The leading products are Tobii [25] and SMI [26]. S¡ are systems
remote, ie no “devices” need to be mounted on the head
u »user. The operation is based on the use of a recording camera
images in the infrared band and sources of infrared light. -Ak-
The actual point the user looks at is determined by testing
14
change in position between the center of the pupil and the reflection reflected from the background
of the eye cap caused by an infrared light emitter.
However, there is no commercial system that it would choose
using a standard camera. There is, however, a lot of it
research centers in which intensive research is conducted in
to create a prototype of such a system. An example would be or
COGAIN [27], which, among other things, works on creation
cheap eye-tracking system using a standard camera
mers.
Currently the only research project available
sight using a standard camera is opengazer [22]. This is a
open source gram. His main disadvantage is the assumption that the head used
The profile is completely stationary while the program is running. Even
low traffic makes recalibration of the system necessary.
15
Chapter 3
Theoretical basics
3.1 Visual system
To build an eye tracking system it is necessary to know the basics
major issues related to the human visual system. multi
This tool is necessary to understand the current determination method
direction of vision, as well as to ‘identify existing restrictions’,
that must be considered when creating the linking system
sight.
3.1.1 Description of the visual system
The human eye is made of a lens with variable and adjustable fire
fastener, iris (diaphragm) regulating the diameter of the hole (pupil) through
light, and photosensitive retina into the eye. photo-
receptors on the retina change light into potential
electrical, which causes a nerve impulse transported to the cortex
cerebral through the optic nerve.
The retina consisted of two types of photoreceptors: suppositories
and stamens. The stamens are very sensitive to the intensity of light,
this makes it possible to see in very ambient conditions
low intensity of light. However, these receptors are not able to register
16
Figure 3.1: Eye structure
the color, which means that everything seen at night has a shade of gray.
Unlike stamens, suppositories need a much lighter one
Lights for the production of nerve impulses, but they are able to resolve
get to know the colors. The stamens have three types of filters sensitive to different ones
light lengths: red, green and blue.
Due to the high contrast between the sclera, the fever can be –
visual assessment is necessary. One of the reasons for this
The construction of the eye is the fact that man is a social being, and the definitions of
sight market is very helpful during communication.
The yellow spot is responsible for the picture sharpness. This is the largest
a large cluster of suppositories with a diameter of about two minimimeters
in the center of the retina. The rest of the retina is between 15 and 50
percentage of photoreceptors. This makes the width of the image seen
about one degree. Focusing on an object is not possible
smaller than the yellow spot, which makes it impossible to determine
17
direction of view with more accuracy than “one degree”.
3.1.2 Eye movement
Eye movements serve two primary purposes. First, stabilization
image on the retina to compensate for head or movement movements
objects in the field of view. Secondly, positioning the eye relative to
connecting so that the fragment of the image analyzed at a given time is projected
on the medial retina with the highest sensitivity and processing capacity
detailing. Due to the diversity of retinal functions,
Face image details (e.g. letter shapes when reading) can be
it is only possible to cover a small area of around one angle of a century
near the retina center.
During normal human activity, most of the time the eyes
remain in a state of bookkeeping, i.e. in a state of relative rest. Under-
the time of the book is taking visual information from the surroundings. Time
The duration of the book depends on how the information is processed. On
the total ranges from 0.15 sec to 1.5 sec ‘average 4-6 times
for a second sacada is performed, which is a very fast, jumping movement
changes of position between subsequent books. Sakada lasts forever
approx. 0.03 to 0.06 seconds
3.2 Optical flow
Optical flow is a vector field that allows
transforming a given image in a sequence into another image of that sequence,
by moving areas from the first image (for which
this field is defined), according to the corresponding vectors of it
fields for the second image. In short, optical flow is a set
translation (in the form of a field) that transform the image in sequence
in the next image in the sequence [3].
18
There are many methods for determining optical flow. You can eat them
divide into three main groups:
Gradient-based methods for the analysis of bazuj¡ce derivatives (spatial
time and intensity) of the image intensity,
Methods in the area of ± cz|stotliwo based on information ltrowaniu
pictorial in the field of frequency,
Correlation methods
The Lucas-Kanade algorithm The work uses the Lucas- algorithm
Kalman. This is a gradient method.
The operation of the algorithm is based on three basic assumptions:
1. The brightness of the image does not change much between frames
sequence.
2. The speed of motion of the objects in the image is low.
3. Points that are a short distance away move
similarly.
The brightness of the image is determined by the function depending on the time.
f (x, y, t) ≡I (x (t), y (t), t)
The requirement that the brightness of the image should not change over time
presents the equation:
I (x (t), y (y), t) = I (x (t + dt), y (t + dt), t + dt)
This means that the intensity of the tracked pixel does not change over time.
August:
19
∂f (x, y)
∂t
= 0
Using this assumption, you can save the optical flow condition
denoting velocity vector in x direction by u, and denoting by
v velocity vector in y direction
–
∂I
∂t
= –
∂I
∂x
u +
∂I
∂y
v
Unfortunately, this is an equation with two unknowns. To solve them
additional information is needed. Right now at the moment
use is the third assumption, points are in a small
the distances move similarly. Thanks to this, it solves
For the equation for one point, additional pixels are used
otaczaj¡ce. To solve the equality, the least method is used
squares.
The assumptions of this method make it detectable only
very small movement between successive frames. To improve its performance
The image pyramid does not apply. This involves the implementation of the algorithm
successively in the image with reduced resolution.
twenty
Chapter 4
Description of your own solution
The visual tracking system presented is based on the recorder
eye image using a standard webcam. Scheme
the system is shown in Fig. 4.1. For further simplification
consideration »it is assumed that the camera is placed on the monitor on his
± middle. Most commercial systems require the camera to be
located below the monitor, this gives a slightly better picture of the eye. But
the purpose of this work was to create a universal solution. standard
webcam is located above the monitor, special
an example would be laptops with a built-in camera where it does not exist
it is possible to change its location.
The purpose of the work is to determine the point on the monitor for which given
the moment the user looks. To obtain such data it is necessary
determining the relative position of the eye and head.
The first stage is the initialization of the algorithm during which
the head model is created. The initialization phase follows the tracking phase.
Optical flow is used to determine the 3D position of the head
between successive frames obtained from the camera and the PO- algorithm
SIT [13]. The change in the direction of vision is determined by examining the differences in
distances between the center of the eyeball and the center of the pupil of the eye. These data
are obtained using a whole range of methods to process
21
Figure 4.1: System diagram
times described later in this work. To get presented
results have been tested a whole range of algorithms, some of which
was presented later in the work.
Then after determining the relative position of the eyes and head
creating a geometric model, the mapping of the eyesight on the plane follows
from the monitor. To determine the necessary mapping coefficients
the calibration process at the beginning of the program session is necessary. Ka-
system calibration is based on following the display point
in specific locations on the monitor.
22
Chapter 5
‘tracking head position
The large number of available systems for tracking eyesight assumes
combined head movement. This is not a convenient or practical solution.
Man is unable to keep his head completely still
a long period of time without using a cradle on which to rest
chin or forehead. Such devices are used in ophthalmology during
eyesight tests. Involuntary head movement is heavily influenced by breathing
no. That it would be possible to determine the direction in which it is directed
sight of the person sitting in front of the computer without restrictive imposition
main movement restrictions, very precise tracking is necessary
3D face position.
Head positioning is a very important area of research over
human-computer interaction (HCI). There are many methods for estimating
positions with a single camera. These methods can be divided
two main groups: based on the head model and charac- teristics
characteristic faces. The methods using the model estimate positions
designating the 2-3D relationship between features. By means of these dependencies
position is determined.
Methods based on facial properties assume that there is certain
the relationship between the 3D position and certain facial image properties. You-
they also mean dependencies when using a large number of training images
23
with a known face position for training the neural network. The disadvantage of this
approach is a malfunction for new faces whose image
was not used during training.
The paper will present two different approaches. First
inspired by [10] uses the AAM algorithm [28]. This is one thing
of the most popular solutions to track head position in recent years
tach. The second one is a method using a sine wave
del head and POSIT algorithm [13] for determining the current position.
In [30], a combination of both methods was presented.
5.1 Active Appearance Model
The Active Appearance Model (AAM) [28] is a detection method
and ± tracing any objects whose model is created during
learning phases based on a properly prepared set of images
± qualitative input. AAM has a very wide range of applications starting
from the segmentation of medical images, facial recognition as well as
head tracking. The algorithm can be modeled as evidence
an object whose shape and texture are known. AAM is a type method
data-driven, which means that there is no need for manual selection of
rameters that determine the action. The algorithm automatically tunes
to data during the initialization process. His main disadvantage is that
the success of the action depends very much on proper selection
a set of images with an imprinted shape of the object used during
model creation phases. The collection of learning data can consist of hundreds
images, and putting the shape of an object correctly can be a task
pracochªonnym. Of course, it is possible to automate the nano-
shape on the learning data set, but this is an issue
outside the framework of this work. The method is used to build the model
main components (PCA), and it is assumed that the sought
The modes of the model have a Gusian distribution, which in extreme cases can
24
make the shape of the model subject to unreal deformation. -Ak-
The normal appearance model consists of a static shape model and
texture model
5.1.1 Static shape model
The active shape model is a structure containing information about red-
the shape of an object of a given type (e.g. face) and data describing the most
more characteristic modifications of this shape, observed in
teaching set. The model’s shape can be fashioned by algo-
rhythms that try to match it to the real shape, no
while allowing unnatural deformations. Creating a
delu begins with building the object template, i.e. the set of
points representing a given shape. This is called O-
del points distribution (Point Distribution Model PDM). Cha- points
characteristic ones must then be marked in adequate places
photos on all N learning images. In this way we get
set of learning shapes, written as containing vectors
coordinates of characteristic points. Size and data
object positions are removed from vectors by a special procedure
standardization, so that only information about the shape remains. automatic
positioning of characteristic points on static images
new very complex problem, so the safest method to use
scanning of a set of examples of shapes is their manual selection,
undoubtedly a tedious and time-consuming occupation.
The shape is defined as a set of 2D points forming a grid
fifth on the tracked object. Landmarks (landmarks) can
be placed on the image automatically or manually by the user
speaker. The mathematical record of the shape s is expressed by the 2n vector
dimensional
25
s = [x 1 , x 2, …, x n , y 1 , y 2, …, y n ,]
The training data set is given normalization, and then
its static parameters are determined using the principal components analysis
data (PCA). To analyze the main components it is necessary to calculate
not values and eigenvectors of the C covariance matrix , where by
N is the number of images in the training set
C =
1
N – 1
SS
T
S = [s 1 – s 0 , s 1 – s 0 , … s N – s 0 ]
PCA is used as a method to reduce the size of the set
data. Only eigenvectors corresponding to the largest are used
own value. The number of t vectors used depends on the different
input data set. This allows you to approximate the instance
shape s as a linear combination of eigenvectors of the kov
variance.
s ≈ s 0 + Φ s b s
Where b s is a parameter vector described as follows
b s = Φ T
s (s – s 0 )
a Φ T
s is a matrix containing t eigenvectors
Using the shape model created using PCA is possible
generation of new objects similar to the training set.
5.1.2 Static texture model
The texture model is created based on a set of learning images from
2D face mesh. The grid is determined using the method
26
Delone triangulation [29] on a set of characteristic points described
in the previous chapter.
The g texture for each input image is defined as
the intensity of pixels inside the grid spread over points of
tangents.
g = [g 1 , g 2 , …, g m ] T
The texture model describes the differences in appearance between the input images
exchange. To create such a model it is necessary at all
images of pupils overlapped with relevant parts of the face. You can
achieved by applying deformations to the shape of individual faces
gªówny p 0 determined in the previous step. Transformation of the ob
learning times for a common shape is determined using a map
A n ditional relations of the individual triangles of the face mesh.
The transformation of the texture of the educational image into a reference form is
performed in the following way:
1. For each pixel of the face a 2D grid triangle is searched, in
where it is located
2. A nical transformation of the designated triangle is carried out
so that it best matches the main shape
3. The transformed triangle is applied to the output image.
After initial normalization, the input file is processed
the principal component method (PCA) in an analogous manner as it was
shaped model of the shape. The columns of the G matrix are normalized vectors
important textures of student images g. The covariance matrix is calculated
The new texture is generated as a linear combination of vectors
own covariance matrices.
g = g 0 + Φ g b g
Where b g is a parameter vector.
A common approach to determining the 3D head position is use
AAM (Active Appearance Model) algorithm. The main idea is based
on creating a 2D model of the appearance of the face by training it prepared
images with a 2D mesh contouring the face, eyes and nose
and paragraph. Then the model prepared in this way is adapted to new ones
images designing the characteristics specified during model learning. Badaj¡c
2D mesh distortion is possible to approximate the 3D position.
An example would be work [14].
The method has been tested using a ready implementation of al-
of bitterness contained in the open source library: The AAM-API [11].
The advantages are high efficiency and accuracy, but only in the case of
when the model will be prepared for the person. When the model is
created for a large base of people there is an incorrect
matching searched features. It is necessary to manually put the mesh on
the user’s face makes such a program would require complicated
con fi guration. This was the reason why it was abandoned
± approach cutting.
5.2. Determination of the head position using the model
3D
The head position is determined by six degrees of freedom: three angles
rot (Fig. 5.1) and three values of the M shift (x, y, z) . Head rotation
can be characterized by three Eurel angles: around the z axis (roll, θ ),
then around the y axis (yaw, β ) and finally around the x axis (pitch, α ).
28
Figure 5.1: Head model with rotation axes
29
The rotation matrix R is determined on the basis of knowledge of three k¡-
of Euler.
R z (θ) =
⎡
⎣
cosθ −sinθ 0
sinθ cosθ 0
0
0
1
⎤
⎦
R y (β) =
⎡
⎣
cosβ 0 sinβ
0
1
0
sinβ 0 cosβ
⎤
⎦
R x (α) =
⎡
⎣
1
0
0
0 cosα −sinα
0 sinα cosα
⎤
⎦
The algorithm for determining the 3D position of an object is based on a simplified one
camera model called camera obscura. The idea of action is based on
simulation of model parameters based on the estimation of object features projection
best suited to the location of these features in the image. By using
combined camera model point projection
– →
and the 3D model on the image plan
– →
b assuming the absence of distortions caused by imperfections
the lens lens can be described as follows:
– →
b = T− → a
u x = f
b x
b z
u y = f
b y
b y
where T is the transformation matrix in a homogeneous system
wspóªrz|dnych. Matrix T is a combination of the following geo operations
metric: rotation around the coordinate axis of the θ , β and angle α
and M vector translation
thirty
T = M (x, y, z) R z (θ) R y (β) R x (α)
The f- factor represents the focal length of the lens. Wspóª-
image rows in pixels
– →
q are calculated as follows
q x =
u x
s x
+ c x
q y =
u y
s y
+ c y
The coefficient s determines the net value of the distance between two measures
adjacent pixels on the camera matrix, and c means the displacement of
between the optical axis and the center of the matrix. For simplicity, the calculation can be done
assume that a value
– →
c is zero. This corresponds to the situation when
the troy is placed perfectly on the optical axis of the lens. Value f
can be determined experimentally during the camera calibration process.
For further calculations, a knowledge of f was assumed . Establishing a constant value of f
does not have a significant impact on the further operation of the algorithm, because it continues to
It is sufficient to determine the relative changes in
head. After applying these simplifications, the current head position
– →
p
it is described by six variables
– →
p = {x, y, z, α, β, θ}
In the general case, the projection of 3D object points onto a 2D plan is
non-linear operation, but assuming small changes between the known
the position determined in the previous frame and the current position. Opera-
this term can be well approximated by a linear function. With this assumption
parameters describing positions can be determined iteratively.
– →
p i + 1 =
– →
p i –
– →
d
31
Iteration correlation vector in each step
– →
d is calculated at
power to minimize the vector
– →
e error being the sum of the distance between
projection of model points and the position of appropriate features in the image. Hundred-
using Newton’s method, where the Jacobian matrix is denoted by J ,
– →
d
determines the following equation
J
– →
d =
– →
e
This equation can be solved by using pseudo inversions
– →
d = (J
t
J)
-1
J
t – →
e
Levenberg-Marquardt algorithm
The presented method of determining the position of an object means that all
During optimization, the parameters are treated with the same weight.
For example, a 0.5 radian rotation around the z axis can have a much higher
impact is more than the projection change caused by a 50 mm shift. co-
it is necessary to normalize the coefficients taking into account the
dard ka »each row of the matrix J . To this end, a small
shadow W, whose elements lying on the diagonal are inversely proportional
to standard deviation σ .
In ii =
1
σ p i
To improve the convergence of the method, the λ parameter is added in the target
weight control stabilization.
– →
d = (J
t
J + λW
t
IN)
-1
J
t – →
e
Determination of Jacobian J is a complex operation. Spra-
this means that this method is not the best solution to use
32
for real-time operations. This was the reason for giving up
this approach and the POSIT algorithm.
5.2.1 POSIT algorithm
Algorithm POSIT (Pose from Orthography and Scaling with Iteration)
[13] is used to estimate the position in three dimensions of a known object.
It was presented in 1992 as a method of determining position
(position determined by the translation vector T and orientation matrix R)
3D object with known dimensions. Necessary to determine the position
is to define at least four points (not lying on one
plane) on the surface of the object. The algorithm consists of two parts
± ci: initial position estimation and iterative improvement of the result. Pierw-
the rest of the algorithm (POS) assumes that the designated points of the object are
are located at the same distance from the camera and the object size difference
due to the change in distance to the camera is negligible. Suppose »components,
»That the points are in the same distance means» that the object is located
far enough from the camera and you can ignore the depth differences (assumption
poor perspective). Thanks to this approach, knowing the parameters of
mery, you can pre-determine the position of the object using scaling
perspective. Such calculations are usually not sufficient
accurate, so interactive result enhancement is used. At
using the calculated position in the previous iteration, points are given
to the 3D object. The result is used as the starting point of the first one
algorithm stage.
This method makes it possible to determine the 3D position of an object by means of video
dock from a single camera. To act it is necessary to know the act
any other mapped 2D position with at least 4 points not lying on one
the plane and their 3D coordinates in the object model. Algorithm
does not include perspective. However, this has no effect on the
If the subject is sufficiently distant from the camera, because then the influence
33
perspective on the object’s appearance is negligible.
The described algorithm is a very good choice when
the application should work in real time, because »its evil»
computational load is small. Its basic disadvantage is necessity
determining a reference point whose 3D coordinates must be
zero. The reference point projection is incorrectly set
very large impact on disrupting the correct assessment of your current position
object. An additional problem may be the situation when the projection
the reference point will not be visible. This situation is possible
with a large head turn, for example, when the nose covers the features of
on one of its sides. The solution to this problem is
determining the reference location of the feature using a set of points from
her neighborhood.
The algorithm designing the 3D head position consists of two main ones
steps
1. Initialization of the algorithm. Face and eye detection is performed.
In order to proceed to further analysis it is necessary to extract
chipping of the face area from the surroundings. Then it is appointed
a set of features that will be tracked in the next stage. Based
determined features and the known position of the face a model is created
3D heads.
2. ‘tracking 3D head position. Repetitive iterative stage after initiation
algorithm cation. Based on the detected change in position of the punk
of those specified in the previous stage, the current
head position in 3D space described by the translation vector T and
rotation matrix R.
The individual phases of the algorithm are described in detail below.
34
5.2.2 Face and eye detection.
The method of detecting the position of the face in the image was used
proposed by Poul Viole and Michael Jones [12]. It is
one of the most popular solutions used to detect
her face. It is characterized by a high speed of action and very
good effectiveness. The method is used to locate objects in the image
using the previously created kator class. Kator class is training
valid using images containing the searched object and images,
on which the object does not occur. When using so prepared
The image set creates a cascading kator class, which is then
used to locate the object in unknown images.
The algorithm implementation in the Open library was used in the work
Computer Vision Library (OpenCV) [20]. This is an open project
urce created by rm¦ Intel. This project includes ready classes katory
for detecting faces and eyes.
The course of the face and eye detection algorithm
1. Search the image using the facial detection class.
The result is the location of all found faces. Added-
our analysis takes into account the face of the largest size, i.e.
the person closest to the camera.
2. Using the previously determined face location is replaced
another eye catcher used for eye detection. In order to
At CPU load, only the upper half is searched
face. It also reduces the risk of incorrect eye detection,
e.g. at the mouth or nose.
3. The size and position of the face and eyes are checked.
The size of both eyes and their horizontal position is required
35
Figure 5.2: Face and eye detection
be close to each other. If these conditions are not met the whole process
detection is performed from the beginning on a new frame.
The result of the algorithm operation is shown in Fig. 5.2.
Combination of the kator class for face and eye detection gives
much better results than using a single face class
WHO. Based on the experience, «it can be»
That by using only the facial detection class you can get the wrong ones
results, e.g. an area of the image that does not have a face, can
was incorrectly classified. This situation usually has
sce if the lighting conditions are different from what they were
on images used to train the kator class. Especially often
Incorrect results were obtained with lateral lighting of the face.
Combination of the two classes of sailors and mutual checking
the location of the eyes and their size gives much greater confidence
36
Algorithm requirements. The price you have to pay for it is that sometimes,
although there is a face in the input image, it will not be
naleziona. However, this is not a big problem because in the program
video sequences are used and several initial ones can be omitted
frames. The face detection algorithm is repeated interactive to mo
to detect faces and eyes and to meet their mutual conditions
poªo “pose.
5.2.3. Determination of the characteristics to be traced
Commercial facial and head movement detection system using marke
a trench made of reflective material that glued to the face.
This solution is very accurate and resistant to interference. requires
but expensive equipment. The assumption of the work was to create a system
requiring complicated hardware configuration. Presented
the solution is based only on a standard web camera without
use of any additional markers to improve performance
algorithms.
A discrete head was used to track changes in head position.
Lucas-Kalman optical flow described in the theoretical part. For action
It requires determining the set of features whose position change will be
Tracked in subsequent frames of the video sequence. Features must be yes
selected so that it would be possible to clearly determine the change in their mid
»In subsequent frames. The Lucas-Kalman algorithm works well,
if the tracked points are located in places where sharp
w|dzi. The selection of the right set of features is a very important issue.
It has a major impact on the further tracking of changes in position
face. In the literature you can find many different approaches to determining
good tracking points. One of the basic criteria
choosing the right method was a complex complex. By assumption
the program should work in real time, so it couldn’t be
37
afford to choose a complicated method that would be too heavy
»A processor.
One of the most commonly used edge definitions has been introduced
caused by Harris [12]. This definition is based on derivative matrices
second degree (hessian) image intensity. Hessian at point p (x, y)
is determined as follows:
H (p) =
[
∂ 2 I
∂x 2
∂ 2 I
∂x∂y
∂ 2 I
∂x∂y
∂ 2 I
2y 2
]
The Hessen M autocorrelation matrix is determined by summation
values of the second sunny adjacent to a given point:
M (x, y) =
⎡
⎢
⎢
⎣
n
Σ
-K≤i, j≤K
And 2
x (x + i, y + j)
n
Σ
-K≤i, j≤K
I x (x + i, y + j) I y (x + i, y + j)
n
Σ
-K≤i, j≤K
I x (x + i, y + j) I y (x + i, y + j)
n
Σ
-K≤i, j≤K
And 2
y (x + i, y + j)
⎤
⎥
⎥
⎦
The edges are in places where the autocorrelation matrix
hessian has two large intrinsic values. This means that the texture in
surroundings of a given point significantly change in two independent ones
directions. By using only the own values determined
the edges do not change when rotating the image. Searching for maxima
local autocorrelation of the input image can easily be used
scan the points that can be tracked using the flow
Optical.
It is very important that the designated features are arranged equally
on the surface of the tracked object. Achieved this can be limited to
destroying the minimum distance between neighboring points. Such a
It is necessary because in the case of autocorrelation of hessian
the object’s textures will have a large maximum in one
location, the result will be the location of most features around this maxi-
mum. In this case, the algorithm that determines the 3D position of the object
38
may give incorrect results.
The approach presented is based on the determination of natural features
faces that are easy to track by means of optical flow.
The algorithm searches for local maxima, which makes it adaptive to
lying away from the appearance of a person’s face searches for optimal points to
± tracking. This solution is much more universal and gives better
results are greater than relying on strictly defined facial features, such as
like corners of the lips, eye corners, etc. Determining fixed features is not always the case
can be “possible. In a situation where we assume that we use, e.g. the corners of the eyes,
and the algorithm of searching for these features will not predict the fact that a person can
wear glasses, this can lead to a worse result, or even to
invalid results.
Algorithm to determine the characteristics to be traced
1. Specifying the search area. Examination of the occurrence of features is
limited to the area of the face defined during the previous one
stage. The region in which located
permanent eyes because the movements of the eyeball band disturb the correct
determination of head position.
2. Determination of features. Algo-
rhythm of Harris [12].
3. Skip points too close to each other. In order to increase
performance gains, features located too close are eliminated
each other.
5.2.4 Initialization of the head 3d model
The head is modeled using a sinusoidal grid shown in
figure 5.4. This is a rough estimate without a detailed one
39
(a) an input image with the face marked
and eyes
(b) edge intensity (Hesian autocorrelation
image intensity)
(c) designated features
Figure 5.3: Result of the algorithm of searching for serving features
for determining the head position change
40
Figure 5.4: Sinusoidal head model
the shape of the face, but its action is quite satisfactory and not
requires a complicated approach.
This simplification of the model has a number of advantages:
Šatwo ± ¢ model designation
Automatic initialization. There is no need for prior appointment
preparation of the model grid for a given user.
Versatility ± ¢ zwi¡zana the lack of necessity of ± uwzgl|dnienia in-
individual face profile
Fast ± ¢ dziaªania wynikaj¡ca from simple.
Resistance ± ¢ on zakªócenia
41
Figure 5.5: Head with superimposed 3D model grid
Mapping of facial features (2D) determined in the previous step on
the 3D mesh of the model is possible due to the assumption that during initialization
the user’s head is facing straight towards the monitor. Behind-
Such assumptions can be made because the model will serve to
assessment of the change in the head position in relation to the position determined during
initialization. An example of matching the face mesh is shown
in figure 5.5
The use of a simplified face model enables easy initialization
system s. Simple models such as cylinder, ellipsoid or sine wave s¡
42
commonly used to track head movement because of initialization
such models require the selection of significantly fewer parameters than »
in the case of a real face model for an individual. On
based on the experiments, it can be stated that the initialization
The model is relatively resistant to imprecise parameter selection
initial, such as the exact size of the head, or its current position
tion. In the case of a detailed model, precision is very important
initialization because a small change in the start parameter has a large amount
affecting further tracking of the object. Having even precise
a person’s vital mesh obtained, e.g. using a laser scanner
if during the initialization of the algorithm that tracks the position change
the starting position, the exact face model are not selected correctly
It becomes useless. It is necessary to determine during initialization
the current position of the model so that the user’s face mesh overlaps
with the image obtained from the camera. This process can be done
automatically assuming that the camera position and person are known
looks towards the camera during initialization. Head positions can be used
then determine by face detection using the methods described
above “above. The sinusoidal grid is not a precise estimation of the real one
head model, but using a simplified model can be assumed,
»That the accuracy of the selection of start parameters will be sufficient for
proper examination of changes in the head position. When used
exact face model, the initialization process had to be carried out
manually. It would be a significant impediment and make the sys-
thus it would become less practical. System initialization required
additional knowledge from the user.
In [24] a comparison of the model’s application is presented
cylindrical and accurate model obtained with the help of a scanner
Cyberware laser, in the algorithm of tracking head position changes. FROM
the results of this work clearly show that the simplified model is completely
sufficient, and its main advantage is the possibility of automatic
43
initialization.
The sinusoidal model was used in the presented solution.
It is slightly more complicated than the cylindrical model. His
the main advantage is a better reflection of the shape of the nose, which makes
the head position in the case of the large shortening is determined with greater
dokªadno ± ci¡.
Initialization of the algorithm for tracking position changes based on
using the 3D model of the object consists in determining the relationship between
3D points lying on the surface of the object’s mesh, and projection of these
points on the starting frame of the video sequence. Marking points
tracking using the Harris algorithm has been described above.
Having a set of such points, it is necessary to mark their location on
model surface. In general, this is not the task of
stimulate, because if the model is complicated, you cannot
in an analytical way clearly define the mapping from space
2D to 3D. One of the simpler solutions is to use the library
OpenGL. Each triangle of the object’s mesh is assigned a different angle.
lor. Then such an object is rendered using the start page
item position. By reading the color covering the given point it determines
there is a relationship between a point in the image and its counterpart in
delu. This is a general solution and is suitable for a wide class of models.
When using a simplified sinusoidal model, which
can be described with algebraic equations, there is no need for
such complicated initialization.
Projection of the point of the 3D model on the projection plane
– →
q, assuming
»The simplified camera model described in section 5 shows
equations:
u x = f
b x
b z
44
u y = f
b y
b y
q x =
u x
s x
+ c x
q y =
u y
s y
+ c y
In the above equations there are parameters defining the
Camera, but without loss of generality it can be assumed that these parameters
try are fixed and fixed in advance. Such assumptions are possible because
it is not important for the operation of the system to determine an absolute position
gªowy. It is only necessary to determine the change in position between
frames of video sequences. This is due to the fact that during calibration
the appropriate coefficients will be taken into account. At the stage of
It is not necessary to determine the exact camera model.
There are many methods [23] for determining parameters
cameras, but this requires an additional calibration process using
specially prepared images. One of the most popular
A different method is to use a chessboard with known field sizes. Picture of this one
the chessboard is recorded in several different positions. On the basis of
The camera model is calculated from this data. Thanks to this approach,
the absolute 3D position of the tracked object can be determined. But
the calibration process must be carried out individually for each
mers.
A knowledge of the starting head position is assumed. Assuming that
the user during the initialization looks straight into the camera can be set
let that the rotation matrix R is a unit matrix . Vector instead
T translation is determined on the basis of a known head position determined
automatically during the face and eye detection process. Wspóªrz|dne
x and y of the vector T are obtained directly from the current measure
45
while wspóªrz|dna face with a set ± Ian badaj¡c rock face.
A larger face size means that the user is closer to the camera, on
on this basis the value from the T vector is determined .
5.2.5 Use of reference frames
The method presented above uses information about the change of
»Selected facial features only between successive frames of the sequence
video. This approach gives satisfactory results when
the number of frames processed will not be too large. In the case of debt »-
The accumulation of position error has become noticeable
features between successive frames. The solution to this problem was to
applying the method presented in [19]. This approach is based on
on the use of a set of reference images. By reference image
the image frame is determined with the known current head position and set
rem facial features. When the head position approached the registered position
in the reference frame, the accumulation error is leveled. Spra-
this implies that the operation of the head position tracking algorithm does not deteriorate
over time, which was the case when using
Together, the information relates to changing the position of facial features between successive ones
frames.
During system initialization, a startup reference frame is created
and then, when the need arises, auto
New frames automatically. When tracking head position when current
the position significantly differs from the set of registered reference frames
A new frame is created with the position specified in the previous one
algorithm iteration. This approach makes the algorithm work in a way
automatic. It is not necessary to register a set of reference frames
during the initialization process. Use of one starting frame
and the algorithm of automatically adding next frames give
proving results and is a universal solution.
46
5.2.6 Elimination of distorted data
To determine the 3D position of the head, knowledge of the
the actual position of the facial features determined during
initialization process. The positioning algorithm is based on
the least squares. This entails the necessity of
hiding features with incorrectly marked positions, because the large
bending the position of even a single feature can have a very large impact
on the whole method. Changing positions and points is determined
by optical flow. This is a precise solution, however
disturbances caused by a sudden change in lighting or a partial
the salinity of the face can cause the newly calculated location of the features
face will not be correct. Optical flow determines the change in
»Between frames. Therefore, a single fault has an effect
for further course of feature location. The result was a need to
implementing the method of interference detection and elimination. Score
operation of the feature detection algorithm with incorrectly determined position
is shown in Figure 5.6. Green points indicate the features of which
the location has been marked correctly, while in red it was marked
connected points that have been classified through the elimination algorithm
malfunction “as incorrect data.
The operation of the algorithm that detects an incorrect result
optical flow
1. Calculation of the average value of the shift vector of all differences
weighted features between successive frames.
2. Determination of the standard deviation from the mean value of
suni|cia.
3. Elimination of features whose value exceeds three times the mean
the value of the offset deviation. Features eliminated do not take
participation in the current iteration when determining the head position.
47
4. Improving the position of eliminated features by means of point projection
of objects corresponding to given features of the 3D head model. Dzi|ki
then in the next iteration these points can be reused.
The application of this approach makes the position determination algorithm
head is resistant to disturbances such as changing facial expressions,
covering the face with any object, or a sudden change in lighting
scene.
48
Figure 5.6: Elimination of malfunction of the optical flow
Nego
49
Chapter 6
Determining the direction of vision
Detection of the human eye is a difficult task due to low levels of
the trauma between the eye and the surrounding skin. As a consequence, many
non-existent systems use a special camera mounted in a small one
distance from the eye to get a high resolution image.
The most popular method of determining the direction of vision is examination
the difference in distance between the pupil center and the luminous ex
called by an infrared light source. Current location of these
two points determine the sight vector. Mapping between vector
sight, and the current point of sight book is determined during
calibration. This is the simplest and commonly used method in
commercial systems. It does not require complicated algorithms
prevailing images. The use of an external light source makes
operation is not dependent on lighting conditions.
It is assumed that the system presented in this work should not require
the use of any additional equipment besides a standard camera
internetow¡. Therefore, it was necessary to determine the direction of vision
application of a method based solely on the visible spectrum of knowledge
tªa. The current direction in which the user is looking can be determined
± delimited by examining the change of eyeball account in relation to the axis of
mers. This angle is proportional to the change in position between the center
50
The pupil and the center of the eyeball. It is assumed that during initialization
system is looking at the center of the screen. It is registered then
the position of the pupil center, which is then treated as a value
reference specifying the eye. Axis position change related to movement
the head can be determined by the known 3D position of the head determined
in the previous stage. Head image with pattern vectors
(sight vectors are represented by »yellow lines) – Fig.
7.9.
6.1. Determination of the sight vector
The direction in which a person looks can be clearly determined
by studying the shift between the center of the pupil and the horns of the eye. Should »y
also take into account the position of the current head position (shift and
rotation). Methods for determining these values are presented in
above chapters.
The simplest approach is to determine the eye angle change
is the study of the shift between the pupil and the horns of the eye. Algorithms
used to track eye position changes can be divided into two
main groups: based on features and the eye model. An approach based
on features consists in the detection of image features depending on the current position
eye. This requires that an appropriate criterion depends on
the method used to determine the occurrence of the feature sought. (Eg.
In the case of binarization, it is necessary to specify the threshold value
shadow). The selection of the criterion value is usually a system parameter,
which must be set manually. The type of features used is varied
and depends on the particular method, but most often they are based on the level
intensity or image gradient.
With sufficiently illuminated image, the pupil is an area considerably
darker than the surrounding cornea. The center of the mires can be
determined as the geometric center of the area obtained after binarization
51
Figure 6.1: Face with sight vectors
52
with a properly selected threshold.
6.2 Starbust algorithm
The Starbust algorithm [15] is used to detect and measure the position of
and the reflection of infrared light reflected from the cornea of the eye. Is
he part of the open-source openEyes project. The purpose of this project is
creating a low cost, easily accessible to a wide range of users
eye tracking system. Starbust is a hybrid algorithm
using features-based approaches and a model. Contour
drawing the pupil is calculated by estimating the set ellipse
points lying on the border between the pupil and the cornea. Position of
the connected ellipse is improved using a technique based on a model that
maximizes the ratio between the brightness of pixels outside and inside
ellipse. Preliminary center estimation is necessary for contour detection
ellipse. It can be designated as the location from the previous frame, a
for the first frame you can use manual initialization or you can just
it is assumed that the center of the eye coincides with the center of the pupil during initiation
system lization. Points lying on the contour are determined by
wanted maximum value for changing the intensity of the image along »
rays from the center of the eye.
The ellipse is then adjusted to the points you have set. For
to improve accuracy, the RANSAC algorithm is used [23] (Random
Sample Consensus). His main idea is a plague on iterative random exp
five points from the set determined in the previous stage. Ilo ¢ ±
used points in each iteration results from the fact that to determine
an ellipse is needed to know five points. Then matched
an ellipse to this set is important. With a given ellipse, how many are checked
points from the whole set lie on its edge. It is assumed that the point lies
on the edge, if its distance from it does not exceed a certain expert
rationally chosen threshold. For further processing is selected
53
the largest set of points and to it using the smallest method
squares, the ellipse is matched. Application of the RAN algorithm
SAC makes the method much more reliable and resistant to
distortions than when the whole is used to calculate the ellipse
edge set. The least squares method is very sensitive
for disturbed data, therefore it is necessary to eliminate
validation of incorrectly found points.
The final stage of the startbust algorithm is to improve the position of the ellipse
determined by the RANSAC algorithm using the maximum
sharp edge contour.
In the work, the tested part of the algorithm responsible for determining
the middle of the pupil. The ready implementation of the algorithm was used
posted on the OpenEyes project website [16]. However, this method does not
gives good results because »in the presented distribution system
the front of the image of the eye is too small. To correct
In this way, the startbust algorithm requires an eye image in a large resolution
± your forehead. This method is dedicated to use with
camera mounted a short distance from the eye.
6.3 Adaptive selection of binarization threshold values
To assess the direction in which the eyes are directed, it is necessary
is the exact determination of the center of the pupil. There are many complicated ones
methods for searching the pupils, but most of them require the image
the eye was in high resolution. Assuming that the camera image is used
skids with low resolution (640×480), most of the existing materials
tod becomes useless. An example is the algorithm described above
startbust.
The shape and mutual position of the iris and pupil can be approximated
Describe it using two concentric circles. Obliczaj¡c
parameters of the circle describing the shape of the iris can be determined by
54
ordinates of the pupil. Due to its properties, the iris is
An element of the human eye that is easier to locate in relation to the pupil.
This is due to the fact that »the border formed by the iris from the sclera is
much more contrasting than created with pupil, in consequence it is
she is easier to locate.
The method of double binarization of the image, which was used in the work
does not require high resolution. It is based on the fact that the pupil
together with the iris are much darker than the white of the eye. Choosing from-
the appropriate binarization threshold is segmenting the image. Binaryza-
shadow is very sensitive to changes in lighting, which makes it not
it is possible to choose a universal binarization threshold so that the center of the eye
always determined correctly, irrespective of the lighting conditions
and the color of the iris. This resulted in the implementation
automatic binarization threshold selection procedures.
To increase reliability, two different thresholds are used. At
approximately the middle of the pupil is determined using the first threshold.
Then the image of binarization using the second threshold is given, with
limiting that the designated area must contain the previous one
estimates. Thanks to this approach, reliability is used
smaller threshold and better accuracy of the larger binarization threshold.
After binarizing with pre-set thresholds,
there is image segmentation. A common method of determination
focal points of areas separated during segmentation is
calculation of the ± mass center. However, the light reflected from the eye can cause
the point is that the areas of the pupil will contain clear references that interfere
precise designation of the center of the area. Therefore, for estimation, use
An ellipse was determined using the central contour moments
a binarized image of the eye.
The implemented method is a modulation of the algorithm described in
of work [18].
55
Determining the center of the area with an ellipse After segmenting the image,
the contours of the separated areas are marked. Then they are calculated
central contour moments. Central moments are obtained by adding up
values of all pixels contained in the contour. General formula de niu-
moment of order (p, q) .
m p, q =
n
Σ
i = 1
And (x, y) x
p
s
q
Adjusting the ellipse (center in point (x, y) , height h and width w)
to the area for which the central moments have been calculated
is as follows:
Introduces si| worth of auxiliary ±
u 0.0 = m 0.0
u 1.0 =
m 1.0
m 0.0
u 0.1 =
m 0.1
m 0.0
u 1.1 = –
m 1.1 – m 1.0
m 0.1
m 0.0
m 0.0
u 2.0 =
m 2.0 – m 1.0
m 1.0
m 0.0
m 0.0
u 0.2 =
m 2.0 – m 1.0
m 1.0
m 0.0
m 0.0
Δ =
√
4 (u 1.1 ) 2 + (u 1.1 – u 1.1 ) (u 2.0 – u 0.2 )
56
‘Center of the ellipse (x, y) defines a si| nast|puj¡co:
{
x = u 1.0
y = u 0.1
Size (height of ± h and ± wide of the ) set ± lone s¡ for equal pomoc¡
on”:
{
h =
√
2 (u 2.1 + u 0.2 + Δ)
y =
√
2 (u 2.1 + u 0.2 – Δ)
Description of the algorithm for determining the diameter of the pupil
1. Wst|pne radius designation t|czówki R . Distance ratio
between the eyes and the iris radius is approximately constant with
all people and can be empirically determined.
2. The image of the eye obtained during face and eye detection remains
increased by the Gusov method up to 100×100 pixels. Dzi|ki
this results in better results for very small resolutions
± you. The accuracy of determining the pupillary center will be greater than
den pixel.
3. The use of erosion to eliminate re ect.
4. Image normalization, thanks to which you can operate regardless of
other cut-off values. Standardization makes the most
the darker pixel is always 0 and the lightest is 255, without
due to lighting.
5. Selection of the first binarization threshold. Iterative tests are checked for
other values from 0 to 40. The criterion for choosing the optimal value
is the distance from the center of the eye and the shape is close to the circle. iterations
the interruption is interrupted when the size reaches half of the previously determined
czonego R .
57
6. Selection of the second binarization threshold. Iterative ones are checked next
values from 40 to 80. The criterion for choosing the optimal value is
distance from the center of the eye and a shape close to the circle. Iteration is
interrupted when the size of the osi¡gnie previously designated ± R .
7. Determination of the mean pupil using selected binary thresholds
sation
The presented method, despite its simplicity, is very effective. Down-
it can cope with low image resolution as well as variable conditions
lighting. The result of the algorithm that searches for the middle
The figures are shown in Figure 6.2.
6.4 Calibration
To determine the user’s direction of vision, a line is used
homogeneous homogeneous mapping [4] of the visual vector generated as
the difference in distance between the current center of the pupil and the projection of
the position of the center of the eyeball on the plane of the camera. Mapping matrix
is referred to as H. It is determined on the basis of a set of rel
between the sight vector and the point displayed on the monitor.
Matrix H has eight degrees of freedom, which means that it is necessary
knowledge of at least four such pairs For increased accuracy,
± ten points are recorded during the calibration process. You-
the meaning of the H matrix is based on the least squares method.
The calculated difference between the sight vector and the point of view is minimized
on the monitor.
Application calibration is based on determining the relationship between the vector
sight and the book point displayed on the monitor. During
the calibration process, the user follows his movement
point. The greater the number of points displayed, the calibration makes
58
(and)
(B)
(C)
(D)
Figure 6.2: Result of the algorithm that searches for the mean pupil
59
can be calculated in a more accurate and reliable way. one-
however, too many points mean that the calibration process may take place
boring and you will not be able to keep track of points. To experiencing the ±
with the selection of different number of calibration points, they showed that 10 points
listings displayed after 3 seconds each evenly spaced
not on the screen gives satisfactory results and the calibration process is not
constricted when it lasts only 30 seconds. Human eye during the book
makes constant movement around the point being monitored. To reduce
the size of these involuntary eyeball movements, the point at which
concentrates the user, changes size or position. It makes
involuntary eye deviations are much smaller than in the case of
relevant points. The scheme in which he was also tested
the calibration point is constantly moving and moving all over
screen. This approach, however, does not give the best results. considerable
no better effect is obtained when the calibration point is displayed
in certain locations for some time. Thanks to this, you can
than the value of the vision vector for given locations and omit the significant
not standing out from others. Usually during the first second
after changing the position of the point, a large deviation of the vector is recorded
sight from the mean value. This is due to the fact that eyesight follows
with some delay after the point displayed. Data omitted
obtained for the first second improves the calibration results.
In some commercial systems for tracking high vision
accuracy, when calibrating, the point with the changing size is
replaced with small images or subtitles. This is an additional factor
stimulating the eye to concentrate at a specific point. However,
»A different difference between the approach with the point of changing size
the sculptures can only be seen with very high accuracy
sight vector. The solution presented in this work is based on
acquiring the image from the webcam. Such a picture is usually
so noisy that the accuracy obtained does not allow to register
60
Figure 6.3: Calibration window
any improvement in calibration using images or
small subtitles. It was the reason to stick to the simple scheme
displaying a single point of varying size (Fig.
6.3).
61
Chapter 7
Tests of the developed solution
The accuracy of the results obtained is influenced by many factors,
such as the level of exposure, the distance from the camera as an image from a camera
measurement, angle of view of the lens as well as the exact calibration process.
The system bases its operation on the use of an image from a webcam,
which is often very noisy. Occurrence of noise in the image
are caused by the use of high sensitivity matrix in the case of weak
stage lighting. It has a great influence on the size of the obtained eye image
distance from the camera and angle of view of the lens. In the case of
lens with a wide angle of view face resolution
and the user’s eyes are smaller, which worsens the results.
The accuracy of determining commercial vision systems is given
in degrees, so it doesn’t depend on the size of the monitor.
The error expressed in degrees determines the average deviation between the
the viewable value of the vision and the current point of sight book.
SMI offers the most accurate eye tracking system. IN
system specification maximum accuracy is defined as 0.3 stop
her. This accuracy is only obtained if the
the person has a fixed head and there are laboratory conditions of
smoldering. Systems using infrared light are very sensitive »-
daylight, which makes them correct
62
operation artificially lit room is required.
The best accuracy among systems based solely on
the use of a standard camera is characterized by work [14]. Authors
they received accuracy of 3 feet during the tests. Presented at work
it requires a complicated process to initialize the head model for
individual use