Urządzenia do śledzenie wzroku (eye-tracking) są w ostatnich latach niezwykle popularnym narzędziem badawczym pozwalającym na uzyskanie informacji odnośnie ruchu oka oraz położenia w danym przedziale czasowym. Dane pozyskane w ten sposób mogą zostać wykorzystane np. podczas prowadzenia badań z zakresu użyteczności interfejsów, skuteczności przekazu reklamowego czy jako bezdotykowy interfejs komputerowy. Celem opracowania jest przedstawienie metod śledzenia wzroku z wykorzystaniem standardowej kamery internetowej.

Chapter 1

Introduction

Eye tracking (also called »oculogra ¡in Polish) is a collection of techniques

research to obtain information regarding eye movement

positions in a given time period and (if any) point k-

eyesight. Data obtained in this way can be used

e.g. during research on the usability of interfaces,

cheap text or the effectiveness of an advertising message.

Every day our eyes are intensively used by many

for various purposes: to read, to watch entertainment videos, to

guarding and learning new things. However, we do not all pass

they realize how complicated the functioning is

human visual system.

Okulogra a, i.e. ± tracking eye movements

king) is a technique that has been used for over 100 years in areas such as

psychology, medicine, human-computer interaction, marketing and many

other.

1.1 Putting the problem

The goal of the master’s thesis is to develop and implement a ± le-

real-time eyes of a PC user

5

via webcam. The system’s intended purpose

the point of the current concentration of the sitting person is to be estimated

monitor.

The most important assumption of the work set by the author is

development of a universal application that does not require any »special

equipment gurus for action. The ideal solution should be composed of

from a personal computer and a single universal webcam

based utility.

There are many commercial systems on the market that allow ±

vision. The biggest disadvantage of existing solutions is the necessity

owning special equipment, starting with dedicated devices

head-mounted for infrared cameras. Šatwo

see the advantages of a system that would not require specialized equipment.

This would make it easy to popularize the solution. However,

no web cam makes eye tracking accuracy

much smaller compared to commercial systems.

Another important issue is the requirement that the system be in working order

weighing in real time, which allows you to keep track of every point

concentration. This involves the selection of algorithms for them

too much computational complexity. Great emphasis has been placed on

resistance of image processing algorithms to changing conditions of

smoldering.

Creating a system that would be able to meet the expectations of

magania is not a banal thing. ‘it is even a fact that it does not

there is still no commercial solution to this problem.

6

Chapter 2

Existing solutions

There are many methods for recording visual activity

of man, starting from the usual direct observation through invasion

z o.o. mechanical methods, and finally testing the potential difference

Electric leads between the two sides of the eyeball.

A huge number of tests have been done in the field of eye tracking.

Because of this, there are many different solutions that are significantly different from

each other. The choice of the optimal method depends on the purpose for which

important research. The intended use of the system determines the required

support, resolution, measurement frequency, ease and convenience of use,

and also the price.

Eye tracking systems can be divided into several groups: by reason

on the position of the device relative to the head (mobile and non-mobile), type

obtained data, methods of determining the point of the book (or itself

eye movement).

The following is an overview of the measurement methods used

eye position.

7

Figure 2.3: 3D Video-Oculography system [26]

Three-dimensional direction of vision

The third type of eye tracking is the so-called 3D oculogra. Except

data on the degree of horizontal and vertical eye deviation are provided

information on the degree of rotation (whorl) of the eyeball around its own axis.

Some devices even allow you to monitor movements and

Head positions, which can be later combined with eye rotation data. Pink-

this type of bond is provided by the German company SensoMotoric Instru-

ments GmbH. This is the 3D Video-Oculography ® head-up device

(3D VOG) fig. 2.3.

The data obtained are most often used for research into

binocular vision research and research

vestibular row.

10

2.2 Methods of measuring eye position

2.2.1 Electro-oculograure (EEA)

The technique is to measure the difference in the potential of the skin, using

eyes attached. Eye position can be detected from a large

thanks to registering very small differences in the skin’s potential.

However, this technique is quite difficult and is not suitable for everyday use

use because it requires close contact of the electrodes with the skin of the user

no one .. This method was the most used solution 40 years ago,

but it still applies [3].

The method is based on measuring the difference in bioelectric potentials

muscles located in the ocular region. Based on the amplitude of the

gneus calculates the distance between rapid changes in eye position

(also known as saccades). It is possible thanks to this and potential

the front of the eyeball differs from the potential of its posterior part. systematic

themes based on this method are very susceptible to evoked noise

the activity of other facial muscles. Elektrookulogra ¦ is used

most often in medicine.

Changes occurring in the charge of electric fields caused by movement

eye allows you to monitor its position.

2.2.2 Technician using contact lenses

Thanks to the use of special contact lenses it is possible

accurate assessment of eye movement. These lenses contain small ones

induction coils. The exact positioning of the lenses is determined by

recording electromagnetic field changes caused by motion

eye. One of the main problems is the compensation of head movements.

Head-mounted devices must be used. The disadvantages are

traffic restrictions and cumbersome devices, making use of this

the method is limited to laboratory experiments.

11

2.2.3 Optical techniques based on video recording

2.2.3.1 Using infrared light

Infrared lighting of the optical axis of the eye makes it much easier to locate

iris lysis [4]. ™ renica reflects almost all light, which makes it

the camera registers it as a white circle, this phenomenon is analogous to that

red eye. ™ source of infrared light

beyond the optical axis of the eye, the camera records the pupils as dark

area fig. 2.5

The method consists in determining the position between the middle of the pupil and

a reflection of the infrared light source from the cornea. Light exesses

puffed up from various eyeball structures are called Purkinje images

(Purkinje imaging) Fig. 2.4.

Eye trackers of this kind determine the position of the eye by registering the position

reflection of light from the surface of the cornea (the so-called first Purkin image

its P1 also called glint) relative to the center of the pupil. In order to

increase of measurement accuracy it is possible to increase the number of registers

up to four points (glints) created by the device.

There are also devices with greater measurement precision called

dual Purkinje eye trackers

using light reflection from the surface of the cornea (i.e. the first one)

Purkinje image, P1) relative to the posterior surface of the lens

(called the fourth Purkinje picture, P4). Such systems are characterized by

considerable accuracy, but that their biggest disadvantage is that the fourth

Purkinje’s picture is very weak, as a result of which the examined person does not

he can move his head during the experiment. This effect

usually achieved through the use of a chin rest or special

a bite-bar.

12

s

Figure 2.4: Four types of Purkinje images

Figure 2.5: Eye image recorded using the device

SMI

13

2.2.3.2 Without the use of infrared light

The image is recorded using a standard camera. Operation of this

group of systems consists in measuring the shape and position of the eye cornea

relative to the selected part of the face (e.g. eye corners). It happens very often

that part of the cornea is obscured by the eyelid, which may cause

weight distortion of the vertical position of the eye. Rozwi¡zaniem

is tracking the pupil instead of the cornea, which causes the area to be

the eyelid cover decreases. However, the contrast between the pupil

and the remainder of the eye is much smaller than that of the cornea.

A different approach is to use a neural network. collection

learning is the sequence of images representing the eye and point on

the monitor the person is currently looking at. The process of

The calibration of such systems is very time consuming. Big is required

the number of input images for proper teaching of the neural network

[21].

Systems based on the use of a standard video camera installed

they usually give the user that the head is completely still during

research.

Implementation is required to allow free head movement

methods that precisely determine the current position of the head (shifts

and shortcuts).

2.3 Review of existing solutions

There is a whole range of commercial eye tracking systems on the market.

The leading products are Tobii [25] and SMI [26]. S¡ are systems

remote, ie no “devices” need to be mounted on the head

u »user. The operation is based on the use of a recording camera

images in the infrared band and sources of infrared light. -Ak-

The actual point the user looks at is determined by testing

14

change in position between the center of the pupil and the reflection reflected from the background

of the eye cap caused by an infrared light emitter.

However, there is no commercial system that it would choose

using a standard camera. There is, however, a lot of it

research centers in which intensive research is conducted in

to create a prototype of such a system. An example would be or

COGAIN [27], which, among other things, works on creation

cheap eye-tracking system using a standard camera

mers.

Currently the only research project available

sight using a standard camera is opengazer [22]. This is a

open source gram. His main disadvantage is the assumption that the head used

The profile is completely stationary while the program is running. Even

low traffic makes recalibration of the system necessary.

15

Chapter 3

Theoretical basics

3.1 Visual system

To build an eye tracking system it is necessary to know the basics

major issues related to the human visual system. multi

This tool is necessary to understand the current determination method

direction of vision, as well as to ‘identify existing restrictions’,

that must be considered when creating the linking system

sight.

3.1.1 Description of the visual system

The human eye is made of a lens with variable and adjustable fire

fastener, iris (diaphragm) regulating the diameter of the hole (pupil) through

light, and photosensitive retina into the eye. photo-

receptors on the retina change light into potential

electrical, which causes a nerve impulse transported to the cortex

cerebral through the optic nerve.

The retina consisted of two types of photoreceptors: suppositories

and stamens. The stamens are very sensitive to the intensity of light,

this makes it possible to see in very ambient conditions

low intensity of light. However, these receptors are not able to register

16

Figure 3.1: Eye structure

the color, which means that everything seen at night has a shade of gray.

Unlike stamens, suppositories need a much lighter one

Lights for the production of nerve impulses, but they are able to resolve

get to know the colors. The stamens have three types of filters sensitive to different ones

light lengths: red, green and blue.

Due to the high contrast between the sclera, the fever can be –

visual assessment is necessary. One of the reasons for this

The construction of the eye is the fact that man is a social being, and the definitions of

sight market is very helpful during communication.

The yellow spot is responsible for the picture sharpness. This is the largest

a large cluster of suppositories with a diameter of about two minimimeters

in the center of the retina. The rest of the retina is between 15 and 50

percentage of photoreceptors. This makes the width of the image seen

about one degree. Focusing on an object is not possible

smaller than the yellow spot, which makes it impossible to determine

17

direction of view with more accuracy than “one degree”.

3.1.2 Eye movement

Eye movements serve two primary purposes. First, stabilization

image on the retina to compensate for head or movement movements

objects in the field of view. Secondly, positioning the eye relative to

connecting so that the fragment of the image analyzed at a given time is projected

on the medial retina with the highest sensitivity and processing capacity

detailing. Due to the diversity of retinal functions,

Face image details (e.g. letter shapes when reading) can be

it is only possible to cover a small area of around one angle of a century

near the retina center.

During normal human activity, most of the time the eyes

remain in a state of bookkeeping, i.e. in a state of relative rest. Under-

the time of the book is taking visual information from the surroundings. Time

The duration of the book depends on how the information is processed. On

the total ranges from 0.15 sec to 1.5 sec ‘average 4-6 times

for a second sacada is performed, which is a very fast, jumping movement

changes of position between subsequent books. Sakada lasts forever

approx. 0.03 to 0.06 seconds

3.2 Optical flow

Optical flow is a vector field that allows

transforming a given image in a sequence into another image of that sequence,

by moving areas from the first image (for which

this field is defined), according to the corresponding vectors of it

fields for the second image. In short, optical flow is a set

translation (in the form of a field) that transform the image in sequence

in the next image in the sequence [3].

18

There are many methods for determining optical flow. You can eat them

divide into three main groups:

Gradient-based methods for the analysis of bazuj¡ce derivatives (spatial

time and intensity) of the image intensity,

Methods in the area of ± cz|stotliwo based on information ltrowaniu

pictorial in the field of frequency,

Correlation methods

The Lucas-Kanade algorithm The work uses the Lucas- algorithm

Kalman. This is a gradient method.

The operation of the algorithm is based on three basic assumptions:

1. The brightness of the image does not change much between frames

sequence.

2. The speed of motion of the objects in the image is low.

3. Points that are a short distance away move

similarly.

The brightness of the image is determined by the function depending on the time.

f (x, y, t) ≡I (x (t), y (t), t)

The requirement that the brightness of the image should not change over time

presents the equation:

I (x (t), y (y), t) = I (x (t + dt), y (t + dt), t + dt)

This means that the intensity of the tracked pixel does not change over time.

August:

19

∂f (x, y)

∂t

= 0

Using this assumption, you can save the optical flow condition

denoting velocity vector in x direction by u, and denoting by

v velocity vector in y direction

–

∂I

∂t

= –

∂I

∂x

u +

∂I

∂y

v

Unfortunately, this is an equation with two unknowns. To solve them

additional information is needed. Right now at the moment

use is the third assumption, points are in a small

the distances move similarly. Thanks to this, it solves

For the equation for one point, additional pixels are used

otaczaj¡ce. To solve the equality, the least method is used

squares.

The assumptions of this method make it detectable only

very small movement between successive frames. To improve its performance

The image pyramid does not apply. This involves the implementation of the algorithm

successively in the image with reduced resolution.

twenty

Chapter 4

Description of your own solution

The visual tracking system presented is based on the recorder

eye image using a standard webcam. Scheme

the system is shown in Fig. 4.1. For further simplification

consideration »it is assumed that the camera is placed on the monitor on his

± middle. Most commercial systems require the camera to be

located below the monitor, this gives a slightly better picture of the eye. But

the purpose of this work was to create a universal solution. standard

webcam is located above the monitor, special

an example would be laptops with a built-in camera where it does not exist

it is possible to change its location.

The purpose of the work is to determine the point on the monitor for which given

the moment the user looks. To obtain such data it is necessary

determining the relative position of the eye and head.

The first stage is the initialization of the algorithm during which

the head model is created. The initialization phase follows the tracking phase.

Optical flow is used to determine the 3D position of the head

between successive frames obtained from the camera and the PO- algorithm

SIT [13]. The change in the direction of vision is determined by examining the differences in

distances between the center of the eyeball and the center of the pupil of the eye. These data

are obtained using a whole range of methods to process

21

Figure 4.1: System diagram

times described later in this work. To get presented

results have been tested a whole range of algorithms, some of which

was presented later in the work.

Then after determining the relative position of the eyes and head

creating a geometric model, the mapping of the eyesight on the plane follows

from the monitor. To determine the necessary mapping coefficients

the calibration process at the beginning of the program session is necessary. Ka-

system calibration is based on following the display point

in specific locations on the monitor.

22

Chapter 5

‘tracking head position

The large number of available systems for tracking eyesight assumes

combined head movement. This is not a convenient or practical solution.

Man is unable to keep his head completely still

a long period of time without using a cradle on which to rest

chin or forehead. Such devices are used in ophthalmology during

eyesight tests. Involuntary head movement is heavily influenced by breathing

no. That it would be possible to determine the direction in which it is directed

sight of the person sitting in front of the computer without restrictive imposition

main movement restrictions, very precise tracking is necessary

3D face position.

Head positioning is a very important area of research over

human-computer interaction (HCI). There are many methods for estimating

positions with a single camera. These methods can be divided

two main groups: based on the head model and charac- teristics

characteristic faces. The methods using the model estimate positions

designating the 2-3D relationship between features. By means of these dependencies

position is determined.

Methods based on facial properties assume that there is certain

the relationship between the 3D position and certain facial image properties. You-

they also mean dependencies when using a large number of training images

23

with a known face position for training the neural network. The disadvantage of this

approach is a malfunction for new faces whose image

was not used during training.

The paper will present two different approaches. First

inspired by [10] uses the AAM algorithm [28]. This is one thing

of the most popular solutions to track head position in recent years

tach. The second one is a method using a sine wave

del head and POSIT algorithm [13] for determining the current position.

In [30], a combination of both methods was presented.

5.1 Active Appearance Model

The Active Appearance Model (AAM) [28] is a detection method

and ± tracing any objects whose model is created during

learning phases based on a properly prepared set of images

± qualitative input. AAM has a very wide range of applications starting

from the segmentation of medical images, facial recognition as well as

head tracking. The algorithm can be modeled as evidence

an object whose shape and texture are known. AAM is a type method

data-driven, which means that there is no need for manual selection of

rameters that determine the action. The algorithm automatically tunes

to data during the initialization process. His main disadvantage is that

the success of the action depends very much on proper selection

a set of images with an imprinted shape of the object used during

model creation phases. The collection of learning data can consist of hundreds

images, and putting the shape of an object correctly can be a task

pracochªonnym. Of course, it is possible to automate the nano-

shape on the learning data set, but this is an issue

outside the framework of this work. The method is used to build the model

main components (PCA), and it is assumed that the sought

The modes of the model have a Gusian distribution, which in extreme cases can

24

make the shape of the model subject to unreal deformation. -Ak-

The normal appearance model consists of a static shape model and

texture model

5.1.1 Static shape model

The active shape model is a structure containing information about red-

the shape of an object of a given type (e.g. face) and data describing the most

more characteristic modifications of this shape, observed in

teaching set. The model’s shape can be fashioned by algo-

rhythms that try to match it to the real shape, no

while allowing unnatural deformations. Creating a

delu begins with building the object template, i.e. the set of

points representing a given shape. This is called O-

del points distribution (Point Distribution Model PDM). Cha- points

characteristic ones must then be marked in adequate places

photos on all N learning images. In this way we get

set of learning shapes, written as containing vectors

coordinates of characteristic points. Size and data

object positions are removed from vectors by a special procedure

standardization, so that only information about the shape remains. automatic

positioning of characteristic points on static images

new very complex problem, so the safest method to use

scanning of a set of examples of shapes is their manual selection,

undoubtedly a tedious and time-consuming occupation.

The shape is defined as a set of 2D points forming a grid

fifth on the tracked object. Landmarks (landmarks) can

be placed on the image automatically or manually by the user

speaker. The mathematical record of the shape s is expressed by the 2n vector

dimensional

25

s = [x 1 , x 2, …, x n , y 1 , y 2, …, y n ,]

The training data set is given normalization, and then

its static parameters are determined using the principal components analysis

data (PCA). To analyze the main components it is necessary to calculate

not values and eigenvectors of the C covariance matrix , where by

N is the number of images in the training set

C =

1

N – 1

SS

T

S = [s 1 – s 0 , s 1 – s 0 , … s N – s 0 ]

PCA is used as a method to reduce the size of the set

data. Only eigenvectors corresponding to the largest are used

own value. The number of t vectors used depends on the different

input data set. This allows you to approximate the instance

shape s as a linear combination of eigenvectors of the kov

variance.

s ≈ s 0 + Φ s b s

Where b s is a parameter vector described as follows

b s = Φ T

s (s – s 0 )

a Φ T

s is a matrix containing t eigenvectors

Using the shape model created using PCA is possible

generation of new objects similar to the training set.

5.1.2 Static texture model

The texture model is created based on a set of learning images from

2D face mesh. The grid is determined using the method

26

Delone triangulation [29] on a set of characteristic points described

in the previous chapter.

The g texture for each input image is defined as

the intensity of pixels inside the grid spread over points of

tangents.

g = [g 1 , g 2 , …, g m ] T

The texture model describes the differences in appearance between the input images

exchange. To create such a model it is necessary at all

images of pupils overlapped with relevant parts of the face. You can

achieved by applying deformations to the shape of individual faces

gªówny p 0 determined in the previous step. Transformation of the ob

learning times for a common shape is determined using a map

A n ditional relations of the individual triangles of the face mesh.

The transformation of the texture of the educational image into a reference form is

performed in the following way:

1. For each pixel of the face a 2D grid triangle is searched, in

where it is located

2. A nical transformation of the designated triangle is carried out

so that it best matches the main shape

3. The transformed triangle is applied to the output image.

After initial normalization, the input file is processed

the principal component method (PCA) in an analogous manner as it was

shaped model of the shape. The columns of the G matrix are normalized vectors

important textures of student images g. The covariance matrix is calculated

The new texture is generated as a linear combination of vectors

own covariance matrices.

g = g 0 + Φ g b g

Where b g is a parameter vector.

A common approach to determining the 3D head position is use

AAM (Active Appearance Model) algorithm. The main idea is based

on creating a 2D model of the appearance of the face by training it prepared

images with a 2D mesh contouring the face, eyes and nose

and paragraph. Then the model prepared in this way is adapted to new ones

images designing the characteristics specified during model learning. Badaj¡c

2D mesh distortion is possible to approximate the 3D position.

An example would be work [14].

The method has been tested using a ready implementation of al-

of bitterness contained in the open source library: The AAM-API [11].

The advantages are high efficiency and accuracy, but only in the case of

when the model will be prepared for the person. When the model is

created for a large base of people there is an incorrect

matching searched features. It is necessary to manually put the mesh on

the user’s face makes such a program would require complicated

con ﬁ guration. This was the reason why it was abandoned

± approach cutting.

5.2. Determination of the head position using the model

3D

The head position is determined by six degrees of freedom: three angles

rot (Fig. 5.1) and three values of the M shift (x, y, z) . Head rotation

can be characterized by three Eurel angles: around the z axis (roll, θ ),

then around the y axis (yaw, β ) and finally around the x axis (pitch, α ).

28

Figure 5.1: Head model with rotation axes

29

The rotation matrix R is determined on the basis of knowledge of three k¡-

of Euler.

R z (θ) =

⎡

⎣

cosθ −sinθ 0

sinθ cosθ 0

0

0

1

⎤

⎦

R y (β) =

⎡

⎣

cosβ 0 sinβ

0

1

0

sinβ 0 cosβ

⎤

⎦

R x (α) =

⎡

⎣

1

0

0

0 cosα −sinα

0 sinα cosα

⎤

⎦

The algorithm for determining the 3D position of an object is based on a simplified one

camera model called camera obscura. The idea of action is based on

simulation of model parameters based on the estimation of object features projection

best suited to the location of these features in the image. By using

combined camera model point projection

– →

and the 3D model on the image plan

– →

b assuming the absence of distortions caused by imperfections

the lens lens can be described as follows:

– →

b = T− → a

u x = f

b x

b z

u y = f

b y

b y

where T is the transformation matrix in a homogeneous system

wspóªrz|dnych. Matrix T is a combination of the following geo operations

metric: rotation around the coordinate axis of the θ , β and angle α

and M vector translation

thirty

T = M (x, y, z) R z (θ) R y (β) R x (α)

The f- factor represents the focal length of the lens. Wspóª-

image rows in pixels

– →

q are calculated as follows

q x =

u x

s x

+ c x

q y =

u y

s y

+ c y

The coefficient s determines the net value of the distance between two measures

adjacent pixels on the camera matrix, and c means the displacement of

between the optical axis and the center of the matrix. For simplicity, the calculation can be done

assume that a value

– →

c is zero. This corresponds to the situation when

the troy is placed perfectly on the optical axis of the lens. Value f

can be determined experimentally during the camera calibration process.

For further calculations, a knowledge of f was assumed . Establishing a constant value of f

does not have a significant impact on the further operation of the algorithm, because it continues to

It is sufficient to determine the relative changes in

head. After applying these simplifications, the current head position

– →

p

it is described by six variables

– →

p = {x, y, z, α, β, θ}

In the general case, the projection of 3D object points onto a 2D plan is

non-linear operation, but assuming small changes between the known

the position determined in the previous frame and the current position. Opera-

this term can be well approximated by a linear function. With this assumption

parameters describing positions can be determined iteratively.

– →

p i + 1 =

– →

p i –

– →

d

31

Iteration correlation vector in each step

– →

d is calculated at

power to minimize the vector

– →

e error being the sum of the distance between

projection of model points and the position of appropriate features in the image. Hundred-

using Newton’s method, where the Jacobian matrix is denoted by J ,

– →

d

determines the following equation

J

– →

d =

– →

e

This equation can be solved by using pseudo inversions

– →

d = (J

t

J)

-1

J

t – →

e

Levenberg-Marquardt algorithm

The presented method of determining the position of an object means that all

During optimization, the parameters are treated with the same weight.

For example, a 0.5 radian rotation around the z axis can have a much higher

impact is more than the projection change caused by a 50 mm shift. co-

it is necessary to normalize the coefficients taking into account the

dard ka »each row of the matrix J . To this end, a small

shadow W, whose elements lying on the diagonal are inversely proportional

to standard deviation σ .

In ii =

1

σ p i

To improve the convergence of the method, the λ parameter is added in the target

weight control stabilization.

– →

d = (J

t

J + λW

t

IN)

-1

J

t – →

e

Determination of Jacobian J is a complex operation. Spra-

this means that this method is not the best solution to use

32

for real-time operations. This was the reason for giving up

this approach and the POSIT algorithm.

5.2.1 POSIT algorithm

Algorithm POSIT (Pose from Orthography and Scaling with Iteration)

[13] is used to estimate the position in three dimensions of a known object.

It was presented in 1992 as a method of determining position

(position determined by the translation vector T and orientation matrix R)

3D object with known dimensions. Necessary to determine the position

is to define at least four points (not lying on one

plane) on the surface of the object. The algorithm consists of two parts

± ci: initial position estimation and iterative improvement of the result. Pierw-

the rest of the algorithm (POS) assumes that the designated points of the object are

are located at the same distance from the camera and the object size difference

due to the change in distance to the camera is negligible. Suppose »components,

»That the points are in the same distance means» that the object is located

far enough from the camera and you can ignore the depth differences (assumption

poor perspective). Thanks to this approach, knowing the parameters of

mery, you can pre-determine the position of the object using scaling

perspective. Such calculations are usually not sufficient

accurate, so interactive result enhancement is used. At

using the calculated position in the previous iteration, points are given

to the 3D object. The result is used as the starting point of the first one

algorithm stage.

This method makes it possible to determine the 3D position of an object by means of video

dock from a single camera. To act it is necessary to know the act

any other mapped 2D position with at least 4 points not lying on one

the plane and their 3D coordinates in the object model. Algorithm

does not include perspective. However, this has no effect on the

If the subject is sufficiently distant from the camera, because then the influence

33

perspective on the object’s appearance is negligible.

The described algorithm is a very good choice when

the application should work in real time, because »its evil»

computational load is small. Its basic disadvantage is necessity

determining a reference point whose 3D coordinates must be

zero. The reference point projection is incorrectly set

very large impact on disrupting the correct assessment of your current position

object. An additional problem may be the situation when the projection

the reference point will not be visible. This situation is possible

with a large head turn, for example, when the nose covers the features of

on one of its sides. The solution to this problem is

determining the reference location of the feature using a set of points from

her neighborhood.

The algorithm designing the 3D head position consists of two main ones

steps

1. Initialization of the algorithm. Face and eye detection is performed.

In order to proceed to further analysis it is necessary to extract

chipping of the face area from the surroundings. Then it is appointed

a set of features that will be tracked in the next stage. Based

determined features and the known position of the face a model is created

3D heads.

2. ‘tracking 3D head position. Repetitive iterative stage after initiation

algorithm cation. Based on the detected change in position of the punk

of those specified in the previous stage, the current

head position in 3D space described by the translation vector T and

rotation matrix R.

The individual phases of the algorithm are described in detail below.

34

5.2.2 Face and eye detection.

The method of detecting the position of the face in the image was used

proposed by Poul Viole and Michael Jones [12]. It is

one of the most popular solutions used to detect

her face. It is characterized by a high speed of action and very

good effectiveness. The method is used to locate objects in the image

using the previously created kator class. Kator class is training

valid using images containing the searched object and images,

on which the object does not occur. When using so prepared

The image set creates a cascading kator class, which is then

used to locate the object in unknown images.

The algorithm implementation in the Open library was used in the work

Computer Vision Library (OpenCV) [20]. This is an open project

urce created by rm¦ Intel. This project includes ready classes katory

for detecting faces and eyes.

The course of the face and eye detection algorithm

1. Search the image using the facial detection class.

The result is the location of all found faces. Added-

our analysis takes into account the face of the largest size, i.e.

the person closest to the camera.

2. Using the previously determined face location is replaced

another eye catcher used for eye detection. In order to

At CPU load, only the upper half is searched

face. It also reduces the risk of incorrect eye detection,

e.g. at the mouth or nose.

3. The size and position of the face and eyes are checked.

The size of both eyes and their horizontal position is required

35

Figure 5.2: Face and eye detection

be close to each other. If these conditions are not met the whole process

detection is performed from the beginning on a new frame.

The result of the algorithm operation is shown in Fig. 5.2.

Combination of the kator class for face and eye detection gives

much better results than using a single face class

WHO. Based on the experience, «it can be»

That by using only the facial detection class you can get the wrong ones

results, e.g. an area of the image that does not have a face, can

was incorrectly classified. This situation usually has

sce if the lighting conditions are different from what they were

on images used to train the kator class. Especially often

Incorrect results were obtained with lateral lighting of the face.

Combination of the two classes of sailors and mutual checking

the location of the eyes and their size gives much greater confidence

36

Algorithm requirements. The price you have to pay for it is that sometimes,

although there is a face in the input image, it will not be

naleziona. However, this is not a big problem because in the program

video sequences are used and several initial ones can be omitted

frames. The face detection algorithm is repeated interactive to mo

to detect faces and eyes and to meet their mutual conditions

poªo “pose.

5.2.3. Determination of the characteristics to be traced

Commercial facial and head movement detection system using marke

a trench made of reflective material that glued to the face.

This solution is very accurate and resistant to interference. requires

but expensive equipment. The assumption of the work was to create a system

requiring complicated hardware configuration. Presented

the solution is based only on a standard web camera without

use of any additional markers to improve performance

algorithms.

A discrete head was used to track changes in head position.

Lucas-Kalman optical flow described in the theoretical part. For action

It requires determining the set of features whose position change will be

Tracked in subsequent frames of the video sequence. Features must be yes

selected so that it would be possible to clearly determine the change in their mid

»In subsequent frames. The Lucas-Kalman algorithm works well,

if the tracked points are located in places where sharp

w|dzi. The selection of the right set of features is a very important issue.

It has a major impact on the further tracking of changes in position

face. In the literature you can find many different approaches to determining

good tracking points. One of the basic criteria

choosing the right method was a complex complex. By assumption

the program should work in real time, so it couldn’t be

37

afford to choose a complicated method that would be too heavy

»A processor.

One of the most commonly used edge definitions has been introduced

caused by Harris [12]. This definition is based on derivative matrices

second degree (hessian) image intensity. Hessian at point p (x, y)

is determined as follows:

H (p) =

[

∂ 2 I

∂x 2

∂ 2 I

∂x∂y

∂ 2 I

∂x∂y

∂ 2 I

2y 2

]

The Hessen M autocorrelation matrix is determined by summation

values of the second sunny adjacent to a given point:

M (x, y) =

⎡

⎢

⎢

⎣

n

Σ

-K≤i, j≤K

And 2

x (x + i, y + j)

n

Σ

-K≤i, j≤K

I x (x + i, y + j) I y (x + i, y + j)

n

Σ

-K≤i, j≤K

I x (x + i, y + j) I y (x + i, y + j)

n

Σ

-K≤i, j≤K

And 2

y (x + i, y + j)

⎤

⎥

⎥

⎦

The edges are in places where the autocorrelation matrix

hessian has two large intrinsic values. This means that the texture in

surroundings of a given point significantly change in two independent ones

directions. By using only the own values determined

the edges do not change when rotating the image. Searching for maxima

local autocorrelation of the input image can easily be used

scan the points that can be tracked using the flow

Optical.

It is very important that the designated features are arranged equally

on the surface of the tracked object. Achieved this can be limited to

destroying the minimum distance between neighboring points. Such a

It is necessary because in the case of autocorrelation of hessian

the object’s textures will have a large maximum in one

location, the result will be the location of most features around this maxi-

mum. In this case, the algorithm that determines the 3D position of the object

38

may give incorrect results.

The approach presented is based on the determination of natural features

faces that are easy to track by means of optical flow.

The algorithm searches for local maxima, which makes it adaptive to

lying away from the appearance of a person’s face searches for optimal points to

± tracking. This solution is much more universal and gives better

results are greater than relying on strictly defined facial features, such as

like corners of the lips, eye corners, etc. Determining fixed features is not always the case

can be “possible. In a situation where we assume that we use, e.g. the corners of the eyes,

and the algorithm of searching for these features will not predict the fact that a person can

wear glasses, this can lead to a worse result, or even to

invalid results.

Algorithm to determine the characteristics to be traced

1. Specifying the search area. Examination of the occurrence of features is

limited to the area of the face defined during the previous one

stage. The region in which located

permanent eyes because the movements of the eyeball band disturb the correct

determination of head position.

2. Determination of features. Algo-

rhythm of Harris [12].

3. Skip points too close to each other. In order to increase

performance gains, features located too close are eliminated

each other.

5.2.4 Initialization of the head 3d model

The head is modeled using a sinusoidal grid shown in

figure 5.4. This is a rough estimate without a detailed one

39

(a) an input image with the face marked

and eyes

(b) edge intensity (Hesian autocorrelation

image intensity)

(c) designated features

Figure 5.3: Result of the algorithm of searching for serving features

for determining the head position change

40

Figure 5.4: Sinusoidal head model

the shape of the face, but its action is quite satisfactory and not

requires a complicated approach.

This simplification of the model has a number of advantages:

Šatwo ± ¢ model designation

Automatic initialization. There is no need for prior appointment

preparation of the model grid for a given user.

Versatility ± ¢ zwi¡zana the lack of necessity of ± uwzgl|dnienia in-

individual face profile

Fast ± ¢ dziaªania wynikaj¡ca from simple.

Resistance ± ¢ on zakªócenia

41

Figure 5.5: Head with superimposed 3D model grid

Mapping of facial features (2D) determined in the previous step on

the 3D mesh of the model is possible due to the assumption that during initialization

the user’s head is facing straight towards the monitor. Behind-

Such assumptions can be made because the model will serve to

assessment of the change in the head position in relation to the position determined during

initialization. An example of matching the face mesh is shown

in figure 5.5

The use of a simplified face model enables easy initialization

system s. Simple models such as cylinder, ellipsoid or sine wave s¡

42

commonly used to track head movement because of initialization

such models require the selection of significantly fewer parameters than »

in the case of a real face model for an individual. On

based on the experiments, it can be stated that the initialization

The model is relatively resistant to imprecise parameter selection

initial, such as the exact size of the head, or its current position

tion. In the case of a detailed model, precision is very important

initialization because a small change in the start parameter has a large amount

affecting further tracking of the object. Having even precise

a person’s vital mesh obtained, e.g. using a laser scanner

if during the initialization of the algorithm that tracks the position change

the starting position, the exact face model are not selected correctly

It becomes useless. It is necessary to determine during initialization

the current position of the model so that the user’s face mesh overlaps

with the image obtained from the camera. This process can be done

automatically assuming that the camera position and person are known

looks towards the camera during initialization. Head positions can be used

then determine by face detection using the methods described

above “above. The sinusoidal grid is not a precise estimation of the real one

head model, but using a simplified model can be assumed,

»That the accuracy of the selection of start parameters will be sufficient for

proper examination of changes in the head position. When used

exact face model, the initialization process had to be carried out

manually. It would be a significant impediment and make the sys-

thus it would become less practical. System initialization required

additional knowledge from the user.

In [24] a comparison of the model’s application is presented

cylindrical and accurate model obtained with the help of a scanner

Cyberware laser, in the algorithm of tracking head position changes. FROM

the results of this work clearly show that the simplified model is completely

sufficient, and its main advantage is the possibility of automatic

43

initialization.

The sinusoidal model was used in the presented solution.

It is slightly more complicated than the cylindrical model. His

the main advantage is a better reflection of the shape of the nose, which makes

the head position in the case of the large shortening is determined with greater

dokªadno ± ci¡.

Initialization of the algorithm for tracking position changes based on

using the 3D model of the object consists in determining the relationship between

3D points lying on the surface of the object’s mesh, and projection of these

points on the starting frame of the video sequence. Marking points

tracking using the Harris algorithm has been described above.

Having a set of such points, it is necessary to mark their location on

model surface. In general, this is not the task of

stimulate, because if the model is complicated, you cannot

in an analytical way clearly define the mapping from space

2D to 3D. One of the simpler solutions is to use the library

OpenGL. Each triangle of the object’s mesh is assigned a different angle.

lor. Then such an object is rendered using the start page

item position. By reading the color covering the given point it determines

there is a relationship between a point in the image and its counterpart in

delu. This is a general solution and is suitable for a wide class of models.

When using a simplified sinusoidal model, which

can be described with algebraic equations, there is no need for

such complicated initialization.

Projection of the point of the 3D model on the projection plane

– →

q, assuming

»The simplified camera model described in section 5 shows

equations:

u x = f

b x

b z

44

u y = f

b y

b y

q x =

u x

s x

+ c x

q y =

u y

s y

+ c y

In the above equations there are parameters defining the

Camera, but without loss of generality it can be assumed that these parameters

try are fixed and fixed in advance. Such assumptions are possible because

it is not important for the operation of the system to determine an absolute position

gªowy. It is only necessary to determine the change in position between

frames of video sequences. This is due to the fact that during calibration

the appropriate coefficients will be taken into account. At the stage of

It is not necessary to determine the exact camera model.

There are many methods [23] for determining parameters

cameras, but this requires an additional calibration process using

specially prepared images. One of the most popular

A different method is to use a chessboard with known field sizes. Picture of this one

the chessboard is recorded in several different positions. On the basis of

The camera model is calculated from this data. Thanks to this approach,

the absolute 3D position of the tracked object can be determined. But

the calibration process must be carried out individually for each

mers.

A knowledge of the starting head position is assumed. Assuming that

the user during the initialization looks straight into the camera can be set

let that the rotation matrix R is a unit matrix . Vector instead

T translation is determined on the basis of a known head position determined

automatically during the face and eye detection process. Wspóªrz|dne

x and y of the vector T are obtained directly from the current measure

45

while wspóªrz|dna face with a set ± Ian badaj¡c rock face.

A larger face size means that the user is closer to the camera, on

on this basis the value from the T vector is determined .

5.2.5 Use of reference frames

The method presented above uses information about the change of

»Selected facial features only between successive frames of the sequence

video. This approach gives satisfactory results when

the number of frames processed will not be too large. In the case of debt »-

The accumulation of position error has become noticeable

features between successive frames. The solution to this problem was to

applying the method presented in [19]. This approach is based on

on the use of a set of reference images. By reference image

the image frame is determined with the known current head position and set

rem facial features. When the head position approached the registered position

in the reference frame, the accumulation error is leveled. Spra-

this implies that the operation of the head position tracking algorithm does not deteriorate

over time, which was the case when using

Together, the information relates to changing the position of facial features between successive ones

frames.

During system initialization, a startup reference frame is created

and then, when the need arises, auto

New frames automatically. When tracking head position when current

the position significantly differs from the set of registered reference frames

A new frame is created with the position specified in the previous one

algorithm iteration. This approach makes the algorithm work in a way

automatic. It is not necessary to register a set of reference frames

during the initialization process. Use of one starting frame

and the algorithm of automatically adding next frames give

proving results and is a universal solution.

46

5.2.6 Elimination of distorted data

To determine the 3D position of the head, knowledge of the

the actual position of the facial features determined during

initialization process. The positioning algorithm is based on

the least squares. This entails the necessity of

hiding features with incorrectly marked positions, because the large

bending the position of even a single feature can have a very large impact

on the whole method. Changing positions and points is determined

by optical flow. This is a precise solution, however

disturbances caused by a sudden change in lighting or a partial

the salinity of the face can cause the newly calculated location of the features

face will not be correct. Optical flow determines the change in

»Between frames. Therefore, a single fault has an effect

for further course of feature location. The result was a need to

implementing the method of interference detection and elimination. Score

operation of the feature detection algorithm with incorrectly determined position

is shown in Figure 5.6. Green points indicate the features of which

the location has been marked correctly, while in red it was marked

connected points that have been classified through the elimination algorithm

malfunction “as incorrect data.

The operation of the algorithm that detects an incorrect result

optical flow

1. Calculation of the average value of the shift vector of all differences

weighted features between successive frames.

2. Determination of the standard deviation from the mean value of

suni|cia.

3. Elimination of features whose value exceeds three times the mean

the value of the offset deviation. Features eliminated do not take

participation in the current iteration when determining the head position.

47

4. Improving the position of eliminated features by means of point projection

of objects corresponding to given features of the 3D head model. Dzi|ki

then in the next iteration these points can be reused.

The application of this approach makes the position determination algorithm

head is resistant to disturbances such as changing facial expressions,

covering the face with any object, or a sudden change in lighting

scene.

48

Figure 5.6: Elimination of malfunction of the optical flow

Nego

49

Chapter 6

Determining the direction of vision

Detection of the human eye is a difficult task due to low levels of

the trauma between the eye and the surrounding skin. As a consequence, many

non-existent systems use a special camera mounted in a small one

distance from the eye to get a high resolution image.

The most popular method of determining the direction of vision is examination

the difference in distance between the pupil center and the luminous ex

called by an infrared light source. Current location of these

two points determine the sight vector. Mapping between vector

sight, and the current point of sight book is determined during

calibration. This is the simplest and commonly used method in

commercial systems. It does not require complicated algorithms

prevailing images. The use of an external light source makes

operation is not dependent on lighting conditions.

It is assumed that the system presented in this work should not require

the use of any additional equipment besides a standard camera

internetow¡. Therefore, it was necessary to determine the direction of vision

application of a method based solely on the visible spectrum of knowledge

tªa. The current direction in which the user is looking can be determined

± delimited by examining the change of eyeball account in relation to the axis of

mers. This angle is proportional to the change in position between the center

50

The pupil and the center of the eyeball. It is assumed that during initialization

system is looking at the center of the screen. It is registered then

the position of the pupil center, which is then treated as a value

reference specifying the eye. Axis position change related to movement

the head can be determined by the known 3D position of the head determined

in the previous stage. Head image with pattern vectors

(sight vectors are represented by »yellow lines) – Fig.

7.9.

6.1. Determination of the sight vector

The direction in which a person looks can be clearly determined

by studying the shift between the center of the pupil and the horns of the eye. Should »y

also take into account the position of the current head position (shift and

rotation). Methods for determining these values are presented in

above chapters.

The simplest approach is to determine the eye angle change

is the study of the shift between the pupil and the horns of the eye. Algorithms

used to track eye position changes can be divided into two

main groups: based on features and the eye model. An approach based

on features consists in the detection of image features depending on the current position

eye. This requires that an appropriate criterion depends on

the method used to determine the occurrence of the feature sought. (Eg.

In the case of binarization, it is necessary to specify the threshold value

shadow). The selection of the criterion value is usually a system parameter,

which must be set manually. The type of features used is varied

and depends on the particular method, but most often they are based on the level

intensity or image gradient.

With sufficiently illuminated image, the pupil is an area considerably

darker than the surrounding cornea. The center of the mires can be

determined as the geometric center of the area obtained after binarization

51

Figure 6.1: Face with sight vectors

52

with a properly selected threshold.

6.2 Starbust algorithm

The Starbust algorithm [15] is used to detect and measure the position of

and the reflection of infrared light reflected from the cornea of the eye. Is

he part of the open-source openEyes project. The purpose of this project is

creating a low cost, easily accessible to a wide range of users

eye tracking system. Starbust is a hybrid algorithm

using features-based approaches and a model. Contour

drawing the pupil is calculated by estimating the set ellipse

points lying on the border between the pupil and the cornea. Position of

the connected ellipse is improved using a technique based on a model that

maximizes the ratio between the brightness of pixels outside and inside

ellipse. Preliminary center estimation is necessary for contour detection

ellipse. It can be designated as the location from the previous frame, a

for the first frame you can use manual initialization or you can just

it is assumed that the center of the eye coincides with the center of the pupil during initiation

system lization. Points lying on the contour are determined by

wanted maximum value for changing the intensity of the image along »

rays from the center of the eye.

The ellipse is then adjusted to the points you have set. For

to improve accuracy, the RANSAC algorithm is used [23] (Random

Sample Consensus). His main idea is a plague on iterative random exp

five points from the set determined in the previous stage. Ilo ¢ ±

used points in each iteration results from the fact that to determine

an ellipse is needed to know five points. Then matched

an ellipse to this set is important. With a given ellipse, how many are checked

points from the whole set lie on its edge. It is assumed that the point lies

on the edge, if its distance from it does not exceed a certain expert

rationally chosen threshold. For further processing is selected

53

the largest set of points and to it using the smallest method

squares, the ellipse is matched. Application of the RAN algorithm

SAC makes the method much more reliable and resistant to

distortions than when the whole is used to calculate the ellipse

edge set. The least squares method is very sensitive

for disturbed data, therefore it is necessary to eliminate

validation of incorrectly found points.

The final stage of the startbust algorithm is to improve the position of the ellipse

determined by the RANSAC algorithm using the maximum

sharp edge contour.

In the work, the tested part of the algorithm responsible for determining

the middle of the pupil. The ready implementation of the algorithm was used

posted on the OpenEyes project website [16]. However, this method does not

gives good results because »in the presented distribution system

the front of the image of the eye is too small. To correct

In this way, the startbust algorithm requires an eye image in a large resolution

± your forehead. This method is dedicated to use with

camera mounted a short distance from the eye.

6.3 Adaptive selection of binarization threshold values

To assess the direction in which the eyes are directed, it is necessary

is the exact determination of the center of the pupil. There are many complicated ones

methods for searching the pupils, but most of them require the image

the eye was in high resolution. Assuming that the camera image is used

skids with low resolution (640×480), most of the existing materials

tod becomes useless. An example is the algorithm described above

startbust.

The shape and mutual position of the iris and pupil can be approximated

Describe it using two concentric circles. Obliczaj¡c

parameters of the circle describing the shape of the iris can be determined by

54

ordinates of the pupil. Due to its properties, the iris is

An element of the human eye that is easier to locate in relation to the pupil.

This is due to the fact that »the border formed by the iris from the sclera is

much more contrasting than created with pupil, in consequence it is

she is easier to locate.

The method of double binarization of the image, which was used in the work

does not require high resolution. It is based on the fact that the pupil

together with the iris are much darker than the white of the eye. Choosing from-

the appropriate binarization threshold is segmenting the image. Binaryza-

shadow is very sensitive to changes in lighting, which makes it not

it is possible to choose a universal binarization threshold so that the center of the eye

always determined correctly, irrespective of the lighting conditions

and the color of the iris. This resulted in the implementation

automatic binarization threshold selection procedures.

To increase reliability, two different thresholds are used. At

approximately the middle of the pupil is determined using the first threshold.

Then the image of binarization using the second threshold is given, with

limiting that the designated area must contain the previous one

estimates. Thanks to this approach, reliability is used

smaller threshold and better accuracy of the larger binarization threshold.

After binarizing with pre-set thresholds,

there is image segmentation. A common method of determination

focal points of areas separated during segmentation is

calculation of the ± mass center. However, the light reflected from the eye can cause

the point is that the areas of the pupil will contain clear references that interfere

precise designation of the center of the area. Therefore, for estimation, use

An ellipse was determined using the central contour moments

a binarized image of the eye.

The implemented method is a modulation of the algorithm described in

of work [18].

55

Determining the center of the area with an ellipse After segmenting the image,

the contours of the separated areas are marked. Then they are calculated

central contour moments. Central moments are obtained by adding up

values of all pixels contained in the contour. General formula de niu-

moment of order (p, q) .

m p, q =

n

Σ

i = 1

And (x, y) x

p

s

q

Adjusting the ellipse (center in point (x, y) , height h and width w)

to the area for which the central moments have been calculated

is as follows:

Introduces si| worth of auxiliary ±

u 0.0 = m 0.0

u 1.0 =

m 1.0

m 0.0

u 0.1 =

m 0.1

m 0.0

u 1.1 = –

m 1.1 – m 1.0

m 0.1

m 0.0

m 0.0

u 2.0 =

m 2.0 – m 1.0

m 1.0

m 0.0

m 0.0

u 0.2 =

m 2.0 – m 1.0

m 1.0

m 0.0

m 0.0

Δ =

√

4 (u 1.1 ) 2 + (u 1.1 – u 1.1 ) (u 2.0 – u 0.2 )

56

‘Center of the ellipse (x, y) defines a si| nast|puj¡co:

{

x = u 1.0

y = u 0.1

Size (height of ± h and ± wide of the ) set ± lone s¡ for equal pomoc¡

on”:

{

h =

√

2 (u 2.1 + u 0.2 + Δ)

y =

√

2 (u 2.1 + u 0.2 – Δ)

Description of the algorithm for determining the diameter of the pupil

1. Wst|pne radius designation t|czówki R . Distance ratio

between the eyes and the iris radius is approximately constant with

all people and can be empirically determined.

2. The image of the eye obtained during face and eye detection remains

increased by the Gusov method up to 100×100 pixels. Dzi|ki

this results in better results for very small resolutions

± you. The accuracy of determining the pupillary center will be greater than

den pixel.

3. The use of erosion to eliminate re ect.

4. Image normalization, thanks to which you can operate regardless of

other cut-off values. Standardization makes the most

the darker pixel is always 0 and the lightest is 255, without

due to lighting.

5. Selection of the first binarization threshold. Iterative tests are checked for

other values from 0 to 40. The criterion for choosing the optimal value

is the distance from the center of the eye and the shape is close to the circle. iterations

the interruption is interrupted when the size reaches half of the previously determined

czonego R .

57

6. Selection of the second binarization threshold. Iterative ones are checked next

values from 40 to 80. The criterion for choosing the optimal value is

distance from the center of the eye and a shape close to the circle. Iteration is

interrupted when the size of the osi¡gnie previously designated ± R .

7. Determination of the mean pupil using selected binary thresholds

sation

The presented method, despite its simplicity, is very effective. Down-

it can cope with low image resolution as well as variable conditions

lighting. The result of the algorithm that searches for the middle

The figures are shown in Figure 6.2.

6.4 Calibration

To determine the user’s direction of vision, a line is used

homogeneous homogeneous mapping [4] of the visual vector generated as

the difference in distance between the current center of the pupil and the projection of

the position of the center of the eyeball on the plane of the camera. Mapping matrix

is referred to as H. It is determined on the basis of a set of rel

between the sight vector and the point displayed on the monitor.

Matrix H has eight degrees of freedom, which means that it is necessary

knowledge of at least four such pairs For increased accuracy,

± ten points are recorded during the calibration process. You-

the meaning of the H matrix is based on the least squares method.

The calculated difference between the sight vector and the point of view is minimized

on the monitor.

Application calibration is based on determining the relationship between the vector

sight and the book point displayed on the monitor. During

the calibration process, the user follows his movement

point. The greater the number of points displayed, the calibration makes

58

(and)

(B)

(C)

(D)

Figure 6.2: Result of the algorithm that searches for the mean pupil

59

can be calculated in a more accurate and reliable way. one-

however, too many points mean that the calibration process may take place

boring and you will not be able to keep track of points. To experiencing the ±

with the selection of different number of calibration points, they showed that 10 points

listings displayed after 3 seconds each evenly spaced

not on the screen gives satisfactory results and the calibration process is not

constricted when it lasts only 30 seconds. Human eye during the book

makes constant movement around the point being monitored. To reduce

the size of these involuntary eyeball movements, the point at which

concentrates the user, changes size or position. It makes

involuntary eye deviations are much smaller than in the case of

relevant points. The scheme in which he was also tested

the calibration point is constantly moving and moving all over

screen. This approach, however, does not give the best results. considerable

no better effect is obtained when the calibration point is displayed

in certain locations for some time. Thanks to this, you can

than the value of the vision vector for given locations and omit the significant

not standing out from others. Usually during the first second

after changing the position of the point, a large deviation of the vector is recorded

sight from the mean value. This is due to the fact that eyesight follows

with some delay after the point displayed. Data omitted

obtained for the first second improves the calibration results.

In some commercial systems for tracking high vision

accuracy, when calibrating, the point with the changing size is

replaced with small images or subtitles. This is an additional factor

stimulating the eye to concentrate at a specific point. However,

»A different difference between the approach with the point of changing size

the sculptures can only be seen with very high accuracy

sight vector. The solution presented in this work is based on

acquiring the image from the webcam. Such a picture is usually

so noisy that the accuracy obtained does not allow to register

60

Figure 6.3: Calibration window

any improvement in calibration using images or

small subtitles. It was the reason to stick to the simple scheme

displaying a single point of varying size (Fig.

6.3).

61

Chapter 7

Tests of the developed solution

The accuracy of the results obtained is influenced by many factors,

such as the level of exposure, the distance from the camera as an image from a camera

measurement, angle of view of the lens as well as the exact calibration process.

The system bases its operation on the use of an image from a webcam,

which is often very noisy. Occurrence of noise in the image

are caused by the use of high sensitivity matrix in the case of weak

stage lighting. It has a great influence on the size of the obtained eye image

distance from the camera and angle of view of the lens. In the case of

lens with a wide angle of view face resolution

and the user’s eyes are smaller, which worsens the results.

The accuracy of determining commercial vision systems is given

in degrees, so it doesn’t depend on the size of the monitor.

The error expressed in degrees determines the average deviation between the

the viewable value of the vision and the current point of sight book.

SMI offers the most accurate eye tracking system. IN

system specification maximum accuracy is defined as 0.3 stop

her. This accuracy is only obtained if the

the person has a fixed head and there are laboratory conditions of

smoldering. Systems using infrared light are very sensitive »-

daylight, which makes them correct

62

operation artificially lit room is required.

The best accuracy among systems based solely on

the use of a standard camera is characterized by work [14]. Authors

they received accuracy of 3 feet during the tests. Presented at work

it requires a complicated process to initialize the head model for

individual use