Research Article

Strabismus Recognition Using Eye-Tracking Data and

Convolutional Neural Networks

Zenghai Chen,

Hong Fu ,

Wai-Lun Lo,

and Zheru Chi

Department of Computer Science, Chu Hai College of Higher Education, 80 Castle Peak Road, Castle Peak Bay, Tuen Mun,

NT, Hong Kong

Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong

Correspondence should be addressed to Hong Fu; [email protected]

Received 25 October 2017; Revised 7 January 2018; Accepted 5 February 2018; Published 26 April 2018

Academic Editor: Weide Chang

which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Strabismus is one of the most common vision diseases that would cause amblyopia and even permanent vision loss. Timely

diagnosis is crucial for well treating strabismus. In contrast to manual diagnosis, automatic recognition can signiﬁcantly reduce

labor cost and increase diagnosis eﬃciency. In this paper, we propose to recognize strabismus using eye-tracking data and

convolutional neural networks. In particular, an eye tracker is ﬁrst exploited to record a subject’s eye movements. A gaze

deviation (GaDe) image is then proposed to characterize the subject’s eye-tracking data according to the accuracies of gaze

points. The GaDe image is fed to a convolutional neural network (CNN) that has been trained on a large image database called

ImageNet. The outputs of the full connection layers of the CNN are used as the GaDe image’s features for strabismus

recognition. A dataset containing eye-tracking data of both strabismic subjects and normal subjects is established for

experiments. Experimental results demonstrate that the natural image features can be well transferred to represent eye-tracking

data, and strabismus can be e

ﬀectively recognized by our proposed method.

1. Introduction

Strabismus is a common ophthalmic disease that can lead to

weak 3D perception, amblyopia (termed lazy eye as well), or

even blindness if it is not timely diagnosed and well treated

[1, 2]. More importantly, it has been shown that strabismus

would cause serious psychosocial consequences in both

children and adults [3–12]. These adverse consequences

include education [5], employment [6], and dating [8]. Many

young strabismic patients could be well treated if diagnosis

and treatment were taken at their early ages. A preschool

child’s strabismus has a much larger chance to be cured than

that of an adult. Timely diagnosis is thus essential. Tradi-

tional strabismus diagnosis methods, for example, cover

test, Hirschberg test, and Maddox rod, are manually con-

ducted by professional ophthalmologists. This would make

the diagnosis expensive and drive people out of professional

examinations consequently. Furthermore, ophthalmologists

make decisions according to their experiences, and thus the

diagnosis results are subjective. In view of that, we propose

automatic recognition of strabismus in this paper. Automatic

recognition of strabismus, which can be termed strabismus

recognition as well, would perform strabismus diagnosis

without ophthalmologists. As a result, the diagnosis results

would be objective, and the diagnosis cost can be signiﬁcantly

reduced. We realize strabismus recognition by exploiting

eye-tracking data, which are acquired using an eye tracker.

The proposed eye-tracking-based strabismus recognition

method allow s us to build an objective, noninvasive, and

automatic diagnosis system that could be used to carry out

strabismus examination in large communities. For instance,

we can place the system in a primary school such that the

students can take their examinations at any time.

An eye-tracking technique has been successfully applied

to solve various problems, for example, object recognition

[13], content-based image retrieval [14], attention modeling

[15], and image quality assessment [16]. But very little

research on the eye-tracking technique for strabismus diag-

nosis has been reported. People have also proposed to lever-

age eye-tracking methodology for strabismus examination

Hindawi

Journal of Healthcare Engineering

Volume 2018, Article ID 7692198, 9 pages

https://doi.org/10.1155/2018/7692198

[17–20]. Pulido [17] employed the Tobii eye tracker to

acquire gaze data to conduct ophthalmic examination

including strabismus by calculating the deviation of gaze

data. But Pulido proposed a method prototype only in [17].

The author had no real strabismic gaze data to demonstrate

the prototype’s performance. Model and Eizenman [18] pro-

posed an eye-tracking-based approach for performing the

Hirschberg test, a classical method to measure binocular ocu-

lar misalignment. But the performance of their approach was

studied with ﬁve healthy infants only. The method’seﬀective-

ness for strabismus examination had not been tested. Bakker

et al. [19] developed a gaze direction measurement instru-

ment to estimate the strabismus angle. The instrument allows

for unrestrained head movements. But only three subjects

participated in the experiment. The number of subjects is rel-

atively too small. Furthermore, there is no ground truth

available for strabismic subjects. It is hence impossible to

comprehensively evaluate the instrument’s performance. In

our previous work [ 20], we developed a system based on

the eye-tracking technique to acquire gaze data for strabis-

mus diagnosis. The diagnosis is performed by intuitively

analyzing gaze deviations. But the system’seﬀectiveness is

veriﬁed by a strabismic subject and a normal subject only.

In this paper, we develo ped a more eﬀective eye-tracking

system than that of [20] to acquire gaze data for strabismus

classiﬁcation. Instead of examining strabismus by directly

analyzing gaze deviations in previous methods, we explore

a machine learning method to realize strabismus classiﬁca-

tion. One big disadvantage of previous methods is that their

accuracy is dramatically aﬀected by every single gaze point.

A noisy gaze point would cause an inaccurate examination

result. By contrast, a learning method can eliminate the

eﬀect of a small number of noisy gaze points by using a large

amount of data, so as to generate a more accurate result. Par-

ticularly, we leverage convolutional neural networks (CNNs),

a powerful deep learning algorithm, to extract features from

gaze data for strabismus recognition.

With the rapid developments of deep learning in recent

years, the CNN has achieved numerous successes in com-

puter vision and pattern recognition, for example, image

classiﬁcation [21], scene labeling [22], action recognition

[23], and speech recognition [24]. With a hierarchical struc-

ture of multiple convolution-pooling layers, CNNs can

encode abstract features from raw multimedia data. Espe-

cially for learning image features, CNNs have shown impres-

sive performances. In our work, CNNs are exploited to

generate useful features to characterize eye-tracking data for

strabismus recognition. Concretely, a subject is asked to suc-

cessively ﬁxate on nine points. Meanwhile, the subject’s eye

movements are captured by an eye tracker. The eye-

tracking data are then represented by a gaze deviation

(GaDe) image which is produced according to the ﬁ xation

accuracies of the subject’s gaze points. After that, the GaDe

image is fed to a CNN that has been trained on a large image

database called ImageNet [25]. The output vectors of the full

connection (FC) layers of a CNN are used as features for

representing the GaDe image. Finally, the features are input

to a support vector machine (SVM) for strabismus classiﬁca-

tion. It is expected that the image features of ImageNet learnt

by CNNs would be well transferred to represent eye-tracking

data for strabismus recognition. We build a gaze dataset

using our eye-tracking system to demonstrate the proposed

method’s performance. The dataset is much larger than pre-

viously published strabismus datasets.

The rest of this paper is organized as follows. Section 2

describes the methods exploited for strabismus recognition.

Section 3 introduces the dataset for experiments and reports

experimental results. Section 4 concludes this paper with

ﬁnal remarks. Before ending this introductory section, it is

worth mentioning the contributions of this paper as follows:

(i) We develop an eﬀective eye-tracking system to

acquire gaze data for strabismus recognition.

(ii) We propose a gaze deviation image to characterize

eye-tracking data.

(iii) We exploit convolutional neural networks to gener-

ate features for gaze deviation image representation.

(iv) We demonstrate that natural image features learnt

by convolutional neural networks can be well trans-

ferred to represent eye-tracking data, and strabismus

can be eﬀectively recognized by our method.

2. Methodology

2.1. The Proposed Strabismus Recognition Framework.

Figure 1 shows our proposed framework for strabismus rec-

ognition. The recognition procedure is conducted as follows.

First of all, the subject is asked to look at nine points respec-

tively shown at diﬀerent positions on a screen. Meanwhile, an

eye tracker mounted below the screen detects the subject ’s

eye movements and records his or her gaze points. The gaze

data recorded by an eye tracker are then exploited to generate

three gaze deviation (GaDe) maps, based on the ﬁxation

accuracies of left-eye gaze points, right-eye gaze points, and

center points of two eyes, respectively. The three maps are

combined to form a GaDe image with three maps denoting

R, G, and B channels of the image. After that, the GaDe image

is fed to a CNN which has been trained on ImageNet, so as to

produce a feature vector for representing the GaDe image.

Finally, the feature vector is fed to a SVM for classiﬁcation,

and the subje ct will be classiﬁed as strabismic or normal.

It is worth to present the motivations for the use of a

CNN and GaDe image in our method before digging into

the implementation details. We use the CNN to tackle our

problems for two reasons. Firstly, eye-tracking gaze data are

diﬃcult to characterize. Up to now, there is still no standard

feature for eye-tracking data representation. People have pro-

posed some features such as ﬁxation time and saccade path.

But these features are designed for speciﬁc tasks. They are

not suited for our strabismus recognition problem. Secondly,

the CNN is powerful for learning discriminative features

from raw images. It has shown state-of-the-art performance

for various pattern recognition and image classiﬁ cation prob-

lems. We thus expect that the CNN can extract eﬀective fea-

tures for eye-tracking data representation. Since the CNN is

good at extracting image features, we need to convert the

2 Journal of Healthcare Engineering

raw gaze data to images before feature extraction by the

CNN. That is why we propose GaDe images to represent

the gaze data. The principle for us to design GaDe images is

that the images should be able to well describe the diﬀerence

between normal data and strabismic data. The details of eye-

tracking data acquisition, GaDe image generation, and CNN

models used will be presented in the following subsections.

2.2. Eye-Tracking Data Acquisition. We use the eye tracker

Tobii X2-60 (shown in Figure 1) to acquire eye-tracking gaze

data. Tobii X2-60 has a sampling rate of 60 Hz and tracking

accuracy of 0.4 degree. Both of the sampling rate and tracking

accuracy are high enough to precisely capture strabismic gaze

data in our experiments. The eye tracker is adhered below

the monitor of a laptop to build our eye-tracking system.

The laptop is Lenovo ThinkPad T540p with a 1920 × 1080

screen resolution. The main reason for us to choose a laptop

rather than a desktop for building the system is that it is

convenient to carry the system in diﬀerent environments

for data acquisition. In order to position gaze points on

the screen digitally, we need to deﬁne a coordinate system

for the screen. The upper-left corner of the screen is set as

the origin, the position value of which is (0,0), with hori-

zontal line denoting x-coordinate and vertical line denot-

ing y-coordinate. The values of the lower-right corner,

upper-right corner, and lower-left corner are (1,1), (1,0),

and (0,1), respectively. In other words, both x and y lie in

interval (0,1) on the screen. We exploit Tobii Matlab SDK

to develop our data acquisition interface.

Calibration needs to be performed before using the eye

tracker to acquire gaze data. The purpose of calibration is

to teach the eye tracking system the characteristics of the

subject, such that the eye tracker can precisely detect the

subject’s eye movements. During the calibration, the subject

is asked to ﬁxate on a number of points displayed on the

screen. In terms of the number of points used, we can have

diﬀerent calibration schemes, for example, one-point, three-

point, or nine-point. We adopt a nine-point calibration

scheme, as it can provide a high-tracking accuracy. The posi-

tions of the nine points on the screen are (0.1,0.1), (0.5,0.1),

(0.9,0.1), (0.1,0.5), (0.5,0.5), (0.9,0.5), (0.1,0.9), (0.5,0.9), and

(0.9,0.9). The result would be shown after each calibration.

We can start real tracking tests if the calibration accuracy is

acceptable. Otherwise, we should recalibrate.

A traditional method for ophthalmologists to examine

strabismus is a nine-point method. The nine-point method

is to ask the patient to ﬁxate on nine target points at a certain

distance in front sequentially. Meanwhile, the ophthalmolo-

gist observes the patient’ s eye movements. This method is

able to comprehensively examine the patient’s eye move-

ments with rotations at diﬀerent angles. Therefore, we adopt

the same method to develop a gaze data acquisition interface.

The nine points’ positions are the same to the nine calibra-

tion points. Figure 2 shows the nine-point interface. We use

a black background, as it helps the subject to concentrate

on the target points. A point is comprised by a red inner cir-

cle and a white outer circle. The radiuses of the inner circle

and outer circle are 15 and 30 pixels, respectively. The points

are displayed one by one orderly. The white arrows point out

the display order. In a real test, the subject’s position is

adjusted to make sure that the subje ct is at a ﬁxed distance

(50 cm in our test) from the screen, and the subject’s eye level

and the screen center are in the same horizontal line. A dis-

tance of 50 cm is an optimal distance for the eye tracker Tobii

X2-60 to track the subject’s eye movements.

Figure 3 shows the procedure of gaze data acquisition.

Each time one target point is displayed, the eye tracker

records the subject’s gaze points of both eyes simultaneously.

The next target point would be displayed, if the number of

eﬀective gaze pairs acquired exceeds 100, where a gaze pair

is deﬁned as the two gaze points of two eyes captured by an

eye tracker at one sampling moment and “eﬀective” indicates

Result

Strabismic

normal

GaDe image

GaDe maps

Gaze data

SVM classiﬁer

Feature vector

Convolutional neural network

Eye tracker

Eye-tracking system

Screen

Eye tracker

…

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 1: The proposed strabismus recognition framework.

3Journal of Healthcare Engineering

that at least one gaze point of a gaze pair is located close

enough to the target point. That is, the distance between

the target point and either gaze point of a gaze pair must be

smaller than a threshold (0.05 in this paper) predeﬁned

empirically. The distance is deﬁned as Euclidean distance

shown in Section 2.3. It is worth mentioning that for some

serious strabismic subjects, it would be sometimes diﬃcult

to capture eﬀective gaze points at some target points, in

particular the points located at the four corners of the

screen, because the strabismic subjects need to rotate the eye-

balls to their extreme. In view of that, we let each target point

display for at most 10 seconds for gaze data acquisition. The

next target point would be displayed after 10 seconds no

matter whether or not the system has collected 100 pairs of

eﬀective gaze points. Since the sampling rate of our eye

tracker is 60 Hz, it would take only two seconds for collecting

100 gaze pairs from normal subjects. Hence, 10 seconds are

long enough to capture gaze data for each point.

2.3. Gaze De viation Image. The next step after gaze data

acquisition is to generate a GaDe image to characterize the

gaze data. To realize that, we need to ﬁrst calculate three

GaDe maps, which will serve as R, G, and B channels of the

GaDe image, based on the ﬁxation accuracies of two eyes’

gaze points. Let g

denote the ith gaze pair for the jth target

point and p

= x

, y

and p

= x

, y

denote the values of

the left-eye gaze point and right-eye gaze point of gaze pair

, where 1 ≤ j ≤ 9 and 1 ≤ i ≤ 100, and superscripts l and r

indicate left and right. Let p

= x

, y

denote the jth target

point’s value. Then, we can have the ﬁxation accuracies in

terms of Euclidean distance for left-eye gaze point p

= x

− x

+ y

− y

, 1

and for right-eye gaze point p

= x

− x

+ y

− y

We deﬁned the center of the gaze pair p

and p

= x

, y

, which can be simply formulated as follows:

+ x

+ y

(0.1, 0.1)

(0.1, 0.5)

(0.1, 0.9)

(0.5, 0.1)

(0.5, 0.5)

(0.5, 0.9)

(0.9, 0.1)

(0.9, 0.5)

(0.9, 0.9)

Figure 2: The nine-point gaze data acquisition interface.

Start

Adjust subject’s

position

Calibration

Acceptable?

Display a point

Acquire gaze data

Complete 100

gaze pairs?

Complete 10

seconds?

Complete 9

points?

Ye s

Stop

Figure 3: The gaze data acquisition procedure.

4 Journal of Healthcare Engineering

Then, similar to (1) and (2), the ﬁxation accuracy of the

center p

is calculated as

= x

− x

+ y

− y

For one subject, we calculate the ﬁxation accuracies

, d

, and d

for all of his or her gaze pairs. Based on the

three types of ﬁxation accuracies, three GaDe maps M

, M

and M

can be computed, respectively. The map size is

equivalent to the input size of a CNN. In this paper, two

map sizes (224

∗

224 and 227

∗

227) are adopted. The ele-

ment values of the three maps are derived from three ﬁxation

accuracies d

, d

, and d

of all the gaze pairs. One gaze pair

represents one element in the three maps. In particular, the

values of the gaze pair g

in the three GaDe maps M

, M

and M

are respectively calculated as

= round

max

i,j

∗ 255,

= round

max

i,j

∗ 255,

= round

max

i,j

∗ 255,

where max ⋅ ﬁnds out the maximum value over all gaze

pairs and round

⋅ rounds a value to its nearest integer.

Equation (5) guarantees that the element values of the three

GaDe maps are integers lying in interval (0,255), which is

also the value interval of a real digital image. The positions

of v

, v

, and v

in maps M

, M

, and M

are speciﬁed

, y

, x

, y

, and x

, y

, respectively. The elements

that are not associated with any gaze points in the three maps

are assigned a value of 0. In the GaDe maps, big ﬁxation devi-

ations (far from the target point) possess big values. In other

words, inaccurate gaze points play more important roles in a

GaDe map. This makes sense, as the prominent diﬀerence

between strabismic people and normal people is that strabis-

mic people’s one eye or even two eyes cannot well ﬁxate on

target objects. The three GaDe maps can thus eﬀectively

characterize the properties of strabismic gaze data. Generally,

normal people’s GaDe maps would have only a few bright

(large intensity) points far from nine target points, and most

relatively dark points are around the target points, while stra-

bismic people’s G aDe images usually have a large number of

bright points located far from the target points. We combine

three GaDe maps to form a GaDe image, with each map

representing a color channel of the GaDe image. The GaDe

image is then fed to a CNN for feature extraction.

2.4. Convolutional Neural Networks. A CNN is a hierarchi-

cal architecture that consists of a number of convolution

layers and pooling layers. CNNs usually receive raw data,

for example, image’s pixels, as input and extract increasingly

abstract features through hierarchical convolution-pooling

layers. Take color image feature extraction as an example.

An image’s three-color channels are fed to the ﬁrst convo-

lution layer of the CNN. The convolution results, called

convolution feature maps as well, are then downsampled in

the pooling (e.g., max-pooling) layer following the ﬁrst

convolution layer, to generate pooling-feature maps. The

pooling-feature maps are further passed to the next convolu-

tion layer and then to the pooling layer for processing. After a

number of convolution and pooling operations, the feature

maps are connected to an output layer through one or more

FC layers. The FC layers can be used for classiﬁcation like a

multilayer perceptron, with the output vector representing

diﬀerent classes, or we can employ the outputs of FC layers

as a feature vector of the input image and then use a classiﬁer,

for example, SVM, to perform classiﬁcation on the feature

vector. The hierarchical convolution and pooling operations

make the features extracted by a CNN insensitive to transfor-

mation, rotation, and scaling.

We adopt six diﬀerent CNN models that have been

trained on ImageNet to generate features for representing

eye-tracking gaze data. We use pretrained CNN models as

feature extractors and do not train them using eye-tracking

data in our work. There are two main reasons for us to do

this. Firstly, we do not have enough eye-tracking data to well

train a complicated CNN model. A CNN model may have

thousands or even hundreds of thousands of weights that

need to be trained. A large dataset is hence necessary to eﬀec-

tively tune so many weights. For instance, the CNN models

we adopted have been trained on ImageNet, an image data-

base that contains more than one million training images

associated with 1000 classes. For strabismus classiﬁcation

problem, it is diﬃcult to build a large dataset, since not so

many strabismic people can be found to participate in the

experiments. Actually, only 17 strabismic people participate

in our experiments. It is, therefore, impractical to train a

CNN model using such a few strabismic gaze data. However,

it would be a good idea to employ a pretrained CNN as a fea-

ture extractor to generate features for gaze data representa-

tion. This will be demonstrated in Section 3. Secondly, the

weights of CNN models are tuned using natural images

rather than eye-tracking gaze data. We would like to investi-

gate whether or not the information extracted from the nat-

ural image domain is applicable to the eye-tracking data

domain. It would be meaningful if the features of natural

images can be well transferred to represent eye-tracking data,

since we would be able to make use of large quantities of nat-

ural images in the internet to help generate features, rather

than to manually design complicated algorithms to extract

features, for eye-tracking data representation. The six CNN

models we adopted are named AlexNet [21], VGG-F,

VGG-M, VGG-S [26], VGG-16, and VGG-19 [27]. All of

them have three FC layers but diﬀerent numbers of convolu-

tion layers. Their diﬀerences also lie in input size, number of

convolution ﬁlters in each layer, max-pooling size, and so on.

People can refer to [21, 26, 27] for the architecture details of

the six models. The six models have the same three FC layers.

The ﬁrst two FC layers use ReLU [28] transfer function, and

the ﬁnal FC layer adopts Softmax transfer function. For each

FC layer, we employ the input vector and output vector of the

5Journal of Healthcare Engineering

transfer function as feature vectors of GaDe images. Then, we

can extract in total six feature vec tors from three FC layers.

The six feature vectors are denoted by l

, l

, and l

and their dimension sizes are 4096, 4096, 4096, 4096, 1000,

and 1000, respectively. We will compare the performances

of six feature vectors for six CNN models in Section 3. Note

that the input size of AlexNet is 227 × 227, while the input

sizes of the other ﬁve models are all 224 × 224. Therefore,

the GaDe images need to be resized to 227 × 227 for AlexNet

and 224 × 224 for the other ﬁve models.

2.5. Baseline Method. In order to demonstrate the eﬀec-

tiveness of CNNs for extracting features from gaze data

for strabismus recognition, we propose a baseline method

for a comparison. The baseline method models the normal

gaze points of each target point as a multivariate Gaussian

distribution. The parameters (mean vector and covariance

matrix) of each Gaussian distribution are calculated using

the normal training data. To construct the Gaussian distribu-

tion, we represent a gaze pair g

by the x-coordinate diﬀer-

ences and y-coordinate diﬀerences between the target point

and the pair’s two gaze points p

and p

as follows:

= x

− x

, y

− y

, x

− x

, y

− y

Then, we can have the Gaussian probability density func-

tion for the gaze pair g

; μ

, Σ

2π

exp −

− μ

−1

− μ

where μ

and Σ

are the mean vector and covariance matrix of

the Gaussian distribution for the jth target point, respec-

tively. μ

and Σ

are calculated using the normal training gaze

pairs that belong to the jth target point.

computes the

determinant of Σ

The baseline method performs classiﬁcation as follows.

Given the gaze pair g

, if its density value in (7) is larger than

the threshold α

, the gaze pair is classiﬁed as normal. If the

proportion of normal gaze pairs is larger than the threshold

, then the target point is classiﬁed as normal for the subject.

Otherwise, the target point is classiﬁed as strabismic for the

subject. If one of the right target points is classiﬁed as strabis-

mic, the subject will be ﬁnally classiﬁed as strabismic. In

other words, a normal subject should possess normal ﬁxa-

tions on all nine diﬀerent directions. Once the ﬁxation on

one direction is abnormal, the subject will be diagnosed as

strabismic. This is reasonable, since some types of strabis-

mus such as incomitant strabismus may ﬁxate poorly on a

speciﬁc direction only. In medical examination, a subject

may be also diagnosed to have strabismus once the ophthal-

mologist observes that the subject’s two eyes do not align at a

speciﬁc direction. Thresholds α

and β

are learnt using grid

search, such that the classiﬁcation accuracy on the training

data is maximized.

3. Experiments

3.1. Eye-Tracking Gaze Data. We cooperated with Hong

Kong Association of Squint and Double Vision Suﬀerers

to collect strabismic data. In total, 17 members of the asso-

ciation suﬀering f rom strabismus consented to participate

in our experiments. In addition to the 17 strabismic sub-

jects, we invited 25 normal subjects to join in our study.

All subjects are adults, with age ranging from 25 to 63,

including both male and female. They have been diagnosed

by a p rofessional ophthalmologist, and the diagnosis results

are used as ground truth in this paper. After ethics approval

and informed consent, the 42 subjects followed the data

acquisition procedure introduced in Section 2.2 to partici-

pate in our experiments, and ﬁnally, we collected 42 eye-

tracking samples. The 17 strabismic subjects suﬀer from dif-

ferent types of strabismus (e.g., recessive , intermittent, and

manifest) in various severities (e.g., mild, moderate, and

severe). Recessive strabismus is only present when binocu-

lar vision has been interrupted, such as covering one eye.

This type of patients can still maintain fusion. The patients

are usually aware of having recessive strabismus after taking

examination by an ophthalmologist. Manifest strabismus

can be observed while a patient looks at an object binocu-

larly. Intermittent strabismus is a combination of recessive

strabismus and manifest strabismus. People su ﬀ ering from

recessive strabismus, intermittent strabismus, and mild

manifest strabismus are sometimes diﬃ cult to be distin-

guished from normal people apparently, as their ﬁxation

deviations are small, especially for recessive strabismus

and intermittent strabismus.

Figure 4 shows some examples of gaze data and GaDe

images. The ﬁrst row displays all gaze points for nine target

points in a map, with red ∗ denoting left gaze points and

blue × denoting right gaze points. The second row shows

the corresponding GaDe images for the gaze data of the ﬁrst

row. For a better visualization, the GaDe images have been

brightened by adding a number (50) to the gaze points’

values in the images. The ﬁrst two columns are two normal

samples, one with good ﬁxation (small deviation) and one

with relatively poor ﬁxation (large deviation). The other three

columns from left to right represent strabismic samples of

recessive strabismus, intermittent strabismus, and manifest

strabismus, respectively. Note that the colors in the ﬁrst

row represent the left and right gaze points, and the colors

in the second row represent the R, G, and B channels of GaDe

images. The two main observations can be drawn from

Figure 4. Firstly, gaze points with large deviations shown in

the ﬁrst row are highlighted in the corresponding GaDe

images, and those with small deviations are inhibited. There-

fore, inaccurate gaze points would contribute more in recog-

nizing strabismus using GaDe images. Secondly, the data

distributions of normal sample with small deviation and

manifest sample are distinctive. They are easy to distinguish.

By contrast, the data distributions of normal sample with

large deviation and recessive or intermittent sample look

similar. It is diﬃcult to distinguish them intuitively. That is

why we exploit CNNs to solve the problem. We expect that

CNNs as a powerful abstrac t feature extractor can extract

6 Journal of Healthcare Engineering

distinctive features from diﬀerent samples, so as to eﬀec-

tively classify normal data and strabismic data. It is worth

mentioning that we focus on binary strabismus classiﬁcation

rather than recognizing diﬀerent types of strabismus in this

paper. The major reason is that we do not have enough data

for each strabismus type at this moment. However, we con-

sider that CNNs could extract useful features from GaDe

images of diﬀerent strabismus types in case that suﬃcient

data are provided, and then the proposed method would

be well applied to recognizing strabismus types. We leave

this task for future work when we acquire suﬃcient data

for diﬀerent strabismus types.

3.2. Experimental Results. We have in total 42 samples, with

25 normal samples and 17 strabismic samples. A leave-one-

out evaluation scheme is adopted. That is, each time one

sample is used for testing, and the rest 41 samples are for

training. We can thus have 42 diﬀerent results. The 42 results

are averaged to get the ﬁnal performance. LIBSVM [29] is

employed to implement SVM classiﬁcation. A linear kernel

of a SVM is used for both the CNN method and baseline

method, and the SVM classiﬁers are trained using a two-

fold cross validation scheme. Table 1 tabulates the classiﬁca-

tion accuracies of six CNN models (by row) using diﬀerent

feature vectors (by column) extracted from three FC layers.

The ﬁnal column represents the concatenation of all the six

feature vectors as a one feature vector. The accuracy of the

baseline method is 69.1. As can be seen from Table 1, the fea-

tures extracted from VGG-S overall perform the best. The

highest accuracy (95.2%) is achieved when the feature vector

of VGG-S is used. Feature vectors l

, l

, and l

outper-

form feature vectors l

and l

for most of the cases. One pos-

sible reason is that l

, l

, and l

extracted from the ﬁrst two

FC layers contain richer features than l

and l

. Note that

the concatenation of all feature vectors sometimes obtains

lower accuracy than that of some individual feature vec-

tors, as shown in the ﬁnal column. The most important

ﬁnding from Table 1 is that the features extracted from six

CNN models perform much better than the baseline method,

except for some cases when the feature vector l

is used. This

indicates that the CNN features can eﬀectively characterize

the GaDe images derived from eye-tracking data. CNN fea-

tures can be a promising representation of eye-tracking data.

Speciﬁcity and sensitivity are two important metrics

to measure the performance of a medical classiﬁcation

method. A good method should have high values for both

speciﬁcity and sensitivity. For our strabismus recognition

problem, speciﬁcity is deﬁned as the percentage of normal

subjects who are correctly classiﬁed as normal and sensitivity

is deﬁned as the percentage of strabismic subjects who are

correctly classiﬁed as strabismic. In order to study the spec-

iﬁcity and sensitivity of our method, for each CNN model,

we select the result with the highest classiﬁcation accuracy.

According to Table 1, the highest accuracies for AlexNet,

VGG-F, VGG-M, VGG-S, VGG-16, and VGG-19 are 78.6

), 81.0 (column “All”), 88.1 (l

), 95.2 (l

), 83.3 (l

), and

83.3 (column “All”), respectively. We show the speciﬁcity

and sensitivity of the six CNN results as well as the baseline

method in Figure 5. Evidently, VGG-S possesses the best

speciﬁcity and sensitivity. Only one normal subject and

one str abismic subject are misclassiﬁed by VGG-S. The

Gaze data

GaDe image

Normal with

small deviation

Normal with

large deviation

Recessive Intermittent Manifest

Figure 4: Examples of gaze data and corresponding GaDe images, where the ﬁrst two columns represent normal data with small deviation

and large deviation and the third, fourth, and ﬁfth columns represent data of recessive strabismus, intermittent strabismus, and manifest

strabismus, respectively.

Table 1: Accuracies (%) of di ﬀerent CNN models. The accuracy of

the baseline method is 69.1.

Feature l

All

AlexNet 78.6 78.6 76.2 76.2 73.8 76.7 76.2

VGG-F 76.2 76.2 76.2 76.2 78.6 65.1 81.0

VGG-M 88.1 85.7 85.7 85.7 78.6 57.1 78.6

VGG-S 85.7 81.0 78.6 95.2 76.2 79.1 83.3

VGG-16 83.3 81.0 76.2 81.0 76.2 67.4 83.3

VGG-19 81.0 78.6 81.0 81.0 71.4 62.8 83.3

7Journal of Healthcare Engineering

baseline method has a high speciﬁcity (84%) but a very

low sensitivity (47.1%). This means that the baseline

method is insensitive to strabismic data. It tends to classify

the data as normal. By contrast, the diﬀerence between spec-

iﬁcity and sensitivity of CNN features is relatively small,

especially for VGG-S. This substantiates two things. Firstly,

the proposed GaDe images are able to eﬀectively characterize

both normal gaze data and strabismic gaze data. The two

types of eye-tracking data can be well separated by GaDe

images. Secondly, the natural image features learnt by CNNs

can be well transferred to represent GaDe images.

Overall, the experimental results have demonstrated

that the proposed method is a promising alternative for

strabismus recognition. In the future, the accuracy can be

improved in two major ways. One way is to employ more

advanced pretrained CNN models for better feature extrac-

tion. The other way is to collect more gaze data, especially

data of diﬀerent strabismus types. With suﬃcient data, we

would then be able to ﬁne-tune CNN models, as a result

of which CNN models could learn more discriminative fea-

tures to boost the classiﬁcation accuracy.

4. Conclusion

In this paper, we ﬁrst design an eye-tracking system to

acquire gaze data from both normal and strabismic people

and then propose a GaDe image based on gaze points’ ﬁxa-

tion deviation to characterize eye-tracking data. Finally, we

exploit CNNs that have been trained on a large real image

database to extract features from GaDe images for strabismus

recognition. Experimental results show that GaDe images are

eﬀective for characterizing strabismic gaze data, and CNNs

can be a powerful alternative in feature extraction of eye-

tracking data. The eﬀectiveness of our proposed method for

strabismus recognition has been demonstrated.

Conflicts of Interest

The authors dec lare that they have no conﬂicts of interest.

Acknowledgments

This work was supported by a grant from the Hong Kong

RGC (Project reference no. UGC/FDS13/E04/14).

References

[1] M. S. Castanes, “Major review: the underutilization of vision

screening (for amblyopia, optical anomalies and strabismus)

among preschool age children,” Binocular Vision & Strabis-

mus Quarterly, vol. 18, no. 4, pp. 217–232, 2003.

[2] S. M. Mojon-Azzi, A. Kunz, and D. S. Mojon, “The perception

of strabismus by children and adults,” Graefe's Archive for

Clinical and Experimental Ophthalmology, vol. 249, no. 5,

pp. 753–757, 2011.

[3] J. M. Durnian, C. P. Noonan, and I. B. Marsh, “The psychoso-

cial eﬀects of adult strabismus: a review,” The British Journal of

Ophthalmology, vol. 95, no. 4, pp. 450–453, 2011.

[4] B. G. Mohney, J. A. McKenzie, J. A. Capo, K. J. Nusz,

D. Mrazek, and N. N. Diehl, “Mental illness in young adults

who had strabismus as children,” Pediatrics, vol. 122, no. 5,

pp. 1033–1038, 2008.

[5] O. Uretmen, S. Egrilmez, S. Kose, K. Pamukcu, C. Akkin, and

M. Palamar, “Negative social bias against children with strabis-

mus,” Acta Ophthalmologica Scandinavica, vol. 81, no. 2,

pp. 138–142, 2003.

[6] S. M. Mojon-Azzi and D. S. Mojon, “Strabismus and employ-

ment: the opinion of headhunters,” Acta Ophthalmologica,

vol. 87, no. 7, pp. 784–788, 2009.

[7] M. J. Goﬀ, A. W. Suhr, J. A. Ward, J. K. Croley, and M. A.

O’Hara, “Eﬀect of adult strabismus on ratings of oﬃcial U.S.

Army photographs,” Journal of AAPOS, vol. 10, no. 5,

pp. 400–403, 2006.

[8] S. M. Mojon-Azzi, W. Potnik, and D. S. Mojon, “Opinions of

dating agents about strabismic subjects’ ability to ﬁnd a part-

ner,” British Journal of Ophthalmology, vol. 92, no. 6,

pp. 765–769, 2008.

[9] S. M. Mojon-Azzi and D. S. Mojon, “

Opinion of headhunters

about

the ability of strabismic subjects to obtain employment,”

Ophthalmologica, vol. 221, no. 6, pp. 430–433, 2007.

[10] S. E. Olitsky, S. Sudesh, A. Graziano, J. Hamblen, S. E. Brooks,

and S. H. Shaha, “The negative psychosocial impact of strabis-

mus in adults,” Journal of AAPOS, vol. 3, no. 4, pp. 209–211,

1999.

[11] E. A. Paysse, E. A. Steele, K. M. McCreery, K. R. Wilhelmus,

and D. K. Coats, “Age of the emergence of negative attitudes

toward strabismus,” Journal of AAPOS, vol. 5, no. 6, pp. 361–

366, 2001.

[12] D. K. Coats, E. A. Paysse, A. J. Towler, and R. L. Dipboye,

“Impact of large angle horizontal strabismus on ability to

obtain employment,” Ophthalmology, vol. 107, no. 2,

pp. 402–405, 2000.

[13] T. Toyama, T. Kieninger, F. Shafait, and A. Dengel, “Gaze

guided object recognition using a head-mounted eye tracker,”

in ETRA '12 Proceedings of the Symposium on Eye Tracking

Research and Applications, pp. 91–98, Santa Barbara, CA,

USA, March 2012.

[14] Z. Liang, H. Fu, Y. Zhang, Z. Chi, and D. Feng, “Content-based

image retrieval using a combination of visual features and eye

tracking data,” in ETRA '10 Proceedings of the 2010 Symposium

on Eye-Tracking Research & Applications, pp. 41–44, Austin,

TX, USA, March 2010.

[15] Z. Liang, H. Fu, Z. Chi, and D. Feng, “Reﬁning a region based

attention model using eye

tracking data,” in 2010 IEEE Inter-

national Conference on Image Processing, pp. 1105–1108,

Hong Kong, September 2010.

100

Speciﬁcity

Sensitivity

Baseline AlexNet VGG-F VGG-M VGG-S VGG-16 VGG-19

Method

76.5

47.1

64.7

82.4

94.1

76.5

Figure 5: Speciﬁcity and sensitivity of diﬀerent methods.

8 Journal of Healthcare Engineering

[16] H. Liu and I. Heynderickx, “Visual attention in objective image

quality assessment: based on eye-tracking data,” IEEE Trans-

actions on Circuits and Systems for Video Technology, vol. 21,

no. 7, pp. 971–982, 2011.

[17] R. A. Pulido, Ophthalmic Diagnostics Using Eye Tracking

Technology, [M.S. thesis], KTH Royal Institute of Technology,

Sweden, 2012.

[18] D. Model and M. Eizenman, “An automated Hirschberg test

for infants,” IEEE Transactions on Biomedical Engineering,

vol. 58, no. 1, pp. 103–109, 2011.

[19] N. M. Bakker, B. A. J. Lenseigne, S. Schutte et al., “Accurate

gaze direction measurements with free head movement for

strabismus angle estimation,” IEEE Transactions on Biomedi-

cal Engineering, vol. 60, no. 11, pp. 3028–3035, 2013.

[20] Z. Chen, H. Fu, W. L. Lo, and Z. Chi, “Eye-tracking aided

digital system for strabismus diagnosis,” in 2015 IEEE Interna-

tional Conference on Systems, Man, and Cybernetics, pp. 2305–

2309, Kowloon, October 2015.

[21] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet

classiﬁcation with deep convolutional neural networks,” in

NIPS’12 Proceedings of the 25th International Conference on

Neural Information Processing Systems, vol. 1, pp. 1097–

1105, Lake Tahoe, NV, USA, December 2012.

[22] C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning

hierarchical features for scene labeling,” IEEE Transactions

on Pattern Analysis and Machine Intelligence, vol. 35, no. 8,

pp. 1915–1929, 2013.

[23] S. Ji, W. Xu, M. Yang, and K. Yu, “3D convolutional neural

networks for human action recognition,” IEEE Transactions

on Pattern Analysis and Machine Intelligence, vol. 35, no. 1,

pp. 221–231, 2013.

[24] O. Abdel-Hamid, A. R. Mohamed, H. Jiang, L. Deng, G. Penn,

and D. Yu, “Convolutional neural networks for speech recog-

nition,” IEEE/ACM Transactions on Audio, Speech, and Lan-

guage Processing, vol. 22, no. 10, pp. 1533–1545, 2014.

[25] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei,

“ImageNet: a large-scale hierarchical image database,” in

2009 IEEE Conference on Computer Vision and Pattern Recog-

nition,

pp. 248–255, Miami, FL, USA, June 2009.

[26] K. Chatﬁeld, K. Simonyan, A. Vedaldi, and A. Zisserman,

“Return of the devil in the details: delving deep into convolu-

tional nets,” in Proceedings British Machine Vision Conference

2014, UK, September 2014.

[27] K. Simonyan and A. Zisserman, “Very deep convolutional net-

works for large-scale image recognition,” 2015, In ICLR.

[28] V. Nair and G. E. Hinton, “Rectiﬁed linear units improve

restricted Boltzmann machines,” in Proceedings of the 27th

International Conference on Machine Learning, pp. 807–814,

Haifa, Israel, June 2010.

[29] C. C. Chang and C. J. Lin, “LIBSVM: a library for support vec-

tor machines,” ACM Transactions on Intelligent Systems and

Technology, vol. 2, no. 3, pp. 1–27, 2011.

9Journal of Healthcare Engineering

International Journal of

Aerospace

Engineering

Hindawi

www.hindawi.com

Volume 2018

Robotics

Journal of

Hindawi

www.hindawi.com Volume 2018

Hindawi

www.hindawi.com Volume 2018

Active and Passive

Electronic Components

VLSI Design

Hindawi

www.hindawi.com Volume 2018

Hindawi

www.hindawi.com Volume 2018

Shock and Vibration

Hindawi

www.hindawi.com Volume 2018

Civil Engineering

Advances in

Acoustics and Vibration

Advances in

Hindawi

www.hindawi.com Volume 2018

Hindawi

www.hindawi.com Volume 2018

Electrical and Computer

Engineering

Journal of

Advances in

OptoElectronics

Hindawi

www.hindawi.com

Volume 2018

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2013

Hindawi

www.hindawi.com

The Scientic

World Journal

Volume 2018

Control Science

and Engineering

Journal of

Hindawi

www.hindawi.com Volume 2018

Hindawi

www.hindawi.com

Journal of

Engineering

Volume 2018

Sensors

Journal of

Hindawi

www.hindawi.com Volume 2018

International Journal of

Rotating

Machinery

Hindawi

www.hindawi.com

Volume 2018

Modelling &

Simulation

in Engineering

Hindawi

www.hindawi.com Volume 2018

Hindawi

www.hindawi.com Volume 2018

Chemical Engineering

International Journal of

Antennas and

Propagation

International Journal of

Hindawi

www.hindawi.com Volume 2018

Hindawi

www.hindawi.com Volume 2018

Navigation and

Observation

International Journal of

Hindawi

www.hindawi.com Volume 2018

Advances in

Multimedia

Submit your manuscripts at