Research Article
Strabismus Recognition Using Eye-Tracking Data and
Convolutional Neural Networks
Zenghai Chen,
1
Hong Fu ,
1
Wai-Lun Lo,
1
and Zheru Chi
2
1
Department of Computer Science, Chu Hai College of Higher Education, 80 Castle Peak Road, Castle Peak Bay, Tuen Mun,
NT, Hong Kong
2
Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
Correspondence should be addressed to Hong Fu; [email protected]
Received 25 October 2017; Revised 7 January 2018; Accepted 5 February 2018; Published 26 April 2018
Academic Editor: Weide Chang
Copyright © 2018 Zenghai Chen et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Strabismus is one of the most common vision diseases that would cause amblyopia and even permanent vision loss. Timely
diagnosis is crucial for well treating strabismus. In contrast to manual diagnosis, automatic recognition can signicantly reduce
labor cost and increase diagnosis eciency. In this paper, we propose to recognize strabismus using eye-tracking data and
convolutional neural networks. In particular, an eye tracker is rst exploited to record a subjects eye movements. A gaze
deviation (GaDe) image is then proposed to characterize the subjects eye-tracking data according to the accuracies of gaze
points. The GaDe image is fed to a convolutional neural network (CNN) that has been trained on a large image database called
ImageNet. The outputs of the full connection layers of the CNN are used as the GaDe images features for strabismus
recognition. A dataset containing eye-tracking data of both strabismic subjects and normal subjects is established for
experiments. Experimental results demonstrate that the natural image features can be well transferred to represent eye-tracking
data, and strabismus can be e
ectively recognized by our proposed method.
1. Introduction
Strabismus is a common ophthalmic disease that can lead to
weak 3D perception, amblyopia (termed lazy eye as well), or
even blindness if it is not timely diagnosed and well treated
[1, 2]. More importantly, it has been shown that strabismus
would cause serious psychosocial consequences in both
children and adults [312]. These adverse consequences
include education [5], employment [6], and dating [8]. Many
young strabismic patients could be well treated if diagnosis
and treatment were taken at their early ages. A preschool
childs strabismus has a much larger chance to be cured than
that of an adult. Timely diagnosis is thus essential. Tradi-
tional strabismus diagnosis methods, for example, cover
test, Hirschberg test, and Maddox rod, are manually con-
ducted by professional ophthalmologists. This would make
the diagnosis expensive and drive people out of professional
examinations consequently. Furthermore, ophthalmologists
make decisions according to their experiences, and thus the
diagnosis results are subjective. In view of that, we propose
automatic recognition of strabismus in this paper. Automatic
recognition of strabismus, which can be termed strabismus
recognition as well, would perform strabismus diagnosis
without ophthalmologists. As a result, the diagnosis results
would be objective, and the diagnosis cost can be signicantly
reduced. We realize strabismus recognition by exploiting
eye-tracking data, which are acquired using an eye tracker.
The proposed eye-tracking-based strabismus recognition
method allow s us to build an objective, noninvasive, and
automatic diagnosis system that could be used to carry out
strabismus examination in large communities. For instance,
we can place the system in a primary school such that the
students can take their examinations at any time.
An eye-tracking technique has been successfully applied
to solve various problems, for example, object recognition
[13], content-based image retrieval [14], attention modeling
[15], and image quality assessment [16]. But very little
research on the eye-tracking technique for strabismus diag-
nosis has been reported. People have also proposed to lever-
age eye-tracking methodology for strabismus examination
Hindawi
Journal of Healthcare Engineering
Volume 2018, Article ID 7692198, 9 pages
https://doi.org/10.1155/2018/7692198
[1720]. Pulido [17] employed the Tobii eye tracker to
acquire gaze data to conduct ophthalmic examination
including strabismus by calculating the deviation of gaze
data. But Pulido proposed a method prototype only in [17].
The author had no real strabismic gaze data to demonstrate
the prototypes performance. Model and Eizenman [18] pro-
posed an eye-tracking-based approach for performing the
Hirschberg test, a classical method to measure binocular ocu-
lar misalignment. But the performance of their approach was
studied with ve healthy infants only. The methodseective-
ness for strabismus examination had not been tested. Bakker
et al. [19] developed a gaze direction measurement instru-
ment to estimate the strabismus angle. The instrument allows
for unrestrained head movements. But only three subjects
participated in the experiment. The number of subjects is rel-
atively too small. Furthermore, there is no ground truth
available for strabismic subjects. It is hence impossible to
comprehensively evaluate the instruments performance. In
our previous work [ 20], we developed a system based on
the eye-tracking technique to acquire gaze data for strabis-
mus diagnosis. The diagnosis is performed by intuitively
analyzing gaze deviations. But the systemseectiveness is
veried by a strabismic subject and a normal subject only.
In this paper, we develo ped a more eective eye-tracking
system than that of [20] to acquire gaze data for strabismus
classication. Instead of examining strabismus by directly
analyzing gaze deviations in previous methods, we explore
a machine learning method to realize strabismus classica-
tion. One big disadvantage of previous methods is that their
accuracy is dramatically aected by every single gaze point.
A noisy gaze point would cause an inaccurate examination
result. By contrast, a learning method can eliminate the
eect of a small number of noisy gaze points by using a large
amount of data, so as to generate a more accurate result. Par-
ticularly, we leverage convolutional neural networks (CNNs),
a powerful deep learning algorithm, to extract features from
gaze data for strabismus recognition.
With the rapid developments of deep learning in recent
years, the CNN has achieved numerous successes in com-
puter vision and pattern recognition, for example, image
classication [21], scene labeling [22], action recognition
[23], and speech recognition [24]. With a hierarchical struc-
ture of multiple convolution-pooling layers, CNNs can
encode abstract features from raw multimedia data. Espe-
cially for learning image features, CNNs have shown impres-
sive performances. In our work, CNNs are exploited to
generate useful features to characterize eye-tracking data for
strabismus recognition. Concretely, a subject is asked to suc-
cessively xate on nine points. Meanwhile, the subjects eye
movements are captured by an eye tracker. The eye-
tracking data are then represented by a gaze deviation
(GaDe) image which is produced according to the xation
accuracies of the subjects gaze points. After that, the GaDe
image is fed to a CNN that has been trained on a large image
database called ImageNet [25]. The output vectors of the full
connection (FC) layers of a CNN are used as features for
representing the GaDe image. Finally, the features are input
to a support vector machine (SVM) for strabismus classica-
tion. It is expected that the image features of ImageNet learnt
by CNNs would be well transferred to represent eye-tracking
data for strabismus recognition. We build a gaze dataset
using our eye-tracking system to demonstrate the proposed
methods performance. The dataset is much larger than pre-
viously published strabismus datasets.
The rest of this paper is organized as follows. Section 2
describes the methods exploited for strabismus recognition.
Section 3 introduces the dataset for experiments and reports
experimental results. Section 4 concludes this paper with
nal remarks. Before ending this introductory section, it is
worth mentioning the contributions of this paper as follows:
(i) We develop an eective eye-tracking system to
acquire gaze data for strabismus recognition.
(ii) We propose a gaze deviation image to characterize
eye-tracking data.
(iii) We exploit convolutional neural networks to gener-
ate features for gaze deviation image representation.
(iv) We demonstrate that natural image features learnt
by convolutional neural networks can be well trans-
ferred to represent eye-tracking data, and strabismus
can be eectively recognized by our method.
2. Methodology
2.1. The Proposed Strabismus Recognition Framework.
Figure 1 shows our proposed framework for strabismus rec-
ognition. The recognition procedure is conducted as follows.
First of all, the subject is asked to look at nine points respec-
tively shown at dierent positions on a screen. Meanwhile, an
eye tracker mounted below the screen detects the subject s
eye movements and records his or her gaze points. The gaze
data recorded by an eye tracker are then exploited to generate
three gaze deviation (GaDe) maps, based on the xation
accuracies of left-eye gaze points, right-eye gaze points, and
center points of two eyes, respectively. The three maps are
combined to form a GaDe image with three maps denoting
R, G, and B channels of the image. After that, the GaDe image
is fed to a CNN which has been trained on ImageNet, so as to
produce a feature vector for representing the GaDe image.
Finally, the feature vector is fed to a SVM for classication,
and the subje ct will be classied as strabismic or normal.
It is worth to present the motivations for the use of a
CNN and GaDe image in our method before digging into
the implementation details. We use the CNN to tackle our
problems for two reasons. Firstly, eye-tracking gaze data are
dicult to characterize. Up to now, there is still no standard
feature for eye-tracking data representation. People have pro-
posed some features such as xation time and saccade path.
But these features are designed for specic tasks. They are
not suited for our strabismus recognition problem. Secondly,
the CNN is powerful for learning discriminative features
from raw images. It has shown state-of-the-art performance
for various pattern recognition and image classication prob-
lems. We thus expect that the CNN can extract eective fea-
tures for eye-tracking data representation. Since the CNN is
good at extracting image features, we need to convert the
2 Journal of Healthcare Engineering
raw gaze data to images before feature extraction by the
CNN. That is why we propose GaDe images to represent
the gaze data. The principle for us to design GaDe images is
that the images should be able to well describe the dierence
between normal data and strabismic data. The details of eye-
tracking data acquisition, GaDe image generation, and CNN
models used will be presented in the following subsections.
2.2. Eye-Tracking Data Acquisition. We use the eye tracker
Tobii X2-60 (shown in Figure 1) to acquire eye-tracking gaze
data. Tobii X2-60 has a sampling rate of 60 Hz and tracking
accuracy of 0.4 degree. Both of the sampling rate and tracking
accuracy are high enough to precisely capture strabismic gaze
data in our experiments. The eye tracker is adhered below
the monitor of a laptop to build our eye-tracking system.
The laptop is Lenovo ThinkPad T540p with a 1920 × 1080
screen resolution. The main reason for us to choose a laptop
rather than a desktop for building the system is that it is
convenient to carry the system in dierent environments
for data acquisition. In order to position gaze points on
the screen digitally, we need to dene a coordinate system
for the screen. The upper-left corner of the screen is set as
the origin, the position value of which is (0,0), with hori-
zontal line denoting x-coordinate and vertical line denot-
ing y-coordinate. The values of the lower-right corner,
upper-right corner, and lower-left corner are (1,1), (1,0),
and (0,1), respectively. In other words, both x and y lie in
interval (0,1) on the screen. We exploit Tobii Matlab SDK
to develop our data acquisition interface.
Calibration needs to be performed before using the eye
tracker to acquire gaze data. The purpose of calibration is
to teach the eye tracking system the characteristics of the
subject, such that the eye tracker can precisely detect the
subjects eye movements. During the calibration, the subject
is asked to xate on a number of points displayed on the
screen. In terms of the number of points used, we can have
dierent calibration schemes, for example, one-point, three-
point, or nine-point. We adopt a nine-point calibration
scheme, as it can provide a high-tracking accuracy. The posi-
tions of the nine points on the screen are (0.1,0.1), (0.5,0.1),
(0.9,0.1), (0.1,0.5), (0.5,0.5), (0.9,0.5), (0.1,0.9), (0.5,0.9), and
(0.9,0.9). The result would be shown after each calibration.
We can start real tracking tests if the calibration accuracy is
acceptable. Otherwise, we should recalibrate.
A traditional method for ophthalmologists to examine
strabismus is a nine-point method. The nine-point method
is to ask the patient to xate on nine target points at a certain
distance in front sequentially. Meanwhile, the ophthalmolo-
gist observes the patients eye movements. This method is
able to comprehensively examine the patients eye move-
ments with rotations at dierent angles. Therefore, we adopt
the same method to develop a gaze data acquisition interface.
The nine points positions are the same to the nine calibra-
tion points. Figure 2 shows the nine-point interface. We use
a black background, as it helps the subject to concentrate
on the target points. A point is comprised by a red inner cir-
cle and a white outer circle. The radiuses of the inner circle
and outer circle are 15 and 30 pixels, respectively. The points
are displayed one by one orderly. The white arrows point out
the display order. In a real test, the subjects position is
adjusted to make sure that the subje ct is at a xed distance
(50 cm in our test) from the screen, and the subjects eye level
and the screen center are in the same horizontal line. A dis-
tance of 50 cm is an optimal distance for the eye tracker Tobii
X2-60 to track the subjects eye movements.
Figure 3 shows the procedure of gaze data acquisition.
Each time one target point is displayed, the eye tracker
records the subjects gaze points of both eyes simultaneously.
The next target point would be displayed, if the number of
eective gaze pairs acquired exceeds 100, where a gaze pair
is dened as the two gaze points of two eyes captured by an
eye tracker at one sampling moment and eective indicates
Result
Strabismic
or
normal
GaDe image
GaDe maps
Gaze data
SVM classifier
Feature vector
Convolutional neural network
Eye tracker
Eye-tracking system
Screen
Eye tracker
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure 1: The proposed strabismus recognition framework.
3Journal of Healthcare Engineering
that at least one gaze point of a gaze pair is located close
enough to the target point. That is, the distance between
the target point and either gaze point of a gaze pair must be
smaller than a threshold (0.05 in this paper) predened
empirically. The distance is dened as Euclidean distance
shown in Section 2.3. It is worth mentioning that for some
serious strabismic subjects, it would be sometimes dicult
to capture eective gaze points at some target points, in
particular the points located at the four corners of the
screen, because the strabismic subjects need to rotate the eye-
balls to their extreme. In view of that, we let each target point
display for at most 10 seconds for gaze data acquisition. The
next target point would be displayed after 10 seconds no
matter whether or not the system has collected 100 pairs of
eective gaze points. Since the sampling rate of our eye
tracker is 60 Hz, it would take only two seconds for collecting
100 gaze pairs from normal subjects. Hence, 10 seconds are
long enough to capture gaze data for each point.
2.3. Gaze De viation Image. The next step after gaze data
acquisition is to generate a GaDe image to characterize the
gaze data. To realize that, we need to rst calculate three
GaDe maps, which will serve as R, G, and B channels of the
GaDe image, based on the xation accuracies of two eyes
gaze points. Let g
ij
denote the ith gaze pair for the jth target
point and p
l
ij
= x
l
ij
, y
l
ij
and p
r
ij
= x
r
ij
, y
r
ij
denote the values of
the left-eye gaze point and right-eye gaze point of gaze pair
g
ij
, where 1 j 9 and 1 i 100, and superscripts l and r
indicate left and right. Let p
t
ij
= x
t
j
, y
t
j
denote the jth target
points value. Then, we can have the xation accuracies in
terms of Euclidean distance for left-eye gaze point p
l
ij
as
d
l
ij
= x
t
j
x
l
ij
2
+ y
t
j
y
l
ij
2
, 1
and for right-eye gaze point p
r
ij
as
d
r
ij
= x
t
j
x
r
ij
2
+ y
t
j
y
r
ij
2
2
We dened the center of the gaze pair p
l
ij
and p
r
ij
as
p
c
ij
= x
c
ij
, y
c
ij
, which can be simply formulated as follows:
p
c
ij
=
x
l
ij
+ x
r
ij
2
,
y
l
ij
+ y
r
ij
2
3
(0.1, 0.1)
(0.1, 0.5)
(0.1, 0.9)
(0.5, 0.1)
(0.5, 0.5)
(0.5, 0.9)
(0.9, 0.1)
(0.9, 0.5)
(0.9, 0.9)
y
x
Figure 2: The nine-point gaze data acquisition interface.
Start
Adjust subject’s
position
Calibration
Acceptable?
Display a point
Acquire gaze data
Complete 100
gaze pairs?
Complete 10
seconds?
Complete 9
points?
No
Ye s
Ye s
Ye s
No
Ye s
No
Stop
No
Figure 3: The gaze data acquisition procedure.
4 Journal of Healthcare Engineering
Then, similar to (1) and (2), the xation accuracy of the
center p
c
ij
is calculated as
d
c
ij
= x
t
j
x
c
ij
2
+ y
t
j
y
c
ij
2
4
For one subject, we calculate the xation accuracies
d
l
ij
, d
r
ij
, and d
c
ij
for all of his or her gaze pairs. Based on the
three types of xation accuracies, three GaDe maps M
l
, M
r
,
and M
c
can be computed, respectively. The map size is
equivalent to the input size of a CNN. In this paper, two
map sizes (224
224 and 227
227) are adopted. The ele-
ment values of the three maps are derived from three xation
accuracies d
l
ij
, d
r
ij
, and d
c
ij
of all the gaze pairs. One gaze pair
represents one element in the three maps. In particular, the
values of the gaze pair g
ij
in the three GaDe maps M
l
, M
r
,
and M
c
are respectively calculated as
v
l
ij
= round
d
l
ij
max
i,j
d
l
ij
255,
v
r
ij
= round
d
r
ij
max
i,j
d
r
ij
255,
v
c
ij
= round
d
c
ij
max
i,j
d
c
ij
255,
5
where max nds out the maximum value over all gaze
pairs and round
rounds a value to its nearest integer.
Equation (5) guarantees that the element values of the three
GaDe maps are integers lying in interval (0,255), which is
also the value interval of a real digital image. The positions
of v
l
ij
, v
r
ij
, and v
c
ij
in maps M
l
, M
r
, and M
c
are specied
by
x
l
ij
, y
l
ij
, x
r
ij
, y
r
ij
, and x
c
ij
, y
c
ij
, respectively. The elements
that are not associated with any gaze points in the three maps
are assigned a value of 0. In the GaDe maps, big xation devi-
ations (far from the target point) possess big values. In other
words, inaccurate gaze points play more important roles in a
GaDe map. This makes sense, as the prominent dierence
between strabismic people and normal people is that strabis-
mic peoples one eye or even two eyes cannot well xate on
target objects. The three GaDe maps can thus eectively
characterize the properties of strabismic gaze data. Generally,
normal peoples GaDe maps would have only a few bright
(large intensity) points far from nine target points, and most
relatively dark points are around the target points, while stra-
bismic peoples G aDe images usually have a large number of
bright points located far from the target points. We combine
three GaDe maps to form a GaDe image, with each map
representing a color channel of the GaDe image. The GaDe
image is then fed to a CNN for feature extraction.
2.4. Convolutional Neural Networks. A CNN is a hierarchi-
cal architecture that consists of a number of convolution
layers and pooling layers. CNNs usually receive raw data,
for example, images pixels, as input and extract increasingly
abstract features through hierarchical convolution-pooling
layers. Take color image feature extraction as an example.
An images three-color channels are fed to the rst convo-
lution layer of the CNN. The convolution results, called
convolution feature maps as well, are then downsampled in
the pooling (e.g., max-pooling) layer following the rst
convolution layer, to generate pooling-feature maps. The
pooling-feature maps are further passed to the next convolu-
tion layer and then to the pooling layer for processing. After a
number of convolution and pooling operations, the feature
maps are connected to an output layer through one or more
FC layers. The FC layers can be used for classication like a
multilayer perceptron, with the output vector representing
dierent classes, or we can employ the outputs of FC layers
as a feature vector of the input image and then use a classier,
for example, SVM, to perform classication on the feature
vector. The hierarchical convolution and pooling operations
make the features extracted by a CNN insensitive to transfor-
mation, rotation, and scaling.
We adopt six dierent CNN models that have been
trained on ImageNet to generate features for representing
eye-tracking gaze data. We use pretrained CNN models as
feature extractors and do not train them using eye-tracking
data in our work. There are two main reasons for us to do
this. Firstly, we do not have enough eye-tracking data to well
train a complicated CNN model. A CNN model may have
thousands or even hundreds of thousands of weights that
need to be trained. A large dataset is hence necessary to eec-
tively tune so many weights. For instance, the CNN models
we adopted have been trained on ImageNet, an image data-
base that contains more than one million training images
associated with 1000 classes. For strabismus classication
problem, it is dicult to build a large dataset, since not so
many strabismic people can be found to participate in the
experiments. Actually, only 17 strabismic people participate
in our experiments. It is, therefore, impractical to train a
CNN model using such a few strabismic gaze data. However,
it would be a good idea to employ a pretrained CNN as a fea-
ture extractor to generate features for gaze data representa-
tion. This will be demonstrated in Section 3. Secondly, the
weights of CNN models are tuned using natural images
rather than eye-tracking gaze data. We would like to investi-
gate whether or not the information extracted from the nat-
ural image domain is applicable to the eye-tracking data
domain. It would be meaningful if the features of natural
images can be well transferred to represent eye-tracking data,
since we would be able to make use of large quantities of nat-
ural images in the internet to help generate features, rather
than to manually design complicated algorithms to extract
features, for eye-tracking data representation. The six CNN
models we adopted are named AlexNet [21], VGG-F,
VGG-M, VGG-S [26], VGG-16, and VGG-19 [27]. All of
them have three FC layers but dierent numbers of convolu-
tion layers. Their dierences also lie in input size, number of
convolution lters in each layer, max-pooling size, and so on.
People can refer to [21, 26, 27] for the architecture details of
the six models. The six models have the same three FC layers.
The rst two FC layers use ReLU [28] transfer function, and
the nal FC layer adopts Softmax transfer function. For each
FC layer, we employ the input vector and output vector of the
5Journal of Healthcare Engineering
transfer function as feature vectors of GaDe images. Then, we
can extract in total six feature vec tors from three FC layers.
The six feature vectors are denoted by l
1
, l
2
, l
3
, l
4
, l
5
, and l
6
,
and their dimension sizes are 4096, 4096, 4096, 4096, 1000,
and 1000, respectively. We will compare the performances
of six feature vectors for six CNN models in Section 3. Note
that the input size of AlexNet is 227 × 227, while the input
sizes of the other ve models are all 224 × 224. Therefore,
the GaDe images need to be resized to 227 × 227 for AlexNet
and 224 × 224 for the other ve models.
2.5. Baseline Method. In order to demonstrate the eec-
tiveness of CNNs for extracting features from gaze data
for strabismus recognition, we propose a baseline method
for a comparison. The baseline method models the normal
gaze points of each target point as a multivariate Gaussian
distribution. The parameters (mean vector and covariance
matrix) of each Gaussian distribution are calculated using
the normal training data. To construct the Gaussian distribu-
tion, we represent a gaze pair g
ij
by the x-coordinate dier-
ences and y-coordinate dierences between the target point
p
t
j
and the pairs two gaze points p
l
ij
and p
r
ij
as follows:
u
ij
= x
t
j
x
l
ij
, y
t
j
y
l
ij
, x
t
j
x
r
ij
, y
t
j
y
r
ij
T
6
Then, we can have the Gaussian probability density func-
tion for the gaze pair g
ij
as
pu
ij
; μ
j
, Σ
j
=
1
2π
2
Σ
j
exp
1
2
u
ij
μ
j
Σ
1
j
u
ij
μ
j
,
7
where μ
j
and Σ
j
are the mean vector and covariance matrix of
the Gaussian distribution for the jth target point, respec-
tively. μ
j
and Σ
j
are calculated using the normal training gaze
pairs that belong to the jth target point.
Σ
j
computes the
determinant of Σ
j
.
The baseline method performs classication as follows.
Given the gaze pair g
ij
, if its density value in (7) is larger than
the threshold α
j
, the gaze pair is classied as normal. If the
proportion of normal gaze pairs is larger than the threshold
β
j
, then the target point is classied as normal for the subject.
Otherwise, the target point is classied as strabismic for the
subject. If one of the right target points is classied as strabis-
mic, the subject will be nally classied as strabismic. In
other words, a normal subject should possess normal xa-
tions on all nine dierent directions. Once the xation on
one direction is abnormal, the subject will be diagnosed as
strabismic. This is reasonable, since some types of strabis-
mus such as incomitant strabismus may xate poorly on a
specic direction only. In medical examination, a subject
may be also diagnosed to have strabismus once the ophthal-
mologist observes that the subjects two eyes do not align at a
specic direction. Thresholds α
j
and β
j
are learnt using grid
search, such that the classication accuracy on the training
data is maximized.
3. Experiments
3.1. Eye-Tracking Gaze Data. We cooperated with Hong
Kong Association of Squint and Double Vision Suerers
to collect strabismic data. In total, 17 members of the asso-
ciation suering f rom strabismus consented to participate
in our experiments. In addition to the 17 strabismic sub-
jects, we invited 25 normal subjects to join in our study.
All subjects are adults, with age ranging from 25 to 63,
including both male and female. They have been diagnosed
by a p rofessional ophthalmologist, and the diagnosis results
are used as ground truth in this paper. After ethics approval
and informed consent, the 42 subjects followed the data
acquisition procedure introduced in Section 2.2 to partici-
pate in our experiments, and nally, we collected 42 eye-
tracking samples. The 17 strabismic subjects suer from dif-
ferent types of strabismus (e.g., recessive , intermittent, and
manifest) in various severities (e.g., mild, moderate, and
severe). Recessive strabismus is only present when binocu-
lar vision has been interrupted, such as covering one eye.
This type of patients can still maintain fusion. The patients
are usually aware of having recessive strabismus after taking
examination by an ophthalmologist. Manifest strabismus
can be observed while a patient looks at an object binocu-
larly. Intermittent strabismus is a combination of recessive
strabismus and manifest strabismus. People su ering from
recessive strabismus, intermittent strabismus, and mild
manifest strabismus are sometimes dicult to be distin-
guished from normal people apparently, as their xation
deviations are small, especially for recessive strabismus
and intermittent strabismus.
Figure 4 shows some examples of gaze data and GaDe
images. The rst row displays all gaze points for nine target
points in a map, with red denoting left gaze points and
blue × denoting right gaze points. The second row shows
the corresponding GaDe images for the gaze data of the rst
row. For a better visualization, the GaDe images have been
brightened by adding a number (50) to the gaze points
values in the images. The rst two columns are two normal
samples, one with good xation (small deviation) and one
with relatively poor xation (large deviation). The other three
columns from left to right represent strabismic samples of
recessive strabismus, intermittent strabismus, and manifest
strabismus, respectively. Note that the colors in the rst
row represent the left and right gaze points, and the colors
in the second row represent the R, G, and B channels of GaDe
images. The two main observations can be drawn from
Figure 4. Firstly, gaze points with large deviations shown in
the rst row are highlighted in the corresponding GaDe
images, and those with small deviations are inhibited. There-
fore, inaccurate gaze points would contribute more in recog-
nizing strabismus using GaDe images. Secondly, the data
distributions of normal sample with small deviation and
manifest sample are distinctive. They are easy to distinguish.
By contrast, the data distributions of normal sample with
large deviation and recessive or intermittent sample look
similar. It is dicult to distinguish them intuitively. That is
why we exploit CNNs to solve the problem. We expect that
CNNs as a powerful abstrac t feature extractor can extract
6 Journal of Healthcare Engineering
distinctive features from dierent samples, so as to eec-
tively classify normal data and strabismic data. It is worth
mentioning that we focus on binary strabismus classication
rather than recognizing dierent types of strabismus in this
paper. The major reason is that we do not have enough data
for each strabismus type at this moment. However, we con-
sider that CNNs could extract useful features from GaDe
images of dierent strabismus types in case that sucient
data are provided, and then the proposed method would
be well applied to recognizing strabismus types. We leave
this task for future work when we acquire sucient data
for dierent strabismus types.
3.2. Experimental Results. We have in total 42 samples, with
25 normal samples and 17 strabismic samples. A leave-one-
out evaluation scheme is adopted. That is, each time one
sample is used for testing, and the rest 41 samples are for
training. We can thus have 42 dierent results. The 42 results
are averaged to get the nal performance. LIBSVM [29] is
employed to implement SVM classication. A linear kernel
of a SVM is used for both the CNN method and baseline
method, and the SVM classiers are trained using a two-
fold cross validation scheme. Table 1 tabulates the classica-
tion accuracies of six CNN models (by row) using dierent
feature vectors (by column) extracted from three FC layers.
The nal column represents the concatenation of all the six
feature vectors as a one feature vector. The accuracy of the
baseline method is 69.1. As can be seen from Table 1, the fea-
tures extracted from VGG-S overall perform the best. The
highest accuracy (95.2%) is achieved when the feature vector
l
3
of VGG-S is used. Feature vectors l
1
, l
2
, l
3
, and l
4
outper-
form feature vectors l
5
and l
6
for most of the cases. One pos-
sible reason is that l
1
, l
2
, l
3
, and l
4
extracted from the rst two
FC layers contain richer features than l
5
and l
6
. Note that
the concatenation of all feature vectors sometimes obtains
lower accuracy than that of some individual feature vec-
tors, as shown in the nal column. The most important
nding from Table 1 is that the features extracted from six
CNN models perform much better than the baseline method,
except for some cases when the feature vector l
6
is used. This
indicates that the CNN features can eectively characterize
the GaDe images derived from eye-tracking data. CNN fea-
tures can be a promising representation of eye-tracking data.
Specicity and sensitivity are two important metrics
to measure the performance of a medical classication
method. A good method should have high values for both
specicity and sensitivity. For our strabismus recognition
problem, specicity is dened as the percentage of normal
subjects who are correctly classied as normal and sensitivity
is dened as the percentage of strabismic subjects who are
correctly classied as strabismic. In order to study the spec-
icity and sensitivity of our method, for each CNN model,
we select the result with the highest classication accuracy.
According to Table 1, the highest accuracies for AlexNet,
VGG-F, VGG-M, VGG-S, VGG-16, and VGG-19 are 78.6
(l
1
), 81.0 (column All), 88.1 (l
1
), 95.2 (l
4
), 83.3 (l
1
), and
83.3 (column All), respectively. We show the specicity
and sensitivity of the six CNN results as well as the baseline
method in Figure 5. Evidently, VGG-S possesses the best
specicity and sensitivity. Only one normal subject and
one str abismic subject are misclassied by VGG-S. The
Gaze data
GaDe image
Normal with
small deviation
Normal with
large deviation
Recessive Intermittent Manifest
Figure 4: Examples of gaze data and corresponding GaDe images, where the rst two columns represent normal data with small deviation
and large deviation and the third, fourth, and fth columns represent data of recessive strabismus, intermittent strabismus, and manifest
strabismus, respectively.
Table 1: Accuracies (%) of di erent CNN models. The accuracy of
the baseline method is 69.1.
Feature l
1
l
2
l
3
l
4
l
5
l
6
All
AlexNet 78.6 78.6 76.2 76.2 73.8 76.7 76.2
VGG-F 76.2 76.2 76.2 76.2 78.6 65.1 81.0
VGG-M 88.1 85.7 85.7 85.7 78.6 57.1 78.6
VGG-S 85.7 81.0 78.6 95.2 76.2 79.1 83.3
VGG-16 83.3 81.0 76.2 81.0 76.2 67.4 83.3
VGG-19 81.0 78.6 81.0 81.0 71.4 62.8 83.3
7Journal of Healthcare Engineering
baseline method has a high specicity (84%) but a very
low sensitivity (47.1%). This means that the baseline
method is insensitive to strabismic data. It tends to classify
the data as normal. By contrast, the dierence between spec-
icity and sensitivity of CNN features is relatively small,
especially for VGG-S. This substantiates two things. Firstly,
the proposed GaDe images are able to eectively characterize
both normal gaze data and strabismic gaze data. The two
types of eye-tracking data can be well separated by GaDe
images. Secondly, the natural image features learnt by CNNs
can be well transferred to represent GaDe images.
Overall, the experimental results have demonstrated
that the proposed method is a promising alternative for
strabismus recognition. In the future, the accuracy can be
improved in two major ways. One way is to employ more
advanced pretrained CNN models for better feature extrac-
tion. The other way is to collect more gaze data, especially
data of dierent strabismus types. With sucient data, we
would then be able to ne-tune CNN models, as a result
of which CNN models could learn more discriminative fea-
tures to boost the classication accuracy.
4. Conclusion
In this paper, we rst design an eye-tracking system to
acquire gaze data from both normal and strabismic people
and then propose a GaDe image based on gaze points xa-
tion deviation to characterize eye-tracking data. Finally, we
exploit CNNs that have been trained on a large real image
database to extract features from GaDe images for strabismus
recognition. Experimental results show that GaDe images are
eective for characterizing strabismic gaze data, and CNNs
can be a powerful alternative in feature extraction of eye-
tracking data. The eectiveness of our proposed method for
strabismus recognition has been demonstrated.
Conflicts of Interest
The authors dec lare that they have no conicts of interest.
Acknowledgments
This work was supported by a grant from the Hong Kong
RGC (Project reference no. UGC/FDS13/E04/14).
References
[1] M. S. Castanes, Major review: the underutilization of vision
screening (for amblyopia, optical anomalies and strabismus)
among preschool age children, Binocular Vision & Strabis-
mus Quarterly, vol. 18, no. 4, pp. 217232, 2003.
[2] S. M. Mojon-Azzi, A. Kunz, and D. S. Mojon, The perception
of strabismus by children and adults, Graefe's Archive for
Clinical and Experimental Ophthalmology, vol. 249, no. 5,
pp. 753757, 2011.
[3] J. M. Durnian, C. P. Noonan, and I. B. Marsh, The psychoso-
cial eects of adult strabismus: a review, The British Journal of
Ophthalmology, vol. 95, no. 4, pp. 450453, 2011.
[4] B. G. Mohney, J. A. McKenzie, J. A. Capo, K. J. Nusz,
D. Mrazek, and N. N. Diehl, Mental illness in young adults
who had strabismus as children, Pediatrics, vol. 122, no. 5,
pp. 10331038, 2008.
[5] O. Uretmen, S. Egrilmez, S. Kose, K. Pamukcu, C. Akkin, and
M. Palamar, Negative social bias against children with strabis-
mus, Acta Ophthalmologica Scandinavica, vol. 81, no. 2,
pp. 138142, 2003.
[6] S. M. Mojon-Azzi and D. S. Mojon, Strabismus and employ-
ment: the opinion of headhunters, Acta Ophthalmologica,
vol. 87, no. 7, pp. 784788, 2009.
[7] M. J. Go, A. W. Suhr, J. A. Ward, J. K. Croley, and M. A.
OHara, Eect of adult strabismus on ratings of ocial U.S.
Army photographs, Journal of AAPOS, vol. 10, no. 5,
pp. 400403, 2006.
[8] S. M. Mojon-Azzi, W. Potnik, and D. S. Mojon, Opinions of
dating agents about strabismic subjects ability to nd a part-
ner, British Journal of Ophthalmology, vol. 92, no. 6,
pp. 765769, 2008.
[9] S. M. Mojon-Azzi and D. S. Mojon,
Opinion of headhunters
about
the ability of strabismic subjects to obtain employment,
Ophthalmologica, vol. 221, no. 6, pp. 430433, 2007.
[10] S. E. Olitsky, S. Sudesh, A. Graziano, J. Hamblen, S. E. Brooks,
and S. H. Shaha, The negative psychosocial impact of strabis-
mus in adults, Journal of AAPOS, vol. 3, no. 4, pp. 209211,
1999.
[11] E. A. Paysse, E. A. Steele, K. M. McCreery, K. R. Wilhelmus,
and D. K. Coats, Age of the emergence of negative attitudes
toward strabismus, Journal of AAPOS, vol. 5, no. 6, pp. 361
366, 2001.
[12] D. K. Coats, E. A. Paysse, A. J. Towler, and R. L. Dipboye,
Impact of large angle horizontal strabismus on ability to
obtain employment, Ophthalmology, vol. 107, no. 2,
pp. 402405, 2000.
[13] T. Toyama, T. Kieninger, F. Shafait, and A. Dengel, Gaze
guided object recognition using a head-mounted eye tracker,
in ETRA '12 Proceedings of the Symposium on Eye Tracking
Research and Applications, pp. 9198, Santa Barbara, CA,
USA, March 2012.
[14] Z. Liang, H. Fu, Y. Zhang, Z. Chi, and D. Feng, Content-based
image retrieval using a combination of visual features and eye
tracking data, in ETRA '10 Proceedings of the 2010 Symposium
on Eye-Tracking Research & Applications, pp. 4144, Austin,
TX, USA, March 2010.
[15] Z. Liang, H. Fu, Z. Chi, and D. Feng, Rening a region based
attention model using eye
tracking data, in 2010 IEEE Inter-
national Conference on Image Processing, pp. 11051108,
Hong Kong, September 2010.
100
90
80
70
60
50
40
30
20
10
0
Specificity
Sensitivity
Baseline AlexNet VGG-F VGG-M VGG-S VGG-16 VGG-19
Method
%
84
80
76.5
47.1
84
92
64.7
82.4
96
94.1
88
76.5
88
76.5
Figure 5: Specicity and sensitivity of dierent methods.
8 Journal of Healthcare Engineering
[16] H. Liu and I. Heynderickx, Visual attention in objective image
quality assessment: based on eye-tracking data, IEEE Trans-
actions on Circuits and Systems for Video Technology, vol. 21,
no. 7, pp. 971982, 2011.
[17] R. A. Pulido, Ophthalmic Diagnostics Using Eye Tracking
Technology, [M.S. thesis], KTH Royal Institute of Technology,
Sweden, 2012.
[18] D. Model and M. Eizenman, An automated Hirschberg test
for infants, IEEE Transactions on Biomedical Engineering,
vol. 58, no. 1, pp. 103109, 2011.
[19] N. M. Bakker, B. A. J. Lenseigne, S. Schutte et al., Accurate
gaze direction measurements with free head movement for
strabismus angle estimation, IEEE Transactions on Biomedi-
cal Engineering, vol. 60, no. 11, pp. 30283035, 2013.
[20] Z. Chen, H. Fu, W. L. Lo, and Z. Chi, Eye-tracking aided
digital system for strabismus diagnosis, in 2015 IEEE Interna-
tional Conference on Systems, Man, and Cybernetics, pp. 2305
2309, Kowloon, October 2015.
[21] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet
classication with deep convolutional neural networks, in
NIPS12 Proceedings of the 25th International Conference on
Neural Information Processing Systems, vol. 1, pp. 1097
1105, Lake Tahoe, NV, USA, December 2012.
[22] C. Farabet, C. Couprie, L. Najman, and Y. LeCun, Learning
hierarchical features for scene labeling, IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 35, no. 8,
pp. 19151929, 2013.
[23] S. Ji, W. Xu, M. Yang, and K. Yu, 3D convolutional neural
networks for human action recognition, IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 35, no. 1,
pp. 221231, 2013.
[24] O. Abdel-Hamid, A. R. Mohamed, H. Jiang, L. Deng, G. Penn,
and D. Yu, Convolutional neural networks for speech recog-
nition, IEEE/ACM Transactions on Audio, Speech, and Lan-
guage Processing, vol. 22, no. 10, pp. 15331545, 2014.
[25] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei,
ImageNet: a large-scale hierarchical image database, in
2009 IEEE Conference on Computer Vision and Pattern Recog-
nition,
pp. 248255, Miami, FL, USA, June 2009.
[26] K. Chateld, K. Simonyan, A. Vedaldi, and A. Zisserman,
Return of the devil in the details: delving deep into convolu-
tional nets, in Proceedings British Machine Vision Conference
2014, UK, September 2014.
[27] K. Simonyan and A. Zisserman, Very deep convolutional net-
works for large-scale image recognition, 2015, In ICLR.
[28] V. Nair and G. E. Hinton, Rectied linear units improve
restricted Boltzmann machines, in Proceedings of the 27th
International Conference on Machine Learning, pp. 807814,
Haifa, Israel, June 2010.
[29] C. C. Chang and C. J. Lin, LIBSVM: a library for support vec-
tor machines, ACM Transactions on Intelligent Systems and
Technology, vol. 2, no. 3, pp. 127, 2011.
9Journal of Healthcare Engineering
International Journal of
Aerospace
Engineering
Hindawi
www.hindawi.com
Volume 2018
Robotics
Journal of
Hindawi
www.hindawi.com Volume 2018
Hindawi
www.hindawi.com Volume 2018
Active and Passive
Electronic Components
VLSI Design
Hindawi
www.hindawi.com Volume 2018
Hindawi
www.hindawi.com Volume 2018
Shock and Vibration
Hindawi
www.hindawi.com Volume 2018
Civil Engineering
Advances in
Acoustics and Vibration
Advances in
Hindawi
www.hindawi.com Volume 2018
Hindawi
www.hindawi.com Volume 2018
Electrical and Computer
Engineering
Journal of
Advances in
OptoElectronics
Hindawi
www.hindawi.com
Volume 2018
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2013
Hindawi
www.hindawi.com
The Scientic
World Journal
Volume 2018
Control Science
and Engineering
Journal of
Hindawi
www.hindawi.com Volume 2018
Hindawi
www.hindawi.com
Journal of
Engineering
Volume 2018
Sensors
Journal of
Hindawi
www.hindawi.com Volume 2018
International Journal of
Machinery
Hindawi
www.hindawi.com
Volume 2018
Modelling &
Simulation
in Engineering
Hindawi
www.hindawi.com Volume 2018
Hindawi
www.hindawi.com Volume 2018
Chemical Engineering
International Journal of
Antennas and
Propagation
International Journal of
Hindawi
www.hindawi.com Volume 2018
Hindawi
www.hindawi.com Volume 2018
Navigation and
Observation
International Journal of
Hindawi
www.hindawi.com Volume 2018
Advances in
Multimedia
Submit your manuscripts at
www.hindawi.com