University of Connecticut, May 2002
Statement of Problem
To track eye movements based on pupil location. The location of the pupil will be determined relative to the corners of the eye.
We define the following parts of the eye and refer to them throughout the solution of the problem. The sclera is the whites of the eyes, the white portion surrounding the iris. The iris is the colored part of the eye surrounding the pupil. It is unique to each individual and the outer size does not change. At the center of the eye, necessarily the concentric center of the iris, is the pupil. The pupil will be circular and can change size based on the amount of light in the environment. It is always black. The cornea covering the pupil may reflect light to a camera viewing it, but the pupil is essentially a hollow region of the eye through which light passes to the retina.
We will only be concerned with the parts of the eye visible from the outside; the sclera, iris, and pupil. The sclera will be form the boundary in which we expect to find the pupil. The iris will act as the boundary between the sclera and pupil.
We will also have to work with eyelashes, eyelids, and eyebrows. Eyelashes can directly interfere with the view of the pupil and sclera, depending on the angle from which they are viewed. Eyelids will also be of concern. Depending on how much the person closes their eyes when an image is being taken determines how much of the eye there is to work with to find the pupil. The eyebrow is of smaller significance, but it should be taken into account. The eyebrow is often darker than surrounding areas of the eye and skin, and this can be confused for the darker portion of the pupil and iris. The eyebrow will have to be removed from the image before attempting to find the pupil.
We plan to use standard video, or standard ‘web’ cameras connected to a PC. Any device that can be considered a capture device by Windows will be a viable camera. This includes all web cams, digital cameras, and video cameras connected to a video-in port. A capture device with video-in may be required to use some video cameras.
There are a few different methods we can use to capture an image of the eye once the camera has been setup to work with the computer. A simple and effective method would be to mount the camera in front of the users eye. The user will wear a helmet on which the camera would be mounted and hung in front of their eye at a desirable length from the eye. This method is simple because less analysis is required to find the location of the eye. The location of the eye will be known and will be constant, allowing a constant sized iris to be detected.
Another method for capturing an image of the eye would be to place the camera on the monitor, or location where the user will be looking. This method is much less intrusive than wearing a hat or helmet on which the camera is mounted, but it is much more difficult to find the location of the pupil. The user’s head and eyes would be moving all over the viewing area of the camera. The user may turn their head or in some way take their eyes out of the view of the camera. Assuming that this is a natural action for the user and the user expects the camera to not be able to track their eye movements when they’re not looking at the camera, the camera would still have to find the location of the eyes first, and determine if they are in the view of the camera, and if they are in view, to find the gaze of the user.
For our purposes, the first method of mounting a camera on the user’s head was the easiest. The hardware is fairly straightforward to build and will prevent the need for doing analysis to find the user’s face and their eyes before locating their pupil. That’s more image recognition than we were willing to do.
To begin the calibration process, the user will need to setup the video input. The video source must be selected, as well as the correct format and size. A preview window is displayed at the center of the screen. A new dialog box is presented once the video feed is working correctly. The user is asked to look into the center of the screen and to click the Take Snapshot button until the user is satisfied that they have a good image of them looking at the center of the screen. This image will be used to calculate the default center of the image. All deviance from this center will be calculated as an offset from the center on the screen.
The user will then be shown the picture of the eye that was taken, and asked to mark the center of the pupil and the edge of the iris. This radius will be used later in searching for the iris in the subsequent images of the eye during tracking.
Once calibration is done the user can begin to track the movements of their eye. A stream of images will be passed to the image analyzer for finding the pupil. When the user begins tracking, a thread will be started that contains all the functions necessary to find the center of the eye, and project the gaze onto the screen.
We chose a Canny edge detection algorithm to begin to find the edges of the iris. This was a fairly effective process, and under the right lighting conditions, we were able to find a nearly perfect circle that outlines the iris. With the edges defined in the image, we begin to look for the iris within the image.
Since the iris radius is known, we can look for circles of that size. Searching for circles begins by placing a circle candidate at the center of the image, or if this is a subsequent image, we can place the center of the circle candidate at the last known iris center. A circle is 'swept out' from the center candidate and donut is formed by enclosing pixels just on either side of the circle that was swept out. Within that donut, pixels are checked to see if they are edge pixels. The center candidate is moved each time a donut is checked, and the number of edge pixels per center candidate is stored. The center candidate that has the most pixels within its donut becomes the center.
This will produce an iris center in terms of pixels on the bitmap. We now need to project this offset onto the screen. Some variables of eye size, and screen size are necessary to calculate this offset. The size of the screen in millimeters (mm) is required, and is calculated by asking the user for the diagonal dimension of the screen in inches (in) and using a 0.6 ratio for height, and a 0.8 ratio for width, the screen size in mm can be determined.
Next the displacement (offset) must be calculated in terms of mm. Since the image is in pixels, a correlation must be found between the pixel offset of the iris in the bitmap to an offset in mm. The iris can be variable, but we can provide the user with a general size of the iris. The iris size in pixels is known already, from when the user clicked to find the radius of their iris. Since the iris size in pixels is known, and we can provide the user an average size in mm of their iris, the correlation can be found, and the iris offset can be determined in terms of mm.
Using equation 1, the offset in mm can be determined by projecting the offset onto the screen. This will produce an offset on the screen in terms of mm, and this must be converted into pixels. Since the size of the screen in mm is known, the height and width of the screen in pixels are also known, the offset in mm can be converted into pixels and the offset in pixels can be shown on the screen.
Equation 1 - projected offset from center gaze
The average human eye is approximately one inch or approximately 25.5mm (GRAY 825). Knowing the depth of the eye, and the distance from the eye to the screen, the gaze can be projected onto the screen by determining the distance of the pupil from the center of the sclera limits.
The axes of the eyeball are parallel, even though the axes of the orbits are not. This causes the optic nerve to connect slightly closer to the nasal side of the eyeball; approximately 1mm below and 3mm to the nasal side of the center axis of the eyeball (figure 2). This will cause the pivot of the eye to be just slightly behind the center of the eyeball.
· Gray’s Anatomy; Gray, Henry; 1901