Dataset release for Event-Based, Near-Eye Gaze Tracking Beyond 10,000 Hz
This repository includes instructions for downloading and using our 27-person, near-eye, event- and frame-based gaze-tracking dataset.
conda env create -f ebv-eye.yml
Download the 27-person dataset using the setup script (you might have to change user permissions with chmod u+x setup.sh
before).
Note that in the paper we only use subjects 4-27 because subjects 1-3 were recorded with a slightly different, suboptimal setup.
bash setup.sh
We have provided a simple python script which reads and visualizes our data. Run it with:
python visualize.py --data_dir ./eye_data --subject 3 --eye left --buffer 1000
buffer
controls how many events are rendered as a group. Increasing it will make the rendering faster, but blockier.
This visualization is not real-time; the speed is limited by the rendering rate of matplotlib. The primary use of this visualizer is to provide a minimal example of proper data parsing and alignment.
This dataset contains synchronized left and right, IR illuminated eye data from 27 subjects. The data was collected using DAVIS 364b sensors from iniVation. For additional details regarding setup and data collection, please refer to Section 4 of the associated paper.
The data is stored in the ./eye_data/
directory. This directory contains 27 subdirectories, one for each subject. Within
each of these subject directories is two folders titled, 0 and 1. 0 corresponds to the left eye and 1 corresponds to the
right eye. Within each of these eye directories is a frames directory for video data and an events.aerdat file for event
data. Each of these formats will be explained below
Event Data: The events.aerdat file contains all the event based information in a sequential, raw binary format. Every time an event was registered by the sensor, the following information was written directly to the binary file:
parser.py
.Frame Data: The frames directory contains regular video data stored as a set of image frames. The video was taken at ~25 FPS and each frame is a 8-bit 346 × 260 px greyscale png file. The filename of the frame contains the following information: in what order the frames were captured, where the subject was looking, and when the frame was captured in us. The filename has the format "Index_Row_Column_Stimulus_Timestamp.png", where:
Index: (int) indexes the images according to what order they were captured in. Can be used to align the left and right eyes.
Row: (int) the row of the display on which the stimulus point was shown to the subject at the time the frame was captured. This is measured in pixels, starting from the top of the screen, and ending at the bottom (higher values corresponds to lower on the screen). Vertical center is 540px.
Column: (int) the column of the display on which the stimulus point was shown to the subject at the time the frame was captured. This is measured in pixels, starting from the left side of the screen, and ending at the right side (higher values corresponds to further right on the screen from the subject's perspective). Horizontal center is 960px.
Stimulus: (string, {'s','p','st'}) 's' stands for "saccade." In this mode, the stimulus shown to the subject was a small dot which randomly jumped to a different point on the screen after five seconds. 'p' stands for "smooth pursuit." In this mode, the stimulus, again a small dot, smoothly moved across the screen at a fixed rate. 'st' stands for "stop." A large pause symbol was shown to users in-between 's' and 'p' stimuli, and before the experiment began.
Timestamp: (int) the time, in us, at which the frame was captured, as reported by the internal clock of the DAVIS sensor. Time starts when the sensor is turned on, and clocks are synchronized between sensors using a sync connector. Thus timestams are synchronized between both eyes, and with the corresponding .aerdat files.
The monitor on which the stimulus was displayed was a Sceptre 1080p X415BV_FSR. This model has a 40 inch diagonal, 1920x1080 pixel resolution, and was placed 40 cm away from the user, with the user's eyes roughly centered on the screen.
Event Based, Near Eye Gaze Tracking Beyond 10,000Hz:
@article{angelopoulos2020event,
title={Event Based, Near Eye Gaze Tracking Beyond 10,000 Hz},
author={Angelopoulos, Anastasios N and Martel, Julien NP and Kohli, Amit PS and Conradt, Jorg and Wetzstein, Gordon},
journal={arXiv preprint arXiv:2004.03577},
year={2020}
}