SNAG

SNAG is a multimodal dataset consisting of co-collected eye movements and spoken descriptions produced during an image-inspection task. This dataset was collected from 30 observers whose eye movements and spoken descriptions were recorded as they inspected and describing 100 general-domain images. We also provide an image annotation tool (RegionLabeler) for manually labeling image regions with words. This dataset and the annotation tool were collected and developed by a group of researchers associated with the Multidisciplinary Vision Research Lab and Computational Linguistics and Speech Processing Lab at Rochester Institute of Technology.

Raw data

Description of the dataset

The dataset consists of six folders as described below:

1. SnagImages:

This folder consists of the 100 general-domain images used in the data collection process. These images are a subset of the widely known Micorsoft Common Objects in Context (MSCOCO) dataset. The license for the MSCOCO dataset (REF) applies to these images. To see an example click: 1.jpg

2. SnagGazeData:

This folder contains eye movements for 30 observers, each viewing 100 images. It contains two folders:

(1) SnagFixation: This folder contains 3000 csv files named Aud<Image Number>_Obs<Observer Number>.csv. Each file contains fixation data for the indicated image number and observer number. Click here to see an example: Aud1_Obs31.csv.fixation.csv

(2) SnagSaccade: This folder contains 3000 csv files named Aud<Image Number>_Obs<Observer Number>.csv. Each file contains saccade data for the indicated image number and observer number. Click here to see an example: Aud1_Obs31.csv.saccade.csv

3. SnagAudioWav:

This folder contains 3000 audio files named Aud<Image Number>_Obs<Observer Number>.wav. Each file contains spoken descriptions for the indicated image number and observer number. Click to hear an example: Aud1_Obs31.wav

4. SnagTranscribedJSON:

This folder contains transcriptions of each audio file obtained with IBM Watson automatic speech recognition. The files are named Aud<Image Number>_Obs<Observer Number>.json. Each file contains the transcriptions for the indicated image number and observer number. Click to see an example: Aud1_Obs31.json

5. SnagTranscribedTXT:

This folder contains transcriptions for each audio file in TXT format. The files are named Aud<Image Number>_Obs<Observer Number>.txt. Each file contains the transcriptions for the indicated image number and observer number. Click to see an example: Aud1_Obs31.txt

6. SnagTranscribedTXT_Corrected_5Images:

This folder contains the manually corrected ASR transcriptions for audio files (for 30 observers) corresponding to 5 images i.e. 150 files named Aud<Image Number>_Obs<Observer Number>.txt. Each file contains the manually corrected transcription for the indicated image number and observer number. Click to see an example: Aud1_Obs31.txt.corrected

RegionLabeler: Image Region Annotation Software

This folder contains RegionLabeler, the image annotation user interface as briefly described in the ACL paper. This software allows a user to draw boundary around regions in the image and check off words they can be annotated with. For more information, please refer to the README.txt within the folder.

Downloads

SNAG Dataset

RegionLabeler

If you would like the MSCOCO ids for the SNAG images, please download the folder below. Each image file has the MSCOCO id alognwith the SNAG image id. Please note we do not own the images and cannot proivde you with the license to use it.

SNAGImagesWithMSCOCOId

License

SNAGLicense.pdf

RegionLabelerLicense.pdf

Citation and Contact

Please cite our paper when you use this dataset or the image anotation software:

Vaidyanathan, P., Prud’hommeaux, E., Pelz, J. B., and Alm, C. O., SNAG: Spoken Narratives and Gaze Dataset, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Vol 2, pp. 132-137 or the following bibtex:

@InProceedings{P18-2022, author = “Vaidyanathan, Preethi
and Prud’hommeaux, Emily T. and Pelz, Jeff B. and Alm, Cecilia O.”, title = “SNAG: Spoken Narratives and Gaze Dataset”,
booktitle = “Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)”,
year = “2018”,
publisher = “Association for Computational Linguistics”,
pages = “132–137”,
location = “Melbourne, Australia”,
url = “http://aclweb.org/anthology/P18-2022”
}

For any questions about this dataset or software please contact Preethi Vaidyanathan at pxv1621@rit.edu.