Visual Story Telling
THE PROJECT
This was a course project in the course CS698A: Recent Advances in Computer Vision, under Prof. Gaurav Sharma. Visual Story telling is relatively new task with hardly any prior work. The task involves mapping sequential images to sequential, human like, narrative descriptive sentences or ’stories’. Simply put, it involves understanding a sequence of images and trying to explain its contents in a story like manner. The problem was introduced recently along with a newly constructed dataset (released by MSR).
PROJECT DESCRIPTION
The project deals with the task of visual story telling i.e. constructing narratives given sequential images. In our approach, we encode the image sequence by passing it through a GRU. This is used as the initial hidden state of the story decoder network, which is also modelled as a GRU. Finally, beam search is used to produce the story for each sequence. Specific heuristics are also discussed to further improve performance. We work with a newly constructed dataset released by MSR which consists of around 80k images in 20k sequences. We use METEOR and BLEU scores as evaluation metrics for this task. The code repository is hosted at GitHub.
THE RESULTS
Although the metrics used (METEOR and BLEU) correlate well with human judgements, their ineffectiveness in such a task is notable. They often fail to assign accurate scores to the generated stories, mainly due to the numerous 'correct' stroies for each image sequence. Other qualitative results are provided in the report and presentation.