18th International Conference on MultiMedia Modeling, Klagenfurt, Austria, January 4-6, 2012

Video Browser Showdown

The Video Browser Showdown (held as a separate session of MMM 2012) is a live video browsing competition where international researchers, working in the field of interactive video search, evaluate and demonstrate the efficiency of their tools in presence of the audience. The aim of the Video Browser Showdown is to evaluate video browsing tools for their efficiency at “Known Item Search” (KIS) tasks with a well-defined data set in direct comparison with other tools. For each KIS task the searchers need to interactively find a short video clip in a one-hour video file within a specific time limit.


The Video Browser Showdown will be a moderated “special demo session” of MMM 2012, where 24 KIS tasks (2×12, see below) need to be solved. For each task the moderator presents the target clip on a shared screen that is visible to all participants. The participants use their own equipment to perform an interactive search in the specified video file (taken from a common data set). The performance of participating tools will be evaluated in terms of successful answers and average search time.

The decision for the best-performing tool is based on two runs:

  • Expert run: the participants (developers of the tools) themselves act as searchers
  • Novice run: volunteers from the audience act as searchers (after a short training phase)

The overall best-performing tool will be awarded with the “Best Video Browser” certificate.

No Metadata

The videos to be used for the Video Browser Showdown will be provided without any metadata. However, participants are allowed to perform any content analysis that supports interactive browsing in the video (e.g., through novel content visualization, content clustering, or advanced seeker-bars etc.). The search process must be interactive, i.e., no text-queries are allowed.


Anyone who has a video browsing tool that allows to interactively browse, explore, or navigate in a single video file (search shouldn’t be based on automatic queries) may participate. Examples of tools of interest are: a video shot browser(e.g., temporal-based or concept-based) , a video player with extended navigation/interaction means, a video content exploration tool, or tools using advanced visualizations for improved navigation/interaction(“video surrogates”), etc. Also tools developed for interactive video search on mobile devices are of interest.

Data set and search tasks

The data set consists of about 12 one-hour video files (assembled from freely available video content) with the following characteristics: .mp4 file format, MPEG-4 video codec (H.264/AVC), AAC-LC audio codec, standard-definition resolution (except a few files with CIF resolution). In order to make the Video Browser Showdown a more challenging tasks each one-hour video file is assembled from diverse videos files instead of using a composed one-hour recording. These video files will be available on this website a few weeks before the conference. The KIS search tasks (each video file will have an expert search task and a novice search task; i.e., in total there will be 24 search tasks) will be presented on-site. The goal of one KIS search task is to find a preselected segment of interest (duration ranging from a few seconds up to 30 seconds) in a one-hour video file within a specified time limit (e.g., within 3 minutes) by interactive search. The segment of interest doesn’t necessarily start and stop at shot boundaries.

Example task/video


To apply for participation please submit a scientific paper (2-3 pages in Springer LNCS format) to one of the organizers via email until October 7, 2011. The submission should include a detailed description of the interaction with the video browsing tool and how it supports interactive search in video. Submissions will be peer-reviewed to ensure maximum quality. Accepted submissions will be published in the conference proceedings of MMM 2012.

Important dates

October 7, 2011 paper submission (2-3 pages, Springer LNCS)
October 12, 2011 notification of acceptance
October 19, 2011 camera-ready versions due
January 6, 2012 competition at MMM 2012

Number of participants

Due to organizational reasons the number of participants in the Video Browser Showdown is limited to 18 active participants but should have at least 6 active participants (otherwise it will be canceled).


Klaus Schoeffmann, Klagenfurt University, Austria,
Werner Bailer, JOANNEUM RESEARCH, Austria,
Cees Snoek, University of Amsterdam, Netherlands,

Frequently Asked Questions (FAQ)

  • Q: I noticed that “search shouldn’t be based on automatic queries”. What about a video browser that lets the user select a region/object of interest from a keyframe and perform a query inside the video in order to retrieve shots containing regions/objects similar with the one selected by the user.
    A: That’s absolutely ok and would be definitely of interest for the Video Browser Showdown!
  • Q: It is said that no text queries are allowed in this task. Does it mean users can only use mouse to operate on the interface to find the right video, the keyboard input is not allowed?
    A: You may use the keyboard as long as you don’t enter a query. E.g., you could use the keyboard for navigation in the video!
  • Q: How about the ASR and OCR text information? Is it prohibited to use ASR and OCR to process the dataset in advance?
    A: No it is not prohibited to use ASR or OCR but you are not allowed to let the user enter a text query that performs a text-search based on ASR or OCR. But you may use information from ASR and OCR for clustering or structuring the content of the video (i.e., to facilitate navigation/browsing in the video).
  • Q: Automatic text queries are not allowed, however, is it allowed to display a clickable tag-cloud for navigation?
    A: Yes, that would be ok!
  • Q: When will the videos be available?
    A: The videos should be available in November 2011.


Informations for Participants

The Video Browser Showdown will use a server for evaluation and visualization of the interactive video search tasks. This server is used to collect the results from the participants and show their score on a shared screen.

The server uses a very simple HTTP-based protocol, which every participating system needs to support. To submit a result a client needs to send an HTTP GET request with the following URI:


  • team…your team number (see below)
  • video…the name (without extension) of the current video. Videos will be named 1.mp4, 2.mp4, 3.mp4, etc. For every task the corresponding video will be mentioned by the moderator.
  • segstart…frame number of the first frame in the found segment
  • segstop…frame number of the last frame in the found segment

The server will inspect only the URI of the request and check whether the format of the URI is correct (valid numbers for team, video, start/stop frames; the order of parameters must be like specified above). If there is any problem with the format, the server will answer with an error (HTTP/1.1 400 WRONG FORMAT). If the format is ok, it will answer with success (HTTP/1.1 200 OK) and return your submitted values as well as the server-based solve-time (in ms) in the following format:

Please note, that you actually don’t need to inspect these values, these are for information purpose only. Furthermore, please implement your system in a way that you can easily change the server address and the port, the data will be provided at the contest.

For your convenience you can already download the server for testing purpose. Start the server with (requires JDK 6, at least):
java -jar vbsserver.jar

For testing purpose you can use a web-browser or wget, for example:
wget –O- http://localhost:8080/team=1/video=1/segstart=70500/segstop=71000

Team numbers:
1 Del Fabro, Böszörmenyi
2 Yuan et al.
3 Scott et al.
4 Bursuc et al.
5 Ventura et al.
6 Taschwer
7 Bailer et al.
8 Schoeffmann et al.


For every task each participant can get 100 points at maximum. The concrete number of points depend on the required task solve-time as well as the number of wrong answers:

  • For every task the 100 points will linearly decrease to 50 points until the maximum task solve-time is reached.
  • When a team submits the correct answer, the time-dependent points will be divided by 2^(w), where w is the number of wrong submissions of the team for the current task. For example, if the maximum task solve-time is 3 minutes and the team submits the correct answer after 90 seconds but submitted 2 wrong answers before, the team will get 18.75 points.

More Details About the Server

  • A submitted result is considered correct if it is located within the ground truth segment, whereas an overlapping of 5 secs (at the beginning and the end) is accepted (i.e., submitted startframe >= GTstartframe – 125 && submitted stopframe <= GTstopframe + 125).
  • A submitted segment with a length of 1 frame is ok.
  • After a correct submission for a task, no more submissions from this team and this task will be accepted by the server.
  • Please use the server (jar file) from the specified FTP space for testing purpose. There you will also find a “dryrun” directory that contains a query video and a ground truth file (00.txt) for the server. Please note that for the competition no ground truth files will be provided to the participants.