Video segmentation is the most fundamental process for appropriate
index- ing and retrieval of video intervals. In general, video streams
are composed 1 of shots delimited by physical shot boundaries.
Substantial work has been done on how to detect such shot boundaries
automatically (Arman et aI., 1993) (Zhang et aI., 1993) (Zhang et aI.,
1995) (Kobla et aI., 1997). Through the inte- gration of technologies
such as image processing, speech/character recognition and natural
language understanding, keywords can be extracted and associated with
these shots for indexing (Wactlar et aI., 1996). A single shot, however,
rarely carries enough amount of information to be meaningful by itself.
Usu- ally, it is a semantically meaningful interval that most users are
interested in re- trieving. Generally, such meaningful intervals span
several consecutive shots. There hardly exists any efficient and
reliable technique, either automatic or manual, to identify all
semantically meaningful intervals within a video stream. Works by (Smith
and Davenport, 1992) (Oomoto and Tanaka, 1993) (Weiss et aI., 1995)
(Hjelsvold et aI., 1996) suggest manually defining all such inter- vals
in the database in advance. However, even an hour long video may have an
indefinite number of meaningful intervals. Moreover, video data is
multi- interpretative. Therefore, given a query, what is a meaningful
interval to an annotator may not be meaningful to the user who issues
the query. In practice, manual indexing of meaningful intervals is
labour intensive and inadequate.