Bonjour à tous, ce serveur est obsolète. Merci d'utiliser https://gitlab.univ-lr.fr pour vos futurs projets. Il est aussi conseillé de migrer vos anciens projets vers le nouveau serveur git de l'établissement.
Download zip Select Archive Format
Name Last Update history
File dir groundtruth Loading commit data...
File dir images Loading commit data...
File txt README.md Loading commit data...
File txt test.txt Loading commit data...
File txt train.txt Loading commit data...
File txt val.txt Loading commit data...

README.md

DCM dataset

This dataset is composed of 772 annotated images from 27 golden age comic books. We freely collected them from the free public domain collection of digitized comic books Digital Comics Museum. We selected one album per available publisher to get as many different styles as possible. We made ground-truth bounding boxes of all panels, all characters (body + faces), small or big, human-like or animal-like.

The two image lists from the original paper for the training set and the testing set will soon be also available in 'original_paper' folder.

Citation (view full-text)

@Article{jimaging4070089,
AUTHOR = {Nguyen, Nhu-Van and Rigaud, Christophe and Burie, Jean-Christophe},
TITLE = {Digital Comics Image Indexing Based on Deep Learning},
JOURNAL = {Journal of Imaging},
VOLUME = {4},
YEAR = {2018},
NUMBER = {7},
ARTICLE NUMBER = {89},
URL = {http://www.mdpi.com/2313-433X/4/7/89},
ISSN = {2313-433X},
DOI = {10.3390/jimaging4070089}
}

Ground-truth creation guidelines

During the annotation process, we have identified four different types of characters that we classified into: human-like, object-like, animal-like and extra (supporting role characters). Human-like are characters that look like humans, such as Robinhood or Batman. Object-like characters are the ones that are similar to objects such as Sponge Bob, Cars, etc. Animal-like could be Garfield, the Pink panther, etc. The extras are characters from any of the classes mentioned earlier but not easily distinguishable or in the shadow. Note that faces have been annotated (when visible) only for human-like class. An example of each class is given in the following image:

Alt text Examples of each annotated character class.

Panels, body and faces have been annotated with horizontal bounding boxes. Faces are defined as eyebrows, eyes, nose, mouth and chin and ears if visible, similar to other common datasets from the domain. See exemples in the folowing image: Alt text Examples of annotated bounding boxes for panel (blue), character (red) and face (yellow).

The community can easily extend this dataset because it is based on public domain American comic book images (please use Git request). The Python-based annotation tool can be provided on demand.

Annotation format

class_id x1 y1 x2 y2

Where class_id is one of classes = {"characters": 1, "face": 7, "panel": 8}

Note: in the current release, we combined all character types (human, object, animal ...) into one class "character".