Cinematographic shot classification assigns a category to each shot either on the basis of the field size or on the movement performed by the camera. In this work, we focus on the camera field of view, which is determined by the portion of the subject and of the environment shown in the field of view of the camera. The automation of this task can help freelancers and studios belonging to the visual creative field in their daily activities. In our study, we took into account eight classes of film shots: long shot, medium shot, full figure, american shot, half figure, half torso, close up and extreme close up. The cinematographic shot classification is a complex task, so we combined state-of-the-art techniques to deal with it. Specifically, we finetuned three separated VGG-16 models and combined their predictions in order to obtain better performances by exploiting the stacking learning technique. Experimental results demonstrate the effectiveness of the proposed approach in performing the classification task with good accuracy. Our method was able to achieve 77% accuracy without relying on data augmentation techniques. We also evaluated our approach in terms of f1 score, precision, and recall and we showed confusion matrices to show that most of our misclassified samples belonged to a neighboring class.
Cinematographic Shot Classification with Deep Ensemble Learning / Vacchetti, B.; Cerquitelli, T.. - In: ELECTRONICS. - ISSN 2079-9292. - 11:10(2022), p. 1570. [10.3390/electronics11101570]
Cinematographic Shot Classification with Deep Ensemble Learning
Vacchetti B.;Cerquitelli T.
2022
Abstract
Cinematographic shot classification assigns a category to each shot either on the basis of the field size or on the movement performed by the camera. In this work, we focus on the camera field of view, which is determined by the portion of the subject and of the environment shown in the field of view of the camera. The automation of this task can help freelancers and studios belonging to the visual creative field in their daily activities. In our study, we took into account eight classes of film shots: long shot, medium shot, full figure, american shot, half figure, half torso, close up and extreme close up. The cinematographic shot classification is a complex task, so we combined state-of-the-art techniques to deal with it. Specifically, we finetuned three separated VGG-16 models and combined their predictions in order to obtain better performances by exploiting the stacking learning technique. Experimental results demonstrate the effectiveness of the proposed approach in performing the classification task with good accuracy. Our method was able to achieve 77% accuracy without relying on data augmentation techniques. We also evaluated our approach in terms of f1 score, precision, and recall and we showed confusion matrices to show that most of our misclassified samples belonged to a neighboring class.File | Dimensione | Formato | |
---|---|---|---|
Cinematographic_Shot_Classification_with_Deep_Ensemble_Learning__1_.pdf
accesso aperto
Descrizione: pdf del paper
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Creative commons
Dimensione
2.74 MB
Formato
Adobe PDF
|
2.74 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2966707