Conceptos Básicos
Automated generation of out-of-context captions using conditional word tokens and visual input.
Resumen
The content discusses the increasing issue of misinformation due to cheapfakes in out-of-context media. It introduces a novel task of out-of-context caption generation to address limitations in existing methods. The proposed method improves over baseline image captioning by controlling semantics and context in generated captions. Detailed methodology, experiments, results, and ethical considerations are presented.
Directory:
- Introduction
- Misinformation challenges with cheapfakes.
- Novel task of out-of-context caption generation.
- Related Work
- Detection models for cheapfakes.
- Methodology
- Named Entity Recognition (NER).
- Feature Extraction & Detection Backbone.
- Relational Graph.
- Captioning Module.
- Implementation Details
- Training details using PyTorch and ADAM optimizer.
- Experiments
- Dataset used: COSMOS dataset.
- Train & Val Split for evaluation.
- Metrics: BLEU-4, CiDEr, ROUGE, METEOR.
- Human Evaluation
- Study on human perception of model-generated multimedia examples.
- Conclusion & Ethical Considerations
Estadísticas
Our method improves over the image captioning baseline by 6.2% BLUE-4, 2.96% CiDEr, 11.5% ROUGE, and 7.3% METEOR.
Citas
"Our model is only designed for out-of-context captioning task."
"Byte-pair encoding generates more meaningful captions compared to Glove and fastText."
"The relational graph allows better capture of object relationships in the generated captions."