Proposing a novel caption-matching method for cross-domain image retrieval using multimodal language-vision architectures.