Multimodal Large Language Model-based Person Re-identification: Leveraging Common Instructions and Latent Image Features for Enhanced Performance
This paper proposes MLLMReID, a novel approach that leverages multimodal large language models (MLLM) for person re-identification (ReID) tasks. It introduces Common Instruction to simplify the instruction design process and avoid overfitting, and DirectReID to effectively utilize the latent image feature vectors output by the LLM, directly optimizing the visual encoder for improved person feature extraction.