Utilizing a two-stage ultra-low-cost multimodal system to detect harmful comments and images with high precision and recall rates.