Efficient Scene Graph Generation by Extracting Relationships from Transformer Object Detectors
A lightweight one-stage scene graph generator, EGTR, that effectively extracts relationship information from the self-attention layers of a pre-trained object detector, eliminating the need for a separate triplet detector.