Abstract: Visual grounding for remote sensing images (RSVG) is a fundamental vision-language task, which aims to locate the objects referred to by the natural language expression from the RS images.
To address the challenges of densely distributed small objects, blurry features, and complex backgrounds in unmanned aerial vehicle (UAV) aerial images, as well as the inability of traditional ...