核心概念
Sentence transformers fine-tuned on general question-answering datasets demonstrate some zero-shot ability to associate subjective queries about hiking experiences with synthetically generated route descriptions, but performance is mixed and model-dependent.
摘要
The study investigates the extent to which sentence transformers, fine-tuned on general (non-geospatial) question-answering datasets, can understand vague, subjective, and complex quasi-geospatial concepts when performing asymmetric semantic search.
The authors:
- Use 496,723 user-generated hiking routes across Great Britain and generate textual descriptions for them based on various geospatial attributes.
- Employ five sentence transformer models (based on MiniLM, DistilBERT, and MPNet architectures) fine-tuned on MS MARCO and/or a compilation of question-answering datasets.
- Test the models with 20 queries resembling questions about hiking experiences, and analyze the relevance of ranked route descriptions.
The results are mixed:
- The models perform well on simple queries like "a walk by the seaside" or "an urban walk", associating them with routes having longer stretches along the coast or going through urban areas.
- For more complex queries targeting easier or harder hiking experiences, the models show varying degrees of success in ranking routes based on length, elevation gain, and steepness.
- Even models fine-tuned on the same dataset can disagree on which routes can be completed in under an hour.
- The models struggle to associate "long" and "very long" walks with higher kilometer values, and often rank shorter and flatter walks as more suitable for "someone seeking greater challenges".
The authors suggest future work should explore a more systematic approach to evaluating sentence transformers and other language models for geospatial understanding, focusing on model architecture, fine-tuning datasets, geospatial description generation, and evaluation methods.
统计
This is a 22 km walk that begins in Rampart Head, Cumberland and ends in Little Caldew, Cumberland. Total elevation gain is 222 metres, and elevation grade is 1.0.
About 6 percent of the walk is in a wooded area, about 9 percent of the walk goes through an urban area, about 7 percent of the walk is within green space, about 18 percent of the walk is along the coast, about twenty-seven percent of the walk is alongside a body of water.
This is a nineteen km walk that begins in Millbrook, Caerffili - Caerphilly and ends in Ynysfro Reservoirs, Casnewydd - Newport. Total elevation gain is seven hundred and twenty-four metres, and elevation grade is 3.7.
The walk is predominantly downhill. About seventeen percent of the walk is in a wooded area, about forty-five percent of the walk goes through an urban area, about eight percent of the walk is within green space, about 17 percent of the walk is alongside a body of water.
引用
"This is a circular, 12 km walk that begins and ends in Tarrant Gunville, Dorset. Total elevation gain is one hundred and ninety-one metres, and elevation grade is 1.6."
"This is a 6 km walk that begins in Pebbly Hill, Cotswold, Gloucestershire and ends in Stow-on-the-Wold, Cotswold, Gloucestershire. Total elevation gain is 215 metres, and elevation grade is 3.5. The walk is predominantly uphill."