StableToolBench: Enhancing Stability in Large-Scale Benchmarking for Tool Learning of Language Models
StableToolBench introduces a virtual API server and stable evaluation system to address instability issues in tool learning benchmarks, enhancing model performance evaluations.