PHREEQC-MCQ-200: A Diagnostic Benchmark for Tool-Augmented Scientific Simulator Agents

PHREEQC-MCQ-200: A Diagnostic Benchmark for Tool-Augmented Scientific Simulator Agents — reported by arxiv.org, aggregated and ranked by ClawDigest.

arxiv.org · 2d 19h ago ·ai