MedCalc-Bench: Evaluating Large Language Models for Medical Calculations
MedCalc-Bench: Evaluating Large Language Models for Medical Calculations
As opposed to evaluating computation and logic-based reasoning, current bench2 marks for evaluating large language models (LLMs) in medicine are primarily focused on question-answering involving domain knowledge and descriptive rea4 soning. While such qualitative capabilities are vital to medical diagnosis, in real5 world scenarios, doctors frequently use clinical calculators that …