27–29 de mayo de 2025 Ciencias Naturales, Exactas y Ténicas
Facultad de Matemática y Computación
America/Havana zona horaria

The Accuracy-Efficiency Frontier of Large Language Models and Inference Strategies for Healthcare QA

No programado
20m
Facultad de Matemática y Computación

Facultad de Matemática y Computación

Facultad de Matemática y Computación, Universidad de La Habana, San Lázaro y L, Vedado, Plaza de la Revolución, La Habana, Cuba

Ponente

Daniel Alejandro Valdés Pérez (Universidad de La Habana)

Descripción

Large Language Models (LLMs) have emerged as powerful tools for medical question-answering, capable of assisting in clinical decision-making by processing and synthesising vast amounts of medical knowledge. However, deploying LLMs in healthcare requires balancing accuracy, computational efficiency, and cost. This study investigates different inference strategies—single-call, ensemble, and episodic chain-of-thought (ECoT)—to evaluate their impact on medical reasoning performance. We analyse the trade-offs inherent in various models through extensive benchmarking on the MedQA USMLE-H dataset, including GPT-4, Claude 3.5, LLama 3, Mixtral, Gemini, and GPT-3.5. Our results demonstrate that GPT-4 and Claude 3.5 achieve the highest accuracy but incur substantial computational costs. LLama 3 70B presents a cost-effective alternative with competitive accuracy, while Mixtral and Gemini offer moderate performance at lower costs. ECoT reasoning improves accuracy for some models without introducing computational overhead in our preliminary research, requiring further optimisation. These findings provide insights into selecting optimal inference strategies for deploying LLMs in medical applications.

Autor primario

Daniel Alejandro Valdés Pérez (Universidad de La Habana)

Coautores

Alejandro Piad Morffis (Universidad de La Habana) Dr. Juan Pablo Consuegra Ayala (Universidad de La Habana)

Materiales de la presentación

Todavía no hay materiales.