A Prompting Framework for Natural Language Processing in the Medical Field

Thesis Project created by Anim Mondal
1y ago update
The increasing digitisation of healthcare through the use of technology and artificial intelligence has affected the medical field in a multitude of ways. This thesis aims to investigate whether GPT-SW3, a large language model for the Swedish language, is capable of responding to healthcare tasks accurately given prompts and context. 

To reach the goal, a framework was created. The framework consisted of general medical questions, an evaluation of medical reasoning, and conversations between a doctor and patient has been created to evaluate GPT-SW3's abilities in the respective areas. Each component has a ground truth which is used when evaluating the responses. Overall, the framework presented in this thesis is a good starting point for evaluating the knowledge of an LLM. Based on the findings of this thesis, it can be concluded that GPT-SW3, while showing potential, may not be universally effective for all medical tasks. This answers the original question of whether GPT-SW3 is useful for Swedish healthcare to some extent. The results indicate that there are instances where GPT-SW3 struggle to provide accurate responses or generate appropriate content for medical tasks, suggesting that their performance may be context-dependent. However, there are also situations where GPT-SW3 exhibit promising performance and generates relevant and informative responses. In these instances where GPT-SW3 managed to perform well, the success could be contributed to the fact that the questions are similar to what a portion of the training data contained. It is important to note that this thesis only evaluated the responses of one LLM on certain prompts with specific parameters. The findings of this thesis do not indicate that GPT-SW3 can, or should, be used in healthcare in its current form.

Based on the results, GPT-SW3 is capable of dealing with specific medical tasks and shows, in specific instances, signs of understanding. In more basic tasks, GPT-SW3 manages to provide adequate answers to some questions. In more advanced scenarios, such as conversation and reasoning, GPT-SW3 struggles to provide coherent answers that are reminiscent of the conversation that a human doctor would have. 
These findings highlight the need for careful evaluation and understanding of the capabilities and limitations of LLMs in the medical domain. While LLMs can be a valuable tool in certain medical applications, they can not be solely relied upon without human oversight in their current state. It is crucial to consider factors such as the quality and quantity of training data, model architecture, fine-tuning techniques, and task-specific requirements when considering utilising LLMs for medical tasks.

While there have been some great advancements in natural language processing, further work into a Swedish model will have to be conducted to create a model that is useful for healthcare. Whether the work is in fine-tuning the weights of the models or retraining the models with domain-specific data is left for subsequent works.  Further research is warranted to explore ways to improve the performance of LLMs, and GPT-SW3 specifically, for medical tasks. It is also necessary to create a more comprehensive and diverse set of tasks to ensure that the LLMs can effectively analyse and understand medical information. This may include fine-tuning domain-specific data and including more tasks in the framework. Additionally, ethical considerations, such as bias, fairness, and interpretability, should be thoroughly examined when using LLMs in medical applications to ensure responsible and ethical use.

In conclusion, LLMs offer significant potential for medical tasks, but their performance may vary depending on the task and context. Careful evaluation, validation, and utilisation of these models, in combination with human expertise, can lead to improved outcomes in the field of medicine. Future research and development efforts should continue to refine and optimise LLMs for medical tasks to unlock their full potential while considering ethical implications.

The published thesis can be found at https://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A1766791&dswid=-8693