Methods for reducing the cost of running large ML models
Author: Anton Sandberg
University: Chalmers University of Technology
Summary & Context about of my research:
Large ML models, such as GPT-3.5 and DALL-E, have shown impressive performance on a wide range of natural language processing tasks as well as large classification tasks. However, training and deploying these models can be costly regarding computational resources, memory and energy consumption. Additionally, the long startup times of these models can make them difficult to use in real-time or on-demand applications.
These challenges have primarily been faced by large corporations and research institutions that have the resources to train and deploy these models. However, with the increasing availability and accessibility of machine learning tools and techniques, smaller organisations are also interested in utilizing the capabilities of these models.
This thesis aims to investigate methods for reducing the cost and increasing accessibility of running and making use of large-scale machine learning models. By focusing on techniques such as fast loading, pruning, proximity-based model selection, and partial sequence classification, this thesis aims to make large ML models more accessible to a wide range of organisations and applications. The thesis also aims to look at faster ways to load the model.