Weighted Ensemble Distillation in Federated Learning with Non-IID Data
Abstract:
Federated distillation (FD) is a novel algorithmic idea for federated learning (FL) that allows clients to use heterogeneous model architectures. This is achieved by distilling aggregated local model predictions on an unlabeled auxiliary dataset into a global model. While standard FL algorithms are often based on averaging local parameter updates over multiple communication rounds, FD can be performed with only one communication round, giving favorable communication properties when local models are large and the auxiliary dataset is small. However, both FD and standard FL algorithms experience a significant performance loss when training data is not independently and identically distributed (non-IID) over the clients. This thesis investigates the use of weighting schemes to improve the performance with FD in non-IID scenarios. In particular, the sample-wise weighting scheme FedED-w2 is proposed, where client predictions on auxiliary data are weighted based on the similarity with local data. Data similarity is measured with the reconstruction loss on auxiliary samples when passed through an autoencoder (AE) model that is trained on local data. Image classification experiments with convolutional neural networks performed in this study show that FedED-w2 exceeds the test accuracy of FL baseline algorithms with up to 15 % on the MNIST and EMNIST datasets for varying degrees of non-IID data over 10 clients. The performance of FedED-w2 is lower than FL baselines on the CIFAR-10 dataset, where the experiments display up to 5 % lower test accuracy.