Distributed Assignment with Load Balancing for DNN Inference at the Edge

dc.contributorAalto Universityen
dc.contributor.authorXu, Yuzheen_US
dc.contributor.authorMohammed, Thahaen_US
dc.contributor.authorFrancesco, Mario Dien_US
dc.contributor.authorFischione, Carloen_US
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.groupauthorProfessorship Di Francesco Marioen
dc.contributor.groupauthorComputer Science Professorsen
dc.contributor.groupauthorComputer Science - Computing Systems (ComputingSystems)en
dc.contributor.organizationKTH Royal Institute of Technologyen_US
dc.description.abstractInference carried out on pre-trained deep neural networks (DNNs) is particularly effective as it does not require re-training and entails no loss in accuracy. Unfortunately, resource-constrained devices such as those in the Internet of Things may need to offload the related computation to more powerful servers, particularly, at the network edge. However, edge servers have limited resources compared to those in the cloud; therefore, inference offloading generally requires dividing the original DNN into different pieces that are then assigned to multiple edge servers. Related approaches in the state of the art either make strong assumptions on the system model or fail to provide strict performance guarantees. This article specifically addresses these limitations by applying distributed assignment to deep neural network inference at the edge. In particular, it devises a detailed model of DNN-based inference, suitable for realistic scenarios involving edge computing. Optimal inference offloading with load balancing is also defined as a multiple assignment problem that maximizes proportional fairness. Moreover, a distributed algorithm for DNN inference offloading is introduced to solve such a problem in polynomial time with strong optimality guarantees. Finally, extensive simulations employing different datasets and DNN architectures establish that the proposed solution significantly improves upon the state of the art in terms of inference time (1.14 to 2.62 times faster), load balance (with a Jain’s fairness index of 0.9), and convergence (one order of magnitude less iterations).en
dc.description.versionPeer revieweden
dc.identifier.citationXu, Y, Mohammed, T, Francesco, M D & Fischione, C 2023, ' Distributed Assignment with Load Balancing for DNN Inference at the Edge ', IEEE Internet of Things Journal, vol. 10, no. 2, 9882293, pp. 1053-1065 . https://doi.org/10.1109/JIOT.2022.3205410en
dc.identifier.otherPURE UUID: 55312a27-d1ae-4c57-96ca-c702cc246d1cen_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/55312a27-d1ae-4c57-96ca-c702cc246d1cen_US
dc.identifier.otherPURE LINK: https://ieeexplore.ieee.org/document/9882293/en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85137908524&partnerID=8YFLogxKen_US
dc.identifier.otherPURE LINK: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9882293en_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/87979190/distributed_assignment.pdfen_US
dc.relation.ispartofseriesIEEE Internet of Things Journalen
dc.relation.ispartofseriesarticlenumber 9882293en
dc.subject.keywordTask analysisen_US
dc.subject.keywordComputational modelingen_US
dc.subject.keywordInternet of Thingsen_US
dc.subject.keywordEdge computingen_US
dc.subject.keywordComputer architectureen_US
dc.titleDistributed Assignment with Load Balancing for DNN Inference at the Edgeen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi