Distributed Inference Acceleration with Adaptive DNN Partitioning and Offloading

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorMohammed, Thahaen_US
dc.contributor.authorJoe-Wong, Carleeen_US
dc.contributor.authorBabbar, Rohiten_US
dc.contributor.authorFrancesco, Mario Dien_US
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.groupauthorProfessorship Di Francesco Marioen
dc.contributor.groupauthorProfessorship Babbar Rohiten
dc.contributor.organizationCarnegie Mellon Universityen_US
dc.date.accessioned2020-10-02T06:22:33Z
dc.date.available2020-10-02T06:22:33Z
dc.date.issued2020-07en_US
dc.description.abstractDeep neural networks (DNN) are the de-facto solution behind many intelligent applications of today, ranging from machine translation to autonomous driving. DNNs are accurate but resource-intensive, especially for embedded devices such as mobile phones and smart objects in the Internet of Things. To overcome the related resource constraints, DNN inference is generally offloaded to the edge or to the cloud. This is accomplished by partitioning the DNN and distributing computations at the two different ends. However, most of existing solutions simply split the DNN into two parts, one running locally or at the edge, and the other one in the cloud. In contrast, this article proposes a technique to divide a DNN in multiple partitions that can be processed locally by end devices or offloaded to one or multiple powerful nodes, such as in fog networks. The proposed scheme includes both an adaptive DNN partitioning scheme and a distributed algorithm to offload computations based on a matching game approach. Results obtained by using a self-driving car dataset and several DNN benchmarks show that the proposed solution significantly reduces the total latency for DNN inference compared to other distributed approaches and is 2.6 to 4.2 times faster than the state of the art.en
dc.description.versionPeer revieweden
dc.format.extent10
dc.format.extent854-863
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationMohammed, T, Joe-Wong, C, Babbar, R & Francesco, M D 2020, Distributed Inference Acceleration with Adaptive DNN Partitioning and Offloading . in INFOCOM 2020 - IEEE Conference on Computer Communications ., 9155237, Proceedings - IEEE INFOCOM, vol. 2020-July, IEEE, pp. 854-863, IEEE Conference on Computer Communications, Toronto, Canada, 06/07/2020 . https://doi.org/10.1109/INFOCOM41043.2020.9155237en
dc.identifier.doi10.1109/INFOCOM41043.2020.9155237en_US
dc.identifier.isbn9781728164120
dc.identifier.issn0743-166X
dc.identifier.otherPURE UUID: 2b85c288-46a4-47c4-afce-2a5ac88f3692en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/2b85c288-46a4-47c4-afce-2a5ac88f3692en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85090292658&partnerID=8YFLogxKen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/51832171/Mohammed_Distributed.Final_manuscript.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/46771
dc.identifier.urnURN:NBN:fi:aalto-202010025736
dc.language.isoenen
dc.publisherIEEE
dc.relation.ispartofIEEE Conference on Computer Communicationsen
dc.relation.ispartofseriesINFOCOM 2020 - IEEE Conference on Computer Communicationsen
dc.relation.ispartofseriesProceedings - IEEE INFOCOMen
dc.relation.ispartofseriesVolume 2020-Julyen
dc.rightsopenAccessen
dc.subject.keyworddistributed algorithmen_US
dc.subject.keywordDNN inferenceen_US
dc.subject.keywordmatching gameen_US
dc.subject.keywordtask offloadingen_US
dc.subject.keywordtask partitioningen_US
dc.titleDistributed Inference Acceleration with Adaptive DNN Partitioning and Offloadingen
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionacceptedVersion
Files