Distributed Inference Acceleration with Adaptive DNN Partitioning and Offloading
dc.contributor | Aalto-yliopisto | fi |
dc.contributor | Aalto University | en |
dc.contributor.author | Mohammed, Thaha | en_US |
dc.contributor.author | Joe-Wong, Carlee | en_US |
dc.contributor.author | Babbar, Rohit | en_US |
dc.contributor.author | Francesco, Mario Di | en_US |
dc.contributor.department | Department of Computer Science | en |
dc.contributor.groupauthor | Professorship Di Francesco Mario | en |
dc.contributor.groupauthor | Professorship Babbar Rohit | en |
dc.contributor.organization | Carnegie Mellon University | en_US |
dc.date.accessioned | 2020-10-02T06:22:33Z | |
dc.date.available | 2020-10-02T06:22:33Z | |
dc.date.issued | 2020-07 | en_US |
dc.description.abstract | Deep neural networks (DNN) are the de-facto solution behind many intelligent applications of today, ranging from machine translation to autonomous driving. DNNs are accurate but resource-intensive, especially for embedded devices such as mobile phones and smart objects in the Internet of Things. To overcome the related resource constraints, DNN inference is generally offloaded to the edge or to the cloud. This is accomplished by partitioning the DNN and distributing computations at the two different ends. However, most of existing solutions simply split the DNN into two parts, one running locally or at the edge, and the other one in the cloud. In contrast, this article proposes a technique to divide a DNN in multiple partitions that can be processed locally by end devices or offloaded to one or multiple powerful nodes, such as in fog networks. The proposed scheme includes both an adaptive DNN partitioning scheme and a distributed algorithm to offload computations based on a matching game approach. Results obtained by using a self-driving car dataset and several DNN benchmarks show that the proposed solution significantly reduces the total latency for DNN inference compared to other distributed approaches and is 2.6 to 4.2 times faster than the state of the art. | en |
dc.description.version | Peer reviewed | en |
dc.format.extent | 10 | |
dc.format.extent | 854-863 | |
dc.format.mimetype | application/pdf | en_US |
dc.identifier.citation | Mohammed, T, Joe-Wong, C, Babbar, R & Francesco, M D 2020, Distributed Inference Acceleration with Adaptive DNN Partitioning and Offloading . in INFOCOM 2020 - IEEE Conference on Computer Communications ., 9155237, Proceedings - IEEE INFOCOM, vol. 2020-July, IEEE, pp. 854-863, IEEE Conference on Computer Communications, Toronto, Canada, 06/07/2020 . https://doi.org/10.1109/INFOCOM41043.2020.9155237 | en |
dc.identifier.doi | 10.1109/INFOCOM41043.2020.9155237 | en_US |
dc.identifier.isbn | 9781728164120 | |
dc.identifier.issn | 0743-166X | |
dc.identifier.other | PURE UUID: 2b85c288-46a4-47c4-afce-2a5ac88f3692 | en_US |
dc.identifier.other | PURE ITEMURL: https://research.aalto.fi/en/publications/2b85c288-46a4-47c4-afce-2a5ac88f3692 | en_US |
dc.identifier.other | PURE LINK: http://www.scopus.com/inward/record.url?scp=85090292658&partnerID=8YFLogxK | en_US |
dc.identifier.other | PURE FILEURL: https://research.aalto.fi/files/51832171/Mohammed_Distributed.Final_manuscript.pdf | en_US |
dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/46771 | |
dc.identifier.urn | URN:NBN:fi:aalto-202010025736 | |
dc.language.iso | en | en |
dc.publisher | IEEE | |
dc.relation.ispartof | IEEE Conference on Computer Communications | en |
dc.relation.ispartofseries | INFOCOM 2020 - IEEE Conference on Computer Communications | en |
dc.relation.ispartofseries | Proceedings - IEEE INFOCOM | en |
dc.relation.ispartofseries | Volume 2020-July | en |
dc.rights | openAccess | en |
dc.subject.keyword | distributed algorithm | en_US |
dc.subject.keyword | DNN inference | en_US |
dc.subject.keyword | matching game | en_US |
dc.subject.keyword | task offloading | en_US |
dc.subject.keyword | task partitioning | en_US |
dc.title | Distributed Inference Acceleration with Adaptive DNN Partitioning and Offloading | en |
dc.type | A4 Artikkeli konferenssijulkaisussa | fi |
dc.type.version | acceptedVersion |