Networked Exponential Families for Big Data over Networks

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorJung, Alexander
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.groupauthorHelsinki Institute for Information Technology (HIIT)en
dc.contributor.groupauthorProfessorship Jung Alexanderen
dc.date.accessioned2020-12-31T08:40:53Z
dc.date.available2020-12-31T08:40:53Z
dc.date.issued2020
dc.description.abstractThe data generated in many application domains can be modeled as big data over networks, i.e., massive collections of high-dimensional local datasets related via an intrinsic network structure. Machine learning for big data over networks must jointly leverage the information contained in the local datasets and their network structure. We propose networked exponential families as a novel probabilistic modeling framework for machine learning from big data over networks. We interpret the high-dimensional local datasets as the realizations of a random process distributed according to some exponential family. Networked exponential families allow us to jointly leverage the information contained in local datasets and their network structure in order to learn a tailored model for each local dataset. We formulate the task of learning the parameters of networked exponential families as a convex optimization problem. This optimization problem is an instance of the network Lasso and enforces a data-driven pooling (or clustering) of the local datasets according to their corresponding parameters for the exponential family. We derive an upper bound on the estimation error of network Lasso. This upper bound depends on the network structure and the information geometry of the node-wise exponential families. These insights provided by this bound can be used for determining how much data needs to be collected or observed to ensure network Lasso to be accurate. We also provide a scalable implementation of the network Lasso as a message-passing between adjacent local datasets. Such message passing is appealing for federated machine learning relying on edge computing. We finally note that the proposed method is also privacy-preserving because no raw data but only parameter (estimates) are shared among different nodes.en
dc.description.versionPeer revieweden
dc.format.extent13
dc.format.mimetypeapplication/pdf
dc.identifier.citationJung, A 2020, 'Networked Exponential Families for Big Data over Networks', IEEE Access, vol. 8, 9239959, pp. 202897-202909. https://doi.org/10.1109/ACCESS.2020.3033817en
dc.identifier.doi10.1109/ACCESS.2020.3033817
dc.identifier.issn2169-3536
dc.identifier.otherPURE UUID: 48d2c03f-7fb0-4f5d-ac20-e547c9e4c921
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/48d2c03f-7fb0-4f5d-ac20-e547c9e4c921
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85096301446&partnerID=8YFLogxK
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/53652446/Jung_Networked.09239959.pdf
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/101478
dc.identifier.urnURN:NBN:fi:aalto-2020123160299
dc.language.isoenen
dc.publisherIEEE
dc.relation.ispartofseriesIEEE Accessen
dc.relation.ispartofseriesVolume 8, pp. 202897-202909en
dc.rightsopenAccessen
dc.subject.keywordBig data
dc.subject.keywordfederated learning
dc.subject.keywordlasso
dc.subject.keywordnetworks
dc.subject.keywordprivacy-preserving machine learning
dc.subject.keywordstatistical machine learning
dc.titleNetworked Exponential Families for Big Data over Networksen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
dc.type.versionpublishedVersion

Files