Networked Exponential Families for Big Data over Networks
Access rights
openAccess
publishedVersion
URL
Journal Title
Journal ISSN
Volume Title
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Authors
Date
2020
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
13
Series
IEEE Access, Volume 8, pp. 202897-202909
Abstract
The data generated in many application domains can be modeled as big data over networks, i.e., massive collections of high-dimensional local datasets related via an intrinsic network structure. Machine learning for big data over networks must jointly leverage the information contained in the local datasets and their network structure. We propose networked exponential families as a novel probabilistic modeling framework for machine learning from big data over networks. We interpret the high-dimensional local datasets as the realizations of a random process distributed according to some exponential family. Networked exponential families allow us to jointly leverage the information contained in local datasets and their network structure in order to learn a tailored model for each local dataset. We formulate the task of learning the parameters of networked exponential families as a convex optimization problem. This optimization problem is an instance of the network Lasso and enforces a data-driven pooling (or clustering) of the local datasets according to their corresponding parameters for the exponential family. We derive an upper bound on the estimation error of network Lasso. This upper bound depends on the network structure and the information geometry of the node-wise exponential families. These insights provided by this bound can be used for determining how much data needs to be collected or observed to ensure network Lasso to be accurate. We also provide a scalable implementation of the network Lasso as a message-passing between adjacent local datasets. Such message passing is appealing for federated machine learning relying on edge computing. We finally note that the proposed method is also privacy-preserving because no raw data but only parameter (estimates) are shared among different nodes.Description
Keywords
Big data, federated learning, lasso, networks, privacy-preserving machine learning, statistical machine learning
Other note
Citation
Jung, A 2020, ' Networked Exponential Families for Big Data over Networks ', IEEE Access, vol. 8, 9239959, pp. 202897-202909 . https://doi.org/10.1109/ACCESS.2020.3033817