Distributed and scalable parsing solution for telecom network data

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.advisor Marchal, Samuel
dc.contributor.advisor Vaje, Toivo
dc.contributor.author Khan, Muhammad
dc.date.accessioned 2020-01-26T18:12:16Z
dc.date.available 2020-01-26T18:12:16Z
dc.date.issued 2020-01-20
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/42779
dc.description.abstract The growing usage of mobile devices and the introduction of 5G networks have increased the significance of network data for the telecom business. The success of telecom organizations can depend on employing efficient data engineering techniques for transforming raw network data into useful information by analytics and machine learning (ML). Elisa Oyj., a Finnish telecommunications company, receives massive amounts of network data from network equipment manufactured by various vendors. The effectiveness of data analytics depends on efficient data engineering processes. This thesis presents a scalable data parsing solution that leverages Spark, a distributed programming framework, for parallelizing parsing routines from an existing parsing solution. We design and deploy this solution as a component of the organization's data engineering pipeline to enable automation of data-centric operations. Experimental results indicate that the efficiency of the proposed solution is heavily dependent on the individual file size distribution. The proposed parsing solution demonstrates reliability, scalability, and speed during empirical evaluation and processes a 24-hour network data within 3 hours. The main outcome of the project is an optimized setup with the minimum number of data partitions to ensure zero failures and thus minimum execution time. A smaller execution time leads to lower costs of the continuously running infrastructure provisioned on the cloud. en
dc.format.extent 69 + 8
dc.format.mimetype application/pdf en
dc.language.iso en en
dc.title Distributed and scalable parsing solution for telecom network data en
dc.type G2 Pro gradu, diplomityö fi
dc.contributor.school Perustieteiden korkeakoulu fi
dc.subject.keyword distributed computing en
dc.subject.keyword data engineering en
dc.subject.keyword big data parsing en
dc.subject.keyword cloud computing en
dc.identifier.urn URN:NBN:fi:aalto-202001261889
dc.programme.major Analytics and Data Science fi
dc.programme.mcode SCI3073 fi
dc.type.ontasot Master's thesis en
dc.type.ontasot Diplomityö fi
dc.contributor.supervisor Aura, Tuomas
dc.programme Master’s Programme in Computer, Communication and Information Sciences fi
local.aalto.electroniconly yes
local.aalto.openaccess yes


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse