Analysing Redundancies in World-Wide-Web
Loading...
URL
Journal Title
Journal ISSN
Volume Title
Sähkötekniikan korkeakoulu |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2014-06-16
Department
Major/Subject
Networking Technology
Mcode
S3029
Degree programme
TLT - Master’s Programme in Communications Engineering
Language
en
Pages
60 + 9
Series
Abstract
The World Wide Web is one of the most relevant Internet applications, and it is an important tool for our daily lives. Although it is widely extended, the current web access is still limited by two factors; the poor infrastructure in developing countries and the increasing bandwidth demand for services such as cloud computing or video streaming. Web caches have become a feasible solution to improve web access since improve on network infrastructures is very expensive. Multiple studies in past years aimed to characterize web traffic in order to improve web caching. However, the WWW evolves very fast and previous studies about it are no longer reliable. Moreover, many of the studies are based on passive measurements by collecting traces at the edge of an organization. As a result, we miss little knowledge on current web traffic. This thesis attempts to study present web traffic and how caching systems can benefit from it. We have developed an active measurement system that downloads popular web pages during a short period of time. We analyse this data set from two different points of view: compare old published web traffic and examine dynamic changes of web content. Finally, we investigate the unchanged content of this data set using both caching approaches traditional web caching and packet caching. Among our findings, we observe similar bandwidth saving for both approaches as well as an increasing number of objects per page.Description
Supervisor
Ott, JörgThesis advisor
Sarolahti, PasiKeywords
caching, web site, traffic, bandwidth, web page, WWW, unchanged bytes