On Improving QoE of Remote Rendered Graphics

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Science | Doctoral thesis (article-based) | Defence date: 2024-06-07
Degree programme
101 + app. 67
Aalto University publication series DOCTORAL THESES, 122/2024
A new class of interactive multimedia experiences leverages real-time remote rendering with video encoding to provide high quality visual experiences on low end devices, the so called thin-clients. The basic architecture entails off-loading some or all the rendering calculations of a complex computer graphics scene to a remote server, often a cloud graphics server, which renders the scene, encodes it and sends it to a client as video. The video is then decoded by the thin-client and displayed to a user. Cloud gaming and Cloud Virtual Reality (VR) are two example use cases of such experiences. These applications have two principal constraints: downstream bandwidth and motion to photon (M2P) latency. Quality of experience (QoE) of such applications can be improved by reducing the downstream bandwidth needed for a given visual quality of the encoded video and by reducing the perceived M2P latency; that is the perceived latency between user action and corresponding frame update at the client. In this thesis, we investigate avenues to improve QoE of remotely rendered graphics applications by addressing the above constraints. We evaluate the feasibility of leveraging the characteristics of the Human Visual System (HVS) to reduce the downstream bandwidth needed for streaming high quality graphics videos. Specifically, we investigate the phenomenon of foveation in the context of real time video encoding and evaluate different parameterizations and schemes of foveated video encoding (FVE). We also investigate whether synergies exist between FVE and foveated rendering (FR). To address the challenge of low latency requirements for interactive remotely rendered graphics applications, we investigate Machine Learning (ML) based approaches to predict human motion kinematics used to render a scene by a rendering engine. Specifically, we investigate head pose and gaze prediction using past pose and gaze data. Accurate head pose and gaze information are critical for field of view (FoV) rendering and foveated encoding or rendering respectively. The investigated approaches focus on light weight data ingest and low latency inference in order to preclude introduction of additional latency in the rendering and media delivery pipeline.
Supervising professor
Ylä-Jääski, Antti, Prof., Aalto University, Department of Computer Science, Finland
Thesis advisor
Siekkinen, Matti, DSc. (Tech.), Aalto University, Department of Computer Science, Finland
remote rendering, foveation, gaze prediction, pose prediction
Other note
  • [Publication 1]: Gazi Karam Illahi, Thomas Van Gemert, Matti Siekkinen, Enrico Masala, Antti Oulasvirta and Antti Yla-Jaaski. Cloud Gaming with Foveated Video Encoding. ACM Transactions on Multimedia Computing, Communications and Applications, Volume 16, Issue 1, February 2020.
    DOI: 10.1145/3369110 View at publisher
  • [Publication 2]: Gazi Karam Illahi, Matti Siekkinen, Teemu Kamarainen and Antti Yla-Jaaski. Foveated streaming of real-time graphics. Proceedings of the 12th ACM Multimedia Systems Conference (MMSys’21), Istanbul, Pages 214-226, June 2021.
    DOI: 10.1145/3458305.3463383 View at publisher
  • [Publication 3]: Gazi Karam Illahi, Ashutosh Vaishnav, Teemu Kamarainen, Matti Siekkinen, Mario de Francesco and Antti Yla-Jaaski. Learning to Predict Head Pose in Remotely Rendered Virtual Reality. Proceedings of the 14th ACM Multimedia Systems Conference (MMSys ’23), Vancour, June 2023.
    DOI: 10.1145/3587819.3590972 View at publisher
  • [Publication 4]: Gazi Karam Illahi, Matti Siekkinen, Teemu Kamarainen and Antti Yla-Jaaski. Real-time gaze prediction in virtual reality. Proceedings of the 14th International Workshop on Immersive Mixed and Virtual Environment Systems (MMVE ’22), June 2022.