2nd Bandwidth Estimation Challenge at ACM MMSys 2024

Region: Global

Offline Reinforcement Learning for Bandwidth Estimation in Real Time Communications

Video conferencing systems have recently emerged as indispensable tools to sustain global business operations and enable accessible education by revolutionizing the way people connect, collaborate, and communicate despite physical barriers and geographical divides. The quality of experience (QoE) delivered by these systems to the end user depends on bandwidth estimation, which is the problem of estimating the variable capacity of the bottleneck link between the sender and the receiver over time. In real time communication systems (RTC), the bandwidth estimate serves as a target bit rate for the audio/video encoder, controlling the send rate from the client. Overestimating the capacity of the bottleneck link causes network congestion as the client sends data at a rate higher than what the network can handle. Network congestion is characterized by increased delays in packet delivery, jitter, and potential packet losses. In terms of user’s experience, users will typically experience many resolution switches, frequent video freezes, garbled speech, and audio/video desynchronization, to name a few. Underestimating the available bandwidth on the other hand causes the client to encode and transmit the audio/video streams in a lower rate signal than what the network can handle, which leads to underutilization and degraded QoE. Estimating the available bandwidth accurately is therefore critical to providing the best possible QoE to users in RTC systems. Nonetheless, bandwidth estimation is faced with a multitude of challenges such as dynamic network paths between senders and receivers with fluctuating traffic loads, existence of diverse wired and wireless access network technologies with distinct characteristics, existence of different transmission protocols fighting for bandwidth to carry side and cross traffic, and partial observability of the network as only local packet statistics are available at the client side to base the estimate on.

To improve QoE for users in RTC systems, the ACM MMSys 2024 grand challenge focuses on the development of a deep learning-based bandwidth estimator using offline reinforcement learning (RL) techniques. A real-world dataset of observed network dynamics with objective metrics that reflect user-perceived audio/video quality in Microsoft Teams is released to train the deep RL policies for bandwidth estimation.

Please NOTE that the intellectual property (IP) is not transferred to the challenge organizers, i.e., participants remain the owners of their code (when the code is made publicly available, an appropriate license should be added).

Challenge task

Offline RL is a variant of RL where the agent learns from a fixed dataset of previously collected experiences, without interacting with the environment during training. In offline RL, the goal is to learn a policy that maximizes the expected cumulative reward based on the dataset. Offline RL is different from online RL where the agent can interact with the environment using its updated policy and learn from the feedback it receives online.

In this challenge, participants are provided with a dataset of real-world trajectories for Microsoft Teams audio/video calls. Each trajectory corresponds to the sequence of high-dimensional observation vector (o_n) computed based on packet information received by the client in one audio/video call, along with the bandwidth estimates (b_n) predicted by different estimators (behavior policies). Objective signals which capture the user-perceived audio/video quality during the call are provided. These objective signals are predicted by ML models whose predictions have high correlation with subjective audio and video quality scores as determined by ITU-T’s P.808 and P.910, respectively.

The goal of the challenge is to improve QoE for RTC system users as measured by objective audio/video quality scores by developing a deep learning-based policy model (receiver-side bandwidth estimator, π) with offline RL techniques, such as conservative Q-learning, inverse reinforcement learning, and constrained policy optimization, to name a few. To this end, participants are free to specify an appropriate reward function based on the provided dataset of observed network dynamics and objective metrics, the model architecture, and the training algorithm, given that the developed model adheres to the below requirements.

Challenge requirements

Failing to adhere to challenge rules will lead to disqualification from the challenge.

The policy model (π) can be a state-less or a stateful model that outputs the bandwidth estimate (b_n) in bits per second (bps). The input to a stateless model is the observation vector (o_n), hence, π_stateless : o_n → b_n. On the other hand, the inputs to a stateful model are the observation vector (o_n), as well as hidden (h_n-1) and cell (c_n-1) states which are representations learned by the model to capture the underlying structure and temporal dependencies in the sequence of observation vectors, hence, π_stateful: o_n, h_n-1, c_n-1 → b_n. Please refer to the Tensorflow model class (opens in new tab) or Pytorch_model_class (opens in new tab) in the repository which shows the required inputs and outputs. Any policy model that does not adhere to this input/output signature will be disqualified from the competition.
Feature transformation and/or feature selection should be performed in a processing block within the model. For instance, the first layer (l₀) of the model can map the observation vector (o_n) to a desired agent state (s_n), l₀: o_n → s_n.
Participants can specify an appropriate action space, e.g. a_n ~ [0,1], however, the transformation from the action space to the bps space should be performed by the last layer (l_N) of the model such that the model predicts the bandwidth estimates in bps, l_N: a_n → b_n.
Participants can specify an appropriate reward function for training the RL agent based on the provided signals: audio quality signal, video quality signal, and network metrics in the observation vector.
To reduce the hardware requirements when the policy model is used for inference at the client side of the video conferencing system, the model size must be smaller than 10 MB and inference latency should be no more than 5ms on an Intel Core i5 Quadcore clocked at 2.4 GHz using a single thread.
In offline RL it is typical to use an actor-critic architecture. As long as the inputs to the actor/policy model adhere to the aforementioned requirements, any set of features can be used as inputs for the critic.
Participants can train the model using PyTorch or TensorFlow, and the model should be exported to ONNX (opens in new tab). To ensure that organizers can run the model correctly, participants are required to share a small subset of their validation data along with their model outputs to be used for verification. We provide sample scripts to convert PyTorch (opens in new tab) and TF models (opens in new tab) in the repository. We have also released a baseline stateless model (MLP) (opens in new tab) as a reference, with an example script (opens in new tab) to run this model.
Participants should submit their training code to the Open-source Software and Datasets track of the conference to receive a reproducibility badge.

Evaluation criteria and methodology

Evaluation of submitted models will be conducted in a 2-stage evaluation process:

In the first stage, all the submitted models that adhere to the challenge requirements will be evaluated in our emulation platform on a set of validation network traces with multiple repetitions per network trace. Evaluating the submitted models in our emulation platform, which is a controlled lab environment of connected nodes with software to emulate different network links, enables us to estimate performance statistics and establish statistical significance between models.
The top 3 models from stage 1 will be evaluated in calls conducted between nodes in our geographically distributed testbed. Each model will be used to make several calls, and performance will be reported based on those calls. The final ranking of those three models will be determined based on the results of the second evaluation stage.

In either evaluation stage, the following scoring function will be used to rank the models:

E_calls[E_n[objective video quality + objective audio quality]]

The outer expectation is across all calls and across repetitions per call, while the inner expectation is the temporal average of the objective metrics throughout a call.

Registration procedure

There are two steps in registering for the challenge:

Participants are required to fill this form (opens in new tab) with the list of all the participants, affiliation of each participant (include country name), contact information for participants and name of your team. A confirmation email will be sent once we receive your registration information.
Participants need to register on the Open-source Software and Datasets submission system (opens in new tab) and follow the submission guidelines (opens in new tab) of the ODS track when submitting their work.

Organizers will communicate updates regarding the grand challenge and announce the availability of data, and evaluation results etc. via email. Challenge leaderboard for each evaluation stage will be posted on the challenge website.

Awards

If accepted, paper submissions to the Open-source Software and Datasets will be included in the ACM MMSys 2024 conference proceedings, and code submissions will be given the appropriate reproducibility badge. Authors of accepted papers will also have a chance to present their work during the conference. Moreover, the winner and the runner-up, based on the results of the second evaluation stage, will be awarded cash prizes as described in the rules section.

Contact us

Participants with queries related to this grand challenge can either contact Sami Khairy by email or create an issue on the Github repository.