2nd Bandwidth Estimation Challenge at ACM MMSys 2024

Region: Global

Offline Reinforcement Learning for Bandwidth Estimation in Real Time Communications

Dataset Description

We will release a dataset of trajectories for Microsoft Teams audio/video calls along with two objective signals that measure user-perceived audio/video quality in audio/video calls. The data is collected from audio/video peer-to-peer Microsoft Teams calls conducted between testbed nodes that are geographically distributed across many countries and continents. Nodes are connected to the internet through various Internet Service Providers (ISPs) using either wired or wireless connections. Calls have been conducted with different bandwidth estimators, i.e., behavior policies, including traditional methods such as Kalman-filtering-based estimators and WebRTC (Web Real Time Communications), as well as different ML policies.

Each trajectory corresponds to one audio/video call leg and consists of a sequence of:

150-dimensional observation vector (o_n): computed based on packet information received by the client in one audio/video call,
Bandwidth estimates (b_n): predicted by a behavior policy
Objective audio quality (r_n^audio): objective received audio quality on a scale of [0,5], with a score of 5 being the highest.
Objective video quality (r_n^video): objective received video quality on a scale of [0,5], with a score of 5 being the highest.

The observation vector at a time step n encapsulates observed network statistics that characterize the state of the bottleneck link between the sender and receiver over the 5 most recent short term monitor intervals (MI) of 60ms and the 5 most recent long-term MIs of 600ms. Specifically, the observation vector tracks 15 different network features over 5 short and 5 long term MIs (15 features x (5 short term MIs + 5 long term MIs) = 150). The 15 features and their description are as follows. Features are based on packets received during the short and long term monitor intervals.

Receiving rate: rate at which the client receives data from the sender during a MI, unit: bps.
Number of received packets: total number of packets received in a MI, unit: packet.
Received bytes: total number of bytes received in a MI, unit: Bytes.
Queuing delay: average delay of packets received in a MI minus the minimum packet delay observed so far, unit: ms.
Delay: average delay of packets received in a MI minus a fixed base delay of 200ms, unit: ms.
Minimum seen delay: minimum packet delay observed so far, unit: ms.
Delay ratio: average delay of packets received in a MI divided by the minimum delay of packets received in the same MI, unit: ms/ms.
Delay average minimum difference: average delay of packets received in a MI minus the minimum delay of packets received in the same MI, unit: ms.
Packet interarrival time: mean interarrival time of packets received in a MI, unit: ms.
Packet jitter: standard deviation of interarrival time of packets received in a MI, unit: ms.
Packet loss ratio: probability of packet loss in a MI, unit: packet/packet.
Average number of lost packets: average number of lost packets given a loss occurs, unit: packet.
Video packets probability: proportion of video packets in the packets received in a MI, unit: packet/packet.
Audio packets probability: proportion of audio packets in the packets received in a MI, unit: packet/packet.
Probing packets probability: proportion of probing packets in the packets received in a MI, unit: packet/packet.

The indices (zero-indexed) of features over the 5 short term MIs are {(feature number – 1)*10, ., ., ., feature number * 10 – 5 – 1}.

The indices (zero-indexed) of features over the 5 long term MIs are {feature number * 10 – 5, ., ., ., feature number * 10 – 1}.

Training and Validation Datasets

We are going to release data from 18859 calls conducted between testbed nodes to be used as training and validation data by the participants. Participants are free to split the data into train/validation sets as deemed necessary.

Emulated Dataset

We will also release a dataset from 9405 emulated test calls which contains ground truth information about the bottleneck link between the sender and receiver, namely, bottleneck capacity and loss rate, in addition to the aforementioned data. In this dataset, the characteristics of the bottleneck, namely ground truth capacity and loss rate, are randomly varied throughout the duration of the test call to generate a diverse set of trajectories with network dynamics that may not occur in the real world but are nevertheless important to enhance state-action space coverage and aid in learning generalizable policies.

Participants are free to use this dataset in conjunction with the real-world testbed dataset to train the policy, given that ground truth information is not provided as input to the model and the true environment remains partially observable (only through the observation vector). Any model which has ground truth information in the inputs will be disqualified from the contest.