{"id":1115967,"date":"2025-01-26T17:19:20","date_gmt":"2025-01-27T01:19:20","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-academic-program&#038;p=1115967"},"modified":"2025-07-01T00:55:35","modified_gmt":"2025-07-01T07:55:35","slug":"video-super-resolution-challenge-icme-2025","status":"publish","type":"msr-academic-program","link":"https:\/\/www.microsoft.com\/en-us\/research\/academic-program\/video-super-resolution-challenge-icme-2025\/","title":{"rendered":"ICME 2025 Grand Challenge on Video Super-Resolution for Video Conferencing"},"content":{"rendered":"\n\n<p><\/p>\n\n\n\n\n\n\n<p><strong>Super-Resolution<\/strong> (SR) is a pivotal challenge in the field of computer vision, aiming to reconstructing a high-resolution (HR) image from its low-resolution (LR) counterpart [1]. Over the past decade, numerous single image Super-Resolution challenges have been organized, leading to substantial advancements in the field. These include the Image Super-Resolution [2]\u2013[5] and Efficient Super-Resolution [6]\u2013[8] challenge series.<\/p>\n\n\n\n<p>The Video Super-Resolution (VSR) task extends SR to the temporal domain, aiming to reconstruct a high-resolution video from a low-resolution one. Models for VSR may build upon single image SR techniques, employing various temporal information propagation methods such as local propagation (sliding windows), uni- or bi-directional propagation to enhance quality [9]. Alternatively, traditional upscaling methods like bicubic interpolation can be used, followed by restoration models to improve perceptual quality [10], [11]. <\/p>\n\n\n\n<p>VSR has been a focus in challenges such as NTIRE 2019 [12], NTIRE 2021 [1], and AIM 2024, with the latest exploring efficient VSR [13]. These challenges have addressed various scenarios, including Clean LR [1], [12], LR with motion blur [12], and LR with frame drops [1]. The NTIRE 2021 quality enhancement challenge considered input video encoded with H.265 under a fixed quantization parameter (QP) or fixed bitrate [10] without upscaling. In the AIM 2024 challenge, LR was encoded with AV1 and targeted efficient SR [13]. <\/p>\n\n\n\n<p>In the VSR challenges the performance of the models is evaluated using objective metrics like PSNR [14], SSIM [15], and LPIPS [16]. However, it has been shown that PSNR, SSIM, and MS-SSIM do not correlate well with subjective opinions [17], [18] which can lead to misleading model rankings when human users are the target audience. Moreover, models trained on synthetic data often suffer from error propagation when processing videos with various distortions present in real-world recordings [19]. Some models address this issue by including de-noising as a pre-processing step or limiting the number of frames processed together [19]. However, our experiments indicate that these approaches can lead to other problems, such as unrealistic videos, flickering, or error propagation that appear in longer sequences (>200 frames).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"citation\">Citation<\/h2>\n\n\n\n<p>Please find the description of the challenge, methods and results in our publication:<br><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2506.12269\">ICME 2025 Grand Challenge on Video Super-Resolution for Video Conferencing<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, ICME 2025.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>@article{naderi2025icme,\n  title={ICME 2025 Grand Challenge on Video Super-Resolution for Video Conferencing},\n  author={Naderi, Babak and Cutler, Ross and Cho, Juhee and Khongbantabam, Nabakumar and Ivkovic, Dejan},\n  journal={arXiv preprint arXiv:2506.12269},\n  year={2025}\n}<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"registration-procedure\">Registration procedure<\/h2>\n\n\n\n<p><strong>Registration is open!<\/strong>&nbsp;To register for the challenge,\u202fparticipants are required to email the VSR Challenge organizers&nbsp;<a href=\"mailto:vsr_challenge@microsoft.com\" target=\"_blank\" rel=\"noreferrer noopener\">vsr_challenge@microsoft.com(opens in new tab)<\/a>&nbsp;with the name of their team members, emails, affiliations, team name, track(s) participating in, team captain, and tentative paper title. <s>Participants also need to register on the&nbsp;<strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/cmt3.research.microsoft.com\/VSRChallenge2025\" target=\"_blank\" rel=\"noopener noreferrer\">Challenge CMT<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong> site where they can submit the enhanced clips<\/s>. Registration data is captured and stored in the US.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"submission-instructions\">Submission instructions<\/h2>\n\n\n\n<p><s>Please use&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/cmt3.research.microsoft.com\/VSRChallenge2025\" target=\"_blank\" rel=\"noopener noreferrer\">Microsoft Conference Management Toolkit<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> for submitting the results. <\/s><strong>The test set will be posted 1 week before the challenge&#8217;s end date and only to the registered teams. <\/strong>The instruction will be  shared to the participating teams directly. <s>This instruction is tentative and may be updated before the release of the test set. After logging in, complete the following steps to submit the results<\/s><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><s>Choose \u201cCreate new submission\u201d in the Author Console.<\/s><\/li>\n\n\n\n<li><s>Enter the title, abstract, and co-authors, and upload a&nbsp;<em>lastname<\/em>.txt file (can be empty or contain additional information regarding the submission).<\/s><\/li>\n\n\n\n<li><s>Compress the enhanced results files to a single&nbsp;<em>lastname<\/em>.zip file, retaining the same folder and file names as the blind test set .<\/s><\/li>\n\n\n\n<li><s>After creating the submission, return to the \u201cAuthor Console\u201d (by clicking on \u201cSubmissions\u201d at the top of the page) and upload the&nbsp;<em>lastname<\/em>.zip file via \u201cUpload Supplementary Material\u201d.<\/s><\/li>\n<\/ol>\n\n\n\n<p><strong>Contact us:<\/strong>&nbsp;For questions, please contact&nbsp;<a href=\"mailto:vsr_challenge@microsoft.com\">vsr_challenge@microsoft.com<\/a><\/p>\n\n\n\n<p>The&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/cmt3.research.microsoft.com\/\">Microsoft CMT service<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>&nbsp;was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"references\">References<\/h2>\n\n\n\n<p>[1] Sanghyun Son, Suyoung Lee, Seungjun Nah, Radu Timofte, and Kyoung Mu Lee, \u201cNtire 2021 challenge on video super-resolution,\u201d in Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 166\u2013181.<br>[2] Zheng Chen, Zongwei Wu, Eduard Zamfir, Kai Zhang, Yulun Zhang, Radu Timofte, Xiaokang Yang, Hongyuan Yu, Cheng Wan, Yuxin Hong, et al., \u201cNTIRE 2024 Challenge on Image Super-Resolution (x4): Methods and Results,\u201d arXiv preprint arXiv:2404.09790, 2024.<br>[3] Yulun Zhang, Kai Zhang, Zheng Chen, Yawei Li, Radu Timofte, Junpei Zhang, Kexin Zhang, Rui Peng, Yanbiao Ma, Licheng Jia, et al., \u201cNTIRE 2023 Challenge on Image Super-Resolution (x4): Methods and results,\u201d in Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1865\u20131884.<br>[4] Marcos V Conde, Florin Vasluianu, and Radu Timofte, \u201cBsraw: Improving blind raw image super-resolution,\u201d in Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 8500\u20138510.<br>[5] Andreas Lugmayr, Martin Danelljan, and Radu Timofte, \u201cNtire 2020 challenge on real-world image super-resolution: Methods and results,\u201d in Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 494\u2013495.<br>[6] Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, et al., \u201cThe ninth ntire 2024 efficient super-resolution challenge report,\u201d in Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 6595\u20136631.<br>[7] Marcos V Conde, Eduard Zamfir, Radu Timofte, Daniel Motilla, Cen Liu, Zexin Zhang, Yunbo Peng, Yue Lin, Jiaming Guo, Xueyi Zou, et al., \u201cEfficient deep models for real-time 4k image super-resolution. NTIRE 2023 benchmark and report,\u201d in Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, 2023, pp. 1495\u20131521.<br>[8] Kai Zhang, Martin Danelljan, Yawei Li, Radu Timofte, Jie Liu, Jie Tang, Gangshan Wu, Yu Zhu, Xiangyu He, Wenjie Xu, et al., \u201cAim 2020 challenge on efficient super-resolution: Methods and results,\u201d in Computer Vision\u2013ECCV 2020 Workshops: Glasgow, UK, August 23\u201328, 2020, Proceedings, Part III 16. Springer, 2020, pp. 5\u201340.<br>[9] Kelvin CK Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy, \u201cBasicvsr: The search for essential components in video super-resolution and beyond,\u201d in Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, 2021, pp. 4947\u20134956.<br>[10] Ren Yang, \u201cNtire 2021 challenge on quality enhancement of compressed video: Methods and results,\u201d in Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 647\u2013666.<br>[11] Jeya Maria Jose Valanarasu, Rahul Garg, Andeep Toor, Xin Tong, Weijuan Xi, Andreas Lugmayr, Vishal M Patel, and Anne Menini,\u201c Rebotnet: Fast real-time video enhancement,\u201d arXiv preprint arXiv:2303.13504, 2023.<br>[12] Seungjun Nah, Radu Timofte, Shuhang Gu, Sungyong Baik, Seokil Hong, Gyeongsik Moon, Sanghyun Son, and Kyoung Mu Lee, \u201cNtire 2019 challenge on video super-resolution: Methods and results,\u201d in Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019.<br>[13] Marcos V Conde, Zhijun Lei, Wen Li, Christos Bampis, Ioannis Katsavounidis, and Radu Timofte, \u201cAim 2024 challenge on efficient video super-resolution for av1 compressed content,\u201d arXiv preprint arXiv:2409.17256, 2024.<br>[14] R. Gonzalez and R. Woods, Digital image processing, Prentice Hall, 3rd edition, 2006.<br>[15] Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli, \u201cImage quality assessment: from error visibility to structural similarity,\u201d IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600\u2013612, Apr. 2004.<br>[16] Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang, \u201cThe Unreasonable Effectiveness of Deep Features as a Perceptual Metric,\u201d in 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, June 2018, pp. 586\u2013595, IEEE.<br>[17] Z. Li, A. Aaron, I. Katsavounidis, A. Moorthy, and M. Manohara, \u201cToward A Practical Perceptual Video Quality Metric.,\u201d Tech. Rep., 2016.<br>[18] Kalpana Seshadrinathan, Rajiv Soundararajan, Alan Bovik, and Lawrence Cormack, \u201cA Subjective Study to Evaluate Video Quality Assessment Algorithms,\u201d in Human Vision and Electronic Imaging, 2010.<br>[19] Kelvin CK Chan, Shangchen Zhou, Xiangyu Xu, and Chen Change Loy, \u201cInvestigating tradeoffs in real-world video super-resolution,\u201d in Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5962\u20135971.<\/p>\n\n\n\n\n\n<p>This ICME grand challenge focuses on video super-resolution for video conferencing, where the low-resolution video is encoded using the H.265 codec with fixed QPs. The goal is to upscale the input LR videos by a specific factor and provide HR videos with perceptually enhanced quality (including compression artifact removal). We follow the low-delay scenario in the entire challenge, where <strong>no future frames should be used to enhance the current frame. <\/strong>Additionally, there are three tracks specific to video content:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Track 1: <\/strong>General purpose real-world video content , x4 upscaling<\/li>\n\n\n\n<li><strong>Track 2:<\/strong> Talking Head videos, x4 upscaling<\/li>\n\n\n\n<li><strong>Track 3:<\/strong> Screen sharing videos, x3 upscaling<\/li>\n<\/ul>\n\n\n\n<p>Separate training, validation, and test sets will be provided for each track by organizers. The training set for Track 1 is based on REDS dataset [20], for Track 2 it is an extension of the VCD dataset [21] and includes real-world recordings, and for track 3 it is based on publicly available datasets like [22]\u2013[24]. The validation set includes 5 source video clips with 300 frames for each track, encoded with H.265 using 4 fixed QPs. The test set will be blind and include 20 source video clips per track, prepared similarly to the validation set.<br>See below for further information about each track. The training and validation sets will be published in challenge&#8217;s <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/VSR-Challenge\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub web page<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> on start date.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"evaluation-criteria\">Evaluation criteria<\/h3>\n\n\n\n<p>The video clips in the test set are 10 seconds long and have a frame rate of 30 frames per second (FPS). The validation and test set videos from Track 1 and Track 2 are real-world videos. Participants are free to use any addition training data. The VSR method does not need to use additional frames, i.e., it can be a single image SR model. Participating teams should process the provided test set using their models and submit the resulting video clips. Each team may participate in one or more tracks, using the same or different models for each track.<\/p>\n\n\n\n<p>Models must follow the low delay setting, i.e., they must not use any future frames or information when enhancing the current frame. Submissions will be ranked based on subjective quality, measured according to the crowdsourcing implementation of ITU-T Rec. P.910 [25]. We will calculate the Mean Opinion Score (MOS) for each processed clip. Submissions in each track will be ranked based on the average MOS and the 95% Confidence Interval of the MOS scores for all their processed clips. For Track 3 Word Accuracy Rate will also be included in ranking score (see blow). Additionally, we will provide objective metrics in the final report.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"submission-guidelines\">Submission guidelines<\/h3>\n\n\n\n<p>Participants must process video sequences in the blind test set using their model and submit the processed video sequences before the competition\u2019s end date. Only the latest submission before the deadline will be considered per team and track. To be included in the competition, a complete submission is required, which means a processed video sequence that is upscaled by the specified factor for each input video clip. The number of frames must match between the input and processed clips. No external tools are allowed for enhancing the quality of processed clips.<\/p>\n\n\n\n<p>Although this challenge does not target model efficiency, we require participating teams to report the runtime, number of parameters, and FLOPs of their models. Utility scripts will be provided by the organizers. <\/p>\n\n\n\n<p>Participants in the grand challenge must submit a paper following regular format of ICME 2025 conference. The papers should be single blind submissions and should describe their models, any additional data used in training, and the performance on the validation set.<\/p>\n\n\n\n<p>If authors are participating in multiple tracks, the differences between models and\/or training should be stated in the paper. The ranking in the final competition can be added to the camera-ready version if the paper is accepted. If no paper is submitted or it is shorter than 3 pages, the team will be removed from the competition. Authors of accepted papers will also have a chance to present their work during the ICME 2025 conference\u2019s grand challenge session. Further guidelines will be provided on the challenge website.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"tracks\">Tracks<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"track-1-general-purpose-real-world-video-content\">Track 1 &#8211; General purpose real-world video content <\/h3>\n\n\n\n<p>This track addresses Real-world Video Super Resolution without specifying a content type. The task involves<strong> x4 upscaling<\/strong> and removing compression artifacts from the input video. The model inputs are low-resolution (LR) videos encoded with H.265 using constant Quantization Parameters (QP). Models must not use future frames when upscaling and enhancing the current frame.<\/p>\n\n\n\n<p>We provide training, validation, and test sets. Participants can also use other data for training their models however they should provide details in their paper.<br>The training and validation sets are based on the REDS dataset (<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/seungjunnah.github.io\/Datasets\/reds.html\" target=\"_blank\" rel=\"noopener noreferrer\">Publications Datasets CV | Seungjun Nah<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>) [20]. Besides 5 video clips, the rest of the original REDS validation set is included in the training set. The training and validation set are available in <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/VSR-Challenge\/blob\/main\/docs\/track1.md\" target=\"_blank\" rel=\"noopener noreferrer\">Challenges GitHub page<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.  <strong>The blind test set will only be provided to registered teams, one week before the challenge end dates.<\/strong> We used low-resolution videos (originally downscaled by bicubic) from REDS and encoded them with H.265. Figure 1 illustrates the process.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"312\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/process_track1-679cf090b6d2b-1024x312.png\" alt=\" Data flow diagram for Track 1\" class=\"wp-image-1125525\" style=\"width:704px;height:auto\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/process_track1-679cf090b6d2b-1024x312.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/process_track1-679cf090b6d2b-300x92.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/process_track1-679cf090b6d2b-768x234.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/process_track1-679cf090b6d2b-1536x469.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/process_track1-679cf090b6d2b-2048x625.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/process_track1-679cf090b6d2b-240x73.png 240w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Figure 1 &#8211; Data flow diagram for Track 1<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"track-2-talking-head-videos\">Track 2 &#8211;  Talking Head videos<\/h3>\n\n\n\n<p>This track addresses Real-world Video Super Resolution for Talking Head content type. The task involves<strong> x4 upscaling<\/strong> and removing compression artifacts from the input video. The videos are from real-world where other artifacts may have been included in the original recording. The model inputs are low-resolution (LR) videos encoded with H.265 using constant Quantization Parameters (QP). Models must not use future frames when upscaling and enhancing the current frame.<\/p>\n\n\n\n<p>We provide training, validation, and test sets. Participants can also use other data for training their models however they should provide details in their paper.<br>The training and validation sets are an extension of the VCD dataset [21] (<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/VCD\">https:\/\/github.com\/microsoft\/VCD<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>) and may include landscape or portrait videos. The training and validation set are available in <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/VSR-Challenge\/blob\/main\/docs\/track2.md\" target=\"_blank\" rel=\"noopener noreferrer\">Challenges GitHub page<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. <strong>The blind test set will only be provided to registered teams, one week before the challenge end dates.<\/strong> Figure 2 illustrates the data preparation process and figure 3 presents thumbnail images from portion of training set.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"308\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/process_track2-679cf086dcb0d-1024x308.png\" alt=\" Data flow diagram for Track 2\" class=\"wp-image-1125522\" style=\"width:693px;height:auto\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/process_track2-679cf086dcb0d-1024x308.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/process_track2-679cf086dcb0d-300x90.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/process_track2-679cf086dcb0d-768x231.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/process_track2-679cf086dcb0d-1536x462.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/process_track2-679cf086dcb0d-2048x616.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/process_track2-679cf086dcb0d-240x72.png 240w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Figure 2- Data flow diagram for Track 2<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1080\" height=\"1161\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/texture_atlas_train_track2_small.png\" alt=\"Thumbnail images of the Track 2's training set.\" class=\"wp-image-1125864\" style=\"width:589px;height:auto\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/texture_atlas_train_track2_small.png 1080w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/texture_atlas_train_track2_small-279x300.png 279w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/texture_atlas_train_track2_small-953x1024.png 953w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/texture_atlas_train_track2_small-768x826.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/texture_atlas_train_track2_small-167x180.png 167w\" sizes=\"auto, (max-width: 1080px) 100vw, 1080px\" \/><figcaption class=\"wp-element-caption\">Figure 3 &#8211; Thumbnail images of the Track 2&#8217;s training set.<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"track-3-screen-sharing-videos\">Track 3-Screen sharing videos <\/h3>\n\n\n\n<p>This track addresses Video Super Resolution for screen sharing content type. The task involves<strong> x3 upscaling<\/strong> and removing compression artifacts from the input video. The videos are recorded from different productivity application in which one is performing some relevant tasks. The model inputs are low-resolution (LR) videos encoded with H.265 using constant Quantization Parameters (QP). Models must not use future frames when upscaling and enhancing the current frame.<\/p>\n\n\n\n<p>We provide training, validation, and test sets. Participants can also use other data for training their models however they should provide details in their paper. The training and validation set are available in <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/VSR-Challenge\/blob\/main\/docs\/track3.md\" target=\"_blank\" rel=\"noopener noreferrer\">Challenges GitHub page<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. <strong>The blind test set will only be provided to registered teams, one week before the challenge end dates.<\/strong><br>For this particular track the challenge metric will be combination of Subjective Mean Opinion Score (MOS) and Character Error Rate determined by applying OCR on multiple sections of specific frames in the test set (See Equation 1). Sample code will be provided during challenge time for testing on validation set.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1159\" height=\"381\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/vsr_challenge_score_track3.png\" alt=\"track3_ score \" class=\"wp-image-1126539\" style=\"width:345px;height:auto\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/vsr_challenge_score_track3.png 1159w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/vsr_challenge_score_track3-300x99.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/vsr_challenge_score_track3-1024x337.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/vsr_challenge_score_track3-768x252.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/vsr_challenge_score_track3-240x79.png 240w\" sizes=\"auto, (max-width: 1159px) 100vw, 1159px\" \/><figcaption class=\"wp-element-caption\">Equation 1: Challenge score for Track3 per clip, where CER is average of Character Error Rate on multiple frames. <\/figcaption><\/figure>\n\n\n\n<p><br>Figure 4 illustrates the data preparation process and figure 5 presents thumbnail images from portion of training set.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"308\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/process_track3-679cf07177f6d-1024x308.png\" alt=\" Data flow diagram for Track 3\" class=\"wp-image-1125519\" style=\"width:614px;height:auto\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/process_track3-679cf07177f6d-1024x308.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/process_track3-679cf07177f6d-300x90.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/process_track3-679cf07177f6d-768x231.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/process_track3-679cf07177f6d-1536x462.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/process_track3-679cf07177f6d-2048x616.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/process_track3-679cf07177f6d-240x72.png 240w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Figure 4- Data flow diagram for Track 3<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"905\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/texture_atlas_train-1024x905.png\" alt=\"Thumbnail images of portion of the Track 3's training and validation sets.\" class=\"wp-image-1125552\" style=\"width:756px;height:auto\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/texture_atlas_train-1024x905.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/texture_atlas_train-300x265.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/texture_atlas_train-768x679.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/texture_atlas_train-1536x1358.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/texture_atlas_train-2048x1810.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/texture_atlas_train-204x180.png 204w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Figure 5 &#8211; Thumbnail images of portion of the Track 3&#8217;s training and validation sets.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"references-1\">References<\/h2>\n\n\n\n<p>[20] Seungjun Nah, Sungyong Baik, Seokil Hong, Gyeongsik Moon, Sanghyun Son, Radu Timofte, and Kyoung Mu Lee, \u201cNtire 2019 challenge on video deblurring and super-resolution: Dataset and study,\u201d in CVPR Workshops, June 2019.<br>[21] Babak Naderi, Ross Cutler, Nabakumar Singh Khongbantabam, Yasaman Hosseinkashi, Henrik Turbell, Albert Sadovnikov, and Quan Zou, \u201cVCD: A video conferencing dataset for video compression,\u201d in ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 3970\u20133974.<br>[22] Shan Cheng, Huanqiang Zeng, Jing Chen, Junhui Hou, Jianqing Zhu, and Kai-Kuang Ma, \u201cScreen content video quality assessment: Subjective and objective study,\u201d IEEE Transactions on Image Processing, vol. 29, pp. 8636\u20138651, 2020.<br>[23] Yingbin Wang, Xin Zhao, Xiaozhong Xu, Shan Liu, Zhijun Lei, Mariana Afonso, Andrey Norkin, and Thomas Daede, \u201cAn open video dataset for screen content coding,\u201d in 2022 Picture Coding Symposium (PCS). IEEE, 2022, pp. 301\u2013305.<br>[24] Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan, \u201cYoutube-8m: A large-scale video classification benchmark,\u201d arXiv preprint arXiv:1609.08675, 2016.<br>[25] Babak Naderi and Ross Cutler, \u201cA crowdsourcing approach to video quality assessment,\u201d in ICASSP, 2024.<\/p>\n\n\n\n\n\n<p><\/p>\n\n\n\n\n\n<p>Below is the challenge&#8217;s tentative schedule:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>January 28, 2025: Challenge\u2019s website launch<\/li>\n\n\n\n<li>February 3, 2025: Competition start date , releasing training and validation sets<\/li>\n\n\n\n<li>April 16, 2025: Publishing the test set<\/li>\n\n\n\n<li><s>April 23, 2025 <\/s> UPDATE: April 25, 2025 Competition end date &#8211; submission of processed test set<\/li>\n\n\n\n<li><s>April 30, 2025<\/s> Update: May 4, 23:59 AoE- Paper submission deadline<\/li>\n\n\n\n<li>May 19, 2025 Paper acceptance notification<\/li>\n\n\n\n<li><strong>May 30, 2025 Camera ready<\/strong> , and paper registration<\/li>\n\n\n\n<li>June 30 &#8211; July 4, 2025: ICME conference, VSR workshop and announcement of challenge results<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/babaknaderi\/\">Babak Naderi<\/a>, Microsoft, babaknaderi@microsoft.com<\/li>\n\n\n\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/rosscutler.github.io\/\">Ross Cutler<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Microsoft, ross.cutler@microsoft.com<\/li>\n\n\n\n<li>Juhee Cho, Microsoft, juhcho@microsoft.com<\/li>\n\n\n\n<li>Nabakumar Khongbantabam, Microsoft, naba.kumar@microsoft.com<\/li>\n\n\n\n<li>Dejan Ivkovic, Microsoft, dejanivkovic@microsoft.com<\/li>\n<\/ul>\n\n\n\n\n\n<ul class=\"wp-block-list\">\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/VSR-Challenge\" target=\"_blank\" rel=\"noopener noreferrer\">Challenge&#8217;s Training and validation datasets<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<\/ul>\n\n\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"download-the-test-set\">Download the test set<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The test set includes 80 video clips, each with 300 frames for each track.<\/li>\n\n\n\n<li>Link to the list of clips in the test set will be sent to registered teams (hereafter trackX_testset.txt). <\/li>\n\n\n\n<li>Use the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/VSR-Challenge?tab=readme-ov-file#download\">utility<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/VSR-Challenge?tab=readme-ov-file#download\" target=\"_blank\" rel=\"noopener noreferrer\"> script<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/VSR-Challenge?tab=readme-ov-file#download\">&nbsp;<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> from the challenge&#8217;s Git repository and thetrackX_testset.txt file to download the test set.<\/li>\n\n\n\n<li>Example command:<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>python downloader.py --list-of-files trackX_testset.txt --local-path LOCAL_PATH<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"file-naming-and-format\">File naming and format<\/h3>\n\n\n\n<p>Each track consists of 80 test clips, and for each clip, we require an upscaled and enhanced version in return. The processed clip should retain the same name as the input clip. Ensure the output clips are encoded using H.264 with a CRF = 10, a pixel format of YUV420p, and 30FPS (similar to the input). <\/p>\n\n\n\n<p>Example FFMPEG command<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>ffmpeg -framerate 30 -i &#091;PNGG_DIR]\/%d.png -y -c:v libx264 -crf 10 -preset veryslow -pix_fmt yuv420p output.mp4<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"paper-submission\">Paper submission <\/h3>\n\n\n\n<p>Participants in the grand challenge must submit a paper following the same format as the main ICME conference (see details <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/2025.ieeeicme.org\/author-information-and-submission-instructions\/\" target=\"_blank\" rel=\"noopener noreferrer\">here<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>) but <strong>as single-blind submissions.<\/strong> Papers must be no longer than 6 pages, including all text, figures, and references. The papers should describe their models, any additional data used in training, and the <strong>complexity of their models<\/strong> (including the number of parameters, FLOPs, and input shape).<\/p>\n\n\n\n<p>If authors are participating in multiple tracks, the differences between models and\/or training should be stated in the paper. The ranking in the final competition can be added to the camera-ready version if the paper is accepted. If no paper is submitted or if it is shorter than 3 pages, the team will be removed from the competition. Authors of accepted papers will also have a chance to present their work during the ICME 2025 conference\u2019s grand challenge session.<\/p>\n\n\n\n<p>Please use <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/cmt3.research.microsoft.com\/ICMEW2025\" target=\"_blank\" rel=\"noopener noreferrer\">ICMEW2025 CMT website<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> to submit your paper (<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/cmt3.research.microsoft.com\/ICMEW2025\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/cmt3.research.microsoft.com\/ICMEW2025<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>).  <\/p>\n\n\n\n\n\n<p>Submissions were assessed using the subjective video quality assessment method, specifically the Comparison Category Rating (CCR) test from the crowdsourcing implementation of ITU-T Rec. P.910. In CCR, subjects view the source (ground truth) and processed clips, rating the quality of the second clip compared to the first. The presentation order is randomized, and average ratings, presented as CMOS, indicate the processed clip&#8217;s quality relative to the source. Ratings range from -3 (much worse) to +3 (much better), with 0 indicating no difference.<\/p>\n\n\n\n<p>The average subjective scores in terms of CMOS, multiple objective metrics, and the ranking of models evaluated in the challenge are presented in Table II. Two consecutive models are considered to have tied ranks when there is no significant difference between the distributions of CMOS values in tracks 1 and 2.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"citation\">Citation<\/h2>\n\n\n\n<p>Please find the description of the challenge, methods and results in our publication:<br><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2506.12269\">ICME 2025 Grand Challenge on Video Super-Resolution for Video Conferencing<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, ICME 2025.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>@article{naderi2025icme,\n  title={ICME 2025 Grand Challenge on Video Super-Resolution for Video Conferencing},\n  author={Naderi, Babak and Cutler, Ross and Cho, Juhee and Khongbantabam, Nabakumar and Ivkovic, Dejan},\n  journal={arXiv preprint arXiv:2506.12269},\n  year={2025}\n}\n<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"770\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/results_overview_update-1024x770.png\" alt=\"table\" class=\"wp-image-1143376\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/results_overview_update-1024x770.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/results_overview_update-300x226.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/results_overview_update-768x578.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/results_overview_update-80x60.png 80w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/results_overview_update-240x180.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/01\/results_overview_update.png 1082w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr_hide_image_in_river":null,"footnotes":""},"msr-opportunity-type":[187426],"msr-region":[],"msr-locale":[268875],"msr-program-audience":[],"msr-post-option":[269148,269142],"msr-impact-theme":[],"class_list":["post-1115967","msr-academic-program","type-msr-academic-program","status-publish","hentry","msr-opportunity-type-challenges","msr-locale-en_us","msr-post-option-approved-for-river","msr-post-option-include-in-river"],"msr_description":"","msr_social_media":[],"related-researchers":[{"type":"user_nicename","display_name":"Ross Cutler","user_id":40660,"people_section":"Section name 0","alias":"rcutler"},{"type":"user_nicename","display_name":"Babak Naderi","user_id":42525,"people_section":"Section name 0","alias":"babaknaderi"}],"tab-content":[],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-academic-program\/1115967","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-academic-program"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-academic-program"}],"version-history":[{"count":47,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-academic-program\/1115967\/revisions"}],"predecessor-version":[{"id":1143377,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-academic-program\/1115967\/revisions\/1143377"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1115967"}],"wp:term":[{"taxonomy":"msr-opportunity-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-opportunity-type?post=1115967"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1115967"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1115967"},{"taxonomy":"msr-program-audience","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-program-audience?post=1115967"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1115967"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1115967"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}