OmniMon: Re-architecting Network Telemetry with Resource Efficiency and Full Accuracy

  • Qun Huang ,
  • Haifeng Sun ,
  • Patrick P. C. Lee ,
  • Wei Bai ,
  • Feng Zhu ,
  • Yungang Bao

2020 ACM Special Interest Group on Data Communication |

Published by ACM

PDF | Publication | Publication | Publication

Network telemetry is essential for administrators to monitor massive data traffic in a network-wide manner. Existing telemetry solutions often face the dilemma between resource efficiency (i.e., low CPU, memory, and bandwidth overhead) and full accuracy (i.e., error-free and holistic measurement). We break this dilemma via a network-wide architectural design OmniMon, which simultaneously achieves resource efficiency and full accuracy in flow-level telemetry for large-scale data centers. OmniMon carefully coordinates the collaboration among different types of entities in the whole network to execute telemetry operations, such that the resource constraints of each entity are satisfied without compromising full accuracy. It further addresses consistency in network-wide epoch synchronization and accountability in error-free packet loss inference. We prototype OmniMon in DPDK and P4. Testbed experiments on commodity servers and Tofino switches demonstrate the effectiveness of OmniMon over state-of-the-art telemetry designs.