{"id":167061,"date":"2014-10-01T00:00:00","date_gmt":"2014-10-01T00:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/msr-research-item\/probe-scheduling-for-efficient-detection-of-silent-failures\/"},"modified":"2018-10-16T21:21:28","modified_gmt":"2018-10-17T04:21:28","slug":"probe-scheduling-for-efficient-detection-of-silent-failures","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/probe-scheduling-for-efficient-detection-of-silent-failures\/","title":{"rendered":"Probe Scheduling for Efficient Detection of Silent Failures"},"content":{"rendered":"<div class=\"asset-content\">\n<p>Most discovery systems for silent failures work in two phases: a continuous monitoring phase that detects presence of failures through probe packets and a localization phase that pinpoints the faultyelement(s). We focus on the monitoring phase, where the goal is to balance the probing overhead with the cost associated with longer failure detection times.<\/p>\n<p>We formulate a general model for the underlying fundamental subset-test scheduling problem. We unify the treatment of schedulers and cost objectives and make several contributions: We propose <em>Memoryless schedules<\/em> &#8212; a natural subclass of stochastic schedules which is simple and suitable for distributed deployment. We show that the optimal memoryless schedulers can be efficiently computed by convex programs (for SUM objectives, which minimize average detection time) or linear programs (for MAX objectives, which minimize worst-case detection time), and surprisingly perhaps, are guaranteed to have expected detection times that are not too far off the (NP hard) stochastic optima. We study <em>Deterministic schedules<\/em>, which provide a guaranteed bound on the maximum (rather than expected) cost of undetected faults, but like general stochastic schedules, are NP hard to optimize. We develop novel efficient deterministic schedulers with provable approximation ratios.<\/p>\n<p>Finally, we conduct an experimental study, simulating our schedulers on real networks topologies, demonstrates a significant performance gains of the new memoryless and deterministic schedulers<br \/>\nover previous approaches.<\/p>\n<\/div>\n<p><!-- .asset-content --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Most discovery systems for silent failures work in two phases: a continuous monitoring phase that detects presence of failures through probe packets and a localization phase that pinpoints the faultyelement(s). We focus on the monitoring phase, where the goal is to balance the probing overhead with the cost associated with longer failure detection times. We [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"Elsevier","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"MSR-TR-2014-113","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"NOTICE: this is the author's version of a work that was accepted for publication. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published at http:\/\/www.elsevier.com\/.","msr_conference_name":"","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2014-10-01","msr_highlight_text":"","msr_notes":"This is a full version with proofs of a Performance 2014 paper","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":2014,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13561,13547],"msr-publication-type":[193718],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-167061","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-algorithms","msr-research-area-systems-and-networking","msr-locale-en_us"],"msr_publishername":"Elsevier","msr_edition":"","msr_affiliation":"","msr_published_date":"2014-10-01","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"MSR-TR-2014-113","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"This is a full version with proofs of a Performance 2014 paper","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"216961","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","title":"blackholes.pdf","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/blackholes.pdf","id":216961,"label_id":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":216961,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/blackholes.pdf"}],"msr-author-ordering":[{"type":"user_nicename","value":"editco","user_id":31713,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=editco"},{"type":"text","value":"Avinatan Hassidim","user_id":0,"rest_url":false},{"type":"text","value":"Haim Kaplan","user_id":0,"rest_url":false},{"type":"text","value":"Yishay Mansour","user_id":0,"rest_url":false},{"type":"text","value":"Danny Raz","user_id":0,"rest_url":false},{"type":"text","value":"Yoav Tzur","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"techreport","related_content":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/167061","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/167061\/revisions"}],"predecessor-version":[{"id":535189,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/167061\/revisions\/535189"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=167061"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=167061"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=167061"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=167061"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=167061"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=167061"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=167061"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=167061"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=167061"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=167061"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=167061"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=167061"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=167061"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}