{"id":2233,"date":"2011-11-09T08:40:00","date_gmt":"2011-11-09T08:40:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/msr_er\/2011\/11\/09\/building-a-net-quality-control-tool-for-next-generation-sequencing-technologies\/"},"modified":"2016-07-20T07:33:24","modified_gmt":"2016-07-20T14:33:24","slug":"building-a-net-quality-control-tool-for-next-generation-sequencing-technologies","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/building-a-net-quality-control-tool-for-next-generation-sequencing-technologies\/","title":{"rendered":"Building a .NET Quality Control Tool for Next-Generation Sequencing Technologies"},"content":{"rendered":"<p><span style=\"font-family: verdana,geneva; font-size: medium;\">The challenge of DNA sequencing is central to all genomics research, and while the technology has existed since the 1970s, today&rsquo;s massively-parallel sequencing instruments are capable of producing gigabytes of raw genomic data quickly and increasingly cheaply. Reconstruction of a DNA sequence from this data (for example, through de novo assembly) is a compute-intensive task, and experimentation has shown that data quantity is no substitute for quality when it comes to the accurate reconstruction of a DNA sequence. Unfortunately, not all sequencing technologies produce reliable and accurate results, and experimental data will always contain varying rates of error. Therefore, a preliminary quality control (QC) step is regularly employed to detect and counteract such sequencing errors.<\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: medium;\"><img decoding=\"async\" style=\"border: 0px currentColor; margin-right: auto; margin-left: auto; display: block;\" title=\"Sequence Quality Control Studio (SeQCoS) user interface\" alt=\"Sequence Quality Control Studio (SeQCoS) user interface\" src=\"https:\/\/msdnshared.blob.core.windows.net\/media\/MSDNBlogsFS\/prod.evol.blogs.msdn.com\/CommunityServer.Blogs.Components.WeblogFiles\/00\/00\/01\/32\/81\/6746.SeQCoS_screenshot.png\" original-url=\"http:\/\/blogs.msdn.com\/resized-image.ashx\/__size\/496x390\/__key\/communityserver-blogs-components-weblogfiles\/00-00-01-32-81\/6746.SeQCoS_5F00_screenshot.png\" \/><\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: medium;\">The QC of sequencing results may range from simple manual filtering procedures to comprehensive automated solutions. To contribute to this area of QC tools development, we present Sequence Quality Control Studio (SeQCoS), a Microsoft .NET software suite that is designed to perform an array of QC evaluations and post-QC manipulation of sequencing data. SeQCoS generates a series of standard plots that illustrate the quality of the input data. These plots (saved in JPEG file format) provide information on commonly observed measurements, such as GC content (the proportion of guanine and cytosine nucleotide bases in a DNA sequence), and distribution of quality scores at position-specific and sequence-specific levels. In order to filter out poorly performing sequences, SeQCoS also conducts basic trimming and discarding functions to manipulate sequence files.<\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: medium;\">At Microsoft Research, the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/research.microsoft.com\/en-us\/projects\/bio\/default.aspx\" target=\"_blank\">Microsoft Biology Initiative<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> team is collaborating with academic research groups in the sequencing of various organisms. To ensure that the sequenced sample is not contaminated by other strains or sequencing vectors, SeQCoS optionally integrates <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/books\/NBK52637\/\" target=\"_blank\">NCBI BLAST<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> for PCs running the Windows operating system to search against a BLAST-formatted database. We provide a pre-formatted database of <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/VecScreen\/UniVec.html\" target=\"_blank\">NCBI UniVec<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, a repository of vector sequences, adapters, linkers and PCR (polymerase chain reaction) primers that are used in DNA sequencing; however, researchers can use a different database if they prefer.<\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: medium;\"><strong>About the Tools<\/strong><\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: medium;\">SeQCoS was written in C#, using the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/bio.codeplex.com\/\" target=\"_blank\">.NET Bio<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (formerly the Microsoft Biology Foundation [MBF]) bioinformatics toolkit and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/research.microsoft.com\/en-us\/projects\/sho\/\" target=\"_blank\">Sho<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, a data analysis and visualization application. It is freely available as open-source code under the Apache 2.0 license. Further details and software downloads are available from <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/seqcos.codeplex.com\/\" target=\"_blank\">Sequence Quality Control Studio<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: medium;\">.NET Bio is a library of common bioinformatics functions (file parsers, algorithms, and web service connectors) that simplify the creation of bioinformatics applications on the .NET platform and is an open-source project that is freely available for academic and commercial use under the Apache 2.0 license. While this project was initiated by Microsoft Research, it is owned by the Outercurve Foundation, a non-profit organization, and is governed by a growing community of users and contributors.<\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: medium;\">&mdash;<em>Kevin Ha, Microsoft Research Intern<\/em><\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: medium;\"><strong>Learn More<\/strong><\/span><\/p>\n<ul>\n<li><span style=\"font-family: verdana,geneva; font-size: small;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/seqcos.codeplex.com\/\" target=\"_blank\">SeQCoS<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/span><\/li>\n<li><span style=\"font-family: verdana,geneva; font-size: small;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/research.microsoft.com\/en-us\/projects\/bio\/\" target=\"_blank\">Microsoft Biology Initiative<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/span><\/li>\n<li><span style=\"font-family: verdana,geneva; font-size: small;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/research.microsoft.com\/en-US\/projects\/bio\/mbf.aspx\" target=\"_blank\">.NET Bio <span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (formerly Microsoft Biology Foundation)<\/span><\/li>\n<li><span style=\"font-family: verdana,geneva; font-size: small;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/bio.codeplex.com\/\" target=\"_blank\">.NET Bio on CodePlex<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/span><\/li>\n<li><span style=\"font-family: verdana,geneva; font-size: small;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/research.microsoft.com\/en-us\/projects\/sho\/\" target=\"_blank\">Sho<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/span><\/li>\n<li><span style=\"font-family: verdana,geneva; font-size: small;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/research.microsoft.com\/en-us\/collaboration\/focus\/health\/default.aspx\" target=\"_blank\">Health and Wellbeing, Microsoft Research Connections<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/span><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>The challenge of DNA sequencing is central to all genomics research, and while the technology has existed since the 1970s, today&rsquo;s massively-parallel sequencing instruments are capable of producing gigabytes of raw genomic data quickly and increasingly cheaply. Reconstruction of a DNA sequence from this data (for example, through de novo assembly) is a compute-intensive task, [&hellip;]<\/p>\n","protected":false},"author":32627,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[],"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[187097,194688,194829,195370,195648,195672,196123,196375,196391,196395,193504,196630,196631,187102,196761,197163,197192,187221],"research-area":[],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-2233","post","type-post","status-publish","format-standard","hentry","category-research-blog","tag-net-bio","tag-apache-2-0","tag-bioinformatics-toolkit","tag-dna-sequencing","tag-gc-content","tag-genomics","tag-kevin-ha","tag-microsoft-net","tag-microsoft-biology-foundation-mbf","tag-microsoft-biology-initiative","tag-microsoft-research","tag-ncbi-blast","tag-ncbi-univec","tag-open-source","tag-outercurve-foundation","tag-sequence-quality-control-studio-seqcos","tag-sho","tag-windows","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[],"related-researchers":[],"msr_type":"Post","byline":"","formattedDate":"November 9, 2011","formattedExcerpt":"The challenge of DNA sequencing is central to all genomics research, and while the technology has existed since the 1970s, today&rsquo;s massively-parallel sequencing instruments are capable of producing gigabytes of raw genomic data quickly and increasingly cheaply. Reconstruction of a DNA sequence from this data&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/2233","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/32627"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=2233"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/2233\/revisions"}],"predecessor-version":[{"id":262191,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/2233\/revisions\/262191"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=2233"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=2233"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=2233"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=2233"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=2233"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=2233"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=2233"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=2233"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=2233"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=2233"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=2233"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}