{"id":183130,"date":"2007-02-26T00:00:00","date_gmt":"2009-10-31T10:20:23","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/msr-research-item\/technical-computing-microsoft-lecture-series-on-the-history-of-parallel-computing-4\/"},"modified":"2016-09-09T10:02:21","modified_gmt":"2016-09-09T17:02:21","slug":"technical-computing-microsoft-lecture-series-on-the-history-of-parallel-computing-4","status":"publish","type":"msr-video","link":"https:\/\/www.microsoft.com\/en-us\/research\/video\/technical-computing-microsoft-lecture-series-on-the-history-of-parallel-computing-4\/","title":{"rendered":"Technical Computing @ Microsoft: Lecture Series on the History of Parallel Computing"},"content":{"rendered":"<div class=\"asset-content\">\n<p>Scalable Parallel Computing on Many\/Multicore Systems<\/p>\n<p>This set of lectures will review the application and programming model issues that will one must address when one gets chips with 32-1024 cores and \u201cscalable\u201d approaches will be needed to make good use of such systems. We will not discuss bit-level and instruction-level parallelism i.e. what happens on a possibly special purpose core, even though this is clearly important. We will use science and engineering applications to drive the discussion as we have substantial experience in these cases but we are interested in the lessons for commodity client and server applications that will be broadly used 5-10 years from now.<\/p>\n<p>We start with simple applications and algorithms from a variety of fields and identify features that makes it \u201cobvious\u201d that \u201call\u201d science and engineering run well and scalably in parallel. We explain why unfortunately it is equally obvious that there is no straightforward way of expressing this parallelism. Parallel hardware architectures are described in enough detail to understand performance and algorithm issues and need for cross-architecture compatibility; however in these lectures we will just be users of hardware. We must understand what features of multicore chips we can and should exploit. We can explicitly use the shared memory between cores or just enjoy its implications for very fast inter-core control and communication linkage. We note that parallel algorithm research is hugely successful although this success has reduced activity in an area that deserves new attention for the next generation of architectures.<\/p>\n<p>The parallel software environment is discussed at several levels including programming paradigm, runtime and operating system. The importance of libraries, templates, kernels (dwarfs) and benchmarks is stressed. The programming environment has various tools including compilers with possible parallelism hints like OpenMP; tuners like Atlas; messaging models; parallelism and distribution support as in HPF, HPCS Languages, co-array Fortran, Global arrays and UPC. We also discuss the relevance of important general ideas like object-oriented paradigms (as in for example Charm++), functional languages and Software Transactional Memories. Streaming, pipelining, co-ordination, services and workflow are placed in context. Examples discussed in the last category include CCR\/DSS from Microsoft and the Common Component Architecture CCA from DoE. Domain Specific environments like Matlab and Mathematica are important as there is no universal silver programming bullet; one will need interoperable focused environments.<\/p>\n<p>We discuss performance analysis including speedup, efficiency, scaled speedup and Amdahl&#8217;s law. We show how to relate performance to algorithm\/application structure and the hardware characteristics. Applications will not get scalable parallelism accidentally but only if there is an understandable performance model. We review some of the many pitfalls for both performance and correctness; these include deadlocks, race conditions, nodes that are busy doing something else and the difficulty of second-guessing automatic parallelization methods. We describe the formal sources of overhead; load imbalance and the communication\/control overhead and the ways to reduce them. We relate the blocking used in caching to that used in parallel computation. We note that load balancing was the one part of parallel computing that was easier than expected.<\/p>\n<p>We will mix both simple idealized applications with \u201creal problems\u201d noting that usually it is simple problems that are the hardest as they have poor computation to control\/communication ratios. We will explain the parallelism in several application classes including for science and engineering: Finite Difference, Finite Elements, FFT\/Spectral, Meshes of all sorts, Particle Dynamics, Particle-Mesh, and Monte Carlo methods. Some applications like Image Processing, Graphics, Cryptography and Media coding\/decoding have features similar to well understood science and engineering problems. We emphasize that nearly all applications are built hierarchically from more \u201cbasic applications\u201d with a variety of different structures and natural programming models. Such application composition (co-ordination, workflow) must be supported with a common run-time. We contrast the difference between decomposition needed in most \u201cbasic parallel applications\u201d to the composition supported in workflow and Web 2.0 mashups. Looking at broader application classes we should cover; Internet applications and services; artificial intelligence, optimization, machine learning; divide and conquer algorithms; tree structured searches like computer chess; applications generated by the data deluge including access, search, and the assimilation of data into simulations. There has been a growing interest in Complex Systems whether it be for critical infrastructure like energy and transportation, the Internet itself, commodity gaming, computational epidemiology or the original war games. We expect Discrete Event Simulations (such as DoD\u2019s High Level Architecture HLA) to grow in importance as they naturally describe complex systems and because they can clearly benefit from multicore architectures. We will discuss the innovative Sequential Dynamical Systems approach used in the EpiSims, TransSims and other critical infrastructure simulation environments. In all applications we need to identify the intrinsic parallelism and the degrees of freedom that can be parallelized and distinguish small parallelism (local to core) compared to large parallelism (scalable across many cores)<\/p>\n<p>We will not forget many critical non-technical Issues including \u201cWho programs \u2013 everybody or just the marine corps?\u201d; \u201cThe market for science and engineering was small but it will be large for general multicore\u201d; \u201cThe program exists and can\u2019t be changed\u201d; \u201cWhat features will next hardware\/software release support and how should I protect myself from change?\u201d<\/p>\n<p>We will summarize lessons and relate them to application and programming model categories. In last lecture or at end of all lectures we encourage the audience to bring their own multicore application or programming model so we can discuss examples that interest you.<\/p>\n<\/div>\n<p><!-- .asset-content --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Scalable Parallel Computing on Many\/Multicore Systems This set of lectures will review the application and programming model issues that will one must address when one gets chips with 32-1024 cores and \u201cscalable\u201d approaches will be needed to make good use of such systems. We will not discuss bit-level and instruction-level parallelism i.e. what happens on [&hellip;]<\/p>\n","protected":false},"featured_media":194912,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr_hide_image_in_river":0,"footnotes":""},"research-area":[],"msr-video-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-session-type":[],"msr-impact-theme":[],"msr-pillar":[],"msr-episode":[],"msr-research-theme":[],"class_list":["post-183130","msr-video","type-msr-video","status-publish","has-post-thumbnail","hentry","msr-locale-en_us"],"msr_download_urls":"","msr_external_url":"https:\/\/youtu.be\/EEM3bijh_yA","msr_secondary_video_url":"","msr_video_file":"","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/183130","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-video"}],"version-history":[{"count":0,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/183130\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/194912"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=183130"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=183130"},{"taxonomy":"msr-video-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video-type?post=183130"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=183130"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=183130"},{"taxonomy":"msr-session-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-session-type?post=183130"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=183130"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=183130"},{"taxonomy":"msr-episode","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-episode?post=183130"},{"taxonomy":"msr-research-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-theme?post=183130"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}