- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
EoCoE and Exa2Pro are pleased to invite all of you to participate in a joint workshop that will showcase our respective, breakthrough work in the field of computer science.
This workshop will last a full 3 days, from February 22 to February 24, 2021, and will highlight several complementary technologies and researches that our respective projects are developing and carrying out.
All the sessions including hands-on will be done remotely. the hands-on session will be limited to 20 participants for the quality of the trainings.
Registration is necessary so that we can communicate further information and the links to the virtual conference rooms before the workshop.
EXA2PRO website: https://exa2pro.eu/
EoCoE website: https://www.eocoe.eu/
Organizing committee:
Overview of the EXA2PRO project and EXA2PRO framework
We shortly present the main concepts of the EXA2PRO high-level programming model: SkePU skeletons (i.e., generic C++ program constructs with multiple backends supporting heterogeneous systems and clusters), multi-variant software components with explicit metadata annotation, smart data-containers for array-based data types, and the XPDL platform modeling framework.
We present the concepts of the EXA2PRO low-level programming model: StarPU task-based programming (https://starpu.gitlabpages.inria.fr/), which provides optimized execution on clusters of heterogeneous platforms. We will start with the basic principles of task-based programming. We will then bring an overview of the set of features and optimizations which are thus made possible at little extra cost from the programmer, from optimized scheduling to efficient distributed execution.
Overview of the EoCoE project
Links:
- project website - https://pdi.julien-bigot.fr/master/
Large scale infrastructures for distributed and parallel computing offer thousands of computing nodes to their users to satisfy their computing needs. As the need for massively parallel computing increases in industry and development, cloud infrastructures and computing centers are being forced to increase in size and to transition to new computing technologies. While the advantage for the users is clear, such evolution imposes significant challenges, such as energy consumption and fault tolerance. Fault tolerance is even more critical in infrastructures built on commodity hardware. Recent works have shown that large scale machines built with commodity hardware experience more failures than previously thought.
Leonardo Bautista Gomez, senior Researcher at the Barcelona Supercomputing Center, will focus on how to guarantee high reliability to high-performance applications running in large infrastructures. In particular, they will cover all the technical content necessary to implement scalable multilevel checkpointing for tightly coupled applications. This will include an overview of the internals of the FTI library, and explain how multilevel checkpointing is implemented today, together with examples that the audience can test and analyze on their own laptops, so that they learn how to use FTI in practice, and ultimately transfer that knowledge to their production systems.
This session is limited to 20 participants.
Tutorial website: https://skepu.github.io/tutorials/eocoe-exa2pro-2021/
SkePU (https://skepu.github.io) is a C++ based high-level programming framework for heterogeneous parallel systems and clusters. Its programming interface is based on so-called algorithmic skeletons, i.e., predefined generic program constructs based on higher-order functions, which express common parallelizable patterns such as map, reduce, stencil, or scan, which can be customized in problem-specific C++ code, and for which sequential and parallel implementations are available for different execution platforms. From the single, quasi-sequential SkePU source code, platform-specific parallel code is automatically created. In this presentation, we give a more in-depth overview of the SkePU concepts and programming interface, preparing for the subsequent exercise sessions.
We demonstrate programming in SkePU with a complete example jointly for all participants, and also discuss some performance aspects and advanced issues.
For this session we expect that participants install or have installed SkePU on some Linux system accessible to them (GPU or cluster architecture is not required). For fast installation we provide a binary x86-64 Linux distribution of SkePU (for ubuntu 18.04 and possibly other Linux variants) as well as a docker image; it is also possible to install SkePU from source, see https://skepu.github.io. We encourage participants to bring their own problems or C++ application codes for porting to SkePU. As alternative, we will provide further example problems for participants to experiment with programming in SkePU at their own pace. We will set up a shared queueing mechanism for providing individual assistance on a first-come first-served basis.
This hands-on session is limited to 20 participants.
This hands-on session is limited to 20 participants.
https://starpu.gitlabpages.inria.fr/tutorials/2021-02-EoCoE/
This session will present the basics of the low-level task-based programming interface provided by StarPU. It will discuss the C and Fortran interfaces for defining computation kernels and tasks. It will then describe how application data is registered to the runtime, and possibly partitioned dynamically.
This session will let participants give a try at the basics of StarPU. First they will build and run a few simple examples. Their source code will provide working examples that participants can study and later use as a basis for their own applications. Data partitioning examples are then studied, and a simple exercise is proposed to put it into practice.
In this session, more advanced features of the StarPU runtime will be presented. The performance models for tasks will be discussed, leading to advanced task scheduling, and various performance feedback tools will be presented. The distributed execution support will then be discussed.
This second practice session will let participants try out more advanced tools around the StarPU runtime, notably the task performance feedback tools. It will then propose to study a couple of distributed task-based examples, and the advanced ViTE vizualisation tool.
The HiePACS Inria team co-develops linear algebra libraries to solve very large numerical systems on supercomputers. To get good performances whatever the computing machine, these libraries are designed as task-based algorithms and make use of runtime systems such as OpenMP (task), Parsec or StarPU. One main advantage is that with a single algorithm we can deploy executions on different architectures (homogeneous, heterogeneous with GPUS, with few/many cores, different kind of architectures and networks) achieving relatively high performance without requiring a lot of parameter tuning. Three of these libraries will be highlighted within a thirty minutes presentation to which will succeed a one hour demonstration on our PlaFRIM supercomputer: Chameleon (parallel dense linear algebra), PaStiX (parallel sparse direct solver) and Maphys (parallel hybrid solver). We will show how to install each library, how to use it through examples, discuss how to get good performances by tuning some parameters and finally visualize execution traces. The demonstration will put the emphasis on the reproducibility of experiments and performance; we will do so thanks to the GNU Guix distribution.
This tutorial will address the basic functionalities of the PSBLAS and AMG4PSBLAS libraries for the parallelization of computationally intensive scientific applications. We will discuss the principles behind the parallel implementation of iterative solvers for sparse linear systems in a distributed memory paradigm and look at the routines for multiplying sparse matrices by dense matrices, solving block diagonal systems with triangular diagonal entries, preprocessing sparse matrices, and several additional methods for dense matrix operations. Moreover, we will delve into the idea underlying the construction of effective parallel preconditioners that are capable of extracting performance from supercomputers with hundreds of thousands of computational cores.
The tutorial will highlight how these approaches are related to and inspired by the need for EoCoE-II applications.
In this demonstration, we will go through the distributed simulation of a general linear PDE by exploiting the tools made available by the PSBLAS/AMG4PSBLAS environment. We will start by discussing the issue of data distribution among the processors and the ones relative to the construction of the associated linear system. Then, we will show how to set-up and build the preconditioner in the AMG4PSBLAS library for solving such a system. After these two stepping stones have been achieved, we will briefly discuss the portability of the solution step to a hybrid parallel setting involving the usage of GPU and how the library is interfaced to face also non-linear problems by means of Newton-type algorithms.