EXA2PRO-EoCoE joint workshop


EoCoE and Exa2Pro are pleased to invite all of you to participate in a joint workshop that will showcase our respective, breakthrough work in the field of computer science. 

This workshop will last a full 3 days, from February 22 to February 24, 2021, and will highlight several complementary technologies and researches that our respective projects are developing and carrying out.

All the sessions including hands-on will be done remotely. the hands-on session will be limited to 20 participants for the quality of the trainings.

Registration is necessary so that we can communicate further information and the links to the virtual conference rooms before the workshop.

EXA2PRO website: https://exa2pro.eu/

EoCoE website: https://www.eocoe.eu/

Organizing committee:

  • Lazaros Papadopoulos (ICCS/NTUA)
  • Edouard Audit (MdlS/CEA)
  • Matthieu Haefele (CNRS)
  • Christoph Kessler (Linköping University)
  • Samuel Thibault (Bordeaux University)
  • Julien Thélot (MdlS/CEA)
  • Mathieu Lobet (MdlS/CEA)
    • 09:30 10:00
      EXA2PRO framework overview & success stories 30m

      Overview of the EXA2PRO project and EXA2PRO framework

      Speaker: Lazaros Papadopoulos (ICCS/NTUA)
    • 10:00 10:45
      EXA2PRO High-level programming interface: SkePU and ComPU 45m

      We shortly present the main concepts of the EXA2PRO high-level programming model: SkePU skeletons (i.e., generic C++ program constructs with multiple backends supporting heterogeneous systems and clusters), multi-variant software components with explicit metadata annotation, smart data-containers for array-based data types, and the XPDL platform modeling framework.


      Speaker: Christoph Kessler (Linköping University)
    • 10:45 11:15
      Break 30m
    • 11:15 12:00
      EXA2PRO Runtime system: StarPU 45m

      We present the concepts of the EXA2PRO low-level programming model: StarPU task-based programming (https://starpu.gitlabpages.inria.fr/), which provides optimized execution on clusters of heterogeneous platforms. We will start with the basic principles of task-based programming. We will then bring an overview of the set of features and optimizations which are thus made possible at little extra cost from the programmer, from optimized scheduling to efficient distributed execution.

      Speaker: Samuel Thibault (University of Bordeaux)
    • 12:00 14:00
      Lunch break 2h
    • 14:00 14:30
      EoCoE framework overview & success stories 30m

      Overview of the EoCoE project

      Speaker: Edouard Audit (CEA)
    • 14:30 15:15
      EoCoE - The Parallel Data Interface 45m

      - project website - https://pdi.julien-bigot.fr/master/

      Speaker: Julien Bigot (MdlS/CEA)
    • 15:15 15:45
      Break 30m
    • 15:45 16:30
      EoCoE - FTI - State-of-the-art multi-level checkpointing library 45m

      Large scale infrastructures for distributed and parallel computing offer thousands of computing nodes to their users to satisfy their computing needs. As the need for massively parallel computing increases in industry and development, cloud infrastructures and computing centers are being forced to increase in size and to transition to new computing technologies. While the advantage for the users is clear, such evolution imposes significant challenges, such as energy consumption and fault tolerance. Fault tolerance is even more critical in infrastructures built on commodity hardware. Recent works have shown that large scale machines built with commodity hardware experience more failures than previously thought.

      Leonardo Bautista Gomez, senior Researcher at the Barcelona Supercomputing Center, will focus on how to guarantee high reliability to high-performance applications running in large infrastructures. In particular, they will cover all the technical content necessary to implement scalable multilevel checkpointing for tightly coupled applications. This will include an overview of the internals of the FTI library, and explain how multilevel checkpointing is implemented today, together with examples that the audience can test and analyze on their own laptops, so that they learn how to use FTI in practice, and ultimately transfer that knowledge to their production systems.

      Speaker: Leonardo Bautista-Gomez (Barcelona Super-Computing Center)
    • 09:00 12:30
      SkePU Skeleton Programming Hands-on Session 3h 30m

      This session is limited to 20 participants.

      Tutorial website: https://skepu.github.io/tutorials/eocoe-exa2pro-2021/

      Speakers: August Ernstsson (Linköping University), Dr Christoph Kessler (Linköping University), Johan Ahlqvist (Linköping University)
      • Introduction to Programming in SkePU 45m

        SkePU (https://skepu.github.io) is a C++ based high-level programming framework for heterogeneous parallel systems and clusters. Its programming interface is based on so-called algorithmic skeletons, i.e., predefined generic program constructs based on higher-order functions, which express common parallelizable patterns such as map, reduce, stencil, or scan, which can be customized in problem-specific C++ code, and for which sequential and parallel implementations are available for different execution platforms. From the single, quasi-sequential SkePU source code, platform-specific parallel code is automatically created. In this presentation, we give a more in-depth overview of the SkePU concepts and programming interface, preparing for the subsequent exercise sessions.

        Speakers: August Ernstsson , Dr Christoph Kessler (Linköping University)
      • Programming in SkePU: Guided Exercise and Advanced Issues 45m

        We demonstrate programming in SkePU with a complete example jointly for all participants, and also discuss some performance aspects and advanced issues.

        Speakers: August Ernstsson (Linköping University), Johan Ahlqvist (Linköping University)
      • Break 30m
      • Individual work session 1h 30m

        For this session we expect that participants install or have installed SkePU on some Linux system accessible to them (GPU or cluster architecture is not required). For fast installation we provide a binary x86-64 Linux distribution of SkePU (for ubuntu 18.04 and possibly other Linux variants) as well as a docker image; it is also possible to install SkePU from source, see https://skepu.github.io. We encourage participants to bring their own problems or C++ application codes for porting to SkePU. As alternative, we will provide further example problems for participants to experiment with programming in SkePU at their own pace. We will set up a shared queueing mechanism for providing individual assistance on a first-come first-served basis.

        Speakers: August Ernstsson , Dr Christoph Kessler (Linköping University), Johan Ahlqvist (Linköping University)
    • 12:30 14:00
      Lunch break 1h 30m
    • 14:00 17:30
      Performance Enginnering and code generation techniques 3h 30m

      This hands-on session is limited to 20 participants.

      Speakers: Markus Holzer (FAU), Sebastian Kuckuk (FAU), Thomas Gruber (FAU)
      • Performance Engineering with LIKWID 1h
        Speaker: Thomas Gruber (FAU)
      • Introduction to code generation techniques 45m
        Speaker: Sebastian Kuckuk (FAU)
      • Break 30m
      • Practical Session: Coupling Performance Engineering and code generation with pystencils/lbmp 1h 15m
        Speaker: Markus Holzer (FAU)
    • 09:00 12:30
      StarPU task-based programming hands-on session 3h 30m

      This hands-on session is limited to 20 participants.

      Speakers: N. Furmento, Olivier Aumage, Samuel Thibault (University of Bordeaux)
      • Introduction to task-based programming with StarPU 30m

        This session will present the basics of the low-level task-based programming interface provided by StarPU. It will discuss the C and Fortran interfaces for defining computation kernels and tasks. It will then describe how application data is registered to the runtime, and possibly partitioned dynamically.

        Speaker: Samuel Thibault (University of Bordeaux)
      • Practice session part1: basic principles 1h

        This session will let participants give a try at the basics of StarPU. First they will build and run a few simple examples. Their source code will provide working examples that participants can study and later use as a basis for their own applications. Data partitioning examples are then studied, and a simple exercise is proposed to put it into practice.

        Speakers: N. Furmento, Olivier Aumage, Samuel Thibault (University of Bordeaux)
      • Break 30m
      • Advanced principles of StarPU 30m

        In this session, more advanced features of the StarPU runtime will be presented. The performance models for tasks will be discussed, leading to advanced task scheduling, and various performance feedback tools will be presented. The distributed execution support will then be discussed.

        Speaker: Samuel Thibault (University of Bordeaux)
      • Practice session part2: advanced principles 1h

        This second practice session will let participants try out more advanced tools around the StarPU runtime, notably the task performance feedback tools. It will then propose to study a couple of distributed task-based examples, and the advanced ViTE vizualisation tool.

        Speakers: N. Furmento, Olivier Aumage, Samuel Thibault (University of Bordeaux)
    • 12:30 14:00
      Lunch break 1h 30m
    • 14:00 15:30
      Solving large linear systems with parallel solvers designed on top of runtime systems 1h 30m

      The HiePACS Inria team co-develops linear algebra libraries to solve very large numerical systems on supercomputers. To get good performances whatever the computing machine, these libraries are designed as task-based algorithms and make use of runtime systems such as OpenMP (task), Parsec or StarPU. One main advantage is that with a single algorithm we can deploy executions on different architectures (homogeneous, heterogeneous with GPUS, with few/many cores, different kind of architectures and networks) achieving relatively high performance without requiring a lot of parameter tuning. Three of these libraries will be highlighted within a thirty minutes presentation to which will succeed a one hour demonstration on our PlaFRIM supercomputer: Chameleon (parallel dense linear algebra), PaStiX (parallel sparse direct solver) and Maphys (parallel hybrid solver). We will show how to install each library, how to use it through examples, discuss how to get good performances by tuning some parameters and finally visualize execution traces. The demonstration will put the emphasis on the reproducibility of experiments and performance; we will do so thanks to the GNU Guix distribution.

      Speaker: Florent Pruvost (INRIA)
      • Methods to solve linear systems using the task based StarPU runtime (Chameleon, PaStiX, Maphys) 30m
      • Demonstration 1h
    • 15:30 16:00
      Break 30m
    • 16:00 17:30
      Extreme-scale computation with PSBLAS and AMG4PSBLA 1h 30m
      • Introduction to extreme-scale computation with PSBLAS and AMG4PSBLAS. 45m

        This tutorial will address the basic functionalities of the PSBLAS and AMG4PSBLAS libraries for the parallelization of computationally intensive scientific applications. We will discuss the principles behind the parallel implementation of iterative solvers for sparse linear systems in a distributed memory paradigm and look at the routines for multiplying sparse matrices by dense matrices, solving block diagonal systems with triangular diagonal entries, preprocessing sparse matrices, and several additional methods for dense matrix operations. Moreover, we will delve into the idea underlying the construction of effective parallel preconditioners that are capable of extracting performance from supercomputers with hundreds of thousands of computational cores.

        The tutorial will highlight how these approaches are related to and inspired by the need for EoCoE-II applications.

        Speaker: Salvatore Filippone (University of Rome TOR VERGATA)
      • Demonstration of PSBLAS and AMG4PSBLAS for solving sparse linear systems on parallel hybrid architectures 45m

        In this demonstration, we will go through the distributed simulation of a general linear PDE by exploiting the tools made available by the PSBLAS/AMG4PSBLAS environment. We will start by discussing the issue of data distribution among the processors and the ones relative to the construction of the associated linear system. Then, we will show how to set-up and build the preconditioner in the AMG4PSBLAS library for solving such a system. After these two stepping stones have been achieved, we will briefly discuss the portability of the solution step to a hybrid parallel setting involving the usage of GPU and how the library is interfaced to face also non-linear problems by means of Newton-type algorithms.

        Speaker: Fabio Durastante (CNR)