 | 2012 |
| 40 |  | Klaus Iglberger,
Georg Hager,
Jan Treibig,
Ulrich Rüde:
Expression Templates Revisited: A Performance Analysis of Current Methodologies.
SIAM J. Scientific Computing 34(2): (2012) |
| 2011 |
| 39 |  | Gerald Schubert,
Georg Hager,
Holger Fehske,
Gerhard Wellein:
Parallel Sparse Matrix-Vector Multiplication as a Test Case for Hybrid MPI+OpenMP Programming.
IPDPS Workshops 2011: 1751-1758 |
| 38 |  | Jan Treibig,
Georg Hager,
Gerhard Wellein,
Michael Meier:
Poster: LIKWID: lightweight performance tools.
SC Companion 2011: 29-30 |
| 37 |  | Johannes Habich,
Thomas Zeiser,
Georg Hager,
Gerhard Wellein:
Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA.
Advances in Engineering Software 42(5): 266-272 (2011) |
| 36 |  | Gerald Schubert,
Georg Hager,
Holger Fehske,
Gerhard Wellein:
Parallel sparse matrix-vector multiplication as a test case for hybrid MPI+OpenMP programming
CoRR abs/1101.0091: (2011) |
| 35 |  | Markus Wittmann,
Georg Hager:
Optimizing ccNUMA locality for task-parallel execution under OpenMP and TBB on multicore-based systems
CoRR abs/1101.0093: (2011) |
| 34 |  | Klaus Iglberger,
Georg Hager,
Jan Treibig,
Ulrich Rüde:
Expression Templates Revisited: A Performance Analysis of the Current ET Methodology
CoRR abs/1104.1729: (2011) |
| 33 |  | Jan Treibig,
Georg Hager,
Gerhard Wellein:
LIKWID: Lightweight Performance Tools
CoRR abs/1104.4874: (2011) |
| 32 |  | Jan Treibig,
Georg Hager,
Hannes G. Hofmann,
Joachim Hornegger,
Gerhard Wellein:
Pushing the limits for medical image reconstruction on recent standard multicore processors
CoRR abs/1104.5243: (2011) |
| 31 |  | Gerald Schubert,
Holger Fehske,
Georg Hager,
Gerhard Wellein:
Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems
CoRR abs/1106.5908: (2011) |
| 30 |  | Markus Wittmann,
Thomas Zeiser,
Georg Hager,
Gerhard Wellein:
Comparison of different Propagation Steps for the Lattice Boltzmann Method
CoRR abs/1111.0922: (2011) |
| 29 |  | Markus Wittmann,
Thomas Zeiser,
Georg Hager,
Gerhard Wellein:
Domain decomposition and locality optimization for large-scale lattice Boltzmann simulations
CoRR abs/1111.1129: (2011) |
| 28 |  | Johannes Habich,
Christian Feichtinger,
Harald Köstler,
Georg Hager,
Gerhard Wellein:
Performance engineering for the Lattice Boltzmann method on GPGPUs: Architectural requirements and performance results
CoRR abs/1112.0850: (2011) |
| 27 |  | Moritz Kreutzer,
Georg Hager,
Gerhard Wellein,
Holger Fehske,
Achim Basermann,
Alan R. Bishop:
Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation
CoRR abs/1112.5588: (2011) |
| 26 |  | Jan Treibig,
Gerhard Wellein,
Georg Hager:
Efficient multicore-aware parallelization strategies for iterative stencil computations.
J. Comput. Science 2(2): 130-137 (2011) |
| 25 |  | Christian Feichtinger,
Johannes Habich,
Harald Köstler,
Georg Hager,
Ulrich Rüde,
Gerhard Wellein:
A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters.
Parallel Computing 37(9): 536-549 (2011) |
| 24 |  | Gerald Schubert,
Holger Fehske,
Georg Hager,
Gerhard Wellein:
Hybrid-Parallel Sparse Matrix-Vector Multiplication with Explicit Communication Overlap on Current Multicore-Based Systems.
Parallel Processing Letters 21(3): 339-358 (2011) |
| 2010 |
| 23 |  | Jan Treibig,
Georg Hager,
Gerhard Wellein:
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments.
ICPP Workshops 2010: 207-216 |
| 22 |  | Markus Wittmann,
Georg Hager,
Gerhard Wellein:
Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory.
IPDPS Workshops 2010: 1-7 |
| 21 |  | Jan Treibig,
Gerhard Wellein,
Georg Hager:
Efficient multicore-aware parallelization strategies for iterative stencil computations
CoRR abs/1004.1741: (2010) |
| 20 |  | Jan Treibig,
Georg Hager,
Gerhard Wellein:
LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments
CoRR abs/1004.4431: (2010) |
| 19 |  | Markus Wittmann,
Georg Hager,
Jan Treibig,
Gerhard Wellein:
Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters
CoRR abs/1006.3148: (2010) |
| 18 |  | Christian Feichtinger,
Johannes Habich,
Harald Köstler,
Georg Hager,
Ulrich Rüde,
Gerhard Wellein:
A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU-CPU Clusters
CoRR abs/1007.1388: (2010) |
| 17 |  | Markus Wittmann,
Georg Hager,
Jan Treibig,
Gerhard Wellein:
Leveraging Shared Caches for Parallel Temporal Blocking of Stencil Codes on Multicore Processors and Clusters.
Parallel Processing Letters 20(4): 359-376 (2010) |
| 2009 |
| 16 |  | Gerhard Wellein,
Georg Hager,
Thomas Zeiser,
Markus Wittmann,
Holger Fehske:
Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization.
COMPSAC (1) 2009: 579-586 |
| 15 |  | Thomas Zeiser,
Georg Hager,
Gerhard Wellein:
The world's fastest CPU and SMP node: Some performance results from the NEC SX-9.
IPDPS 2009: 1-8 |
| 14 |  | Rolf Rabenseifner,
Georg Hager,
Gabriele Jost:
Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes.
PDP 2009: 427-436 |
| 13 |  | Jan Treibig,
Georg Hager:
Introducing a Performance Model for Bandwidth-Limited Loop Kernels.
PPAM (1) 2009: 615-624 |
| 12 |  | Markus Wittmann,
Georg Hager:
A Proof of Concept for Optimizing Task Parallelism by Locality Queues
CoRR abs/0902.1884: (2009) |
| 11 |  | Jan Treibig,
Georg Hager:
Introducing a Performance Model for Bandwidth-Limited Loop Kernels
CoRR abs/0905.0792: (2009) |
| 10 |  | Gerald Schubert,
Georg Hager,
Holger Fehske:
Performance limitations for sparse matrix-vector multiplications on current multicore environments
CoRR abs/0910.4836: (2009) |
| 9 |  | Jan Treibig,
Georg Hager,
Gerhard Wellein:
Multi-core architectures: Complexities of performance prediction and the impact of cache topology
CoRR abs/0910.4865: (2009) |
| 8 |  | Markus Wittmann,
Georg Hager,
Gerhard Wellein:
Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory
CoRR abs/0912.4506: (2009) |
| 7 |  | Thomas Zeiser,
Georg Hager,
Gerhard Wellein:
Benchmark Analysis and Application Results for Lattice Boltzmann Simulations on NEC SX Vector and Intel Nehalem Systems.
Parallel Processing Letters 19(4): 491-511 (2009) |
| 2008 |
| 6 |  | Georg Hager,
Thomas Zeiser,
Gerhard Wellein:
Data access optimizations for highly threaded multi-core CPUs with multiple memory controllers.
IPDPS 2008: 1-7 |
| 5 |  | Georg Hager,
Thomas Zeiser,
Gerhard Wellein:
Data Access Characteristics and Optimizations for Sun UltraSPARC T2 and T2+ Systems.
Parallel Processing Letters 18(4): 471-490 (2008) |
| 2007 |
| 4 |  | Georg Hager,
Thomas Zeiser,
Gerhard Wellein:
Data access optimizations for highly threaded multi-core CPUs with multiple memory controllers
CoRR abs/0712.2302: (2007) |
| 3 |  | Georg Hager,
Holger Stengel,
Thomas Zeiser,
Gerhard Wellein:
RZBENCH: Performance evaluation of current HPC architectures using low-level and application benchmarks
CoRR abs/0712.3389: (2007) |
| 2006 |
| 2 |  | Rolf Rabenseifner,
Georg Hager,
Gabriele Jost,
Rainer Keller:
Hybrid MPI and OpenMP Parallel Programming.
PVM/MPI 2006: 11 |
| 2002 |
| 1 |  | Gerhard Wellein,
Georg Hager,
Achim Basermann,
Holger Fehske:
Fast Sparse Matrix-Vector Multiplication for TeraFlop/s Computers.
VECPAR 2002: 287-301 |