default search action
Torsten Hoefler
Person information
- affiliation: ETH Zürich
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j67]Maciej Besta, Robert Gerstenberger, Emanuel Peter, Marc Fischer, Michal Podstawski, Claude Barthels, Gustavo Alonso, Torsten Hoefler:
Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries. ACM Comput. Surv. 56(2): 31:1-31:40 (2024) - [j66]Daniele De Sensi, Edgar Costa Molero, Salvatore Di Girolamo, Laurent Vanbever, Torsten Hoefler:
Canary: Congestion-aware in-network allreduce using dynamic trees. Future Gener. Comput. Syst. 152: 70-82 (2024) - [j65]María Engracia Gómez, Julio Sahuquillo, Andrea Biagioni, Nikos Chrysos, Damien Berton, Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, Michele Martinelli, Pier Stanislao Paolucci, Elena Pastorelli, Francesco Simula, Matteo Turisini, Piero Vicini, Roberto Ammendola, Carlotta Chiarini, Chiara De Luca, Fabrizio Capuani, Adrián Castelló, Jose Duro, Eugenio Stabile, Enrique S. Quintana-Ortí, Pascale Bernier-Bruna, Claire Chen, Pierre-Axel Lagadec, Gregoire Pichon, Etienne Walter, Manolis Katevenis, Sokratis Bartzis, Orestis Mousouros, Pantelis Xirouchakis, Vangelis Mageiropoulos, Michalis Gianioudis, Harisis Loukas, Aggelos Ioannou, Nikos Kallimanis, Miguel Sánchez de la Rosa, Gabriel Gomez-Lopez, Francisco Alfaro-Cortés, Jesús Escudero-Sahuquillo, Pedro Javier García, Francisco J. Quiles, José L. Sánchez, Gaetan De Gassowski, Matthieu Hautreaux, Stephane Mathieu, Gilles Moreau, Marc Pérache, Hugo Taboada, Torsten Hoefler, Timo Schneider, Matteo Barnaba, Giuseppe Piero Brandino, Francesco De Giorgi, Matteo Poggi, Iakovos Mavroidis, Yannis Papaefstathiou, Nikolaos Tampouratzis, Benjamin Kalisch, Ulrich Krackhardt, Mondrian Nuessle, Wolfgang Frings, Dominik Gottwald, Felime Guimaraes, Max Holicki, Volker Marx, Yannik Müller, Carsten Clauss, Hugo Falter, Xu Huang, Jennifer Lopez Barillao, Thomas Moschny, Simon Pickartz:
RED-SEA Project: Towards a new-generation European interconnect. Microprocess. Microsystems 110: 105102 (2024) - [j64]Peter Bauer, Torsten Hoefler, Bjorn Stevens, Wilco Hazeleger:
Digital twins of Earth and the computing challenge of human interaction. Nat. Comput. Sci. 4(3): 154-157 (2024) - [j63]Maciej Besta, Torsten Hoefler:
Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis. IEEE Trans. Pattern Anal. Mach. Intell. 46(5): 2584-2606 (2024) - [j62]Thomas Benz, Michael Rogenmoser, Paul Scheffler, Samuel Riedel, Alessandro Ottaviano, Andreas Kurth, Torsten Hoefler, Luca Benini:
A High-Performance, Energy-Efficient Modular DMA Engine Architecture. IEEE Trans. Computers 73(1): 263-277 (2024) - [j61]Jinfan Chen, Shigang Li, Ran Guo, Jinhui Yuan, Torsten Hoefler:
AutoDDL: Automatic Distributed Deep Learning With Near-Optimal Bandwidth Cost. IEEE Trans. Parallel Distributed Syst. 35(8): 1331-1344 (2024) - [c275]Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler:
Graph of Thoughts: Solving Elaborate Problems with Large Language Models. AAAI 2024: 17682-17690 - [c274]Samuel Riedel, Marc Gantenbein, Alessandro Ottaviano, Torsten Hoefler, Luca Benini:
LRSCwait: Enabling Scalable and Efficient Synchronization in Manycore Systems Through Polling-Free and Retry-Free Operation. DATE 2024: 1-6 - [c273]Marcin Copik, Alexandru Calotoiu, Pengyu Zhou, Konstantin Taranov, Torsten Hoefler:
FaaSKeeper: Learning from Building Serverless Services with ZooKeeper as an Example. HPDC 2024: 94-108 - [c272]Piotr Luczynski, Lukas Gianinazzi, Patrick Iff, Leighton Wilson, Daniele De Sensi, Torsten Hoefler:
Near-Optimal Wafer-Scale Reduce. HPDC 2024: 334-347 - [c271]Saleh Ashkboos, Maximilian L. Croci, Marcelo Gennari Do Nascimento, Torsten Hoefler, James Hensman:
SliceGPT: Compress Large Language Models by Deleting Rows and Columns. ICLR 2024 - [c270]Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, Dan Alistarh:
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression. ICLR 2024 - [c269]Langwen Huang, Lukas Gianinazzi, Yuejiang Yu, Peter D. Düben, Torsten Hoefler:
DiffDA: a Diffusion model for weather-scale Data Assimilation. ICML 2024 - [c268]Marcin Copik, Marcin Chrapek, Larissa Schmid, Alexandru Calotoiu, Torsten Hoefler:
Software Resource Disaggregation for HPC with Serverless Computing. IPDPS 2024: 139-156 - [c267]Yves Baumann, Tal Ben-Nun, Maciej Besta, Lukas Gianinazzi, Torsten Hoefler, Piotr Luczynski:
Low-Depth Spatial Tree Algorithms. IPDPS 2024: 180-192 - [c266]Nils Blach, Maciej Besta, Daniele De Sensi, Jens Domke, Hussein Harake, Shigang Li, Patrick Iff, Marek Konieczny, Kartik Lakhotia, Ales Kubicek, Marcel Ferrari, Fabrizio Petrini, Torsten Hoefler:
A High-Performance Design, Implementation, Deployment, and Evaluation of The Slim Fly Network. NSDI 2024 - [c265]Daniele De Sensi, Tommaso Bonato, David Saam, Torsten Hoefler:
Swing: Short-cutting Rings for Higher Bandwidth Allreduce. NSDI 2024 - [c264]Lukas Gianinazzi, Alexandros Nikolaos Ziogas, Langwen Huang, Piotr Luczynski, Saleh Ashkboosh, Florian Scheidl, Armon Carigiet, Chio Ge, Nabil Abubaker, Maciej Besta, Tal Ben-Nun, Torsten Hoefler:
Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix Multiplication. PPoPP 2024: 404-416 - [c263]Kartik Lakhotia, Laura Monroe, Kelly Isham, Maciej Besta, Nils Blach, Torsten Hoefler, Fabrizio Petrini:
PolarStar: Expanding the Horizon of Diameter-3 Networks. SPAA 2024: 345-357 - [c262]Mikhail Khalilov, Marcin Chrapek, Siyuan Shen, Alessandro Vezzu, Thomas Benz, Salvatore Di Girolamo, Timo Schneider, Daniele De Sensi, Luca Benini, Torsten Hoefler:
OSMOSIS: Enabling Multi-Tenancy in Datacenter SmartNICs. USENIX ATC 2024: 247-263 - [i184]Torsten Hoefler, Marcin Copik, Pete Beckman, Andrew Jones, Ian T. Foster, Manish Parashar, Daniel A. Reed, Matthias Troyer, Thomas C. Schulthess, Dan Ernst, Jack J. Dongarra:
XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing. CoRR abs/2401.04552 (2024) - [i183]Langwen Huang, Lukas Gianinazzi, Yuejiang Yu, Peter D. Düben, Torsten Hoefler:
DiffDA: a diffusion model for weather-scale data assimilation. CoRR abs/2401.05932 (2024) - [i182]Daniele De Sensi, Tommaso Bonato, David Saam, Torsten Hoefler:
Swing: Short-cutting Rings for Higher Bandwidth Allreduce. CoRR abs/2401.09356 (2024) - [i181]Samuel Riedel, Marc Gantenbein, Alessandro Ottaviano, Torsten Hoefler, Luca Benini:
LRSCwait: Enabling Scalable and Efficient Synchronization in Manycore Systems through Polling-Free and Retry-Free Operation. CoRR abs/2401.09359 (2024) - [i180]Lukas Möller, Marcin Copik, Alexandru Calotoiu, Torsten Hoefler:
Cppless: Productive and Performant Serverless Programming in C++. CoRR abs/2401.10834 (2024) - [i179]Marcin Copik, Marcin Chrapek, Larissa Schmid, Alexandru Calotoiu, Torsten Hoefler:
Software Resource Disaggregation for HPC with Serverless Computing. CoRR abs/2401.10852 (2024) - [i178]Maciej Besta, Florim Memedi, Zhenyu Zhang, Robert Gerstenberger, Nils Blach, Piotr Nyczyk, Marcin Copik, Grzegorz Kwasniewski, Jürgen Müller, Lukas Gianinazzi, Ales Kubicek, Hubert Niewiadomski, Onur Mutlu, Torsten Hoefler:
Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts. CoRR abs/2401.14295 (2024) - [i177]Saleh Ashkboos, Maximilian L. Croci, Marcelo Gennari Do Nascimento, Torsten Hoefler, James Hensman:
SliceGPT: Compress Large Language Models by Deleting Rows and Columns. CoRR abs/2401.15024 (2024) - [i176]Lukas Gianinazzi, Alexandros Nikolaos Ziogas, Langwen Huang, Piotr Luczynski, Saleh Ashkboos, Florian Scheidl, Armon Carigiet, Chio Ge, Nabil Abubaker, Maciej Besta, Tal Ben-Nun, Torsten Hoefler:
Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix Multiplication. CoRR abs/2402.19364 (2024) - [i175]Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L. Croci, Bo Li, Martin Jaggi, Dan Alistarh, Torsten Hoefler, James Hensman:
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs. CoRR abs/2404.00456 (2024) - [i174]Tommaso Bonato, Abdul Kabbani, Daniele De Sensi, Rong Pan, Yanfang Le, Costin Raiciu, Mark Handley, Timo Schneider, Nils Blach, Ahmad Ghalayini, Daniel S. F. Alves, Michael Papamichael, Adrian M. Caulfield, Torsten Hoefler:
SMaRTT-REPS: Sender-based Marked Rapidly-adapting Trimmed & Timed Transport with Recycled Entropies. CoRR abs/2404.01630 (2024) - [i173]Yves Baumann, Tal Ben-Nun, Maciej Besta, Lukas Gianinazzi, Torsten Hoefler, Piotr Luczynski:
Low-Depth Spatial Tree Algorithms. CoRR abs/2404.12953 (2024) - [i172]Siyuan Shen, Langwen Huang, Marcin Chrapek, Timo Schneider, Jai Dayal, Manisha Gajbe, Robert Wisniewski, Torsten Hoefler:
LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear Programming. CoRR abs/2404.14193 (2024) - [i171]Piotr Luczynski, Lukas Gianinazzi, Patrick Iff, Leighton Wilson, Daniele De Sensi, Torsten Hoefler:
Near-Optimal Wafer-Scale Reduce. CoRR abs/2404.15888 (2024) - [i170]Nabil Abubaker, Torsten Hoefler:
SpComm3D: A Framework for Enabling Sparse Communication in 3D Sparse Kernels. CoRR abs/2404.19638 (2024) - [i169]Torsten Hoefler, Alexandru Calotoiu, Anurag Dipankar, Thomas C. Schulthess, Xavier Lapillonne, Oliver Fuhrer:
Towards Specialized Supercomputers for Climate Sciences: Computational Requirements of the Icosahedral Nonhydrostatic Weather and Climate Model. CoRR abs/2405.13043 (2024) - [i168]Timo Schneider, Pengcheng Xu, Torsten Hoefler:
FPsPIN: An FPGA-based Open-Hardware Research Platform for Processing in the Network. CoRR abs/2405.16378 (2024) - [i167]Maciej Besta, Lorenzo Paleari, Ales Kubicek, Piotr Nyczyk, Robert Gerstenberger, Patrick Iff, Tomasz Lehmann, Hubert Niewiadomski, Torsten Hoefler:
CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks. CoRR abs/2406.02524 (2024) - [i166]Maciej Besta, Ales Kubicek, Roman Niggli, Robert Gerstenberger, Lucas Weitzendorf, Mingyuan Chi, Patrick Iff, Joanna Gajda, Piotr Nyczyk, Jürgen Müller, Hubert Niewiadomski, Marcin Chrapek, Michal Podstawski, Torsten Hoefler:
Multi-Head RAG: Solving Multi-Aspect Problems with LLMs. CoRR abs/2406.05085 (2024) - [i165]Wenqi Jiang, Hang Hu, Torsten Hoefler, Gustavo Alonso:
Accelerating Graph-based Vector Search via Delayed-Synchronization Traversal. CoRR abs/2406.12385 (2024) - [i164]Maciej Besta, Florian Scheidl, Lukas Gianinazzi, Shachar Klaiman, Jürgen Müller, Torsten Hoefler:
Demystifying Higher-Order Graph Neural Networks. CoRR abs/2406.12841 (2024) - [i163]Tommaso Bonato, Abdul Kabbani, Ahmad Ghalayini, Mohammad Dohadwala, Michael Papamichael, Daniele De Sensi, Torsten Hoefler:
REPS: Recycling Entropies for Packet Spraying to Adaptively Explore Paths and Mitigate Failures. CoRR abs/2407.21625 (2024) - [i162]Patrik Okanovic, Grzegorz Kwasniewski, Paolo Sylos Labini, Maciej Besta, Flavio Vella, Torsten Hoefler:
High Performance Unstructured SpMM Computation Using Tensor Cores. CoRR abs/2408.11551 (2024) - [i161]Luigi Fusco, Mikhail Khalilov, Marcin Chrapek, Giridhar Chukkapalli, Thomas C. Schulthess, Torsten Hoefler:
Understanding Data Movement in Tightly Coupled Heterogeneous Systems: A Case Study with the Grace Hopper Superchip. CoRR abs/2408.11556 (2024) - [i160]Elias Frantar, Roberto L. Castro, Jiale Chen, Torsten Hoefler, Dan Alistarh:
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models. CoRR abs/2408.11743 (2024) - [i159]Maciej Besta, Robert Gerstenberger, Patrick Iff, Pournima Sonawane, Juan Gómez-Luna, Raghavendra Kanakagiri, Rui Min, Onur Mutlu, Torsten Hoefler, Raja Appuswamy, Aidan O'Mahony:
Hardware Acceleration for Knowledge Graph Processing: Challenges & Recent Developments. CoRR abs/2408.12173 (2024) - [i158]Mikhail Khalilov, Salvatore Di Girolamo, Marcin Chrapek, Rami Nudelman, Gil Bloch, Torsten Hoefler:
Network-Offloaded Bandwidth-Optimal Broadcast and Allgather for Distributed AI. CoRR abs/2408.13356 (2024) - [i157]Daniele De Sensi, Lorenzo Pichetti, Flavio Vella, Tiziano De Matteis, Zebin Ren, Luigi Fusco, Matteo Turisini, Daniele Cesarini, Kurt Lust, Animesh Trivedi, Duncan Roweth, Filippo Spiga, Salvatore Di Girolamo, Torsten Hoefler:
Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects. CoRR abs/2408.14090 (2024) - 2023
- [j60]Torsten Hoefler, Thomas Häner, Matthias Troyer:
Disentangling Hype from Practicality: On Realistically Achieving Quantum Advantage. Commun. ACM 66(5): 82-87 (2023) - [j59]Torsten Hoefler, Duncan Roweth, Keith D. Underwood, Robert Alverson, Mark Griswold, Vahid Tabatabaee, Mohan Kalkunte, Surendra Anubolu, Siyuan Shen, Moray McLaren, Abdul Kabbani, Steve Scott:
Data Center Ethernet and Remote Direct Memory Access: Issues at Hyperscale. Computer 56(7): 67-77 (2023) - [j58]Torsten Hoefler, Bjorn Stevens, Andreas F. Prein, Johanna Baehr, Thomas C. Schulthess, Thomas F. Stocker, John A. Taylor, Daniel Klocke, Pekka Manninen, Piers M. Forster, Tobias Kölling, Nicolas Gruber, Hartwig Anzt, Claudia Frauen, Florian Ziemen, Milan Klöwer, Karthik Kashinath, Christoph M. Schär, Oliver Fuhrer, Bryan N. Lawrence:
Earth Virtualization Engines: A Technical Perspective. Comput. Sci. Eng. 25(3): 50-59 (2023) - [j57]Satoshi Matsuoka, Jens Domke, Mohamed Wahib, Aleksandr Drozd, Torsten Hoefler:
Myths and legends in high-performance computing. Int. J. High Perform. Comput. Appl. 37(3-4): 245-259 (2023) - [j56]Maciej Besta, Marc Fischer, Vasiliki Kalavri, Michael Kapralov, Torsten Hoefler:
Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems. IEEE Trans. Parallel Distributed Syst. 34(6): 1860-1876 (2023) - [j55]Paul Scheffler, Florian Zaruba, Fabian Schuiki, Torsten Hoefler, Luca Benini:
Sparse Stream Semantic Registers: A Lightweight ISA Extension Accelerating General Sparse Linear Algebra. IEEE Trans. Parallel Distributed Syst. 34(12): 3147-3161 (2023) - [c261]Wei Qiu, Marcin Copik, Yun Wang, Alexandru Calotoiu, Torsten Hoefler:
User-guided Page Merging for Memory Deduplication in Serverless Systems. IEEE Big Data 2023: 159-169 - [c260]Tal Ben-Nun, Berke Ates, Alexandru Calotoiu, Torsten Hoefler:
Bridging Control-Centric and Data-Centric Optimization. CGO 2023: 173-185 - [c259]Tal Ben-Nun, Lukas Gianinazzi, Torsten Hoefler, Yishai Oltchik:
Maximum Flows in Parametric Graph Templates. CIAC 2023: 97-111 - [c258]Patrick Iff, Maciej Besta, Matheus A. Cavalcante, Tim Fischer, Luca Benini, Torsten Hoefler:
Sparse Hamming Graph: A Customizable Network-on-Chip Topology. DAC 2023: 1-6 - [c257]Patrick Iff, Maciej Besta, Matheus A. Cavalcante, Tim Fischer, Luca Benini, Torsten Hoefler:
HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement. DAC 2023: 1-6 - [c256]Tiziano De Matteis, Lukas Gianinazzi, Johannes de Fine Licht, Torsten Hoefler:
Streaming Task Graph Scheduling for Dataflow Architectures. HPDC 2023: 225-237 - [c255]Yunqiang Li, Jan C. van Gemert, Torsten Hoefler, Bert Moons, Evangelos Eleftheriou, Bram-Ernst Verhoef:
Differentiable Transportation Pruning. ICCV 2023: 16911-16921 - [c254]Elias Frantar, Saleh Ashkboos, Torsten Hoefler, Dan Alistarh:
OPTQ: Accurate Quantization for Generative Pre-trained Transformers. ICLR 2023 - [c253]Langwen Huang, Torsten Hoefler:
Compressing multidimensional weather and climate data into neural networks. ICLR 2023 - [c252]Lukas Trümper, Tal Ben-Nun, Philipp Schaad, Alexandru Calotoiu, Torsten Hoefler:
Performance Embeddings: A Similarity-Based Transfer Tuning Approach to Performance Optimization. ICS 2023: 50-62 - [c251]Marcin Copik, Roman Böhringer, Alexandru Calotoiu, Torsten Hoefler:
FMI: Fast and Cheap Message Passing for Serverless Functions. ICS 2023: 373-385 - [c250]Marcin Copik, Konstantin Taranov, Alexandru Calotoiu, Torsten Hoefler:
rFaaS: Enabling High Performance Serverless with RDMA and Leases. IPDPS 2023: 897-907 - [c249]Maciej Besta, Afonso Claudino Catarino, Lukas Gianinazzi, Nils Blach, Piotr Nyczyk, Hubert Niewiadomski, Torsten Hoefler:
HOT: Higher-Order Dynamic Graph Representation Learning With Efficient Transformers. LoG 2023: 15 - [c248]Kazuki Osawa, Shigang Li, Torsten Hoefler:
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices. MLSys 2023 - [c247]Tommy Nguyen, Yue Shi, Samuel Alexander Stein, Tim Stavenger, Marvin Warner, Martin Roetteler, Torsten Hoefler, Ang Li:
A Reference Implementation for a Quantum Message Passing Interface. QCE 2023: 292-293 - [c246]Maciej Besta, Robert Gerstenberger, Marc Fischer, Michal Podstawski, Nils Blach, Berke Egeli, George Mitenkov, Wojciech Chlapek, Marek T. Michalewicz, Hubert Niewiadomski, Jürgen Müller, Torsten Hoefler:
The Graph Database Interface: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores. SC 2023: 22:1-22:18 - [c245]Marcin Chrapek, Mikhail Khalilov, Torsten Hoefler:
HEAR: Homomorphically Encrypted Allreduce. SC 2023: 36:1-36:17 - [c244]Maciej Besta, Pawel Renc, Robert Gerstenberger, Paolo Sylos Labini, Alexandros Nikolaos Ziogas, Tiancheng Chen, Lukas Gianinazzi, Florian Scheidl, Kalman Szenes, Armon Carigiet, Patrick Iff, Grzegorz Kwasniewski, Raghavendra Kanakagiri, Chio Ge, Sammy Jaeger, Jaroslaw Was, Flavio Vella, Torsten Hoefler:
High-Performance and Programmable Attentional Graph Neural Networks with Global Tensor Formulations. SC 2023: 66:1-66:16 - [c243]Roberto L. Castro, Andrei Ivanov, Diego Andrade, Tal Ben-Nun, Basilio B. Fraguela, Torsten Hoefler:
VENOM: A Vectorized N: M Format for Unleashing the Power of Sparse Tensor Cores. SC 2023: 72:1-72:14 - [c242]Wenqi Jiang, Shigang Li, Yu Zhu, Johannes de Fine Licht, Zhenhao He, Runbin Shi, Cédric Renggli, Shuai Zhang, Theodoros Rekatsinas, Torsten Hoefler, Gustavo Alonso:
Co-design Hardware and Algorithm for Vector Search. SC 2023: 87:1-87:15 - [c241]Philipp Schaad, Timo Schneider, Tal Ben-Nun, Alexandru Calotoiu, Alexandros Nikolaos Ziogas, Torsten Hoefler:
FuzzyFlow: Leveraging Dataflow To Find and Squash Program Optimization Bugs. SC 2023: 88:1-88:15 - [c240]Yue Shi, Tommy Nguyen, Samuel Alexander Stein, Tim Stavenger, Marvin Warner, Martin Roetteler, Torsten Hoefler, Ang Li:
A Reference Implementation for a Quantum Message Passing Interface. SC Workshops 2023: 1420-1425 - [c239]Daniele De Sensi, Tiziano De Matteis, Konstantin Taranov, Salvatore Di Girolamo, Tobias Rahn, Torsten Hoefler:
Noise in the Clouds: Influence of Network Performance Variability on Application Scalability. SIGMETRICS (Abstracts) 2023: 17-18 - [c238]Kartik Lakhotia, Kelly Isham, Laura Monroe, Maciej Besta, Torsten Hoefler, Fabrizio Petrini:
In-network Allreduce with Multiple Spanning Trees on PolarFly. SPAA 2023: 165-176 - [c237]Andrei Ivanov, Benjamin Rothenberger, Arnaud Dethise, Marco Canini, Torsten Hoefler, Adrian Perrig:
SAGE: Software-based Attestation for GPU Execution. USENIX ATC 2023: 485-499 - [d3]Maciej Besta, Robert Gerstenberger, Marc Fischer, Michal Podstawski, Jürgen Müller, Nils Blach, Berke Egeli, George Mitenkov, Marek T. Michalewicz, Torsten Hoefler:
GDI-RMA 0.1 Software Artifact. Zenodo, 2023 - [d2]Maciej Besta, Pawel Renc, Robert Gerstenberger, Paolo Sylos Labini, Alexandros Nikolaos Ziogas, Tiancheng Chen, Lukas Gianinazzi, Florian Scheidl, Kalman Szenes, Armon Carigiet, Patrick Iff, Grzegorz Kwasniewski, Raghavendra Kanakagiri, Chio Ge, Sammy Jaeger, Jaroslaw Was, Flavio Vella, Torsten Hoefler:
GNN Scaling 0.1 Software Artifact. Zenodo, 2023 - [d1]Lukas Gianinazzi, Alexandros Nikolaos Ziogas, Piotr Luczynski, Saleh Ashkboosh, Langwen Huang, Florian Scheidl, Chio Ge, Armon Carigiet, Maciej Besta, Tal Ben-Nun, Torsten Hoefler:
Arrow Matrix Decompositions. Zenodo, 2023 - [i156]Niels Gleinig, Tal Ben-Nun, Torsten Hoefler:
A Theory of I/O-Efficient Sparse Neural Network Inference. CoRR abs/2301.01048 (2023) - [i155]Satoshi Matsuoka, Jens Domke, Mohamed Wahib, Aleksandr Drozd, Torsten Hoefler:
Myths and Legends in High-Performance Computing. CoRR abs/2301.02432 (2023) - [i154]Jinfan Chen, Shigang Li, Ran Guo, Jinhui Yuan, Torsten Hoefler:
AutoDDL: Automatic Distributed Deep Learning with Asymptotically Optimal Communication. CoRR abs/2301.06813 (2023) - [i153]Niels Gleinig, Tobias Rohner, Torsten Hoefler:
Approximate Reversible Circuits for NISQ-Era Quantum Computers. CoRR abs/2302.01066 (2023) - [i152]Torsten Hoefler, Duncan Roweth, Keith D. Underwood, Bob Alverson, Mark Griswold, Vahid Tabatabaee, Mohan Kalkunte, Surendra Anubolu, Siyuan Shen, Abdul Kabbani, Moray McLaren, Steve Scott:
Datacenter Ethernet and RDMA: Issues at Hyperscale. CoRR abs/2302.03337 (2023) - [i151]Kartik Lakhotia, Laura Monroe, Kelly Isham, Maciej Besta, Nils Blach, Torsten Hoefler, Fabrizio Petrini:
PolarStar: Expanding the Scalability Horizon of Diameter-3 Networks. CoRR abs/2302.07217 (2023) - [i150]Lukas Trümper, Tal Ben-Nun, Philipp Schaad, Alexandru Calotoiu, Torsten Hoefler:
Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization. CoRR abs/2303.08142 (2023) - [i149]Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Saleh Ashkboos, Torsten Hoefler:
STen: Productive and Efficient Sparsity in PyTorch. CoRR abs/2304.07613 (2023) - [i148]Kazuki Osawa, Satoki Ishikawa, Rio Yokota, Shigang Li, Torsten Hoefler:
ASDL: A Unified Interface for Gradient Preconditioning in PyTorch. CoRR abs/2305.04684 (2023) - [i147]Thomas Benz, Michael Rogenmoser, Paul Scheffler, Samuel Riedel, Alessandro Ottaviano, Andreas Kurth, Torsten Hoefler, Luca Benini:
A High-performance, Energy-efficient Modular DMA Engine Architecture. CoRR abs/2305.05240 (2023) - [i146]Paul Scheffler, Florian Zaruba, Fabian Schuiki, Torsten Hoefler, Luca Benini:
Sparse Stream Semantic Registers: A Lightweight ISA Extension Accelerating General Sparse Linear Algebra. CoRR abs/2305.05559 (2023) - [i145]Marcin Copik, Roman Böhringer, Alexandru Calotoiu, Torsten Hoefler:
FMI: Fast and Cheap Message Passing for Serverless Functions. CoRR abs/2305.08763 (2023) - [i144]Maciej Besta, Robert Gerstenberger, Marc Fischer, Michal Podstawski, Jürgen Müller, Nils Blach, Berke Egeli, George Mitenkov, Wojciech Chlapek, Marek T. Michalewicz, Torsten Hoefler:
High-Performance Graph Databases That Are Portable, Programmable, and Scale to Hundreds of Thousands of Cores. CoRR abs/2305.11162 (2023) - [i143]Tal Ben-Nun, Berke Ates, Alexandru Calotoiu, Torsten Hoefler:
Bridging Control-Centric and Data-Centric Optimization. CoRR abs/2306.00366 (2023) - [i142]Tiziano De Matteis, Lukas Gianinazzi, Johannes de Fine Licht, Torsten Hoefler:
Streaming Task Graph Scheduling for Dataflow Architectures. CoRR abs/2306.02730 (2023) - [i141]Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, Dan Alistarh:
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression. CoRR abs/2306.03078 (2023) - [i140]Wenqi Jiang, Shigang Li, Yu Zhu, Johannes de Fine Licht, Zhenhao He, Runbin Shi, Cédric Renggli, Shuai Zhang, Theodoros Rekatsinas, Torsten Hoefler, Gustavo Alonso:
Co-design Hardware and Algorithm for Vector Search. CoRR abs/2306.11182 (2023) - [i139]Philipp Schaad, Timo Schneider, Tal Ben-Nun, Alexandru Calotoiu, Alexandros Nikolaos Ziogas, Torsten Hoefler:
FuzzyFlow: Leveraging Dataflow To Find and Squash Program Optimization Bugs. CoRR abs/2306.16178 (2023) - [i138]Torsten Hoefler, Thomas Häner, Matthias Troyer:
Disentangling Hype from Practicality: On Realistically Achieving Quantum Advantage. CoRR abs/2307.00523 (2023) - [i137]Tal Ben-Nun, Lukas Gianinazzi, Torsten Hoefler, Yishai Oltchik:
Maximum Flows in Parametric Graph Templates. CoRR abs/2307.08420 (2023) - [i136]Yunqiang Li, Jan C. van Gemert, Torsten Hoefler, Bert Moons, Evangelos Eleftheriou, Bram-Ernst Verhoef:
Differentiable Transportation Pruning. CoRR abs/2307.08483 (2023) - [i135]Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Michal Podstawski, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler:
Graph of Thoughts: Solving Elaborate Problems with Large Language Models. CoRR abs/2308.09687 (2023) - [i134]Julia Bazinska, Andrei Ivanov, Tal Ben-Nun, Nikoli Dryden, Maciej Besta, Siyuan Shen, Torsten Hoefler:
Cached Operator Reordering: A Unified View for Fast GNN Training. CoRR abs/2308.12093 (2023) - [i133]Mikhail Khalilov, Marcin Chrapek, Siyuan Shen, Alessandro Vezzu, Thomas Benz, Salvatore Di Girolamo, Timo Schneider, Daniele De Sensi, Luca Benini, Torsten Hoefler:
OSMOSIS: Enabling Multi-Tenancy in Datacenter SmartNICs. CoRR abs/2309.03628 (2023) - [i132]Torsten Hoefler, Bjorn Stevens, Andreas F. Prein, Johanna Baehr, Thomas C. Schulthess, Thomas F. Stocker, John A. Taylor, Daniel Klocke, Pekka Manninen, Piers M. Forster, Tobias Kölling, Nicolas Gruber, Hartwig Anzt, Claudia Frauen, Florian Ziemen, Milan Klöwer, Karthik Kashinath, Christoph M. Schär, Oliver Fuhrer, Bryan N. Lawrence:
Earth Virtualization Engines - A Technical Perspective. CoRR abs/2309.09002 (2023) - [i131]Daniele De Sensi, Edgar Costa Molero, Salvatore Di Girolamo, Laurent Vanbever, Torsten Hoefler:
Canary: Congestion-Aware In-Network Allreduce Using Dynamic Trees. CoRR abs/2309.16214 (2023) - [i130]Roberto L. Castro, Andrei Ivanov, Diego Andrade, Tal Ben-Nun, Basilio B. Fraguela, Torsten Hoefler:
VENOM: A Vectorized N: M Format for Unleashing the Power of Sparse Tensor Cores. CoRR abs/2310.02065 (2023) - [i129]Nils Blach, Maciej Besta, Daniele De Sensi, Jens Domke, Hussein Harake, Shigang Li, Patrick Iff, Marek Konieczny, Kartik Lakhotia, Ales Kubicek, Marcel Ferrari, Fabrizio Petrini, Torsten Hoefler:
A High-Performance Design, Implementation, Deployment, and Evaluation of The Slim Fly Network. CoRR abs/2310.03742 (2023) - [i128]Saleh Ashkboos, Ilia Markov, Elias Frantar, Tingxuan Zhong, Xincheng Wang, Jie Ren, Torsten Hoefler, Dan Alistarh:
Towards End-to-end 4-Bit Inference on Generative Large Language Models. CoRR abs/2310.09259 (2023) - [i127]Wenqi Jiang, Marco Zeller, Roger Waleffe, Torsten Hoefler, Gustavo Alonso:
Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models. CoRR abs/2310.09949 (2023) - [i126]Patrick Iff, Benigna Bruggmann, Maciej Besta, Luca Benini, Torsten Hoefler:
RapidChiplet: A Toolchain for Rapid Design Space Exploration of Chiplet Architectures. CoRR abs/2311.06081 (2023) - [i125]Wei Qiu, Marcin Copik, Yun Wang, Alexandru Calotoiu, Torsten Hoefler:
User-guided Page Merging for Memory Deduplication in Serverless Systems. CoRR abs/2311.13588 (2023) - [i124]Maciej Besta, Afonso Claudino Catarino, Lukas Gianinazzi, Nils Blach, Piotr Nyczyk, Hubert Niewiadomski, Torsten Hoefler:
HOT: Higher-Order Dynamic Graph Representation Learning with Efficient Transformers. CoRR abs/2311.18526 (2023) - [i123]Eldar Kurtic, Torsten Hoefler, Dan Alistarh:
How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry" Benchmark. CoRR abs/2312.13547 (2023) - 2022
- [j54]Torsten Hoefler, Ariel Hendel, Duncan Roweth:
The Convergence of Hyperscale Data Center and High-Performance Computing Networks. Computer 55(7): 29-37 (2022) - [j53]Torsten Hoefler:
Benchmarking Data Science: 12 Ways to Lie With Statistics and Performance on Parallel Computers. Computer 55(8): 49-56 (2022) - [j52]Marcin Copik, Tobias Grosser, Torsten Hoefler, Paolo Bientinesi, Benjamin Berkels:
Work-Stealing Prefix Scan: Addressing Load Imbalance in Large-Scale Image Registration. IEEE Trans. Parallel Distributed Syst. 33(3): 523-535 (2022) - [c236]Konstantin Taranov, Benjamin Rothenberger, Daniele De Sensi, Adrian Perrig, Torsten Hoefler:
NeVerMore: Exploiting RDMA Mistakes in NVMe-oF Storage Applications. CCS 2022: 2765-2778 - [c235]Andrea Cossettini, Konstantin Taranov, Christian Vogt, Michele Magno, Torsten Hoefler, Luca Benini:
A RDMA Interface for Ultra-Fast Ultrasound Data-Streaming over an Optical Link. DATE 2022: 80-83 - [c234]Niels Gleinig, Torsten Hoefler:
Circuits for Measurement Based Quantum State Preparation. DATE 2022: 328-333 - [c233]Andrea Biagioni, Paolo Cretaro, Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, Michele Martinelli, Pier Stanislao Paolucci, Elena Pastorelli, Francesco Simula, Matteo Turisini, Piero Vicini, Roberto Ammendola, Pascale Bernier-Bruna, Claire Chen, Said Derradji, Stéphane Guez, Pierre-Axel Lagadec, Gregoire Pichon, Etienne Walter, Gaetan De Gassowski, Matthieu Hautreaux, Stephane Mathieu, Gilles Moreau, Marc Pérache, Hugo Taboada, Torsten Hoefler, Timo Schneider, Matteo Barnaba, Giuseppe Piero Brandino, Francesco De Giorgi, Matteo Poggi, Iakovos Mavroidis, Yannis Papaefstathiou, Nikolaos Tampouratzis, Benjamin Kalisch, Ulrich Krackhardt, Mondrian Nuessle, Pantelis Xirouchakis, Vangelis Mageiropoulos, Michalis Gianioudis, Harisis Loukas, Aggelos Ioannou, Nikos Kallimanis, Nikos Chrysos, Manolis Katevenis, Wolfgang Frings, Dominik Gottwald, Felime Guimaraes, Max Holicki, Volker Marx, Yannik Müller, Carsten Clauss, Hugo Falter, Xu Huang, Jennifer Lopez Barillao, Thomas Moschny, Simon Pickartz, Francisco J. Alfaro, Jesús Escudero-Sahuquillo, Pedro Javier García, Francisco J. Quiles, José L. Sánchez, Adrián Castelló, Jose Duro, María Engracia Gómez, Enrique S. Quintana-Ortí, Julio Sahuquillo, Eugenio Stabile:
RED-SEA: Network Solution for Exascale Architectures. DSD 2022: 712-719 - [c232]Shiyi Cao, Salvatore Di Girolamo, Torsten Hoefler:
Accelerating Data Serialization/Deserialization Protocols with In-Network Compute. ExaMPI@SC 2022: 22-30 - [c231]Johannes de Fine Licht, Christopher A. Pattison, Alexandros Nikolaos Ziogas, David Simmons-Duffin, Torsten Hoefler:
Fast Arbitrary Precision Floating Point on FPGA. FCCM 2022: 1-9 - [c230]Carl-Johannes Johnsen, Tiziano De Matteis, Tal Ben-Nun, Johannes de Fine Licht, Torsten Hoefler:
Temporal Vectorization: A Compiler Approach to Automatic Multi-Pumping. ICCAD 2022: 85:1-85:9 - [c229]Bryan A. Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, Kate Saenko:
Neural Parameter Allocation Search. ICLR 2022 - [c228]Larissa Schmid, Marcin Copik, Alexandru Calotoiu, Dominik Werle, Andreas Reiter, Michael Selzer, Anne Koziolek, Torsten Hoefler:
Performance-detective: automatic deduction of cheap and accurate performance models. ICS 2022: 3:1-3:13 - [c227]Alexandru Calotoiu, Tal Ben-Nun, Grzegorz Kwasniewski, Johannes de Fine Licht, Timo Schneider, Philipp Schaad, Torsten Hoefler:
Lifting C semantics for dataflow optimization. ICS 2022: 17:1-17:13 - [c226]Oliver Rausch, Tal Ben-Nun, Nikoli Dryden, Andrei Ivanov, Shigang Li, Torsten Hoefler:
A data-centric optimization framework for machine learning. ICS 2022: 36:1-36:13 - [c225]Andrei Lascu, Alastair F. Donaldson, Tobias Grosser, Torsten Hoefler:
Metamorphic Fuzzing of C++ Libraries. ICST 2022: 35-46 - [c224]Niels Gleinig, Maciej Besta, Torsten Hoefler:
I/O-Optimal Cache-Oblivious Sparse Matrix-Sparse Matrix Multiplication. IPDPS 2022: 36-46 - [c223]András Strausz, Flavio Vella, Salvatore Di Girolamo, Maciej Besta, Torsten Hoefler:
Asynchronous Distributed-Memory Triangle Counting and LCC with RMA Caching. IPDPS 2022: 291-301 - [c222]Maciej Besta, Raphael Grob, Cesare Miglioli, Nicola Bernold, Grzegorz Kwasniewski, Gabriel Gjini, Raghavendra Kanakagiri, Saleh Ashkboos, Lukas Gianinazzi, Nikoli Dryden, Torsten Hoefler:
Motif Prediction with Graph Neural Networks. KDD 2022: 35-45 - [c221]Maciej Besta, Patrick Iff, Florian Scheidl, Kazuki Osawa, Nikoli Dryden, Michal Podstawski, Tiancheng Chen, Torsten Hoefler:
Neural Graph Databases. LoG 2022: 31 - [c220]Saleh Ashkboos, Langwen Huang, Nikoli Dryden, Tal Ben-Nun, Peter Dueben, Lukas Gianinazzi, Luca Kummer, Torsten Hoefler:
ENS-10: A Dataset For Post-Processing Ensemble Weather Forecasts. NeurIPS 2022 - [c219]Nikoli Dryden, Torsten Hoefler:
Spatial Mixture-of-Experts. NeurIPS 2022 - [c218]Shigang Li, Torsten Hoefler:
Near-optimal sparse allreduce for distributed deep learning. PPoPP 2022: 135-149 - [c217]Salvatore Di Girolamo, Daniele De Sensi, Konstantin Taranov, Milos Malesevic, Maciej Besta, Timo Schneider, Severin Kistler, Torsten Hoefler:
Building Blocks for Network-Accelerated Distributed File Systems. SC 2022: 10:1-10:14 - [c216]Torsten Hoefler, Tommaso Bonato, Daniele De Sensi, Salvatore Di Girolamo, Shigang Li, Marco Heddes, Jon Belk, Deepak Goel, Miguel Castro, Steve Scott:
HammingMesh: A Network Topology for Large-Scale Deep Learning. SC 2022: 11:1-11:18 - [c215]Kartik Lakhotia, Maciej Besta, Laura Monroe, Kelly Isham, Patrick Iff, Torsten Hoefler, Fabrizio Petrini:
PolarFly: A Cost-Effective and Flexible Low-Diameter Topology. SC 2022: 12:1-12:15 - [c214]Alexandros Nikolaos Ziogas, Grzegorz Kwasniewski, Tal Ben-Nun, Timo Schneider, Torsten Hoefler:
Deinsum: Practically I/O Optimal Multi-Linear Algebra. SC 2022: 25:1-25:15 - [c213]Shigang Li, Kazuki Osawa, Torsten Hoefler:
Efficient Quantized Sparse Matrix Operations on Tensor Cores. SC 2022: 37:1-37:15 - [c212]Maciej Besta, Cesare Miglioli, Paolo Sylos Labini, Jakub Tetek, Patrick Iff, Raghavendra Kanakagiri, Saleh Ashkboos, Kacper Janda, Michal Podstawski, Grzegorz Kwasniewski, Niels Gleinig, Flavio Vella, Onur Mutlu, Torsten Hoefler:
ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations. SC 2022: 43:1-43:17 - [c211]Philipp Schaad, Tal Ben-Nun, Torsten Hoefler:
Boosting Performance Optimization with Interactive Data Movement Visualization. SC 2022: 64:1-64:16 - [c210]Tal Ben-Nun, Linus Groner, Florian Deconinck, Tobias Wicky, Eddie Davis, Johann Dahm, Oliver Elbert, Rhea George, Jeremy McGibbon, Lukas Trümper, Elynn Wu, Oliver Fuhrer, Thomas C. Schulthess, Torsten Hoefler:
Productive Performance Engineering for Weather and Climate Modeling with Python. SC 2022: 73:1-73:14 - [c209]Konstantin Taranov, Steve Byan, Virendra J. Marathe, Torsten Hoefler:
KafkaDirect: Zero-copy Data Access for Apache Kafka over RDMA Networks. SIGMOD Conference 2022: 2191-2204 - [c208]Niels Gleinig, Torsten Hoefler:
The Red-Blue Pebble Game on Trees and DAGs with Large Input. SIROCCO 2022: 135-153 - [i122]Shigang Li, Torsten Hoefler:
Near-Optimal Sparse Allreduce for Distributed Deep Learning. CoRR abs/2201.07598 (2022) - [i121]Konstantin Taranov, Benjamin Rothenberger, Daniele De Sensi, Adrian Perrig, Torsten Hoefler:
NeVerMore: Exploiting RDMA Mistakes in NVMe-oF Storage Applications. CoRR abs/2202.08080 (2022) - [i120]András Strausz, Flavio Vella, Salvatore Di Girolamo, Maciej Besta, Torsten Hoefler:
Asynchronous Distributed-Memory Triangle Counting and LCC with RMA Caching. CoRR abs/2202.13976 (2022) - [i119]Marcin Copik, Alexandru Calotoiu, Konstantin Taranov, Torsten Hoefler:
FaasKeeper: a Blueprint for Serverless Services. CoRR abs/2203.14859 (2022) - [i118]Johannes de Fine Licht, Christopher A. Pattison, Alexandros Nikolaos Ziogas, David Simmons-Duffin, Torsten Hoefler:
Fast Arbitrary Precision Floating Point on FPGA. CoRR abs/2204.06256 (2022) - [i117]Tal Ben-Nun, Linus Groner, Florian Deconinck, Tobias Wicky, Eddie Davis, Johann Dahm, Oliver Elbert, Rhea George, Jeremy McGibbon, Lukas Trümper, Elynn Wu, Oliver Fuhrer, Thomas C. Schulthess, Torsten Hoefler:
Productive Performance Engineering for Weather and Climate Modeling with Python. CoRR abs/2205.04148 (2022) - [i116]Lukas Gianinazzi, Tal Ben-Nun, Saleh Ashkboos, Yves Baumann, Piotr Luczynski, Torsten Hoefler:
The spatial computer: A model for energy-efficient parallel computation. CoRR abs/2205.04934 (2022) - [i115]Maciej Besta, Torsten Hoefler:
Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis. CoRR abs/2205.09702 (2022) - [i114]Alexandros Nikolaos Ziogas, Grzegorz Kwasniewski, Tal Ben-Nun, Timo Schneider, Torsten Hoefler:
Deinsum: Practically I/O Optimal Multilinear Algebra. CoRR abs/2206.08301 (2022) - [i113]Salvatore Di Girolamo, Daniele De Sensi, Konstantin Taranov, Milos Malesevic, Maciej Besta, Timo Schneider, Severin Kistler, Torsten Hoefler:
Building Blocks for Network-Accelerated Distributed File Systems. CoRR abs/2206.10007 (2022) - [i112]Saleh Ashkboos, Langwen Huang, Nikoli Dryden, Tal Ben-Nun, Peter Dueben, Lukas Gianinazzi, Luca Kummer, Torsten Hoefler:
ENS-10: A Dataset For Post-Processing Ensemble Weather Forecast. CoRR abs/2206.14786 (2022) - [i111]Philipp Schaad, Tal Ben-Nun, Torsten Hoefler:
Boosting Performance Optimization with Interactive Data Movement Visualization. CoRR abs/2207.07433 (2022) - [i110]Kartik Lakhotia, Maciej Besta, Laura Monroe, Kelly Isham, Patrick Iff, Torsten Hoefler, Fabrizio Petrini:
PolarFly: A Cost-Effective and Flexible Low-Diameter Topology. CoRR abs/2208.01695 (2022) - [i109]Maciej Besta, Cesare Miglioli, Paolo Sylos Labini, Jakub Tetek, Patrick Iff, Raghavendra Kanakagiri, Saleh Ashkboos, Kacper Janda, Michal Podstawski, Grzegorz Kwasniewski, Niels Gleinig, Flavio Vella, Onur Mutlu, Torsten Hoefler:
ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations. CoRR abs/2208.11469 (2022) - [i108]Torsten Hoefler, Tommaso Bonato, Daniele De Sensi, Salvatore Di Girolamo, Shigang Li, Marco Heddes, Jon Belk, Deepak Goel, Miguel Castro, Steve Scott:
HammingMesh: A Network Topology for Large-Scale Deep Learning. CoRR abs/2209.01346 (2022) - [i107]Andrei Ivanov, Benjamin Rothenberger, Arnaud Dethise, Marco Canini, Torsten Hoefler, Adrian Perrig:
SAGE: Software-based Attestation for GPU Execution. CoRR abs/2209.03125 (2022) - [i106]Shigang Li, Kazuki Osawa, Torsten Hoefler:
Efficient Quantized Sparse Matrix Operations on Tensor Cores. CoRR abs/2209.06979 (2022) - [i105]Maciej Besta, Patrick Iff, Florian Scheidl, Kazuki Osawa, Nikoli Dryden, Michal Podstawski, Tiancheng Chen, Torsten Hoefler:
Neural Graph Databases. CoRR abs/2209.09732 (2022) - [i104]Carl-Johannes Johnsen, Tiziano De Matteis, Tal Ben-Nun, Johannes de Fine Licht, Torsten Hoefler:
Temporal Vectorization: A Compiler Approach to Automatic Multi-Pumping. CoRR abs/2210.04598 (2022) - [i103]Langwen Huang, Torsten Hoefler:
Compressing multidimensional weather and climate data into neural networks. CoRR abs/2210.12538 (2022) - [i102]Daniele De Sensi, Tiziano De Matteis, Konstantin Taranov, Salvatore Di Girolamo, Tobias Rahn, Torsten Hoefler:
Noise in the Clouds: Influence of Network Performance Variability on Application Scalability. CoRR abs/2210.15315 (2022) - [i101]Elias Frantar, Saleh Ashkboos, Torsten Hoefler, Dan Alistarh:
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers. CoRR abs/2210.17323 (2022) - [i100]Michael E. Beverland, Prakash Murali, Matthias Troyer, Krysta M. Svore, Torsten Hoefler, Vadym Kliuchnikov, Guang Hao Low, Mathias Soeken, Aarthi Sundaram, Alexander Vaschillo:
Assessing requirements to scale to practical quantum advantage. CoRR abs/2211.07629 (2022) - [i99]Nikoli Dryden, Torsten Hoefler:
Spatial Mixture-of-Experts. CoRR abs/2211.13491 (2022) - [i98]Patrick Iff, Maciej Besta, Matheus A. Cavalcante, Tim Fischer, Luca Benini, Torsten Hoefler:
Sparse Hamming Graph: A Customizable Network-on-Chip Topology. CoRR abs/2211.13980 (2022) - [i97]Patrick Iff, Maciej Besta, Matheus A. Cavalcante, Tim Fischer, Luca Benini, Torsten Hoefler:
HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement. CoRR abs/2211.13989 (2022) - [i96]Kazuki Osawa, Shigang Li, Torsten Hoefler:
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices. CoRR abs/2211.14133 (2022) - [i95]Konstantin Taranov, Fabian Fischer, Torsten Hoefler:
Efficient RDMA Communication Protocols. CoRR abs/2212.09134 (2022) - [i94]Johannes de Fine Licht, Tiziano De Matteis, Tal Ben-Nun, Andreas Kuster, Oliver Rausch, Manuel Burger, Carl-Johannes Johnsen, Torsten Hoefler:
Python FPGA Programming with Data-Centric Multi-Level Design. CoRR abs/2212.13768 (2022) - 2021
- [j51]Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, Alexandra Peste:
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks. J. Mach. Learn. Res. 22: 241:1-241:124 (2021) - [j50]Peter Bauer, Peter D. Düben, Torsten Hoefler, Tiago Quintino, Thomas C. Schulthess, Nils P. Wedi:
The digital revolution of Earth-system science. Nat. Comput. Sci. 1(2): 104-113 (2021) - [j49]Arjun Pitchanathan, Christian Ulmann, Michel Weber, Torsten Hoefler, Tobias Grosser:
FPL: fast Presburger arithmetic through transprecision. Proc. ACM Program. Lang. 5(OOPSLA): 1-26 (2021) - [j48]Daniele De Sensi, Tiziano De Matteis, Konstantin Taranov, Salvatore Di Girolamo, Tobias Rahn, Torsten Hoefler:
Noise in the Clouds: Influence of Network Performance Variability on Application Scalability. Proc. ACM Meas. Anal. Comput. Syst. 6(3): 49:1-49:27 (2021) - [j47]Maciej Besta, Zur Vonarburg-Shmaria, Yannick Schaffner, Leonardo Schwarz, Grzegorz Kwasniewski, Lukas Gianinazzi, Jakub Beránek, Kacper Janda, Tobias Holenstein, Sebastian Leisinger, Peter Tatkowski, Esref Özdemir, Adrian Balla, Marcin Copik, Philipp Lindenberger, Marek Konieczny, Onur Mutlu, Torsten Hoefler:
GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra. Proc. VLDB Endow. 14(11): 1922-1936 (2021) - [j46]Edgar Solomonik, James Demmel, Torsten Hoefler:
Communication Lower Bounds of Bilinear Algorithms for Symmetric Tensor Contractions. SIAM J. Sci. Comput. 43(5): A3328-A3356 (2021) - [j45]Tobias Gysi, Christoph Müller, Oleksandr Zinenko, Stephan Herhut, Eddie Davis, Tobias Wicky, Oliver Fuhrer, Torsten Hoefler, Tobias Grosser:
Domain-Specific Multi-Level IR Rewriting for GPU: The Open Earth Compiler for GPU-accelerated Climate Simulation. ACM Trans. Archit. Code Optim. 18(4): 51:1-51:23 (2021) - [j44]Fabian Schuiki, Florian Zaruba, Torsten Hoefler, Luca Benini:
Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores. IEEE Trans. Computers 70(2): 212-227 (2021) - [j43]Florian Zaruba, Fabian Schuiki, Torsten Hoefler, Luca Benini:
Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads. IEEE Trans. Computers 70(11): 1845-1860 (2021) - [j42]Maciej Besta, Jens Domke, Marcel Schneider, Marek Konieczny, Salvatore Di Girolamo, Timo Schneider, Ankit Singla, Torsten Hoefler:
High-Performance Routing With Multipathing and Path Diversity in Ethernet and HPC Networks. IEEE Trans. Parallel Distributed Syst. 32(4): 943-959 (2021) - [j41]Johannes de Fine Licht, Maciej Besta, Simon Meierhans, Torsten Hoefler:
Transformations of High-Level Synthesis Codes for High-Performance Computing. IEEE Trans. Parallel Distributed Syst. 32(5): 1014-1029 (2021) - [j40]Shigang Li, Tal Ben-Nun, Giorgi Nadiradze, Salvatore Di Girolamo, Nikoli Dryden, Dan Alistarh, Torsten Hoefler:
Breaking (Global) Barriers in Parallel Stochastic Optimization With Wait-Avoiding Group Averaging. IEEE Trans. Parallel Distributed Syst. 32(7): 1725-1739 (2021) - [c207]Dan Graur, Rodrigo Bruno, Joschka Bischoff, Marcel Rieser, Wolfgang Scherr, Torsten Hoefler, Gustavo Alonso:
Hermes: Enabling efficient large-scale simulation in MATSim. ANT/EDI40 2021: 635-641 - [c206]Johannes de Fine Licht, Andreas Kuster, Tiziano De Matteis, Tal Ben-Nun, Dominic Hofer, Torsten Hoefler:
StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems. CGO 2021: 315-326 - [c205]Niels Gleinig, Torsten Hoefler:
An Efficient Algorithm for Sparse Quantum State Preparation. DAC 2021: 433-438 - [c204]Paul Scheffler, Florian Zaruba, Fabian Schuiki, Torsten Hoefler, Luca Benini:
Indirection Stream Semantic Register Architecture for Efficient Sparse-Dense Linear Algebra. DATE 2021: 1787-1792 - [c203]Chris Cummins, Zacharias V. Fisches, Tal Ben-Nun, Torsten Hoefler, Michael F. P. O'Boyle, Hugh Leather:
ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations. ICML 2021: 2244-2253 - [c202]Alexandros Nikolaos Ziogas, Tal Ben-Nun, Timo Schneider, Torsten Hoefler:
NPBench: a benchmarking suite for high-performance NumPy. ICS 2021: 63-74 - [c201]Marcus Ritter, Alexander Geiß, Johannes Wehrstein, Alexandru Calotoiu, Thorsten Reimann, Torsten Hoefler, Felix Wolf:
Noise-Resilient Empirical Performance Modeling with Deep Neural Networks. IPDPS 2021: 23-34 - [c200]Salvatore Di Girolamo, Andreas Kurth, Alexandru Calotoiu, Thomas Benz, Timo Schneider, Jakub Beránek, Luca Benini, Torsten Hoefler:
A RISC-V in-network accelerator for flexible high-performance low-power packet processing. ISCA 2021: 958-971 - [c199]Maciej Besta, Raghavendra Kanakagiri, Grzegorz Kwasniewski, Rachata Ausavarungnirun, Jakub Beránek, Konstantinos Kanellopoulos, Kacper Janda, Zur Vonarburg-Shmaria, Lukas Gianinazzi, Ioana Stefan, Juan Gómez-Luna, Jakub Golinowski, Marcin Copik, Lukas Kapp-Schwoerer, Salvatore Di Girolamo, Nils Blach, Marek Konieczny, Onur Mutlu, Torsten Hoefler:
SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems. MICRO 2021: 282-297 - [c198]Marcin Copik, Grzegorz Kwasniewski, Maciej Besta, Michal Podstawski, Torsten Hoefler:
SeBS: a serverless benchmark suite for function-as-a-service computing. Middleware 2021: 64-78 - [c197]Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, Torsten Hoefler:
Data Movement Is All You Need: A Case Study on Optimizing Transformers. MLSys 2021 - [c196]Marcin Copik, Alexandru Calotoiu, Tobias Grosser, Nicolas Wicki, Felix Wolf, Torsten Hoefler:
Extracting clean performance models from tainted programs. PPoPP 2021: 403-417 - [c195]Grzegorz Kwasniewski, Tal Ben-Nun, Alexandros Nikolaos Ziogas, Timo Schneider, Maciej Besta, Torsten Hoefler:
On the parallel I/O optimality of linear algebra kernels: near-optimal LU factorization. PPoPP 2021: 463-464 - [c194]Thomas Häner, Damian S. Steiger, Torsten Hoefler, Matthias Troyer:
Distributed quantum computing with QMPI. SC 2021: 16 - [c193]Shigang Li, Torsten Hoefler:
Chimera: efficiently training large-scale neural networks with bidirectional pipelines. SC 2021: 27 - [c192]Daniele De Sensi, Salvatore Di Girolamo, Saleh Ashkboos, Shigang Li, Torsten Hoefler:
Flare: flexible in-network allreduce. SC 2021: 35 - [c191]Grzegorz Kwasniewski, Marko Kabic, Tal Ben-Nun, Alexandros Nikolaos Ziogas, Jens Eirik Saethre, André Gaillard, Timo Schneider, Maciej Besta, Anton Kozhevnikov, Joost VandeVondele, Torsten Hoefler:
On the parallel I/O optimality of linear algebra kernels: near-optimal matrix factorizations. SC 2021: 70 - [c190]Nikoli Dryden, Roman Böhringer, Tal Ben-Nun, Torsten Hoefler:
Clairvoyant prefetching for distributed machine learning I/O. SC 2021: 92 - [c189]Alexandros Nikolaos Ziogas, Timo Schneider, Tal Ben-Nun, Alexandru Calotoiu, Tiziano De Matteis, Johannes de Fine Licht, Luca Lavarini, Torsten Hoefler:
Productivity, portability, performance: data-centric Python. SC 2021: 95 - [c188]Konstantin Taranov, Salvatore Di Girolamo, Torsten Hoefler:
CoRM: Compactable Remote Memory over RDMA. SIGMOD Conference 2021: 1811-1824 - [c187]Lukas Gianinazzi, Maciej Besta, Yannick Schaffner, Torsten Hoefler:
Parallel Algorithms for Finding Large Cliques in Sparse Graphs. SPAA 2021: 243-253 - [c186]Grzegorz Kwasniewski, Tal Ben-Nun, Lukas Gianinazzi, Alexandru Calotoiu, Timo Schneider, Alexandros Nikolaos Ziogas, Maciej Besta, Torsten Hoefler:
Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs. SPAA 2021: 328-339 - [c185]Konstantin Taranov, Rodrigo Bruno, Gustavo Alonso, Torsten Hoefler:
Naos: Serialization-free RDMA networking in Java. USENIX ATC 2021: 1-14 - [c184]Maksym Planeta, Jan Bierbaum, Leo Sahaya Daphne Antony, Torsten Hoefler, Hermann Härtig:
MigrOS: Transparent Live-Migration Support for Containerised RDMA Applications. USENIX ATC 2021: 47-63 - [c183]Benjamin Rothenberger, Konstantin Taranov, Adrian Perrig, Torsten Hoefler:
ReDMArk: Bypassing RDMA Security Mechanisms. USENIX Security Symposium 2021: 4277-4292 - [i93]Roman Böhringer, Nikoli Dryden, Tal Ben-Nun, Torsten Hoefler:
Clairvoyant Prefetching for Distributed Machine Learning I/O. CoRR abs/2101.08734 (2021) - [i92]David Ittah, Thomas Häner, Vadym Kliuchnikov, Torsten Hoefler:
Enabling Dataflow Optimization for Quantum Programs. CoRR abs/2101.11030 (2021) - [i91]Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, Alexandra Peste:
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks. CoRR abs/2102.00554 (2021) - [i90]Maciej Besta, Zur Vonarburg-Shmaria, Yannick Schaffner, Leonardo Schwarz, Grzegorz Kwasniewski, Lukas Gianinazzi, Jakub Beránek, Kacper Janda, Tobias Holenstein, Sebastian Leisinger, Peter Tatkowski, Esref Özdemir, Adrian Balla, Marcin Copik, Philipp Lindenberger, Pavel Kalvoda, Marek Konieczny, Onur Mutlu, Torsten Hoefler:
GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra. CoRR abs/2103.03653 (2021) - [i89]Maciej Besta, Raghavendra Kanakagiri, Grzegorz Kwasniewski, Rachata Ausavarungnirun, Jakub Beránek, Konstantinos Kanellopoulos, Kacper Janda, Zur Vonarburg-Shmaria, Lukas Gianinazzi, Ioana Stefan, Juan Gómez-Luna, Marcin Copik, Lukas Kapp-Schwoerer, Salvatore Di Girolamo, Marek Konieczny, Onur Mutlu, Torsten Hoefler:
SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems. CoRR abs/2104.07582 (2021) - [i88]Thomas Häner, Damian S. Steiger, Torsten Hoefler, Matthias Troyer:
Distributed Quantum Computing with QMPI. CoRR abs/2105.01109 (2021) - [i87]Grzegorz Kwasniewski, Tal Ben-Nun, Lukas Gianinazzi, Alexandru Calotoiu, Timo Schneider, Alexandros Nikolaos Ziogas, Maciej Besta, Torsten Hoefler:
Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs. CoRR abs/2105.07203 (2021) - [i86]Maciej Besta, Marcel Schneider, Salvatore Di Girolamo, Ankit Singla, Torsten Hoefler:
Towards Million-Server Network Simulations on Just a Laptop. CoRR abs/2105.12663 (2021) - [i85]Maciej Besta, Raphael Grob, Cesare Miglioli, Nicola Bernold, Grzegorz Kwasniewski, Gabriel Gjini, Raghavendra Kanakagiri, Saleh Ashkboos, Lukas Gianinazzi, Nikoli Dryden, Torsten Hoefler:
Motif Prediction with Graph Neural Networks. CoRR abs/2106.00761 (2021) - [i84]Lukas Gianinazzi, Maximilian Fries, Nikoli Dryden, Tal Ben-Nun, Maciej Besta, Torsten Hoefler:
Learning Combinatorial Node Labeling Algorithms. CoRR abs/2106.03594 (2021) - [i83]Marcin Copik, Konstantin Taranov, Alexandru Calotoiu, Torsten Hoefler:
RFaaS: RDMA-Enabled FaaS Platform for Serverless High-Performance Computing. CoRR abs/2106.13859 (2021) - [i82]Daniele De Sensi, Salvatore Di Girolamo, Saleh Ashkboos, Shigang Li, Torsten Hoefler:
Flare: Flexible In-Network Allreduce. CoRR abs/2106.15565 (2021) - [i81]Alexandros Nikolaos Ziogas, Timo Schneider, Tal Ben-Nun, Alexandru Calotoiu, Tiziano De Matteis, Johannes de Fine Licht, Luca Lavarini, Torsten Hoefler:
Productivity, Portability, Performance: Data-Centric Python. CoRR abs/2107.00555 (2021) - [i80]Shigang Li, Torsten Höfler:
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines. CoRR abs/2107.06925 (2021) - [i79]Grzegorz Kwasniewski, Marko Kabic, Tal Ben-Nun, Alexandros Nikolaos Ziogas, Jens Eirik Saethre, André Gaillard, Timo Schneider, Maciej Besta, Anton Kozhevnikov, Joost VandeVondele, Torsten Hoefler:
On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations. CoRR abs/2108.09337 (2021) - [i78]Lukas Gianinazzi, Maciej Besta, Yannick Schaffner, Torsten Hoefler:
Parallel Algorithms for Finding Large Cliques in Sparse Graphs. CoRR abs/2109.09663 (2021) - [i77]Oliver Rausch, Tal Ben-Nun, Nikoli Dryden, Andrei Ivanov, Shigang Li, Torsten Hoefler:
A Data-Centric Optimization Framework for Machine Learning. CoRR abs/2110.10802 (2021) - [i76]Alexandru Calotoiu, Tal Ben-Nun, Grzegorz Kwasniewski, Johannes de Fine Licht, Timo Schneider, Philipp Schaad, Torsten Hoefler:
Lifting C Semantics for Dataflow Optimization. CoRR abs/2112.11879 (2021) - 2020
- [j39]Thomas Häner, Torsten Hoefler, Matthias Troyer:
Assertion-based optimization of Quantum programs. Proc. ACM Program. Lang. 4(OOPSLA): 133:1-133:20 (2020) - [j38]Tobias Grosser, Theodoros Theodoridis, Maximilian Falkenstein, Arjun Pitchanathan, Michael Kruse, Manuel Rigger, Zhendong Su, Torsten Hoefler:
Fast linear programming through transprecision computing on small and sparse data. Proc. ACM Program. Lang. 4(OOPSLA): 195:1-195:28 (2020) - [j37]Jesper Larsson Träff, Torsten Hoefler:
Special issue: Selected papers from EuroMPI 2019. Parallel Comput. 99: 102695 (2020) - [j36]Carlos Osuna, Tobias Wicky, Fabian Thuering, Torsten Hoefler, Oliver Fuhrer:
Dawn: a High-level Domain-Specific Language Compiler Toolchain for Weather and Climate Applications. Supercomput. Front. Innov. 7(2): 79-97 (2020) - [j35]Asif Ali Khan, Hauke Mewes, Tobias Grosser, Torsten Hoefler, Jerónimo Castrillón:
Polyhedral Compilation for Racetrack Memories. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 39(11): 3968-3980 (2020) - [j34]Maciej Besta, Marc Fischer, Tal Ben-Nun, Dimitri Stanojevic, Johannes de Fine Licht, Torsten Hoefler:
Substream-Centric Maximum Matchings on FPGA. ACM Trans. Reconfigurable Technol. Syst. 13(2): 8:1-8:33 (2020) - [c182]Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler, Daniel Soudry:
Augment Your Batch: Improving Generalization Through Instance Repetition. CVPR 2020: 8126-8135 - [c181]Andreas Kurth, Samuel Riedel, Florian Zaruba, Torsten Hoefler, Luca Benini:
ATUNs: Modular and Scalable Support for Atomic Operations in a Shared Memory Multiprocessor. DAC 2020: 1-6 - [c180]Johannes de Fine Licht, Grzegorz Kwasniewski, Torsten Hoefler:
Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis. FPGA 2020: 244-254 - [c179]Marcus Ritter, Alexandru Calotoiu, Sebastian Rinke, Thorsten Reimann, Torsten Hoefler, Felix Wolf:
Learning Cost-Effective Sampling Strategies for Empirical Performance Modeling. IPDPS 2020: 884-895 - [c178]Maciej Besta, Raghavendra Kanakagiri, Harun Mustafa, Mikhail Karasikov, Gunnar Rätsch, Torsten Hoefler, Edgar Solomonik:
Communication-Efficient Jaccard similarity for High-Performance Distributed Genome Comparisons. IPDPS 2020: 1122-1132 - [c177]Shigang Li, Tal Ben-Nun, Salvatore Di Girolamo, Dan Alistarh, Torsten Hoefler:
Taming unbalanced training workloads in deep learning with partial collective operations. PPoPP 2020: 45-61 - [c176]Yuyang Jin, Haojie Wang, Xiongchao Tang, Torsten Hoefler, Xu Liu, Jidong Zhai:
Identifying scalability bottlenecks for large-scale parallel programs with graph analysis. PPoPP 2020: 409-410 - [c175]Alexandr Nigay, Lukas Mosimann, Timo Schneider, Torsten Hoefler:
Communication and Timing Issues with MPI Virtualization. EuroMPI 2020: 11-20 - [c174]Maciej Besta, Marcel Schneider, Marek Konieczny, Karolina Cynk, Erik Henriksson, Salvatore Di Girolamo, Ankit Singla, Torsten Hoefler:
FatPaths: routing in supercomputers and data centers when shortest paths fall short. SC 2020: 27 - [c173]Yuyang Jin, Haojie Wang, Teng Yu, Xiongchao Tang, Torsten Hoefler, Xu Liu, Jidong Zhai:
ScalAna: automating scaling loss detection with graph analysis. SC 2020: 28 - [c172]Daniele De Sensi, Salvatore Di Girolamo, Kim H. McMahon, Duncan Roweth, Torsten Hoefler:
An in-depth analysis of the slingshot interconnect. SC 2020: 35 - [c171]Tiziano De Matteis, Johannes de Fine Licht, Torsten Hoefler:
fBLAS: streaming linear algebra on FPGA. SC 2020: 59 - [c170]Alexandru Calotoiu, Markus Geisenhofer, Florian Kummer, Marcus Ritter, Jens Weber, Torsten Hoefler, Martin Oberlack, Felix Wolf:
Empirical Modeling of Spatially Diverging Performance. HUST/ProTools@SC 2020: 71-80 - [c169]Maciej Besta, Armon Carigiet, Kacper Janda, Zur Vonarburg-Shmaria, Lukas Gianinazzi, Torsten Hoefler:
High-performance parallel graph coloring with strong guarantees on work, depth, and quality. SC 2020: 99 - [c168]Lukas Gianinazzi, Torsten Hoefler:
Parallel Planar Subgraph Isomorphism and Vertex Connectivity. SPAA 2020: 269-280 - [c167]Konstantin Taranov, Benjamin Rothenberger, Adrian Perrig, Torsten Hoefler:
sRDMA - Efficient NIC-based Authentication and Encryption for Remote Direct Memory Access. USENIX ATC 2020: 691-704 - [p2]Alexandru Calotoiu, Marcin Copik, Torsten Hoefler, Marcus Ritter, Sergei Shudler, Felix Wolf:
ExtraPeak: Advanced Automatic Performance Modeling for HPC Applications. Software for Exascale Computing 2020: 453-482 - [i75]Tobias Gysi, Tobias Grosser, Laurin Brandner, Torsten Hoefler:
A Fast Analytical Model of Fully Associative Caches. CoRR abs/2001.01653 (2020) - [i74]Robert Gerstenberger, Maciej Besta, Torsten Hoefler:
Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided. CoRR abs/2001.07747 (2020) - [i73]Florian Zaruba, Fabian Schuiki, Torsten Hoefler, Luca Benini:
Snitch: A 10 kGE Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads. CoRR abs/2002.10143 (2020) - [i72]Chris Cummins, Zacharias V. Fisches, Tal Ben-Nun, Torsten Hoefler, Hugh Leather:
ProGraML: Graph-based Deep Learning for Program Optimization and Analysis. CoRR abs/2003.10536 (2020) - [i71]Shigang Li, Tal Ben-Nun, Dan Alistarh, Salvatore Di Girolamo, Nikoli Dryden, Torsten Hoefler:
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging. CoRR abs/2005.00124 (2020) - [i70]Peter Grönquist, Chengyuan Yao, Tal Ben-Nun, Nikoli Dryden, Peter Dueben, Shigang Li, Torsten Hoefler:
Deep Learning for Post-Processing Ensemble Weather Forecasts. CoRR abs/2005.08748 (2020) - [i69]Tobias Gysi, Christoph Müller, Oleksandr Zinenko, Stephan Herhut, Eddie Davis, Tobias Wicky, Oliver Fuhrer, Torsten Hoefler, Tobias Grosser:
Domain-Specific Multi-Level IR Rewriting for GPU. CoRR abs/2005.13014 (2020) - [i68]Bryan A. Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, Kate Saenko:
Shapeshifter Networks: Cross-layer Parameter Sharing for Scalable and Effective Deep Learning. CoRR abs/2006.10598 (2020) - [i67]Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, Torsten Hoefler:
Data Movement Is All You Need: A Case Study on Optimizing Transformers. CoRR abs/2007.00072 (2020) - [i66]Lukas Gianinazzi, Torsten Hoefler:
Parallel Planar Subgraph Isomorphism and Vertex Connectivity. CoRR abs/2007.01199 (2020) - [i65]Maciej Besta, Jens Domke, Marcel Schneider, Marek Konieczny, Salvatore Di Girolamo, Timo Schneider, Ankit Singla, Torsten Hoefler:
High-Performance Routing with Multipathing and Path Diversity in Supercomputers and Data Centers. CoRR abs/2007.03776 (2020) - [i64]Daniele De Sensi, Salvatore Di Girolamo, Kim H. McMahon, Duncan Roweth, Torsten Hoefler:
An In-Depth Analysis of the Slingshot Interconnect. CoRR abs/2008.08886 (2020) - [i63]Maciej Besta, Armon Carigiet, Zur Vonarburg-Shmaria, Kacper Janda, Lukas Gianinazzi, Torsten Hoefler:
High-Performance Parallel Graph Coloring with Strong Guarantees on Work, Depth, and Quality. CoRR abs/2008.11321 (2020) - [i62]Yuyang Jin, Haojie Wang, Teng Yu, Xiongchao Tang, Torsten Hoefler, Xu Liu, Jidong Zhai:
ScalAna: Automating Scaling Loss Detection with Graph Analysis. CoRR abs/2009.01692 (2020) - [i61]Maksym Planeta, Jan Bierbaum, Leo Sahaya Daphne Antony, Torsten Hoefler, Hermann Härtig:
TardiS: Migrating Containers with RDMA Networks. CoRR abs/2009.06988 (2020) - [i60]Salvatore Di Girolamo, Andreas Kurth, Alexandru Calotoiu, Thomas Benz, Timo Schneider, Jakub Beránek, Luca Benini, Torsten Hoefler:
PsPIN: A high-performance low-power architecture for flexible in-network compute. CoRR abs/2010.03536 (2020) - [i59]Grzegorz Kwasniewski, Tal Ben-Nun, Alexandros Nikolaos Ziogas, Timo Schneider, Maciej Besta, Torsten Hoefler:
On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal LU Factorization. CoRR abs/2010.05975 (2020) - [i58]Maciej Besta, Torsten Hoefler:
Fault Tolerance for Remote Memory Access Programming Models. CoRR abs/2010.09025 (2020) - [i57]Maciej Besta, Torsten Hoefler:
Accelerating Irregular Computations with Hardware Transactional Memory and Active Messages. CoRR abs/2010.09135 (2020) - [i56]Hermann Schweizer, Maciej Besta, Torsten Hoefler:
Evaluating the Cost of Atomic Operations on Modern Architectures. CoRR abs/2010.09852 (2020) - [i55]Patrick Schmid, Maciej Besta, Torsten Hoefler:
High-Performance Distributed RMA Locks. CoRR abs/2010.09854 (2020) - [i54]Maciej Besta, Florian Marending, Edgar Solomonik, Torsten Hoefler:
SlimSell: A Vectorizable Graph Representation for Breadth-First Search. CoRR abs/2010.09913 (2020) - [i53]Maciej Besta, Syed Minhaj Hassan, Sudhakar Yalamanchili, Rachata Ausavarungnirun, Onur Mutlu, Torsten Hoefler:
Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy Efficiency and Scalability. CoRR abs/2010.10683 (2020) - [i52]Marcin Copik, Tobias Grosser, Torsten Hoefler, Paolo Bientinesi, Benjamin Berkels:
Work-stealing prefix scan: Addressing load imbalance in large-scale image registration. CoRR abs/2010.12478 (2020) - [i51]Maciej Besta, Marc Fischer, Tal Ben-Nun, Dimitri Stanojevic, Johannes de Fine Licht, Torsten Hoefler:
Substream-Centric Maximum Matchings on FPGA. CoRR abs/2010.14684 (2020) - [i50]Johannes de Fine Licht, Andreas Kuster, Tiziano De Matteis, Tal Ben-Nun, Dominic Hofer, Torsten Hoefler:
StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems. CoRR abs/2010.15218 (2020) - [i49]Maciej Besta, Dimitri Stanojevic, Tijana Zivic, Jagpreet Singh, Maurice Hoerold, Torsten Hoefler:
Log(Graph): A Near-Optimal High-Performance Graph Representation. CoRR abs/2010.15879 (2020) - [i48]Maciej Besta, Michal Podstawski, Linus Groner, Edgar Solomonik, Torsten Hoefler:
To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations. CoRR abs/2010.16012 (2020) - [i47]Tal Ben-Nun, Lukas Gianinazzi, Torsten Hoefler, Yishai Oltchik:
Parametric Graph Templates: Properties and Algorithms. CoRR abs/2011.07001 (2020) - [i46]Paul Scheffler, Florian Zaruba, Fabian Schuiki, Torsten Hoefler, Luca Benini:
Indirection Stream Semantic Register Architecture for Efficient Sparse-Dense Linear Algebra. CoRR abs/2011.08070 (2020) - [i45]Chris Cummins, Hugh Leather, Zacharias V. Fisches, Tal Ben-Nun, Torsten Hoefler, Michael F. P. O'Boyle:
Deep Data Flow Analysis. CoRR abs/2012.01470 (2020) - [i44]Marcin Copik, Grzegorz Kwasniewski, Maciej Besta, Michal Podstawski, Torsten Hoefler:
SeBS: A Serverless Benchmark Suite for Function-as-a-Service Computing. CoRR abs/2012.14132 (2020) - [i43]Marcin Copik, Alexandru Calotoiu, Tobias Grosser, Nicolas Wicki, Felix Wolf, Torsten Hoefler:
Extracting Clean Performance Models from Tainted Programs. CoRR abs/2012.15592 (2020)
2010 – 2019
- 2019
- [j33]Pedro Yébenes, Jesús Escudero-Sahuquillo, Pedro Javier García, Francisco J. Quiles, Torsten Hoefler:
Head-of-line blocking avoidance in Slim Fly networks using deadlock-free non-minimal and adaptive routing. Concurr. Comput. Pract. Exp. 31(2) (2019) - [j32]Thomas C. Schulthess, Peter Bauer, Nils Wedi, Oliver Fuhrer, Torsten Hoefler, Christoph M. Schär:
Reflecting on the Goal and Baseline for Exascale Computing: A Roadmap Based on Weather and Climate Simulations. Comput. Sci. Eng. 21(1): 30-41 (2019) - [j31]Tal Ben-Nun, Torsten Hoefler:
Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis. ACM Comput. Surv. 52(4): 65:1-65:43 (2019) - [j30]Claude Barthels, Ingo Müller, Konstantin Taranov, Gustavo Alonso, Torsten Hoefler:
Strong consistency is not hard to get: Two-Phase Locking and Two-Phase Commit on Thousands of Cores. Proc. VLDB Endow. 12(13): 2325-2338 (2019) - [j29]Sergei Shudler, Yannick Berens, Alexandru Calotoiu, Torsten Hoefler, Alexandre Strube, Felix Wolf:
Engineering Algorithms for Scalability through Continuous Validation of Performance Expectations. IEEE Trans. Parallel Distributed Syst. 30(8): 1768-1785 (2019) - [c166]Tobias Gysi, Tobias Grosser, Torsten Hoefler:
Absinthe: Learning an Analytical Performance Model to Fuse and Tile Stencil Codes in One Shot. PACT 2019: 370-382 - [c165]Niels Gleinig, Frances Ann Hubis, Torsten Hoefler:
Embedding Functions Into Reversible Circuits: A Probabilistic Approach to the Number of Lines. DAC 2019: 72 - [c164]Maciej Besta, Marc Fischer, Tal Ben-Nun, Johannes de Fine Licht, Torsten Hoefler:
Substream-Centric Maximum Matchings on FPGA. FPGA 2019: 152-161 - [c163]Paul R. Eller, Torsten Hoefler, William Gropp:
Using performance models to understand scalable Krylov solver performance at scale for structured grid problems. ICS 2019: 138-149 - [c162]Tal Ben-Nun, Maciej Besta, Simon Huber, Alexandros Nikolaos Ziogas, Daniel Peter, Torsten Hoefler:
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning. IPDPS 2019: 66-77 - [c161]Torsten Hoefler:
Invited Talk 2. IPDPS Workshops 2019: 392 - [c160]Salvatore Di Girolamo, Pirmin Schmid, Thomas C. Schulthess, Torsten Hoefler:
SimFS: A Simulation Data Virtualizing File System Interface. IPDPS 2019: 621-630 - [c159]Felix Thaler, Stefan Moosbrugger, Carlos Osuna, Mauro Bianco, Hannes Vogt, Anton Afanasyev, Lukas Mosimann, Oliver Fuhrer, Thomas C. Schulthess, Torsten Hoefler:
Porting the COSMO Weather Model to Manycore CPUs. PASC 2019: 13:1-13:11 - [c158]Tobias Gysi, Tobias Grosser, Laurin Brandner, Torsten Hoefler:
A fast analytical model of fully associative caches. PLDI 2019: 816-829 - [c157]Martin Küttler, Maksym Planeta, Jan Bierbaum, Carsten Weinhold, Hermann Härtig, Amnon Barak, Torsten Hoefler:
Corrected trees for reliable group communication. PPoPP 2019: 287-299 - [c156]Jesper Larsson Träff, Torsten Hoefler:
Foreword EuroMPI 2019. EuroMPI 2019: 1:1-1:2 - [c155]Alexandros Nikolaos Ziogas, Tal Ben-Nun, Guillermo Indalecio Fernández, Timo Schneider, Mathieu Luisier, Torsten Hoefler:
A data-centric approach to extreme-scale ab initio dissipative quantum transport simulations. SC 2019: 1:1-1:13 - [c154]Cédric Renggli, Saleh Ashkboos, Mehdi Aghagolzadeh, Dan Alistarh, Torsten Hoefler:
SparCML: high-performance sparse communication for machine learning. SC 2019: 11:1-11:15 - [c153]Daniele De Sensi, Salvatore Di Girolamo, Torsten Hoefler:
Mitigating network noise on Dragonfly networks through application-aware routing. SC 2019: 16:1-16:32 - [c152]Grzegorz Kwasniewski, Marko Kabic, Maciej Besta, Joost VandeVondele, Raffaele Solcà, Torsten Hoefler:
Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication. SC 2019: 24:1-24:22 - [c151]Maciej Besta, Simon Weber, Lukas Gianinazzi, Robert Gerstenberger, Andrey Ivanov, Yishai Oltchik, Torsten Hoefler:
Slim graph: practical lossy graph compression for approximate graph processing, storage, and analytics. SC 2019: 35:1-35:25 - [c150]Salvatore Di Girolamo, Konstantin Taranov, Andreas Kurth, Michael Schaffner, Timo Schneider, Jakub Beránek, Maciej Besta, Luca Benini, Duncan Roweth, Torsten Hoefler:
Network-accelerated non-contiguous memory transfers. SC 2019: 56:1-56:14 - [c149]Alexandros Nikolaos Ziogas, Tal Ben-Nun, Guillermo Indalecio Fernández, Timo Schneider, Mathieu Luisier, Torsten Hoefler:
Optimizing the data movement in quantum transport simulations via data-centric parallel programming. SC 2019: 78:1-78:17 - [c148]Tal Ben-Nun, Johannes de Fine Licht, Alexandros Nikolaos Ziogas, Timo Schneider, Torsten Hoefler:
Stateful dataflow multigraphs: a data-centric model for performance portability on heterogeneous architectures. SC 2019: 81:1-81:14 - [c147]Tiziano De Matteis, Johannes de Fine Licht, Jakub Beránek, Torsten Hoefler:
Streaming message interface: high-performance distributed memory programming on reconfigurable hardware. SC 2019: 82:1-82:33 - [e8]Torsten Hoefler, Jesper Larsson Träff:
Proceedings of the 26th European MPI Users' Group Meeting, EuroMPI 2019, Zürich, Switzerland, September 11-13, 2019. ACM 2019, ISBN 978-1-4503-7175-9 [contents] - [i42]Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler, Daniel Soudry:
Augment your batch: better training with larger batches. CoRR abs/1901.09335 (2019) - [i41]Tal Ben-Nun, Maciej Besta, Simon Huber, Alexandros Nikolaos Ziogas, Daniel Peter, Torsten Hoefler:
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning. CoRR abs/1901.10183 (2019) - [i40]Salvatore Di Girolamo, Pirmin Schmid, Thomas C. Schulthess, Torsten Hoefler:
SimFS: A Simulation Data Virtualizing File System Interface. CoRR abs/1902.03154 (2019) - [i39]Tal Ben-Nun, Johannes de Fine Licht, Alexandros Nikolaos Ziogas, Timo Schneider, Torsten Hoefler:
Stateful Dataflow Multigraphs: A Data-Centric Model for High-Performance Parallel Programs. CoRR abs/1902.10345 (2019) - [i38]Maciej Besta, Dimitri Stanojevic, Johannes de Fine Licht, Tal Ben-Nun, Torsten Hoefler:
Graph Processing on FPGAs: Taxonomy, Survey, Challenges. CoRR abs/1903.06697 (2019) - [i37]Maciej Besta, Marcel Schneider, Karolina Cynk, Marek Konieczny, Erik Henriksson, Salvatore Di Girolamo, Ankit Singla, Torsten Hoefler:
FatPaths: Routing in Supercomputers, Data Centers, and Clouds with Low-Diameter Networks when Shortest Paths Fall Short. CoRR abs/1906.10885 (2019) - [i36]Tiziano De Matteis, Johannes de Fine Licht, Torsten Hoefler:
FBLAS: Streaming Linear Algebra on FPGA. CoRR abs/1907.07929 (2019) - [i35]Shigang Li, Tal Ben-Nun, Salvatore Di Girolamo, Dan Alistarh, Torsten Hoefler:
Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations. CoRR abs/1908.04207 (2019) - [i34]Salvatore Di Girolamo, Konstantin Taranov, Andreas Kurth, Michael Schaffner, Timo Schneider, Jakub Beránek, Maciej Besta, Luca Benini, Duncan Roweth, Torsten Hoefler:
Network-Accelerated Non-Contiguous Memory Transfers. CoRR abs/1908.08590 (2019) - [i33]Elad Hoffer, Berry Weinstein, Itay Hubara, Tal Ben-Nun, Torsten Hoefler, Daniel Soudry:
Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency. CoRR abs/1908.08986 (2019) - [i32]Grzegorz Kwasniewski, Marko Kabic, Maciej Besta, Joost VandeVondele, Raffaele Solcà, Torsten Hoefler:
Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication. CoRR abs/1908.09606 (2019) - [i31]Tiziano De Matteis, Johannes de Fine Licht, Jakub Beránek, Torsten Hoefler:
Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware. CoRR abs/1909.03231 (2019) - [i30]Daniele De Sensi, Salvatore Di Girolamo, Torsten Hoefler:
Mitigating Network Noise on Dragonfly Networks through Application-Aware Routing. CoRR abs/1909.07865 (2019) - [i29]Johannes de Fine Licht, Torsten Hoefler:
hlslib: Software Engineering for Hardware Design. CoRR abs/1910.04436 (2019) - [i28]Maciej Besta, Emanuel Peter, Robert Gerstenberger, Marc Fischer, Michal Podstawski, Claude Barthels, Gustavo Alonso, Torsten Hoefler:
Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries. CoRR abs/1910.09017 (2019) - [i27]Maciej Besta, Torsten Hoefler:
Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations. CoRR abs/1910.12897 (2019) - [i26]Peter Grönquist, Tal Ben-Nun, Nikoli Dryden, Peter Dueben, Luca Lavarini, Shigang Li, Torsten Hoefler:
Predicting Weather Uncertainty with Deep Convnets. CoRR abs/1911.00630 (2019) - [i25]Maciej Besta, Raghavendra Kanakagiri, Harun Mustafa, Mikhail Karasikov, Gunnar Rätsch, Torsten Hoefler, Edgar Solomonik:
Communication-Efficient Jaccard Similarity for High-Performance Distributed Genome Comparisons. CoRR abs/1911.04200 (2019) - [i24]Fabian Schuiki, Florian Zaruba, Torsten Hoefler, Luca Benini:
Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores. CoRR abs/1911.08356 (2019) - [i23]Johannes de Fine Licht, Grzegorz Kwasniewski, Torsten Hoefler:
Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis. CoRR abs/1912.06526 (2019) - [i22]Alexandros Nikolaos Ziogas, Tal Ben-Nun, Guillermo Indalecio Fernández, Timo Schneider, Mathieu Luisier, Torsten Hoefler:
Optimizing the Data Movement in Quantum Transport Simulations via Data-Centric Parallel Programming. CoRR abs/1912.08810 (2019) - [i21]Maciej Besta, Simon Weber, Lukas Gianinazzi, Robert Gerstenberger, Andrey Ivanov, Yishai Oltchik, Torsten Hoefler:
Slim Graph: Practical Lossy Graph Compression for Approximate Graph Processing, Storage, and Analytics. CoRR abs/1912.08950 (2019) - [i20]Maciej Besta, Torsten Hoefler:
Slim Fly: A Cost Effective Low-Diameter Network Topology. CoRR abs/1912.08968 (2019) - [i19]Alexandros Nikolaos Ziogas, Tal Ben-Nun, Guillermo Indalecio Fernández, Timo Schneider, Mathieu Luisier, Torsten Hoefler:
A Data-Centric Approach to Extreme-Scale Ab initio Dissipative Quantum Transport Simulations. CoRR abs/1912.10024 (2019) - [i18]Maciej Besta, Marc Fischer, Vasiliki Kalavri, Michael Kapralov, Torsten Hoefler:
Practice of Streaming and Dynamic Graphs: Concepts, Models, Systems, and Parallelism. CoRR abs/1912.12740 (2019) - 2018
- [j28]Robert Gerstenberger, Maciej Besta, Torsten Hoefler:
Enabling highly scalable remote memory access programming with MPI-3 one sided. Commun. ACM 61(10): 106-113 (2018) - [j27]Shigang Li, Yunquan Zhang, Torsten Hoefler:
Cache-Oblivious MPI All-to-All Communications Based on Morton Order. IEEE Trans. Parallel Distributed Syst. 29(3): 542-555 (2018) - [c146]Maciej Besta, Dimitri Stanojevic, Tijana Zivic, Jagpreet Singh, Maurice Hoerold, Torsten Hoefler:
Log(graph): a near-optimal high-performance graph representation. PACT 2018: 7:1-7:13 - [c145]Maciej Besta, Syed Minhaj Hassan, Sudhakar Yalamanchili, Rachata Ausavarungnirun, Onur Mutlu, Torsten Hoefler:
Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy Efficiency and Scalability. ASPLOS 2018: 43-55 - [c144]Alexandru Calotoiu, Alexander Graf, Torsten Hoefler, Daniel Lorenz, Sebastian Rinke, Felix Wolf:
Lightweight Requirements Engineering for Exascale Co-design. CLUSTER 2018: 201-211 - [c143]Yosuke Oyama, Tal Ben-Nun, Torsten Hoefler, Satoshi Matsuoka:
Accelerating Deep Learning Frameworks with Micro-Batches. CLUSTER 2018: 402-412 - [c142]Konstantin Taranov, Gustavo Alonso, Torsten Hoefler:
Fast and strongly-consistent per-item resilience in key-value stores. EuroSys 2018: 39:1-39:14 - [c141]Ingo Müller, Andrea Arteaga, Torsten Hoefler, Gustavo Alonso:
Reproducible Floating-Point Aggregation in RDBMSs. ICDE 2018: 1049-1060 - [c140]Tal Ben-Nun, Alice Shoshana Jakobovits, Torsten Hoefler:
Neural Code Comprehension: A Learnable Representation of Code Semantics. NeurIPS 2018: 3589-3601 - [c139]Dan Alistarh, Torsten Hoefler, Mikael Johansson, Nikola Konstantinov, Sarit Khirirat, Cédric Renggli:
The Convergence of Sparsified Gradient Methods. NeurIPS 2018: 5977-5987 - [c138]Lukas Gianinazzi, Pavel Kalvoda, Alessandro De Palma, Maciej Besta, Torsten Hoefler:
Communication-avoiding parallel minimum cuts and connected components. PPoPP 2018: 219-232 - [c137]Johannes de Fine Licht, Michaela Blott, Torsten Hoefler:
Designing scalable FPGA architectures using high-level synthesis. PPoPP 2018: 403-404 - [c136]Heng Lin, Xiaowei Zhu, Bowen Yu, Xiongchao Tang, Wei Xue, Wenguang Chen, Lufei Zhang, Torsten Hoefler, Xiaosong Ma, Xin Liu, Weimin Zheng, Jingfang Xu:
ShenTu: processing multi-trillion edge graphs on millions of cores in seconds. SC 2018: 56:1-56:11 - [c135]Cedric Baumann, Andrei Marian Dan, Yuri Meshman, Torsten Hoefler, Martin T. Vechev:
Automatic Verification of RMA Programs via Abstraction Extrapolation. VMCAI 2018: 47-70 - [i17]Cédric Renggli, Dan Alistarh, Torsten Hoefler:
SparCML: High-Performance Sparse Communication for Machine Learning. CoRR abs/1802.08021 (2018) - [i16]Ingo Müller, Andrea Arteaga, Torsten Hoefler, Gustavo Alonso:
Reproducible Floating-Point Aggregation in RDBMSs. CoRR abs/1802.09883 (2018) - [i15]Tal Ben-Nun, Torsten Hoefler:
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis. CoRR abs/1802.09941 (2018) - [i14]Yosuke Oyama, Tal Ben-Nun, Torsten Hoefler, Satoshi Matsuoka:
μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching. CoRR abs/1804.04806 (2018) - [i13]Johannes de Fine Licht, Maciej Besta, Simon Meierhans, Torsten Hoefler:
Transformations of High-Level Synthesis Codes for High-Performance Computing. CoRR abs/1805.08288 (2018) - [i12]Maciej Besta, Torsten Hoefler:
Survey and Taxonomy of Lossless Graph Compression and Space-Efficient Graph Representations. CoRR abs/1806.01799 (2018) - [i11]Tal Ben-Nun, Alice Shoshana Jakobovits, Torsten Hoefler:
Neural Code Comprehension: A Learnable Representation of Code Semantics. CoRR abs/1806.07336 (2018) - [i10]Dan Alistarh, Torsten Hoefler, Mikael Johansson, Sarit Khirirat, Nikola Konstantinov, Cédric Renggli:
The Convergence of Sparsified Gradient Methods. CoRR abs/1809.10505 (2018) - [i9]Thomas Häner, Torsten Hoefler, Matthias Troyer:
Using Hoare logic for quantum circuit optimization. CoRR abs/1810.00375 (2018) - 2017
- [j26]Claude Barthels, Gustavo Alonso, Torsten Hoefler:
Designing Databases for Future High-Performance Networks. IEEE Data Eng. Bull. 40(1): 15-26 (2017) - [j25]Claude Barthels, Gustavo Alonso, Torsten Hoefler, Timo Schneider, Ingo Müller:
Distributed Join Algorithms on Thousands of Cores. Proc. VLDB Endow. 10(5): 517-528 (2017) - [j24]Didem Unat, Anshu Dubey, Torsten Hoefler, John Shalf, Mark James Abraham, Mauro Bianco, Bradford L. Chamberlain, Romain Cledat, H. Carter Edwards, Hal Finkel, Karl Fuerlinger, Frank Hannig, Emmanuel Jeannot, Amir Kamil, Jeff Keasler, Paul H. J. Kelly, Vitus J. Leung, Hatem Ltaief, Naoya Maruyama, Chris J. Newburn, Miquel Pericàs:
Trends in Data Locality Abstractions for HPC Systems. IEEE Trans. Parallel Distributed Syst. 28(10): 3007-3020 (2017) - [c134]Klaus-Tycho Foerster, Linus Groner, Torsten Hoefler, Michael König, Sascha Schmid, Roger Wattenhofer:
Multi-agent Pathfinding with n Agents on Graphs with n Vertices: Combinatorial Classification and Tight Algorithmic Bounds. CIAC 2017: 247-259 - [c133]Pedro Yébenes, Jesús Escudero-Sahuquillo, Pedro Javier García, Francisco J. Quiles, Torsten Hoefler:
Improving Non-minimal and Adaptive Routing Algorithms in Slim Fly Networks. Hot Interconnects 2017: 1-8 - [c132]Timo Schneider, James Dinan, Mario Flajslik, Keith D. Underwood, Torsten Hoefler:
Fast Networks and Slow Memories: A Mechanism for Mitigating Bandwidth Mismatches. Hot Interconnects 2017: 17-24 - [c131]Pedro Yébenes, Jesús Escudero-Sahuquillo, Pedro Javier García, Francisco J. Quiles, Torsten Hoefler:
An Effective Queuing Scheme to Provide Slim Fly Topologies with HoL Blocking Reduction and Deadlock Freedom for Minimal-Path Routing. HiPINEB@HPCA 2017: 25-32 - [c130]Maciej Besta, Michal Podstawski, Linus Groner, Edgar Solomonik, Torsten Hoefler:
To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations. HPDC 2017: 93-104 - [c129]Marius Poke, Torsten Hoefler, Colin W. Glass:
AllConcur: Leaderless Concurrent Atomic Broadcast. HPDC 2017: 205-218 - [c128]Andrea Arteaga, Oliver Fuhrer, Torsten Hoefler, Thomas C. Schulthess:
Model-Driven Choice of Numerical Methods for the Solution of the Linear Advection Equation. ICCS 2017: 1542-1551 - [c127]Maciej Besta, Florian Marending, Edgar Solomonik, Torsten Hoefler:
SlimSell: A Vectorizable Graph Representation for Breadth-First Search. IPDPS 2017: 32-41 - [c126]Sabela Ramos, Torsten Hoefler:
Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL. IPDPS 2017: 297-306 - [c125]Torsten Hoefler, Amnon Barak, Amnon Shiloh, Zvi Drezner:
Corrected Gossip Algorithms for Fast Reliable Broadcast on Unreliable Systems. IPDPS 2017: 357-366 - [c124]Tobias Wicky, Edgar Solomonik, Torsten Hoefler:
Communication-Avoiding Parallel Algorithms for Solving Triangular Systems of Linear Equations. IPDPS 2017: 678-687 - [c123]Salvatore Di Girolamo, Flavio Vella, Torsten Hoefler:
Transparent Caching for RMA Systems. IPDPS 2017: 1018-1027 - [c122]Shuaiwen Leon Song, Torsten Hoefler:
IPDRM Workshop Introduction. IPDPS Workshops 2017: 1284 - [c121]Torsten Hoefler:
EMBRACE Keynote. IPDPS Workshops 2017: 1558 - [c120]Sergei Shudler, Alexandru Calotoiu, Torsten Hoefler, Felix Wolf:
Isoefficiency in Practice: Configuring and Understanding the Performance of Task-based Applications. PPoPP 2017: 131-143 - [c119]Shigang Li, Yunquan Zhang, Torsten Hoefler:
POSTER: Cache-Oblivious MPI All-to-All Communications on Many-Core Architectures. PPoPP 2017: 445-446 - [c118]Edgar Solomonik, Maciej Besta, Flavio Vella, Torsten Hoefler:
Scaling betweenness centrality using communication-efficient sparse matrix multiplication. SC 2017: 47 - [c117]Torsten Hoefler, Salvatore Di Girolamo, Konstantin Taranov, Ryan E. Grant, Ron Brightwell:
sPIN: high-performance streaming processing in the network. SC 2017: 59 - [c116]Edgar Solomonik, Grey Ballard, James Demmel, Torsten Hoefler:
A Communication-Avoiding Parallel Algorithm for the Symmetric Eigenvalue Problem. SPAA 2017: 111-121 - [e7]Torsten Hoefler, Kamil Iskra:
Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers, ROSS@HPDC 2017, Washingon, DC, DC, USA, June 27 - 27, 2017. ACM 2017, ISBN 978-1-4503-5086-0 [contents] - [i8]Edgar Solomonik, James Demmel, Torsten Hoefler:
Communication Lower Bounds of Bilinear Algorithms for Symmetric Tensor Contractions. CoRR abs/1707.04618 (2017) - [i7]Torsten Hoefler, Salvatore Di Girolamo, Konstantin Taranov, Ryan E. Grant, Ron Brightwell:
sPIN: High-performance streaming Processing in the Network. CoRR abs/1709.05483 (2017) - 2016
- [j23]Patrick M. Widener, Scott Levy, Kurt B. Ferreira, Torsten Hoefler:
On noise and the performance benefit of nonblocking collectives. Int. J. High Perform. Comput. Appl. 30(1): 121-133 (2016) - [j22]Salvatore Di Girolamo, Pierre Jolivet, Keith D. Underwood, Torsten Hoefler:
Exploiting Offload-Enabled Network Interfaces. IEEE Micro 36(4): 6-17 (2016) - [j21]Sabela Ramos, Torsten Hoefler:
Cache Line Aware Algorithm Design for Cache-Coherent Architectures. IEEE Trans. Parallel Distributed Syst. 27(10): 2824-2837 (2016) - [c115]Alexandru Calotoiu, David Beckingsale, Christopher W. Earl, Torsten Hoefler, Ian Karlin, Martin Schulz, Felix Wolf:
Fast Multi-parameter Performance Modeling. CLUSTER 2016: 172-181 - [c114]Timo Schneider, Otto Bibartiu, Torsten Hoefler:
Ensuring Deadlock-Freedom in Low-Diameter InfiniBand Networks. Hot Interconnects 2016: 1-8 - [c113]Jens Domke, Torsten Hoefler, Satoshi Matsuoka:
Routing on the Dependency Graph: A New Approach to Deadlock-Free High-Performance Routing. HPDC 2016: 3-14 - [c112]Patrick Schmid, Maciej Besta, Torsten Hoefler:
High-Performance Distributed RMA Locks. HPDC 2016: 19-30 - [c111]Takayuki Sasaki, Christos Pappas, Taeho Lee, Torsten Hoefler, Adrian Perrig:
SDNsec: Forwarding Accountability for the SDN Data Plane. ICCCN 2016: 1-10 - [c110]Tobias Grosser, Torsten Hoefler:
Polly-ACC Transparent compilation to heterogeneous hardware. ICS 2016: 1:1-1:13 - [c109]Andrei Marian Dan, Patrick Lam, Torsten Hoefler, Martin T. Vechev:
Modeling and analysis of remote memory access programming. OOPSLA 2016: 129-144 - [c108]Torsten Hoefler:
Selecting Technical Papers for an Interdisciplinary Conference: The PASC Review Process. PASC 2016: 13 - [c107]Jens Domke, Torsten Hoefler:
Scheduling-aware routing for supercomputers. SC 2016: 142-153 - [c106]William M. Tang, Bei Wang, Stéphane Ethier, Grzegorz Kwasniewski, Torsten Hoefler, Khaled Z. Ibrahim, Kamesh Madduri, Samuel Williams, Leonid Oliker, Carlos Rosales-Fernandez, Timothy J. Williams:
Extreme scale plasma turbulence simulations on top supercomputers worldwide. SC 2016: 502-513 - [c105]Tobias Gysi, Jeremia Bär, Torsten Hoefler:
dCUDA: hardware supported overlap of computation and communication. SC 2016: 609-620 - [c104]Maxime Martinasso, Grzegorz Kwasniewski, Sadaf R. Alam, Thomas C. Schulthess, Torsten Hoefler:
A PCIe congestion-aware performance model for densely populated accelerator servers. SC 2016: 739-749 - [p1]Felix Wolf, Christian H. Bischof, Alexandru Calotoiu, Torsten Hoefler, Christian Iwainsky, Grzegorz Kwasniewski, Bernd Mohr, Sergei Shudler, Alexandre Strube, Andreas Vogel, Gabriel Wittum:
Automatic Performance Modeling of HPC Applications. Software for Exascale Computing 2016: 445-465 - [e6]Kamil Iskra, Torsten Hoefler:
Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers, Kyoto, Japan, June 1, 2016. ACM 2016, ISBN 978-1-4503-4387-9 [contents] - [e5]Torsten Hoefler, David E. Keyes, Timothy Robinson:
Proceedings of the Platform for Advanced Scientific Computing Conference, PASC 2016, Lausanne, Switzerland, June 8-10, 2016. ACM 2016, ISBN 978-1-4503-4126-4 [contents] - [i6]Edgar Solomonik, Grey Ballard, James Demmel, Torsten Hoefler:
A communication-avoiding parallel algorithm for the symmetric eigenvalue problem. CoRR abs/1604.03703 (2016) - [i5]Takayuki Sasaki, Christos Pappas, Taeho Lee, Torsten Hoefler, Adrian Perrig:
SDNsec: Forwarding Accountability for the SDN Data Plane. CoRR abs/1605.01944 (2016) - [i4]Marius Poke, Torsten Hoefler, Colin W. Glass:
AllConcur: Leaderless Concurrent Atomic Broadcast (Extended Version). CoRR abs/1608.05866 (2016) - [i3]Edgar Solomonik, Maciej Besta, Flavio Vella, Torsten Hoefler:
Betweenness Centrality is more Parallelizable than Dense Matrix Multiplication. CoRR abs/1609.07008 (2016) - [i2]Tobias Wicky, Edgar Solomonik, Torsten Hoefler:
Communication-Avoiding Parallel Algorithms for Solving Triangular Systems of Linear Equations. CoRR abs/1612.01855 (2016) - 2015
- [j20]Kamil Iskra, Torsten Hoefler:
Operating systems and runtime environments on supercomputers. Int. J. High Perform. Comput. Appl. 29(1): 3-4 (2015) - [j19]Torsten Hoefler, James Dinan, Rajeev Thakur, Brian Barrett, Pavan Balaji, William Gropp, Keith D. Underwood:
Remote Memory Access Programming in MPI-3. ACM Trans. Parallel Comput. 2(2): 9:1-9:26 (2015) - [j18]Michael Dinitz, Torsten Hoefler:
Introduction to the Special Issue on SPAA 2013. ACM Trans. Parallel Comput. 2(3): 14e:1-14e:2 (2015) - [c103]Hermann Schweizer, Maciej Besta, Torsten Hoefler:
Evaluating the Cost of Atomic Operations on Modern Architectures. PACT 2015: 445-456 - [c102]Arnamoy Bhattacharyya, Grzegorz Kwasniewski, Torsten Hoefler:
Using Compiler Techniques to Improve Automatic Performance Modeling. PACT 2015: 468-479 - [c101]Taeho Lee, Christos Pappas, Cristina Basescu, Jun Han, Torsten Hoefler, Adrian Perrig:
Source-Based Path Selection: The Data Plane Perspective. CFI 2015: 41-45 - [c100]Salvatore Di Girolamo, Pierre Jolivet, Keith D. Underwood, Torsten Hoefler:
Exploiting Offload Enabled Network Interfaces. Hot Interconnects 2015: 26-33 - [c99]Torsten Hoefler, Robert B. Ross, Timothy Roscoe:
Distributing the Data Plane for Remote Storage Access. HotOS 2015 - [c98]Sabela Ramos, Torsten Hoefler:
Cache Line Aware Optimizations for ccNUMA Systems. HPDC 2015: 85-88 - [c97]Marius Poke, Torsten Hoefler:
DARE: High-Performance State Machine Replication on RDMA Networks. HPDC 2015: 107-118 - [c96]Maciej Besta, Torsten Hoefler:
Accelerating Irregular Computations with Hardware Transactional Memory and Active Messages. HPDC 2015: 161-172 - [c95]Maciej Besta, Torsten Hoefler:
Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations. ICS 2015: 155-164 - [c94]Sergei Shudler, Alexandru Calotoiu, Torsten Hoefler, Alexandre Strube, Felix Wolf:
Exascaling Your Library: Will Your Implementation Meet Your Expectations? ICS 2015: 165-175 - [c93]Tobias Gysi, Tobias Grosser, Torsten Hoefler:
MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures. ICS 2015: 177-186 - [c92]Torsten Hoefler, Laxmikant V. Kalé:
HIPS-LSPP Keynotes. IPDPS Workshops 2015: 204 - [c91]Roberto Belli, Torsten Hoefler:
Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization. IPDPS 2015: 871-881 - [c90]Georgios Kathareios, Cyriel Minkenberg, Bogdan Prisacari, Germán Rodríguez, Torsten Hoefler:
Cost-effective diameter-two topologies: analysis and evaluation. SC 2015: 36:1-36:11 - [c89]Torsten Hoefler, Roberto Belli:
Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results. SC 2015: 73:1-73:12 - [e4]Torsten Hoefler, Kamil Iskra:
Proceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2015, Portland, OR, USA, June 16, 2015. ACM 2015, ISBN 978-1-4503-3606-2 [contents] - [i1]Edgar Solomonik, Torsten Hoefler:
Sparse Tensor Algebra as a Parallel Programming Model. CoRR abs/1512.00066 (2015) - 2014
- [j17]Shigang Li, Torsten Hoefler, Chungjin Hu, Marc Snir:
Improved MPI collectives for MPI processes in shared address spaces. Clust. Comput. 17(4): 1139-1155 (2014) - [j16]Timo Schneider, Robert Gerstenberger, Torsten Hoefler:
Application-oriented ping-pong benchmarking: how to assess the real communication overheads. Computing 96(4): 279-292 (2014) - [j15]Robert Gerstenberger, Maciej Besta, Torsten Hoefler:
Enabling highly-scalable remote memory access programming with MPI-3 One Sided. Sci. Program. 22(2): 75-91 (2014) - [j14]Torsten Hoefler, Dmitry Moor:
Energy, Memory, and Runtime Tradeoffs for Implementing Collective Communication Operations. Supercomput. Front. Innov. 1(2): 58-75 (2014) - [c88]Arnamoy Bhattacharyya, Torsten Hoefler:
PEMOGEN: automatic adaptive performance modeling during program runtime. PACT 2014: 393-404 - [c87]Felix Wolf, Christian H. Bischof, Torsten Hoefler, Bernd Mohr, Gabriel Wittum, Alexandru Calotoiu, Christian Iwainsky, Alexandre Strube, Andreas Vogel:
Catwalk: A Quick Development Path for Performance Models. Euro-Par Workshops (2) 2014: 589-600 - [c86]Maciej Besta, Torsten Hoefler:
Fault tolerance for remote memory access programming models. HPDC 2014: 37-48 - [c85]Bogdan Prisacari, Germán Rodríguez, Philip Heidelberger, Dong Chen, Cyriel Minkenberg, Torsten Hoefler:
Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks. HPDC 2014: 129-140 - [c84]Andrea Arteaga, Oliver Fuhrer, Torsten Hoefler:
Designing Bit-Reproducible Portable High-Performance Applications. IPDPS 2014: 1235-1244 - [c83]Patrick M. Widener, Kurt B. Ferreira, Scott Levy, Torsten Hoefler:
Exploring the effect of noise on the performance benefit of nonblocking allreduce. EuroMPI/ASIA 2014: 77 - [c82]Maciej Besta, Torsten Hoefler:
Slim Fly: A Cost Effective Low-Diameter Network Topology. SC 2014: 348-359 - [c81]Jens Domke, Torsten Hoefler, Satoshi Matsuoka:
Fail-in-Place Network Design: Interaction Between Topology, Routing Algorithm and Failures. SC 2014: 597-608 - [c80]Kurt B. Ferreira, Patrick M. Widener, Scott Levy, Dorian C. Arnold, Torsten Hoefler:
Understanding the Effects of Communication and Coordination on Checkpointing at Scale. SC 2014: 883-894 - [c79]Torsten Hoefler, Grzegorz Kwasniewski:
Automatic complexity analysis of explicitly parallel programs. SPAA 2014: 226-235 - [e3]Kamil Iskra, Torsten Hoefler:
Proceedings of the 4th International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2014, Munich, Germany, June 10, 2014. ACM 2014, ISBN 978-1-4503-2950-7 [contents] - 2013
- [j13]Torsten Hoefler, James Dinan, Darius Buntinas, Pavan Balaji, Brian Barrett, Ron Brightwell, William Gropp, Vivek Kale, Rajeev Thakur:
MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory. Computing 95(12): 1121-1136 (2013) - [j12]Torsten Hoefler, Kamil Iskra:
Operating systems and runtime environments on supercomputers. Int. J. High Perform. Comput. Appl. 27(2): 123 (2013) - [j11]Bogdan Prisacari, Germán Rodríguez, Cyriel Minkenberg, Torsten Hoefler:
Fast pattern-specific routing for fat tree networks. ACM Trans. Archit. Code Optim. 10(4): 36:1-36:25 (2013) - [c78]Olav Lysne, Torsten Hoefler, Pedro López, Davide Bertozzi:
Topic 13: High-Performance Networks and Communication - (Introduction). Euro-Par 2013: 684 - [c77]Shigang Li, Torsten Hoefler, Marc Snir:
NUMA-aware shared-memory collective communication for MPI. HPDC 2013: 85-96 - [c76]Sabela Ramos, Torsten Hoefler:
Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi. HPDC 2013: 97-108 - [c75]Timo Schneider, Torsten Hoefler, Ryan E. Grant, Brian W. Barrett, Ron Brightwell:
Protocols for Fully Offloaded Collective Operations on Accelerated Network Adapters. ICPP 2013: 593-602 - [c74]Bogdan Prisacari, Germán Rodríguez, Cyriel Minkenberg, Torsten Hoefler:
Bandwidth-optimal all-to-all exchanges in fat tree networks. ICS 2013: 139-148 - [c73]Timo Schneider, Robert Gerstenberger, Torsten Hoefler:
Compiler Optimizations for Non-contiguous Remote Data Movement. LCPC 2013: 307-321 - [c72]Andrew Friedley, Torsten Hoefler, Greg Bronevetsky, Andrew Lumsdaine, Ching-Chen Ma:
Ownership passing: efficient distributed memory programming on multi-core systems. PPoPP 2013: 177-186 - [c71]Timo Schneider, Fredrik Kjolstad, Torsten Hoefler:
MPI datatype processing using runtime compilation. EuroMPI 2013: 19-24 - [c70]Andrew Friedley, Greg Bronevetsky, Torsten Hoefler, Andrew Lumsdaine:
Hybrid MPI: efficient message passing for multi-core systems. SC 2013: 18:1-18:11 - [c69]Alexandru Calotoiu, Torsten Hoefler, Marius Poke, Felix Wolf:
Using automated performance modeling to find scalability bugs in complex codes. SC 2013: 45:1-45:12 - [c68]Robert Gerstenberger, Maciej Besta, Torsten Hoefler:
Enabling highly-scalable remote memory access programming with MPI-3 one sided. SC 2013: 53:1-53:12 - [c67]Scott Levy, Bryan Topp, Kurt B. Ferreira, Dorian C. Arnold, Torsten Hoefler, Patrick M. Widener:
Using Simulation to Evaluate the Performance of Resilience Strategies at Scale. PMBS@SC 2013: 91-114 - [e2]Torsten Hoefler, Kamil Iskra:
Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2013, Eugene, Oregon, USA, June 10, 2013. ACM 2013, ISBN 978-1-4503-2146-4 [contents] - 2012
- [j10]Torsten Hoefler, Kamil Iskra:
Operating systems and runtime environments on supercomputers. Int. J. High Perform. Comput. Appl. 26(2): 93-94 (2012) - [j9]Torsten Hoefler, Patrick Geoffray, Fabrizio Petrini, Jesper Larsson Träff:
Top Picks from Hot Interconnects 2011: Petascale Network Architectures. IEEE Micro 32(1): 4-7 (2012) - [j8]Torsten Hoefler:
Extensions for next-generation parallel programming models. Parallel Comput. 38(1-2): 1 (2012) - [c66]Torsten Hoefler, Timo Schneider:
Runtime detection and optimization of collective communication patterns. PACT 2012: 263-272 - [c65]Peter Gottschling, Torsten Hoefler:
Productive Parallel Linear Algebra Programming with Unstructured Topology Adaption. CCGRID 2012: 9-16 - [c64]Greg Bauer, Steven Gottlieb, Torsten Hoefler:
Performance Modeling and Comparative Analysis of the MILC Lattice QCD Application su3_rmd. CCGRID 2012: 652-659 - [c63]Simone Pellegrini, Torsten Hoefler, Thomas Fahringer:
On the Effects of CPU Caches on MPI Point-to-Point Communications. CLUSTER 2012: 495-503 - [c62]Kishor Kharbas, Donghoon Kim, Torsten Hoefler, Frank Mueller:
Assessing HPC Failure Detectors for MPI Jobs. PDP 2012: 81-88 - [c61]Torsten Hoefler, Timo Schneider:
Communication-centric optimizations by dynamically detecting collective operations. PPoPP 2012: 305-306 - [c60]Fredrik Kjolstad, Torsten Hoefler, Marc Snir:
Automatic datatype generation and optimization. PPoPP 2012: 327-328 - [c59]Simone Pellegrini, Torsten Hoefler, Thomas Fahringer:
Exact Dependence Analysis for Increased Communication Overlap. EuroMPI 2012: 89-99 - [c58]Timo Schneider, Robert Gerstenberger, Torsten Hoefler:
Micro-applications for Communication Data Access Patterns and MPI Datatypes. EuroMPI 2012: 121-131 - [c57]Torsten Hoefler, James Dinan, Darius Buntinas, Pavan Balaji, Brian W. Barrett, Ron Brightwell, William Gropp, Vivek Kale, Rajeev Thakur:
Leveraging MPI's One-Sided Communication Interface for Shared-Memory Programming. EuroMPI 2012: 132-141 - [c56]Torsten Hoefler, Timo Schneider:
Optimization principles for collective neighborhood communications. SC 2012: 98 - [c55]Vivek Kale, Todd Gamblin, Torsten Hoefler, Bronis R. de Supinski, William D. Gropp:
Abstract: Slack-Conscious Lightweight Loop Scheduling for Improving Scalability of Bulk-synchronous MPI Applications. SC Companion 2012: 1392 - [e1]Torsten Hoefler, Kamil Iskra:
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers, ROSS '12, Venice, Italy, June 29, 2012. ACM 2012, ISBN 978-1-4503-1460-2 [contents] - 2011
- [j7]Torsten Hoefler, Rolf Rabenseifner, Hubert Ritzdorf, Bronis R. de Supinski, Rajeev Thakur, Jesper Larsson Träff:
The scalable process topology interface of MPI 2.2. Concurr. Comput. Pract. Exp. 23(4): 293-310 (2011) - [j6]Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Torsten Hoefler, Sameer Kumar, Ewing L. Lusk, Rajeev Thakur, Jesper Larsson Träff:
Mpi on millions of Cores. Parallel Process. Lett. 21(1): 45-60 (2011) - [c54]Timo Schneider, Sven Eckelmann, Torsten Hoefler, Wolfgang Rehm:
Kernel-Based Offload of Collective Operations - Implementation, Evaluation and Lessons Learned. Euro-Par (2) 2011: 264-275 - [c53]Torsten Hoefler, Marc Snir:
Generic topology mapping strategies for large-scale parallel architectures. ICS 2011: 75-84 - [c52]Jeremiah Willcock, Torsten Hoefler, Nicholas Gerard Edmonds, Andrew Lumsdaine:
Active pebbles: parallel programming for data-driven applications. ICS 2011: 235-244 - [c51]Jens Domke, Torsten Hoefler, Wolfgang E. Nagel:
Deadlock-Free Oblivious Routing for Arbitrary Topologies. IPDPS 2011: 616-627 - [c50]Torsten Hoefler:
HIPS Introduction. IPDPS Workshops 2011: 1139-1140 - [c49]Eric Holk, William E. Byrd, Jeremiah Willcock, Torsten Hoefler, Arun Chauhan, Andrew Lumsdaine:
Kanor - A Declarative Language for Explicit Communication. PADL 2011: 190-204 - [c48]Jeremiah Willcock, Torsten Hoefler, Nicholas Gerard Edmonds, Andrew Lumsdaine:
Active pebbles: a programming model for highly parallel fine-grained data-driven computations. PPoPP 2011: 305-306 - [c47]Vishwanath Venkatesan, Mohamad Chaarawi, Edgar Gabriel, Torsten Hoefler:
Design and Evaluation of Nonblocking Collective I/O Operations. EuroMPI 2011: 90-98 - [c46]William Gropp, Torsten Hoefler, Rajeev Thakur, Jesper Larsson Träff:
Performance Expectations and Guidelines for MPI Derived Datatypes. EuroMPI 2011: 150-159 - [c45]Torsten Hoefler, Marc Snir:
Writing Parallel Libraries with MPI - Common Practice, Issues, and Extensions. EuroMPI 2011: 345-355 - [c44]Torsten Hoefler, William Gropp, William Kramer, Marc Snir:
Performance modeling for systematic performance tuning. SC State of the Practice Reports 2011: 6:1-6:12 - [c43]Stephen Lien Harrell, Preston M. Smith, Doug Smith, Torsten Hoefler, Anna A. Labutina, Trinity Overmyer:
Methods of creating student cluster competition teams. TG 2011: 50:1-50:6 - 2010
- [j5]Torsten Hoefler:
Software and Hardware Techniques for Power-Efficient HPC Networking. Comput. Sci. Eng. 12(6): 30-37 (2010) - [j4]Torsten Hoefler, Timo Schneider, Andrew Lumsdaine:
Accurately measuring overhead, communication time and progression of blocking and nonblocking collective operations at massive scale. Int. J. Parallel Emergent Distributed Syst. 25(4): 241-258 (2010) - [c42]Jeremiah Willcock, Torsten Hoefler, Nicholas Gerard Edmonds, Andrew Lumsdaine:
AM++: a generalized active message framework. PACT 2010: 401-410 - [c41]Torsten Hoefler:
Bridging Performance Analysis Tools and Analytic Performance Modeling for HPC. Euro-Par Workshops 2010: 483-491 - [c40]Nick Edmonds, Torsten Hoefler, Andrew Lumsdaine:
A space-efficient parallel algorithm for computing betweenness centrality in distributed memory. HiPC 2010: 1-10 - [c39]L. Baba Arimilli, Ravi Arimilli, Vicente Chung, Scott Clark, Wolfgang E. Denzel, Ben C. Drerup, Torsten Hoefler, Jody B. Joyner, Jerry Lewis, Jian Li, Nan Ni, Ramakrishnan Rajamony:
The PERCS High-Performance Interconnect. Hot Interconnects 2010: 75-82 - [c38]Torsten Hoefler, Timo Schneider, Andrew Lumsdaine:
LogGOPSim: simulating large-scale applications in the LogGOPS model. HPDC 2010: 597-604 - [c37]Torsten Hoefler, Christian Siebert, Andrew Lumsdaine:
Scalable communication protocols for dynamic sparse data exchange. PPoPP 2010: 159-168 - [c36]Torsten Hoefler, William Gropp, Rajeev Thakur, Jesper Larsson Träff:
Toward Performance Models of MPI Implementations for Understanding Application Scaling Issues. EuroMPI 2010: 21-30 - [c35]Torsten Hoefler, Greg Bronevetsky, Brian Barrett, Bronis R. de Supinski, Andrew Lumsdaine:
Efficient MPI Support for Advanced Hybrid Programming Models. EuroMPI 2010: 50-61 - [c34]Torsten Hoefler, Steven Gottlieb:
Parallel Zero-Copy Algorithms for Fast Fourier Transform and Conjugate Gradient Using MPI Datatypes. EuroMPI 2010: 132-141 - [c33]Torsten Hoefler, Timo Schneider, Andrew Lumsdaine:
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation. SC 2010: 1-11
2000 – 2009
- 2009
- [j3]Torsten Hoefler, Timo Schneider, Andrew Lumsdaine:
The Effect of Network Noise on Large-Scale Collective Communications. Parallel Process. Lett. 19(4): 573-593 (2009) - [j2]Torsten Hoefler, Timo Schneider, Andrew Lumsdaine:
LogGP in theory and practice - An in-depth analysis of modern interconnection networks and benchmarking methods for collective operations. Simul. Model. Pract. Theory 17(9): 1511-1521 (2009) - [c32]Prabhanjan Kambadur, Anshul Gupta, Torsten Hoefler, Andrew Lumsdaine:
Demand-driven execution of static directed acyclic graphs using task parallelism. HiPC 2009: 284-293 - [c31]Torsten Hoefler, Timo Schneider, Andrew Lumsdaine:
Optimized Routing for Large-Scale InfiniBand Networks. Hot Interconnects 2009: 103-111 - [c30]Torsten Hoefler, Christian Siebert, Andrew Lumsdaine:
Group Operation Assembly Language - A Flexible Way to Express Collective Communication. ICPP 2009: 574-581 - [c29]Torsten Hoefler, Timo Schneider, Andrew Lumsdaine:
A power-aware, application-based performance study of modern commodity cluster interconnection networks. IPDPS 2009: 1-7 - [c28]Torsten Hoefler, Timo Schneider, Andrew Lumsdaine:
The impact of network noise at large-scale communication performance. IPDPS 2009: 1-8 - [c27]Torsten Hoefler, Jesper Larsson Träff:
Sparse collective operations for MPI. IPDPS 2009: 1-8 - [c26]Christian Kaiser, Torsten Hoefler, Boris Bierbaum, Thomas Bemmerl:
Implementation and analysis of nonblocking collective operations on SCI networks. IPDPS 2009: 1-7 - [c25]Torsten Hoefler, Andrew Lumsdaine, Jack J. Dongarra:
Towards Efficient MapReduce Using MPI. PVM/MPI 2009: 240-249 - 2008
- [c24]Timo Schneider, Torsten Hoefler, Simon Wunderlich, Torsten Mehlan, Wolfgang Rehm:
An Optimized ZGEMM Implementation for the Cell BE. PASA 2008: 113-122 - [c23]Torsten Hoefler, Andrew Lumsdaine:
Overlapping Communication and Computation with High Level Communication Routines. CCGRID 2008: 572-577 - [c22]Torsten Hoefler, Timo Schneider, Andrew Lumsdaine:
Multistage switches are not crossbars: Effects of static routing in high-performance networks. CLUSTER 2008: 116-125 - [c21]Torsten Hoefler, Andrew Lumsdaine:
Message progression in parallel computing - to thread or not to thread? CLUSTER 2008: 213-222 - [c20]Patrick Geoffray, Torsten Hoefler:
Adaptive Routing Strategies for Modern High Performance Networks. Hot Interconnects 2008: 165-172 - [c19]Torsten Hoefler, Andrew Lumsdaine:
Optimizing non-blocking collective operations for infiniband. IPDPS 2008: 1-8 - [c18]Torsten Hoefler, Timo Schneider, Andrew Lumsdaine:
Accurately measuring collective operations at massive scale. IPDPS 2008: 1-8 - [c17]Torsten Hoefler, Florian Lorenzen, Andrew Lumsdaine:
Sparse Non-blocking Collectives in Quantum Mechanical Calculations. PVM/MPI 2008: 55-63 - [c16]Torsten Hoefler, Maraike Schellmann, Sergei Gorlatch, Andrew Lumsdaine:
Communication Optimization for Medical Image Reconstruction Algorithms. PVM/MPI 2008: 75-83 - [c15]Torsten Hoefler, Peter Gottschling, Andrew Lumsdaine:
Leveraging non-blocking collective communication in high-performance applications. SPAA 2008: 113-115 - 2007
- [j1]Torsten Hoefler, Peter Gottschling, Andrew Lumsdaine, Wolfgang Rehm:
Optimizing a conjugate gradient solver with non-blocking collective operations. Parallel Comput. 33(9): 624-633 (2007) - [c14]Torsten Hoefler, Torsten Mehlan, Andrew Lumsdaine, Wolfgang Rehm:
Netgauge: A Network Performance Measurement Framework. HPCC 2007: 659-671 - [c13]Torsten Hoefler, Andre Lichei, Wolfgang Rehm:
Low-Overhead LogGP Parameter Assessment for Modern Interconnection Networks. IPDPS 2007: 1-8 - [c12]Torsten Hoefler, Christian Siebert, Wolfgang Rehm:
A practically constant-time MPI Broadcast Algorithm for large-scale InfiniBand Clusters with Multicast. IPDPS 2007: 1-8 - [c11]Torsten Hoefler, Prabhanjan Kambadur, Richard L. Graham, Galen M. Shipman, Andrew Lumsdaine:
A Case for Standard Non-blocking Collective Operations. PVM/MPI 2007: 125-134 - [c10]Torsten Hoefler, Andrew Lumsdaine, Wolfgang Rehm:
Implementation and performance analysis of non-blocking collective operations for MPI. SC 2007: 52 - 2006
- [c9]Torsten Hoefler, Torsten Mehlan, Frank Mietke, Wolfgang Rehm:
Adding Low-Cost Hardware Barrier Support to Small Commodity Clusters. ARCS Workshops 2006: 343-350 - [c8]Frank Mietke, Robert Rex, Robert Baumgartl, Torsten Mehlan, Torsten Hoefler, Wolfgang Rehm:
Analysis of the Memory Registration Process in the Mellanox InfiniBand Software Stack. Euro-Par 2006: 124-133 - [c7]Torsten Hoefler, Torsten Mehlan, Frank Mietke, Wolfgang Rehm:
Fast barrier synchronization for InfiniBand™. IPDPS 2006 - [c6]Torsten Hoefler, Torsten Mehlan, Frank Mietke, Wolfgang Rehm:
LogfP - a model for small messages in InfiniBand. IPDPS 2006 - [c5]Torsten Hoefler, Jeffrey M. Squyres, Wolfgang Rehm, Andrew Lumsdaine:
A Case for Non-blocking Collective Operations. ISPA Workshops 2006: 155-164 - [c4]Torsten Mehlan, Jochen Strunk, Torsten Hoefler, Frank Mietke, Wolfgang Rehm:
IRS - A Portable Interface for Reconfigurable Systems. PARELEC 2006: 187-191 - [c3]Torsten Hoefler, Carsten Viertel, Torsten Mehlan, Frank Mietke, Wolfgang Rehm:
Assessing Single-Message and Multi-Node Communication Performance of InfiniBand. PARELEC 2006: 227-232 - [c2]Torsten Hoefler, Peter Gottschling, Wolfgang Rehm, Andrew Lumsdaine:
Optimizing a Conjugate Gradient Solver with Non-Blocking Collective Operations. PVM/MPI 2006: 374-382 - 2005
- [c1]Torsten Hoefler, Lavinio Cerquetti, Torsten Mehlan, Frank Mietke, Wolfgang Rehm:
A Practical Approach to the Rating of Barrier Algorithms Using the LogP Model and Open MPI. ICPP Workshops 2005: 562-569
Coauthor Index
aka: Saleh Ashkboosh
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-23 21:27 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint