default search action
32nd IPDPS 2018: Vancouver, BC, Canada
- 2018 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018, Vancouver, BC, Canada, May 21-25, 2018. IEEE Computer Society 2018, ISBN 978-1-5386-4368-6
Keynote 1
- Michael A. Bender:
The Algorithmics of Write Optimization. 1
Session 1: Graph Algorithms 1
- Saliya Ekanayake, Jose Cadena, Udayanga Wickramasinghe, Anil Vullikanti:
MIDAS: Multilinear Detection at Scale. 2-11 - Michael Sutton, Tal Ben-Nun, Amnon Barak:
Optimizing Parallel Graph Connectivity Computation via Subgraph Sampling. 12-21 - S. M. Ferdous, Arif M. Khan, Alex Pothen:
Parallel Algorithms Through Approximation: B-Edge Cover. 22-33 - Md. Vasimuddin, Sriram P. Chockalingam, Srinivas Aluru:
A Parallel Algorithm for Bayesian Network Inference Using Arithmetic Circuits. 34-43
Session 2: Large-Scale Applications 1
- Jeffrey Regier, Kiran Pamnany, Keno Fischer, Andreas Noack, Maximilian Lam, Jarrett Revels, Steve Howard, Ryan Giordano, David Schlegel, Jon McAuliffe, Rollin C. Thomas, Prabhat:
Cataloging the Visible Universe Through Bayesian Inference at Petascale. 44-53 - Hao Lu, Sudip K. Seal, Gregory Muzyn, Wei Guo, Jonathan D. Poplawsky:
Efficient, Parallel At-scale Correlation Analysis for Atom Probe Tomography on Hybrid Architectures. 54-63 - Mert Hidayetoglu, Carl Pearson, Izzat El Hajj, Levent Gürel, Weng Cho Chew, Wen-Mei W. Hwu:
A Fast and Massively-Parallel Inverse Solver for Multiple-Scattering Tomographic Image Reconstruction. 64-74 - Hatem Ltaief, Ali Charara, Damien Gratadour, Nicolas Doucet, Bilel Hadri, Eric Gendron, Saber Feki, David E. Keyes:
Real-Time Massively Distributed Multi-object Adaptive Optics Simulations for the European Extremely Large Telescope. 75-84
Session 3: Performance / QoS / Resilience
- Palden Lama, Shaoqi Wang, Xiaobo Zhou, Dazhao Cheng:
Performance Isolation of Data-Intensive Scale-out Applications in a Multi-tenant Cloud. 85-94 - Suman Karki, Bao Nguyen, Xuechen Zhang:
QoS Support for Scientific Workflows Using Software-Defined Storage Resource Enclaves. 95-104 - Shaohua Duan, Pradeep Subedi, Keita Teranishi, Philip E. Davis, Hemanth Kolla, Marc Gamell, Manish Parashar:
Scalable Data Resilience for In-memory Data Staging. 105-115 - Balazs Gerofi, Rolf Riesen, Masamichi Takagi, Taisuke Boku, Kengo Nakajima, Yutaka Ishikawa, Robert W. Wisniewski:
Performance and Scalability of Lightweight Multi-kernel Based Operating Systems. 116-125
Session 4: Memory Designs and Optimizations
- Eran Gilad, Tehila Mayzels, Elazar Raab, Mark Oskin, Yoav Etsion:
Architectural Support for Unlimited Memory Versioning and Renaming. 126-136 - Gunjae Koo, Hyeran Jeon, Zhenhong Liu, Nam Sung Kim, Murali Annavaram:
CTA-Aware Prefetching and Scheduling for GPU. 137-148 - Jie Zhang, Shuwen Gao, Nam Sung Kim, Myoungsoo Jung:
CIAO: Cache Interference-Aware Throughput-Oriented Architecture and Scheduling for GPUs. 149-159 - Nitin, Mithuna Thottethodi, T. N. Vijaykumar:
Millipede: Die-Stacked Memory Optimizations for Big Data Machine Learning Analytics. 160-171
Session 5: Scheduling
- Klaus Jansen, Felix Land:
Scheduling Monotone Moldable Jobs in Linear Time. 172-181 - Kunal Agrawal, Seth Gilbert:
The Power to Schedule a Parallel Program. 182-193 - Hongyang Sun, Redouane Elghazi, Ana Gainaru, Guillaume Aupy, Padma Raghavan:
Scheduling Parallel Tasks under Multiple Resources: List Scheduling vs. Pack Scheduling. 194-203 - Loris Marchal, Hanna Nagy, Bertrand Simon, Frédéric Vivien:
Parallel Scheduling of DAGs under Memory Constraints. 204-213
Session 6: Learning
- Dmitry Duplyakin, Jed Brown, Donna Calhoun:
Evaluating Active Learning with Cost and Memory Awareness. 214-223 - Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz:
Semantics-Preserving Parallelization of Stochastic Gradient Descent. 224-233 - Zeyi Wen, Bingsheng He, Ramamohanarao Kotagiri, Shengliang Lu, Jiashuai Shi:
Efficient Gradient Boosted Decision Tree Training on GPUs. 234-243 - Yuwei Hu, Jidong Zhai, Dinghua Li, Yifan Gong, Yuhao Zhu, Wei Liu, Lei Su, Jiangming Jin:
BitFlow: Exploiting Vector Parallelism for Binary Neural Networks on CPU. 244-253
Session 7: Compilers and Libraries
- Michael Axtmann, Armin Wiebigke, Peter Sanders:
Lightweight MPI Communicators with Applications to Perfectly Balanced Quicksort. 254-265 - Wim Lavrijsen, Costin Iancu, Xing Pan:
Improving Network Throughput with Global Communication Reordering. 266-275 - Kaixi Hou, Hao Wang, Wu-chun Feng, Jeffrey S. Vetter, Seyong Lee:
Highly Efficient Compensation-Based Parallelism for Wavefront Loops on GPUs. 276-285 - Gaurav Mitra, Jonathan Bohmann, Ian Lintault, Alistair P. Rendell:
Development and Application of a Hybrid Programming Environment on an ARM/DSP System for High Performance Computing. 286-295
Session 8: Optimizations for Emerging Storage Systems
- Suzhen Wu, Weidong Zhu, Guixin Liu, Hong Jiang, Bo Mao:
GC-Aware Request Steering with Improved Performance and Reliability for SSD-Based RAIDs. 296-305 - Ting Yao, Zhi-hu Tan, Jiguang Wan, Ping Huang, Yiwen Zhang, Changsheng Xie, Xubin He:
A Set-Aware Key-Value Store on Shingled Magnetic Recording Drives with Dynamic Band. 306-315 - Masab Ahmad, Halit Dogan, Fabio Checconi, Xinyu Que, Daniele Buono, Omer Khan:
Software-Hardware Managed Last-level Cache Allocation Scheme for Large-Scale NVRAM-Based Multicores Executing Parallel Data Analytics Applications. 316-325 - Aditya Narayan, Tiansheng Zhang, Shaizeen Aga, Satish Narayanasamy, Ayse K. Coskun:
MOCA: Memory Object Classification and Allocation in Heterogeneous Memory Systems. 326-335
Best Paper Nominees - Plenary
- Daniel Funke, Sebastian Lamm, Peter Sanders, Christian Schulz, Darren Strash, Moritz von Looz:
Communication-Free Massively Distributed Graph Generation. 336-347 - Tao Lu, Qing Liu, Xubin He, Huizhang Luo, Eric Suchyta, Jong Choi, Norbert Podhorszki, Scott Klasky, Matthew Wolf, Tong Liu, Zhenbo Qiao:
Understanding and Modeling Lossy Compression Schemes on HPC Scientific Data. 348-357 - Karthik Kambatla, Vamsee Yarlagadda, Iñigo Goiri, Ananth Grama:
UBIS: Utilization-Aware Cluster Scheduling. 358-367 - Daniel Castro, Paolo Romano, João Pedro Barreto:
Hardware Transactional Memory Meets Memory Persistency. 368-377
Keynote 2
- Keren Bergman:
Empowering Flexible and Scalable High Performance Architectures with Embedded Photonics. 378
Session 9: Numerical Algorithms
- Doru-Thom Popovici, Tze Meng Low, Franz Franchetti:
Large Bandwidth-Efficient FFTs on Multicore and Multi-socket Systems. 379-388 - Akihiro Ida:
Lattice H-Matrices on Distributed-Memory Systems. 389-398 - Tiago Lobato Gimenes, Flavia Pisani, Edson Borin:
Evaluating the Performance and Cost of Accelerating Seismic Processing with CUDA, OpenCL, OpenACC, and OpenMP. 399-408 - Aditya Devarakonda, Kimon Fountoulakis, James Demmel, Michael W. Mahoney:
Avoiding Synchronization in First-Order Methods for Sparse Convex Optimization. 409-418
Session 10: GPU Hashing and Searching
- Saman Ashkiani, Martin Farach-Colton, John D. Owens:
A Dynamic Hash Table for the GPU. 419-429 - Saman Ashkiani, Shengren Li, Martin Farach-Colton, Nina Amenta, John D. Owens:
GPU LSM: A Dynamic Dictionary Data Structure for the GPU. 430-440 - Daniel Jünger, Christian Hundt, Bertil Schmidt:
WarpDrive: Massively Parallel Hashing on Multi-GPU Nodes. 441-450 - Afton Geil, Martin Farach-Colton, John D. Owens:
Quotient Filters: Approximate Membership Queries on the GPU. 451-462
Session 11: Domain-Specific, Runtime and Autotuning
- Steve Petruzza, Sean Treichler, Valerio Pascucci, Peer-Timo Bremer:
BabelFlow: An Embedded Domain Specific Language for Parallel Analysis and Visualization. 463-473 - Jingna Zeng, Paolo Romano, João Pedro Barreto, Luís E. T. Rodrigues, Seif Haridi:
Online Tuning of Parallelism Degree in Parallel Nesting Transactional Memory. 474-483 - Saman Barghi, Martin Karsten:
Work-Stealing, Locality-Aware Actor Scheduling. 484-494 - Michael B. Driscoll, Benjamin Brock, Frank Ong, Jonathan I. Tamir, Hsiou-Yuan Liu, Michael Lustig, Armando Fox, Katherine A. Yelick:
Indigo: A Domain-Specific Language for Fast, Portable Image Reconstruction. 495-504
Session 12: Resource Management
- Qihua Zhou, Peng Li, Kun Wang, Deze Zeng, Song Guo, Minyi Guo:
Swallow: Joint Online Scheduling and Coflow Compression in Datacenter Networks. 505-514 - Peng Zhang, Jianbin Fang, Tao Tang, Canqun Yang, Zheng Wang:
Auto-tuning Streamed Applications on Intel Xeon Phi. 515-525 - Ryuichi Sakamoto, Tapasya Patki, Thang Cao, Masaaki Kondo, Koji Inoue, Masatsugu Ueda, Daniel A. Ellsworth, Barry Rountree, Martin Schulz:
Analyzing Resource Trade-offs in Hardware Overprovisioned Supercomputers. 526-535 - Vivek Balasubramanian, Matteo Turilli, Weiming Hu, Matthieu Lefebvre, Wenjie Lei, Ryan T. Modrak, Guido Cervone, Jeroen Tromp, Shantenu Jha:
Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications. 536-545
Session 13: Tensors
- Willow Ahrens, Helen Xu, Nicholas Schiefer:
A Fill Estimation Algorithm for Sparse Matrices and Tensors in Blocked Formats. 546-556 - Grey Ballard, Nicholas Knight, Kathryn Rouse:
Communication Lower Bounds for Matricized Tensor Times Khatri-Rao Product. 557-567 - Jee W. Choi, Xing Liu, Shaden Smith, Tyler A. Simon:
Blocking Optimization Techniques for Sparse Tensor Computation. 568-577 - Jyothi Vedurada, Arjun Suresh, Aravind Sukumaran-Rajam, Jinsung Kim, Changwan Hong, Ajay Panyala, Sriram Krishnamoorthy, V. Krishna Nandivada, Rohit Kumar Srivastava, P. Sadayappan:
TTLG - An Efficient Tensor Transposition Library for GPUs. 578-588
Session 14: Large Scale Applications 2
- Peter A. Dinda, Conor Hetland:
Do Developers Understand IEEE Floating Point? 589-598 - Jiankuo Dong, Fangyu Zheng, Niall Emmart, Jingqiang Lin, Charles C. Weems:
sDPF-RSA: Utilizing Floating-point Computing Power of GPUs for Massive Digital Signature Computations. 599-609 - Simon Scheidegger, Dmitry Mikushin, Felix Kubler, Olaf Schenk:
Rethinking large-scale Economic Modeling for Efficiency: Optimizations for GPU and Xeon Phi Clusters. 610-619 - Tsuyoshi Ichimura, Kohei Fujita, Masashi Horikoshi, Larry Meadows, Kengo Nakajima, Takuma Yamaguchi, Kentaro Koyama, Hikaru Inoue, Akira Naruse, Keisuke Katsushima, Muneo Hori, Lalith Maddegedara:
A Fast Scalable Implicit Solver with Concentrated Computation for Nonlinear Time-Evolution Problems on Low-Order Unstructured Finite Elements. 620-629
Session 15: Data Operations
- Wei Chen, Aidi Pi, Shaoqi Wang, Xiaobo Zhou:
Characterizing Scheduling Delay for Low-Latency Data Analytics Workloads. 630-639 - Jesun Sahariar Firoz, Marcin Zalewski, Andrew Lumsdaine, Martina Barnas:
Runtime Scheduling Policies for Distributed Graph Algorithms. 640-649 - Lorenz Hübschle-Schneider, Peter Sanders:
Communication Efficient Checking of Big Data Operations. 650-659 - Guillaume Aupy, Olivier Beaumont, Lionel Eyraud-Dubois:
What Size Should Your Buffers to Disks be? 660-669
Session 16: Power and Temperature
- Majed Valad Beigi, Gokhan Memik:
THOR: THermal-aware Optimizations for extending ReRAM Lifetime. 670-679 - Lifeng Nai, Ramyad Hadidi, He Xiao, Hyojong Kim, Jaewoong Sim, Hyesoon Kim:
CoolPIM: Thermal-Aware Source Throttling for Efficient PIM Instruction Offloading. 680-689 - Haoran Cai, Xu Zhou, Qiang Cao, Hong Jiang, Feng Sheng, Xiandong Qi, Jie Yao, Changsheng Xie, Liang Xiao, Liang Gu:
GreenSprint: Effective Computational Sprinting in Green Data Centers. 690-699 - Liang Zhou, Chih-Hsun Chou, Laxmi N. Bhuyan, K. K. Ramakrishnan, Daniel Wong:
Joint Server and Network Energy Saving in Data Centers for Latency-Sensitive Applications. 700-709
Keynote 3
- Bruce Hendrickson:
The Day After Tomorrow: The Looming Post-Exascale Crisis. 710
Session 17: Graph Algorithms 2
- Naama Ben-David, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Yan Gu, Charles McGuffey, Julian Shun:
Implicit Decomposition for Write-Efficient Connectivity Algorithms. 711-722 - Leonid Barenboim, Tzalik Maimon:
Distributed Symmetry Breaking in Graphs with Bounded Diversity. 723-732 - Aisha Aljohani, Pavan Poudel, Gokarna Sharma:
Complete Visitability for Autonomous Robots on Graphs. 733-742 - Anisur Rahaman Molla, Gopal Pandurangan:
Local Mixing Time: Distributed Computation and Applications. 743-752
Session 18: Performance Modeling and Analysis
- Bahareh Mostafazadeh, Ferran Marti, Feng Liu, Aparna Chandramowlishwaran:
Roofline Guided Design and Analysis of a Multi-stencil CFD Solver for Multicore Performance. 753-762 - Shizhen Xu, Yuanchao Xu, Wei Xue, Xipeng Shen, Fang Zheng, Xiaomeng Huang, Guangwen Yang:
Taming the "Monster": Overcoming Program Optimization Challenges on SW26010 Through Precise Performance Modeling. 763-773 - Zhou Tong, Xin Yuan, Scott Pakin, Michael Lang:
Performance and Accuracy Trade-offs of HPC Application Modeling and Simulation. 774-783 - Jayaraman J. Thiagarajan, Rushil Anirudh, Bhavya Kailkhura, Nikhil Jain, Tanzima Z. Islam, Abhinav Bhatele, Jae-Seung Yeom, Todd Gamblin:
PADDLE: Performance Analysis Using a Data-Driven Learning Environment. 784-793
Session 19: Memory and Data Access
- Adrián Pérez Diéguez, Margarita Amor, Ramon Doallo, Akira Nukada, Satoshi Matsuoka:
Efficient Solving of Scan Primitive on Multi-GPU Systems. 794-803 - Jinsu Park, Woongki Baek:
Quantifying the Performance and Energy-Efficiency Impact of Hardware Transactional Memory on Scientific Applications on Large-Scale NUMA Systems. 804-813 - Sayan Goswami, Kisung Lee, Shayan Shams, Seung-Jong Park:
GPU-Accelerated Large-Scale Genome Assembly. 814-824 - Gregory Herschlag, Seyong Lee, Jeffrey S. Vetter, Amanda Randles:
GPU Data Access on Complex Geometries for D3Q19 Lattice Boltzmann Method. 825-834
Session 20: Exception Handling & Error Detection
- Yuanfeng Peng, Christian DeLozier, Ariel Eizenberg, William Mansky, Joseph Devietti:
SLIMFAST: Reducing Metadata Redundancy in Sound and Complete Dynamic Data Race Detection. 835-844 - Simone Atzeni, Ganesh Gopalakrishnan, Zvonimir Rakamaric, Ignacio Laguna, Gregory L. Lee, Dong H. Ahn:
SWORD: A Bounded Memory-Overhead Detector of OpenMP Data Races in Production Runs. 845-854 - Mostafa Mehrabi, Nasser Giacaman, Oliver Sinnen:
Unobtrusive Asynchronous Exception Handling with Standard Java Try/Catch Blocks. 855-864 - Hongbo Li, Sihuan Li, Zachary Benavides, Zizhong Chen, Rajiv Gupta:
COMPI: Concolic Testing for MPI Applications. 865-874
Session 21: Graph Algorithms 3
- George M. Slota, Sivasankaran Rajamanickam:
Experimental Design of Work Chunking for Graph Algorithms on High Bandwidth Memory Architectures. 875-884 - Sayan Ghosh, Mahantesh Halappanavar, Antonino Tumeo, Ananth Kalyanaraman, Hao Lu, Daniel G. Chavarría-Miranda, Arif Khan, Assefaw Hadish Gebremedhin:
Distributed Louvain Algorithm for Graph Community Detection. 885-895 - Vincent T. Lee, Amrita Mazumdar, Carlo C. del Mundo, Armin Alaghi, Luis Ceze, Mark Oskin:
Application Codesign of Near-Data Processing for Similarity Search. 896-907
Session 22: Linear Solvers
- Piyush Sao, Xiaoye Sherry Li, Richard W. Vuduc:
A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices. 908-919 - Ernesto Dufrechou, Pablo Ezzatti:
A New GPU Algorithm to Compute a Level Set-Based Analysis for the Parallel Solution of Sparse Triangular Systems. 920-929 - Ichitaro Yamazaki, Ahmad Abdelfattah, Akihiro Ida, Satoshi Ohshima, Stanimire Tomov, Rio Yokota, Jack J. Dongarra:
Performance of Hierarchical-matrix BiCGStab Solver on GPU Clusters. 930-939 - Jordi Wolfson-Pou, Edmond Chow:
Convergence Models and Surprising Results for the Asynchronous Jacobi Method. 940-949
Session 23: Runtime Systems and Libraries
- Yue Zhao, Weijie Zhou, Xipeng Shen, Graham Yiu:
Overhead-Conscious Format Selection for SpMV-Based Applications. 950-959 - Michael A. Sevilla, Ivo Jimenez, Noah Watkins, Jeff LeFevre, Peter Alvaro, Shel Finkelstein, Patrick Donnelly, Carlos Maltzahn:
Cudele: An API and Framework for Programmable Consistency and Durability in a Global Namespace. 960-969 - Nuno Apolónia, Stefanos Antaris, Sarunas Girdzijauskas, George Pallis, Marios D. Dikaiakos:
SELECT: A Distributed Publish/Subscribe Notification System for Online Social Networks. 970-979 - Hoang-Vu Dang, Roshan Dathathri, Gurbinder Gill, Alex Brooks, Nikoli Dryden, Andrew Lenharth, Loc Hoang, Keshav Pingali, Marc Snir:
A Lightweight Communication Runtime for Distributed Graph Analytics. 980-989
Session 24: Networks and Communication
- Lu Wang, Xia Zhao, David R. Kaeli, Zhiying Wang, Lieven Eeckhout:
Intra-Cluster Coalescing to Reduce GPU NoC Pressure. 990-999 - Bo Peng, Jianguo Yao, Zhengwei Qi, Haibing Guan:
HybridPass: Hybrid Scheduling for Mixed Flows in Datacenter Networks. 1000-1009 - Avinash Kodi, Kyle Shifflet, Savas Kaya, Soumyasanta Laha, Ahmed Louri:
Scalable Power-Efficient Kilo-Core Photonic-Wireless NoC Architectures. 1010-1019 - Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. Panda:
Designing Efficient Shared Address Space Reduction Collectives for Multi-/Many-cores. 1020-1029
Session 25: Distributed Computing
- Mercy O. Jaiyeola, Kyle Patron, Jared Saia, Maxwell Young, Qian M. Zhou:
Tiny Groups Tackle Byzantine Adversaries. 1030-1039 - Michael Feldmann, Christian Scheideler, Alexander Setzer:
Skueue: A Scalable and Sequentially Consistent Distributed Queue. 1040-1049 - Michael Feldmann, Christina Kolb, Christian Scheideler, Thim Strothmann:
Self-Stabilizing Supervised Publish-Subscribe Systems. 1050-1059 - John Augustine, Sumathi Sivasubramaniam:
Spartan: A Framework For Sparse Robust Addressable Networks. 1060-1069
Session 26: Graph Algorithms 4
- Kyle Berney, Henri Casanova, Alyssa Higuchi, Ben Karsin, Nodari Sitchinava:
Beyond Binary Search: Parallel In-Place Construction of Implicit Search Tree Layouts. 1070-1079 - Sara Karamati, Jeffrey S. Young, Richard W. Vuduc:
An Energy-Efficient Single-Source Shortest Path Algorithm. 1080-1089 - Yuechao Pan, Roger Pearce, John D. Owens:
Scalable Breadth-First Search on a GPU Cluster. 1090-1101
Session 27: Communication Performance
- Amir Bahmani, Frank Mueller:
Chameleon: Online Clustering of MPI Program Traces. 1102-1112 - Xin Wang, Misbah Mubarak, Xu Yang, Robert B. Ross, Zhiling Lan:
Trade-Off Study of Localizing Communication and Balancing Network Traffic on a Dragonfly System. 1113-1122