default search action
Satoshi Matsuoka
Person information
- affiliation: Tokyo Institute of Technology, Japan
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [c259]Du Wu, Peng Chen, Xiao Wang, Isaac Lyngaas, Takaaki Miyajima, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib:
Real-time High-resolution X-Ray Computed Tomography. ICS 2024: 110-123 - 2023
- [j49]Satoshi Matsuoka, Jens Domke, Mohamed Wahib, Aleksandr Drozd, Torsten Hoefler:
Myths and legends in high-performance computing. Int. J. High Perform. Comput. Appl. 37(3-4): 245-259 (2023) - [j48]Akira Nukada, Taichiro Suzuki, Satoshi Matsuoka:
Efficient checkpoint/Restart of CUDA applications. Parallel Comput. 116: 103018 (2023) - [j47]Jens Domke, Emil Vatai, Balazs Gerofi, Yuetsu Kodama, Mohamed Wahib, Artur Podobas, Sparsh Mittal, Miquel Pericàs, Lingqi Zhang, Peng Chen, Aleksandr Drozd, Satoshi Matsuoka:
At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads. ACM Trans. Archit. Code Optim. 20(4): 57:1-57:26 (2023) - [j46]Huaipeng Zhang, Nhut-Minh Ho, Dogukan Yigit Polat, Peng Chen, Mohamed Wahib, Truong Thao Nguyen, Jintao Meng, Rick Siow Mong Goh, Satoshi Matsuoka, Tao Luo, Weng-Fai Wong:
Simeuro: A Hybrid CPU-GPU Parallel Simulator for Neuromorphic Computing Chips. IEEE Trans. Parallel Distributed Syst. 34(10): 2767-2782 (2023) - [c258]Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka:
Exploiting Scratchpad Memory for Deep Temporal Blocking: A case study for 2D Jacobian 5-point iterative stencil kernel (j2d5pt). GPGPU@PPoPP 2023: 34-35 - [c257]Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka:
PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications. ICS 2023: 167-179 - [c256]Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka:
Revisiting Temporal Blocking Stencil Optimizations. ICS 2023: 251-263 - [i35]Satoshi Matsuoka, Jens Domke, Mohamed Wahib, Aleksandr Drozd, Torsten Hoefler:
Myths and Legends in High-Performance Computing. CoRR abs/2301.02432 (2023) - [i34]Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka:
Revisiting Temporal Blocking Stencil Optimizations. CoRR abs/2305.07390 (2023) - [i33]Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka:
Exploiting Scratchpad Memory for Deep Temporal Blocking: A case study for 2D Jacobian 5-point iterative stencil kernel (j2d5pt). CoRR abs/2306.03336 (2023) - 2022
- [j45]Satoshi Matsuoka, Jens Domke, Mohamed Wahib, Aleksandr Drozd, Andrew A. Chien, Raymond Bair, Jeffrey S. Vetter, John Shalf:
Preparing for the Future - Rethinking Proxy Applications. Comput. Sci. Eng. 24(2): 85-90 (2022) - [j44]Adrián Pérez Diéguez, Margarita Amor, Ramón Doallo, Akira Nukada, Satoshi Matsuoka:
Efficient high-precision integer multiplication on the GPU. Int. J. High Perform. Comput. Appl. 36(3): 356-369 (2022) - [j43]Kazuto Ando, Rahul Bale, Chung-Gang Li, Satoshi Matsuoka, Keiji Onishi, Makoto Tsubokura:
Digital transformation of droplet/aerosol infection risk assessment realized on "Fugaku" for the fight against COVID-19. Int. J. High Perform. Comput. Appl. 36(5-6): 568-586 (2022) - [i32]Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Satoshi Matsuoka:
Persistent Kernels for Iterative Memory-bound GPU Applications. CoRR abs/2204.02064 (2022) - [i31]Jens Domke, Emil Vatai, Balazs Gerofi, Yuetsu Kodama, Mohamed Wahib, Artur Podobas, Sparsh Mittal, Miquel Pericàs, Lingqi Zhang, Peng Chen, Aleksandr Drozd, Satoshi Matsuoka:
At the Locus of Performance: A Case Study in Enhancing CPUs with Copious 3D-Stacked Cache. CoRR abs/2204.02235 (2022) - [i30]Satoshi Matsuoka, Jens Domke, Mohamed Wahib, Aleksandr Drozd, Ray Bair, Andrew A. Chien, Jeffrey S. Vetter, John Shalf:
Preparing for the Future - Rethinking Proxy Apps. CoRR abs/2204.07336 (2022) - 2021
- [j42]Francis J. Alexander, James A. Ang, Jenna A. Bilbrey, Jan Balewski, Tiernan Casey, Ryan Chard, Jong Choi, Sutanay Choudhury, Bert J. Debusschere, Anthony M. DeGennaro, Nikoli Dryden, J. Austin Ellis, Ian T. Foster, Cristina Garcia-Cardona, Sayan Ghosh, Peter Harrington, Yunzhi Huang, Shantenu Jha, Travis Johnston, Ai Kagawa, Ramakrishnan Kannan, Neeraj Kumar, Zhengchun Liu, Naoya Maruyama, Satoshi Matsuoka, Erin McCarthy, Jamaludin Mohd-Yusof, Peter Nugent, Yosuke Oyama, Thomas Proffen, David Pugmire, Sivasankaran Rajamanickam, Vinay Ramakrishnaiah, Malachi Schram, Sudip K. Seal, Ganesh Sivaraman, Christine Sweeney, Li Tan, Rajeev Thakur, Brian Van Essen, Logan T. Ward, Paul M. Welch, Michael Wolf, Sotiris S. Xantheas, Kevin G. Yager, Shinjae Yoo, Byung-Jun Yoon:
Co-design Center for Exascale Machine Learning Technologies (ExaLearn). Int. J. High Perform. Comput. Appl. 35(6): 598-616 (2021) - [j41]Yosuke Oyama, Naoya Maruyama, Nikoli Dryden, Erin McCarthy, Peter Harrington, Jan Balewski, Satoshi Matsuoka, Peter Nugent, Brian Van Essen:
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs With Hybrid Parallelism. IEEE Trans. Parallel Distributed Syst. 32(7): 1641-1652 (2021) - [c255]Peng Chen, Mohamed Wahib, Xiao Wang, Shin'ichiro Takizawa, Takahiro Hirofuchi, Hirotaka Ogawa, Satoshi Matsuoka:
Performance portable back-projection algorithms on CPUs: agnostic data locality and vectorization optimizations. ICS 2021: 316-328 - [c254]Jens Domke, Emil Vatai, Aleksandr Drozd, Peng Chen, Yosuke Oyama, Lingqi Zhang, Shweta Salaria, Daichi Mukunoki, Artur Podobas, Mohamed Wahib, Satoshi Matsuoka:
Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws? IPDPS 2021: 1056-1065 - [c253]Steven Farrell, Murali Emani, Jacob Balma, Lukas Drescher, Aleksandr Drozd, Andreas Fink, Geoffrey C. Fox, David Kanter, Thorsten Kurth, Peter Mattson, Dawei Mu, Amit Ruhela, Kento Sato, Koichi Shirahata, Tsuguchika Tabaru, Aristeidis Tsaris, Jan Balewski, Ben Cumming, Takumi Danjo, Jens Domke, Takaaki Fukai, Naoto Fukumoto, Tatsuya Fukushi, Balazs Gerofi, Takumi Honda, Toshiyuki Imamura, Akihiko Kasagi, Kentaro Kawakami, Shuhei Kudo, Akiyoshi Kuroda, Maxime Martinasso, Satoshi Matsuoka, Henrique Mendonça, Kazuki Minami, Prabhat Ram, Takashi Sawada, Mallikarjun Shankar, Tom St. John, Akihiro Tabuchi, Venkatram Vishwanath, Mohamed Wahib, Masafumi Yamazaki, Junqi Yin:
MLPerf™ HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems. MLHPC@SC 2021: 33-45 - [c252]Peng Chen, Mohamed Wahib, Xiao Wang, Takahiro Hirofuchi, Hirotaka Ogawa, Ander Biguri, Richard P. Boardman, Thomas Blumensath, Satoshi Matsuoka:
Scalable FBP decomposition for cone-beam CT reconstruction. SC 2021: 9 - [c251]Satoshi Matsuoka:
Fugaku and A64FX: the First Exascale Supercomputer and its Innovative Arm CPU. VLSI Circuits 2021: 1-3 - [i29]Peng Chen, Mohamed Wahib, Xiao Wang, Shin'ichiro Takizawa, Takahiro Hirofuchi, Hirotaka Ogawa, Satoshi Matsuoka:
Performance Portable Back-projection Algorithms on CPUs: Agnostic Data Locality and Vectorization Optimizations. CoRR abs/2104.13248 (2021) - [i28]Kazuto Ando, Rahul Bale, Chung-Gang Li, Satoshi Matsuoka, Keiji Onishi, Makoto Tsubokura:
Digital transformation of droplet/aerosol infection risk assessment realized on "Fugaku" for the fight against COVID-19. CoRR abs/2110.09769 (2021) - [i27]Steven Farrell, Murali Emani, Jacob Balma, Lukas Drescher, Aleksandr Drozd, Andreas Fink, Geoffrey C. Fox, David Kanter, Thorsten Kurth, Peter Mattson, Dawei Mu, Amit Ruhela, Kento Sato, Koichi Shirahata, Tsuguchika Tabaru, Aristeidis Tsaris, Jan Balewski, Ben Cumming, Takumi Danjo, Jens Domke, Takaaki Fukai, Naoto Fukumoto, Tatsuya Fukushi, Balazs Gerofi, Takumi Honda, Toshiyuki Imamura, Akihiko Kasagi, Kentaro Kawakami, Shuhei Kudo, Akiyoshi Kuroda, Maxime Martinasso, Satoshi Matsuoka, Henrique Mendonça, Kazuki Minami, Prabhat Ram, Takashi Sawada, Mallikarjun Shankar, Tom St. John, Akihiro Tabuchi, Venkatram Vishwanath, Mohamed Wahib, Masafumi Yamazaki, Junqi Yin:
MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems. CoRR abs/2110.11466 (2021) - 2020
- [j40]Artur Podobas, Kentaro Sano, Satoshi Matsuoka:
A Survey on Coarse-Grained Reconfigurable Architectures From a Performance Perspective. IEEE Access 8: 146719-146743 (2020) - [c250]Artur Podobas, Kentaro Sano, Satoshi Matsuoka:
A Template-based Framework for Exploring Coarse-Grained Reconfigurable Architectures. ASAP 2020: 1-8 - [c249]Kazuaki Matsumura, Hamid Reza Zohouri, Mohamed Wahib, Toshio Endo, Satoshi Matsuoka:
AN5D: automated stencil framework for high-degree temporal blocking on GPUs. CGO 2020: 199-211 - [c248]Lingqi Zhang, Mohamed Wahib, Haoyu Zhang, Satoshi Matsuoka:
A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs. IPDPS 2020: 483-493 - [c247]Satoshi Matsuoka:
A Formal Model for a Linear Time Correctness Condition of Proof Nets of Multiplicative Linear Logic. LOPSTR 2020: 311-328 - [c246]Mohamed Wahib, Haoyu Zhang, Truong Thao Nguyen, Aleksandr Drozd, Jens Domke, Lingqi Zhang, Ryousei Takano, Satoshi Matsuoka:
Scaling distributed deep learning workloads beyond the memory capacity with KARMA. SC 2020: 19 - [i26]Kazuaki Matsumura, Hamid Reza Zohouri, Mohamed Wahib, Toshio Endo, Satoshi Matsuoka:
AN5D: Automated Stencil Framework for High-Degree Temporal Blocking on GPUs. CoRR abs/2001.01473 (2020) - [i25]Hamid Reza Zohouri, Artur Podobas, Satoshi Matsuoka:
High-Performance High-Order Stencil Computation on FPGAs Using OpenCL. CoRR abs/2002.05983 (2020) - [i24]Artur Podobas, Kentaro Sano, Satoshi Matsuoka:
A Survey on Coarse-Grained Reconfigurable Architectures from a Performance Perspective. CoRR abs/2004.04509 (2020) - [i23]Lingqi Zhang, Mohamed Wahib, Haoyu Zhang, Satoshi Matsuoka:
A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs. CoRR abs/2004.05371 (2020) - [i22]Yosuke Oyama, Naoya Maruyama, Nikoli Dryden, Erin McCarthy, Peter Harrington, Jan Balewski, Satoshi Matsuoka, Peter Nugent, Brian Van Essen:
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism. CoRR abs/2007.12856 (2020) - [i21]Mohamed Wahib, Haoyu Zhang, Truong Thao Nguyen, Aleksandr Drozd, Jens Domke, Lingqi Zhang, Ryousei Takano, Satoshi Matsuoka:
Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA. CoRR abs/2008.11421 (2020) - [i20]Jens Domke, Emil Vatai, Aleksandr Drozd, Peng Chen, Yosuke Oyama, Lingqi Zhang, Shweta Salaria, Daichi Mukunoki, Artur Podobas, Mohamed Wahib, Satoshi Matsuoka:
Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws? CoRR abs/2010.14373 (2020)
2010 – 2019
- 2019
- [j39]Bofang Li, Aleksandr Drozd, Yuhe Guo, Tao Liu, Satoshi Matsuoka, Xiaoyong Du:
Scaling Word2Vec on Big Corpus. Data Sci. Eng. 4(2): 157-175 (2019) - [j38]Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, Aydin Buluç:
Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors. Parallel Comput. 90 (2019) - [j37]Aamer Shah, Chih-Song Kuo, Akihiro Nomura, Satoshi Matsuoka, Felix Wolf:
How File-access Patterns Influence the Degree of I/O Interference between Cluster Applications. Supercomput. Front. Innov. 6(2): 29-55 (2019) - [c245]Yusuke Nagasaka, Akira Nukada, Ryosuke Kojima, Satoshi Matsuoka:
Batched Sparse Matrix Multiplication for Accelerating Graph Convolutional Networks. CCGRID 2019: 231-240 - [c244]Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Rio Yokota, Satoshi Matsuoka:
Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks. CVPR 2019: 12359-12367 - [c243]Jens Domke, Satoshi Matsuoka, Ivan Radanov, Yuki Tsushima, Tomoya Yuki, Akihiro Nomura, Shin'ichi Miura, Nic McDonald, Dennis Lee Floyd, Nicolas Dubé:
The First Supercomputer with HyperX Topology: A Viable Alternative to Fat-Trees? Hot Interconnects 2019: 1-4 - [c242]Yohei Tsuji, Kazuki Osawa, Yuichiro Ueno, Akira Naruse, Rio Yokota, Satoshi Matsuoka:
Performance Optimizations and Analysis of Distributed Deep Learning with Approximated Second-Order Optimization Method. ICPP Workshops 2019: 21:1-21:8 - [c241]Jens Domke, Kazuaki Matsumura, Mohamed Wahib, Haoyu Zhang, Keita Yashima, Toshiki Tsuchikawa, Yohei Tsuji, Artur Podobas, Satoshi Matsuoka:
Double-Precision FPUs in High-Performance Computing: An Embarrassment of Riches? IPDPS 2019: 78-88 - [c240]Hamid Reza Zohouri, Satoshi Matsuoka:
The Memory Controller Wall: Benchmarking the Intel FPGA SDK for OpenCL Memory Interface. H2RC@SC 2019: 11-18 - [c239]Jens Domke, Satoshi Matsuoka, Ivan R. Ivanov, Yuki Tsushima, Tomoya Yuki, Akihiro Nomura, Shin'ichi Miura, Nic McDonald, Dennis Lee Floyd, Nicolas Dubé:
HyperX topology: first at-scale implementation and comparison to the fat-tree. SC 2019: 40:1-40:23 - [c238]Peng Chen, Mohamed Wahib, Shin'ichiro Takizawa, Ryousei Takano, Satoshi Matsuoka:
A versatile software systolic execution model for GPU memory-bound kernels. SC 2019: 53:1-53:81 - [c237]Peng Chen, Mohamed Wahib, Shin'ichiro Takizawa, Ryousei Takano, Satoshi Matsuoka:
iFDK: a scalable framework for instant high-resolution image reconstruction. SC 2019: 84:1-84:24 - [c236]Hideyuki Jitsumoto, Yuya Kobayashi, Akihiro Nomura, Satoshi Matsuoka:
MH-QEMU: Memory-State-Aware Fault Injection Platform. SCFA 2019: 71-85 - [c235]Shweta Salaria, Aleksandr Drozd, Artur Podobas, Satoshi Matsuoka:
Learning Neural Representations for Predicting GPU Performance. ISC 2019: 40-58 - [i19]Satoshi Matsuoka:
A New Linear Time Correctness Condition for Multiplicative Linear Logic. CoRR abs/1902.09693 (2019) - [i18]Yusuke Nagasaka, Akira Nukada, Ryosuke Kojima, Satoshi Matsuoka:
Batched Sparse Matrix Multiplication for Accelerating Graph Convolutional Networks. CoRR abs/1903.11409 (2019) - [i17]Peng Chen, Mohamed Wahib, Shin'ichiro Takizawa, Ryousei Takano, Satoshi Matsuoka:
A Versatile Software Systolic Execution Model for GPU Memory-Bound Kernels. CoRR abs/1907.06154 (2019) - [i16]Peng Chen, Mohamed Wahib, Shin'ichiro Takizawa, Ryousei Takano, Satoshi Matsuoka:
iFDK: A Scalable Framework for Instant High-resolution Image Reconstruction. CoRR abs/1909.02724 (2019) - [i15]Hamid Reza Zohouri, Satoshi Matsuoka:
The Memory Controller Wall: Benchmarking the Intel FPGA SDK for OpenCL Memory Interface. CoRR abs/1910.06726 (2019) - 2018
- [j36]Mark Asch, Terry Moore, Rosa M. Badia, Micah Beck, Peter H. Beckman, T. Bidot, François Bodin, Franck Cappello, Alok N. Choudhary, Bronis R. de Supinski, Ewa Deelman, Jack J. Dongarra, Anshu Dubey, Geoffrey C. Fox, H. Fu, Sergi Girona, William Gropp, Michael A. Heroux, Yutaka Ishikawa, Katarzyna Keahey, David E. Keyes, Bill Kramer, J.-F. Lavignon, Y. Lu, Satoshi Matsuoka, Bernd Mohr, Daniel A. Reed, S. Requena, Joel H. Saltz, Thomas C. Schulthess, Rick L. Stevens, D. Martin Swany, Alexander S. Szalay, William M. Tang, G. Varoquaux, Jean-Pierre Vilotte, Robert W. Wisniewski, Z. Xu, Igor Zacharov:
Big data and extreme-scale computing. Int. J. High Perform. Comput. Appl. 32(4): 435-479 (2018) - [j35]James Lin, Zhigeng Xu, Linjin Cai, Akira Nukada, Satoshi Matsuoka:
Evaluating the SW26010 many-core processor with a micro-benchmark suite for performance optimizations. Parallel Comput. 77: 128-143 (2018) - [j34]Abdelhalim Amer, Huiwei Lu, Pavan Balaji, Milind Chabbi, Yanjie Wei, Jeff R. Hammond, Satoshi Matsuoka:
Lock Contention Management in Multithreaded MPI. ACM Trans. Parallel Comput. 5(3): 12:1-12:21 (2018) - [c234]James Lin, Minhua Wen, Delong Meng, Xin Liu, Akira Nukada, Satoshi Matsuoka:
Optimizing Preconditioned Conjugate Gradient on TaihuLight for OpenFOAM. CCGrid 2018: 273-282 - [c233]Yosuke Oyama, Tal Ben-Nun, Torsten Hoefler, Satoshi Matsuoka:
Accelerating Deep Learning Frameworks with Micro-Batches. CLUSTER 2018: 402-412 - [c232]Peng Chen, Mohamed Wahib, Shin'ichiro Takizawa, Ryousei Takano, Satoshi Matsuoka:
Efficient Algorithms for the Summed Area Tables Primitive on GPUs. CLUSTER 2018: 482-493 - [c231]Shweta Salaria, Aleksandr Drozd, Artur Podobas, Satoshi Matsuoka:
Predicting Performance Using Collaborative Filtering. CLUSTER 2018: 504-514 - [c230]Satoshi Matsuoka:
Direct Encodings of NP-Complete Problems into Horn Sequents of Multiplicative Linear Logic. FLOPS 2018: 17-32 - [c229]Hamid Reza Zohouri, Artur Podobas, Satoshi Matsuoka:
Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL. FPGA 2018: 153-162 - [c228]Hiroki Kanezashi, Toyotaro Suzumura, Dario Garcia-Gasulla, Min-hwan Oh, Satoshi Matsuoka:
Adaptive Pattern Matching with Reinforcement Learning for Dynamic Graphs. HiPC 2018: 92-101 - [c227]Satoshi Matsuoka:
Cambrian explosion of computing and big data in the post-moore era. HPDC 2018: 105 - [c226]Tianqi Xu, Kento Sato, Satoshi Matsuoka:
Explorations of Data Swapping on Burst Buffer. ICPADS 2018: 517-526 - [c225]Kevin A. Brown, Nikhil Jain, Satoshi Matsuoka, Martin Schulz, Abhinav Bhatele:
Interference between I/O and MPI Traffic on Fat-tree Networks. ICPP 2018: 7:1-7:10 - [c224]Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, Aydin Buluç:
High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore Architectures. ICPP Workshops 2018: 34:1-34:10 - [c223]Hamid Reza Zohouri, Artur Podobas, Satoshi Matsuoka:
High-Performance High-Order Stencil Computation on FPGAs Using OpenCL. IPDPS Workshops 2018: 123-130 - [c222]Artur Podobas, Satoshi Matsuoka:
Hardware Implementation of POSITs and Their Application in FPGAs. IPDPS Workshops 2018: 138-145 - [c221]Adrián Pérez Diéguez, Margarita Amor, Ramon Doallo, Akira Nukada, Satoshi Matsuoka:
Efficient Solving of Scan Primitive on Multi-GPU Systems. IPDPS 2018: 794-803 - [c220]Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka, Kenichi Miura, John Shalf:
MRG8: Random Number Generation for the Exascale Era. PASC 2018: 6:1-6:11 - [c219]Pak Markthub, Mehmet E. Belviranli, Seyong Lee, Jeffrey S. Vetter, Satoshi Matsuoka:
DRAGON: breaking GPU memory capacity limits with direct NVM access. SC 2018: 32:1-32:13 - [c218]Kazuaki Matsumura, Mitsuhisa Sato, Taisuke Boku, Artur Podobas, Satoshi Matsuoka:
MACC: An OpenACC Transpiler for Automatic Multi-GPU Use. SCFA 2018: 109-127 - [c217]Jian Guo, Akihiro Nomura, Ryan Barton, Haoyu Zhang, Satoshi Matsuoka:
Machine Learning Predictions for Underestimation of Job Runtime on HPC System. SCFA 2018: 179-198 - [i14]Hamid Reza Zohouri, Artur Podobas, Satoshi Matsuoka:
Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL. CoRR abs/1802.00438 (2018) - [i13]Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, Aydin Buluç:
High-performance sparse matrix-matrix products on Intel KNL and multicore architectures. CoRR abs/1804.01698 (2018) - [i12]Yosuke Oyama, Tal Ben-Nun, Torsten Hoefler, Satoshi Matsuoka:
μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching. CoRR abs/1804.04806 (2018) - [i11]Jens Domke, Kazuaki Matsumura, Mohamed Wahib, Haoyu Zhang, Keita Yashima, Toshiki Tsuchikawa, Yohei Tsuji, Artur Podobas, Satoshi Matsuoka:
Double-precision FPUs in High-Performance Computing: an Embarrassment of Riches? CoRR abs/1810.09330 (2018) - [i10]Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Rio Yokota, Satoshi Matsuoka:
Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs. CoRR abs/1811.12019 (2018) - [i9]Hiroki Kanezashi, Toyotaro Suzumura, Dario Garcia-Gasulla, Min-hwan Oh, Satoshi Matsuoka:
Adaptive Pattern Matching with Reinforcement Learning for Dynamic Graphs. CoRR abs/1812.10321 (2018) - 2017
- [j33]Koji Ueno, Toyotaro Suzumura, Naoya Maruyama, Katsuki Fujisawa, Satoshi Matsuoka:
Efficient Breadth-First Search on Massively Parallel and Distributed-Memory Machines. Data Sci. Eng. 2(1): 22-35 (2017) - [j32]Xinhua Lin, Qiang Qin, Shuo Li, Minhua Wen, Satoshi Matsuoka:
使用Stencil评估Intel AVX2 Vgather指令 (Evaluating Intel AVX2 Vgather Instructions with Stencils). 计算机科学 44(1): 20-24 (2017) - [c216]Satoshi Matsuoka:
Being "BYTES-oriented" in HPC leads to an open big data/AI ecosystem and further advances into the post-moore era. IEEE BigData 2017: 5 - [c215]Shweta Salaria, Kevin A. Brown, Hideyuki Jitsumoto, Satoshi Matsuoka:
Evaluation of HPC-Big Data Applications Using Cloud Platforms. CCGrid 2017: 1053-1061 - [c214]Kevin A. Brown, Satoshi Matsuoka:
Co-locating Graph Analytics and HPC Applications. CLUSTER 2017: 659-660 - [c213]Artur Podobas, Hamid Reza Zohouri, Naoya Maruyama, Satoshi Matsuoka:
Evaluating high-level design strategies on FPGAs for high-performance computing. FPL 2017: 1-4 - [c212]Artur Podobas, Hamid Reza Zohouri, Naoya Maruyama, Satoshi Matsuoka:
Evaluating high-level design strategies on FPGAs for high-performance computing. FPL 2017: 1-4 - [c211]Artur Podobas, Satoshi Matsuoka:
Designing and accelerating spiking neural networks using OpenCL for FPGAs. FPT 2017: 255-258 - [c210]Kevin A. Brown, Tianqi Xu, Keita Iwabuchi, Kento Sato, Adam Moody, Kathryn M. Mohror, Nikhil Jain, Abhinav Bhatele, Martin Schulz, Roger A. Pearce, Maya B. Gokhale, Satoshi Matsuoka:
Accelerating Big Data Infrastructure and Applications (Ongoing Collaboration). ICDCS Workshops 2017: 343-347 - [c209]Ikuro Sato, Ryo Fujisaki, Yosuke Oyama, Akihiro Nomura, Satoshi Matsuoka:
Asynchronous, Data-Parallel Deep Convolutional Neural Network Training with Linear Prediction Model for Parameter Transition. ICONIP (2) 2017: 305-314 - [c208]Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka:
High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU. ICPP 2017: 101-110 - [c207]James Lin, Zhigeng Xu, Akira Nukada, Naoya Maruyama, Satoshi Matsuoka:
Optimizations of Two Compute-Bound Scientific Kernels on the SW26010 Many-Core Processor. ICPP 2017: 432-441 - [c206]Zhigeng Xu, James Lin, Satoshi Matsuoka:
Benchmarking SW26010 Many-Core Processor. IPDPS Workshops 2017: 743-752 - [c205]Shota Kuroda, Toshio Endo, Satoshi Matsuoka:
Applying Temporal Blocking with a Directive-based Approach. LLVM-HPC@SC 2017: 8:1-8:11 - 2016
- [j31]Aleksandr Drozd, Olaf Witkowski, Satoshi Matsuoka, Takashi Ikegami:
Critical mass in the emergence of collective intelligence: a parallelized simulation of swarms in noisy environments. Artif. Life Robotics 21(3): 317-323 (2016) - [j30]Michela Taufer, Pavan Balaji, Satoshi Matsuoka:
Special Issue on Cluster Computing. Parallel Comput. 58: 25-26 (2016) - [j29]Hideyuki Shamoto, Koichi Shirahata, Aleksandr Drozd, Hitoshi Sato, Satoshi Matsuoka:
GPU-Accelerated Large-Scale Distributed Sorting Coping with Device Memory Capacity. IEEE Trans. Big Data 2(1): 57-69 (2016) - [c204]Yosuke Oyama, Akihiro Nomura, Ikuro Sato, Hiroki Nishimura, Yukimasa Tamatsu, Satoshi Matsuoka:
Predicting statistics of asynchronous SGD parameters for a large-scale distributed deep learning system on GPU supercomputers. IEEE BigData 2016: 66-75 - [c203]Hitoshi Sato, Ryo Mizote, Satoshi Matsuoka, Hirotaka Ogawa:
I/O chunking and latency hiding approach for out-of-core sorting acceleration using GPU and flash NVM. IEEE BigData 2016: 398-403 - [c202]Koji Ueno, Toyotaro Suzumura, Naoya Maruyama, Katsuki Fujisawa, Satoshi Matsuoka:
Extreme scale breadth-first search on supercomputers. IEEE BigData 2016: 1040-1047 - [c201]