"Performance portable GPU code generation for matrix multiplication."

Toomas Remmelg et al. (2016)
a service of Schloss Dagstuhl - Leibniz Center for Informatics