Matrix multiplication operation (Ⅱ) - Accelerated operation based on BLAS library
BLAS was originally developed as a linear algebra library using Fortran, and was later ported to C/C++. As a core component of modern high-performance computing, it has formed a set of standards. There are open source implementations such as Netlib BLAS, GotoBLAS and its successor OpenBLAS. Commercially, each manufacturer has corresponding implementations for its own platform, such as Intel’s MKL, NVIDIA’s CUDA, AMD’s AOCL and ROCm. Some of them are optimized for CPU platforms, and some use GPU parallel acceleration. This article uses different BLAS libraries to implement matrix operations and analyzes the performance differences between different implementations.
23 minutes to read
Andrew Moa