Compile rocblas-rocm-6.2.4 under Windows
When I was demonstrating matrix operation acceleration before, I wanted to try AMD’s own ROCm. After compiling and running the program, I encountered an error:
rocBLAS error: Cannot read D:\example\efficiency_v3\rocm\build\Release\/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx1150
 List of available TensileLibrary Files : 
According to the official website, ROCm does not support Radeon 880M integrated graphics (AI H 365w processor)1. Unless you compile rocblas yourself, it will not work.
Matrix multiplication operation (Ⅲ) - using MPI parallel acceleration
MPI is a parallel computing protocol and is currently the most commonly used interface program for high-performance computing clusters. MPI communicates through inter-process messages and can call multiple cores across nodes to perform parallel computing, which is not available in OpenMP. MPI is implemented on different platforms, such as MS-MPI and Intel MPI under Windows, OpenMPI and MPICH under Linux, etc.
1. MPI parallel acceleration loop calculation
1.1 C Implementation
MPI needs to initialize the main program interface and establish a message broadcast mechanism; at the same time, the array to be calculated is divided and broadcast to different processes. In the past, these operations were implemented internally by OpenMP or other parallel libraries, and programmers did not need to care about how the underlying implementation was implemented. However, using MPI requires programmers to manually allocate the global and local space of each process and control the broadcast of each message, which undoubtedly increases the additional learning cost.
