Code :: Andrew Moa Blog Site - Example site for hugo-theme-tailwind

Use pyinstaller to package Windows executable files

The executable program compiled using the Qt dynamic link library will rely on a large number of dynamic link library files. If you want to publish the program written on your local computer to other people’s computers, you need to package and publish the dependent dynamic link library files together with the executable program. There are many ways to package and publish, and the principle is the same. The executable program and the dependent files are packaged and compressed into a separate executable file through a compression program. When the user executes the executable file, it is automatically decompressed and run.

2025-07-22

7 minutes to read

Andrew Moa

Compile rocblas-rocm-6.2.4 under Windows

When I was demonstrating matrix operation acceleration before, I wanted to try AMD’s own ROCm. After compiling and running the program, I encountered an error:

rocBLAS error: Cannot read D:\example\efficiency_v3\rocm\build\Release\/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx1150
 List of available TensileLibrary Files :

According to the official website, ROCm does not support Radeon 880M integrated graphics (AI H 365w processor)¹. Unless you compile rocblas yourself, it will not work.

Code
C++

2025-06-27

12 minutes to read

Andrew Moa

Matrix multiplication operation (Ⅲ) - using MPI parallel acceleration

MPI is a parallel computing protocol and is currently the most commonly used interface program for high-performance computing clusters. MPI communicates through inter-process messages and can call multiple cores across nodes to perform parallel computing, which is not available in OpenMP. MPI is implemented on different platforms, such as MS-MPI and Intel MPI under Windows, OpenMPI and MPICH under Linux, etc.

1. MPI parallel acceleration loop calculation

1.1 C Implementation

MPI needs to initialize the main program interface and establish a message broadcast mechanism; at the same time, the array to be calculated is divided and broadcast to different processes. In the past, these operations were implemented internally by OpenMP or other parallel libraries, and programmers did not need to care about how the underlying implementation was implemented. However, using MPI requires programmers to manually allocate the global and local space of each process and control the broadcast of each message, which undoubtedly increases the additional learning cost.

2025-06-25

34 minutes to read

Andrew Moa