Parallel optimization of a DCT image compression algorithm for resource-constrained platforms
Bibliographic entry
Li, B. Parallel optimization of a DCT image compression algorithm for resource-constrained platforms / B. Li, J. Ma // Новые горизонты – 2025 : сборник материалов XII Белорусско-китайского молодежного инновационного форума, 27–28 ноября 2025 года / Белорусский национальный технический университет. – Минск : БНТУ, 2025. – Т. 1. – С. 84-85.
Abstract
For resource-constrained platforms, this paper presents an 8 × 8 blockbased parallel implementation of the DCT/IDCT. The method decomposes the 2D transform into two 1D transforms, uses a slice-safe preallocated 3D buffer together with block-level parfor parallelism, and avoids contention and copying overhead from temporary arrays. Using a batch of 10 images as an example, the average latency without parallelism is 10.422 s per image. After applying the optimizations proposed here, the average time for parallel DCT compression over ten images is 1.234 s per image – an 8×+ speed-up. The paper analyzes representative outliers and suggests optimization directions. Compared with recent work that emphasizes GPU implementations, this study focuses on reproducible engineering practice on general-purpose CPUs/embedded scenarios, providing a baseline for subsequent SIMD/fixed-point work and end-to-end JPEG optimization.
