Усовершенствованный планировщик кооперативного выполнения потоков на многоядерной системе

Карасик, О. Н.; Прихожий, А. А.

Authors

Карасик, О. Н.

Прихожий, А. А.

Date

2017

Publisher

БНТУ

Another Title

Advanced scheduler for cooperative execution of threads on multi-core system

Bibliographic entry

Карасик, О. Н., Прихожий А. А. Усовершенствованный планировщик кооперативного выполнения потоков на многоядерной системе = Advanced scheduler for cooperative execution of threads on multi-core system / О. Н. Карасик, А. А. Прихожий // Системный анализ и прикладная информатика. - 2017. – № 1. - С. 4 - 11.

Abstract

Рассматриваются три архитектуры планировщика кооперативного выполнения потоков в многопоточном приложении, исполняемом на многоядерной системе. Архитектура А0 использует средства взаимодействия и синхронизации потоков, предоставляемые операционной системой. Архитектура А1 вводит новый примитив синхронизации потоков и единую для планировщика очередь заблокированных потоков, благодаря которым уменьшает активность взаимодействия потоков с операционной системой и значительно ускоряет процессы блокировки и разблокировки потоков. Архитектура А2 заменяет единую очередь заблокированных потоков на отдельные очереди для каждого примитива синхронизации и расширяет набор внутренних состояний примитива, уменьшая взаимозависимость потоков планирования и значительно ускоряя процессы блокировки и разблокировки рабочих потоков. Архитектуры планировщика реализованы в операционных системах Windows на базе технологии User Mode Scheduling. Важные экспериментальные результаты получены для многопоточных приложений, реализующих два блочно-параллельных алгоритма решения систем линейных алгебраических уравнений методом Гаусса. Алгоритмы различаются способами распределения данных между потоками и моделями синхронизации потоков. Число потоков варьировалось от 32 до 7936. Архитектура А1 показала ускорение до 8.65%, а архитектура А2 показала ускорение до 11.98 % по сравнению архитектурой А0 на блочно-параллельных алгоритмах с учетом их прямого и обратного хода. На обратном ходе алгоритмов архитектура А1 дала ускорение до 125 %, а архитектура А2 дала ускорение до 413 % по сравнению архитектурой А0. Эксперименты убедительно доказывают, что предлагаемые в статье архитектуры А1 и А2 выигрывают у А0 тем значительнее, чем большее количество блокировок и разблокировок потоков происходит во время выполнения многопоточного приложения.

Abstract in another language

Three architectures of the cooperative thread scheduler in a multithreaded application that is executed on a multi-core system are considered. Architecture A0 is based on the synchronization and scheduling facilities, which are provided by the operating system. Architecture A1 introduces a new synchronization primitive and a single queue of the blocked threads in the scheduler, which reduces the interaction activity between the threads and operating system, and significantly speed up the processes of blocking and unblocking the threads. Architecture A2 replaces the single queue of blocked threads with dedicated queues, one for each of the synchronizing primitives, extends the number of internal states of the primitive, reduces the inter- dependence of the scheduling threads, and further significantly speeds up the processes of blocking and unblocking the threads. All scheduler architectures are implemented on Windows operating systems and based on the User Mode Scheduling. Important experimental results are obtained for multithreaded applications that implement two blocked parallel algorithms of solving the linear algebraic equation systems by the Gaussian elimination. The algorithms differ in the way of the data distri- bution among threads and by the thread synchronization models. The number of threads varied from 32 to 7936. Architecture A1 shows the acceleration of up to 8.65% and the architecture A2 shows the acceleration of up to 11.98% compared to A0 architecture for the blocked parallel algorithms computing the triangular form and performing the back substitution. On the back substitution stage of the algorithms, architecture A1 gives the acceleration of up to 125%, and architecture A2 gives the acceleration of up to 413% compared to architecture A0. The experiments clearly show that the proposed architectures, A1 and A2 outperform A0 depending on the number of thread blocking and unblocking operations, which happen during the execution of multi-threaded applications. The conducted computational experiments demonstrate the improvement of parameters of multithreaded applications on a heterogeneous multi-core system due the proposed advanced versions of the thread scheduler.