Gör så här: Använd parallel_invoke för att skriva en parallell sorteringsrutin

2025-06-19

Det här dokumentet beskriver hur du använder parallel_invoke-algoritmen för att förbättra prestanda för bitonisk sorteringsalgoritm. Den bitoniska sorteringsalgoritmen delar rekursivt indatasekvensen i mindre sorterade partitioner. Den bitoniska sorteringsalgoritmen kan köras parallellt eftersom varje partitionsåtgärd är oberoende av alla andra åtgärder.

Även om den bitoniska sorteringen är ett exempel på ett sorteringsnätverk som sorterar alla kombinationer av indatasekvenser, sorterar det här exemplet sekvenser vars längd är två.

Anmärkning

I det här exemplet används en parallell sorteringsrutin för illustration. Du kan också använda de inbyggda sorteringsalgoritmerna som PPL erbjuder: concurrency::parallel_sort, concurrency::parallel_buffered_sort och concurrency::parallel_radixsort. Mer information finns i parallella algoritmer.

Sektioner

I det här dokumentet beskrivs följande uppgifter:

Utföra bitonisk sortering seriellt
Använda parallel_invoke för att utföra bitonisk sortering parallellt

Utföra bitonisk sortering sekventiellt

I följande exempel visas serieversionen av bitonisk sorteringsalgoritm. Funktionen bitonic_sort delar upp sekvensen i två partitioner, sorterar dessa partitioner i motsatta riktningar och sammanfogar sedan resultatet. Den här funktionen anropar sig själv två gånger rekursivt för att sortera varje partition.

const bool INCREASING = true;
const bool DECREASING = false;

// Comparator function for the bitonic sort algorithm.
template <class T>
void compare(T* items, int i, int j, bool dir)
{
   if (dir == (items[i] > items[j]))
   {
      swap(items[i], items[j]);
   }
}

// Sorts a bitonic sequence in the specified order.
template <class T>
void bitonic_merge(T* items, int lo, int n, bool dir)
{
   if (n > 1)
   {
      int m = n / 2;
      for (int i = lo; i < lo + m; ++i)
      {
         compare(items, i, i + m, dir);
      }
      bitonic_merge(items, lo, m, dir);
      bitonic_merge(items, lo + m, m, dir);
   }
}

// Sorts the given sequence in the specified order.
template <class T>
void bitonic_sort(T* items, int lo, int n, bool dir)
{
   if (n > 1)
   {
      // Divide the array into two partitions and then sort 
      // the partitions in different directions.
      int m = n / 2;
      bitonic_sort(items, lo, m, INCREASING);
      bitonic_sort(items, lo + m, m, DECREASING);

      // Merge the results.
      bitonic_merge(items,lo, n, dir);
   }
}

// Sorts the given sequence in increasing order.
template <class T>
void bitonic_sort(T* items, int size)
{
    bitonic_sort(items, 0, size, INCREASING);
}

[Topp]

Använda parallel_invoke för att utföra bitonisk sortering parallellt

I det här avsnittet beskrivs hur du använder algoritmen parallel_invoke för att utföra den bitoniska sorteringsalgoritmen parallellt.

Så här utför du den bitoniska sorteringsalgoritmen parallellt

Lägg till ett #include direktiv för sidhuvudfilen ppl.h.
```
#include <ppl.h>
```
Lägg till ett using direktiv för concurrency namnområdet.
```
using namespace concurrency;
```

Skapa en ny funktion med namnet parallel_bitonic_mege, som använder algoritmen parallel_invoke för att sammanfoga sekvenserna parallellt om det finns tillräckligt med arbete att göra. Annars anropar du bitonic_merge för att sammanfoga sekvenserna seriellt.

// Sorts a bitonic sequence in the specified order.
template <class T>
void parallel_bitonic_merge(T* items, int lo, int n, bool dir)
{   
   // Merge the sequences concurrently if there is sufficient work to do.
   if (n > 500)
   {
      int m = n / 2;
      for (int i = lo; i < lo + m; ++i)
      {
         compare(items, i, i + m, dir);
      }

      // Use the parallel_invoke algorithm to merge the sequences in parallel.
      parallel_invoke(
         [&items,lo,m,dir] { parallel_bitonic_merge(items, lo, m, dir); },
         [&items,lo,m,dir] { parallel_bitonic_merge(items, lo + m, m, dir); }
      );
   }
   // Otherwise, perform the work serially.
   else if (n > 1)
   {
      bitonic_merge(items, lo, n, dir);
   }   
}

Utför en process som liknar den i föregående steg, men för bitonic_sort funktionen.

// Sorts the given sequence in the specified order.
template <class T>
void parallel_bitonic_sort(T* items, int lo, int n, bool dir)
{   
   if (n > 1)
   {
      // Divide the array into two partitions and then sort 
      // the partitions in different directions.
      int m = n / 2;

      // Sort the partitions in parallel.
      parallel_invoke(
         [&items,lo,m] { parallel_bitonic_sort(items, lo, m, INCREASING); },
         [&items,lo,m] { parallel_bitonic_sort(items, lo + m, m, DECREASING); }
      );
      
      // Merge the results.
      parallel_bitonic_merge(items, lo, n, dir);
   }
}

Skapa en överlagrad version av parallel_bitonic_sort funktionen som sorterar matrisen i ökande ordning.

// Sorts the given sequence in increasing order.
template <class T>
void parallel_bitonic_sort(T* items, int size)
{
   parallel_bitonic_sort(items, 0, size, INCREASING);
}

Algoritmen parallel_invoke minskar kostnaderna genom att utföra den sista av serien med uppgifter i anropskontexten. I funktionen körs till exempel parallel_bitonic_sort den första aktiviteten i en separat kontext och den andra aktiviteten körs i den anropande kontexten.

// Sort the partitions in parallel.
parallel_invoke(
   [&items,lo,m] { parallel_bitonic_sort(items, lo, m, INCREASING); },
   [&items,lo,m] { parallel_bitonic_sort(items, lo + m, m, DECREASING); }
);

I följande fullständiga exempel utförs både serieversionen och de parallella versionerna av den bitoniska sorteringsalgoritmen. Exemplet skriver också ut till konsolen den tid som krävs för att utföra varje beräkning.

// parallel-bitonic-sort.cpp
// compile with: /EHsc
#include <windows.h>
#include <algorithm>
#include <iostream>
#include <random>
#include <ppl.h>

using namespace concurrency;
using namespace std;

// Calls the provided work function and returns the number of milliseconds 
// that it takes to call that function.
template <class Function>
__int64 time_call(Function&& f)
{
   __int64 begin = GetTickCount();
   f();
   return GetTickCount() - begin;
}

const bool INCREASING = true;
const bool DECREASING = false;

// Comparator function for the bitonic sort algorithm.
template <class T>
void compare(T* items, int i, int j, bool dir)
{
   if (dir == (items[i] > items[j]))
   {
      swap(items[i], items[j]);
   }
}

// Sorts a bitonic sequence in the specified order.
template <class T>
void bitonic_merge(T* items, int lo, int n, bool dir)
{
   if (n > 1)
   {
      int m = n / 2;
      for (int i = lo; i < lo + m; ++i)
      {
         compare(items, i, i + m, dir);
      }
      bitonic_merge(items, lo, m, dir);
      bitonic_merge(items, lo + m, m, dir);
   }
}

// Sorts the given sequence in the specified order.
template <class T>
void bitonic_sort(T* items, int lo, int n, bool dir)
{
   if (n > 1)
   {
      // Divide the array into two partitions and then sort 
      // the partitions in different directions.
      int m = n / 2;
      bitonic_sort(items, lo, m, INCREASING);
      bitonic_sort(items, lo + m, m, DECREASING);

      // Merge the results.
      bitonic_merge(items,lo, n, dir);
   }
}

// Sorts the given sequence in increasing order.
template <class T>
void bitonic_sort(T* items, int size)
{
    bitonic_sort(items, 0, size, INCREASING);
}

// Sorts a bitonic sequence in the specified order.
template <class T>
void parallel_bitonic_merge(T* items, int lo, int n, bool dir)
{   
   // Merge the sequences concurrently if there is sufficient work to do.
   if (n > 500)
   {
      int m = n / 2;
      for (int i = lo; i < lo + m; ++i)
      {
         compare(items, i, i + m, dir);
      }

      // Use the parallel_invoke algorithm to merge the sequences in parallel.
      parallel_invoke(
         [&items,lo,m,dir] { parallel_bitonic_merge(items, lo, m, dir); },
         [&items,lo,m,dir] { parallel_bitonic_merge(items, lo + m, m, dir); }
      );
   }
   // Otherwise, perform the work serially.
   else if (n > 1)
   {
      bitonic_merge(items, lo, n, dir);
   }   
}

// Sorts the given sequence in the specified order.
template <class T>
void parallel_bitonic_sort(T* items, int lo, int n, bool dir)
{   
   if (n > 1)
   {
      // Divide the array into two partitions and then sort 
      // the partitions in different directions.
      int m = n / 2;

      // Sort the partitions in parallel.
      parallel_invoke(
         [&items,lo,m] { parallel_bitonic_sort(items, lo, m, INCREASING); },
         [&items,lo,m] { parallel_bitonic_sort(items, lo + m, m, DECREASING); }
      );
      
      // Merge the results.
      parallel_bitonic_merge(items, lo, n, dir);
   }
}

// Sorts the given sequence in increasing order.
template <class T>
void parallel_bitonic_sort(T* items, int size)
{
   parallel_bitonic_sort(items, 0, size, INCREASING);
}

int wmain()
{  
   // For this example, the size must be a power of two.
   const int size = 0x200000;

   // Create two large arrays and fill them with random values.
   int* a1 = new int[size];
   int* a2 = new int[size];

   mt19937 gen(42);
   for(int i = 0; i < size; ++i)
   {
      a1[i] = a2[i] = gen();
   }

   __int64 elapsed;
   
   // Perform the serial version of the sort.
   elapsed = time_call([&] { bitonic_sort(a1, size); });
   wcout << L"serial time: " << elapsed << endl;

   // Now perform the parallel version of the sort.
   elapsed = time_call([&] { parallel_bitonic_sort(a2, size); });
   wcout << L"parallel time: " << elapsed << endl;
 
   delete[] a1;
   delete[] a2;
}

Följande exempelutdata är för en dator som har fyra processorer.

serial time: 4353
parallel time: 1248

[Topp]

Kompilera koden

Om du vill kompilera koden kopierar du den och klistrar sedan in den i ett Visual Studio-projekt eller klistrar in den i en fil med namnet parallel-bitonic-sort.cpp och kör sedan följande kommando i ett Visual Studio-kommandotolkfönster.

cl.exe /EHsc parallel-bitonic-sort.cpp

Robust Programmering

I det här exemplet används algoritmen parallel_invoke i stället för samtidighet::task_group-klassen eftersom livslängden för varje aktivitetsgrupp inte sträcker sig bortom en funktion. Vi rekommenderar att du använder parallel_invoke när du kan eftersom det har mindre exekveringskostnad än task group objekt, och därför kan du skriva mer välpresterande kod.

Parallella versioner av vissa algoritmer presterar bara bättre när det finns tillräckligt med arbete att göra. Funktionen anropar till exempel parallel_bitonic_merge serieversionen, bitonic_merge, om det finns 500 eller färre element i sekvensen. Du kan också planera din övergripande sorteringsstrategi baserat på mängden arbete. Det kan till exempel vara mer effektivt att använda serieversionen av snabbsorteringsalgoritmen om matrisen innehåller färre än 500 objekt, vilket visas i följande exempel:

template <class T>
void quick_sort(T* items, int lo, int n)
{
   // TODO: The function body is omitted for brevity.
}

template <class T>
void parallel_bitonic_sort(T* items, int lo, int n, bool dir)
{
   // Use the serial quick sort algorithm if there are relatively few
   // items to sort. The associated overhead for running few tasks in 
   // parallel may not overcome the benefits of parallel processing.
   if (n - lo + 1 <= 500)
   {
      quick_sort(items, lo, n);
   }
   else if (n > 1)
   {
      // Divide the array into two partitions and then sort 
      // the partitions in different directions.
      int m = n / 2;

      // Sort the partitions in parallel.
      parallel_invoke(
         [&items,lo,m] { parallel_bitonic_sort(items, lo, m, INCREASING); },
         [&items,lo,m] { parallel_bitonic_sort(items, lo + m, m, DECREASING); }
      );
      
      // Merge the results.
      parallel_bitonic_merge(items, lo, n, dir);
   }
}

Precis som med alla parallella algoritmer rekommenderar vi att du profilerar och justerar koden efter behov.

Se även

Uppgiftsparallellitet
parallel_invoke funktion

Feedback

Var den här sidan till hjälp?