Code Efficiency - Parallel For-Loops I

In this semester (summer term 2022), we did introduce a code efficiency lecture to "Introduction to Programming" (IPD). I like to share how easy it is to exploit the number of cores in a computer by using parallel for-loops instead of traditional for-loops when possible. Besides MATLAB, which is used in IPD, other programming languages like C++ and Rust also provide functionalities to implement parallel for-loops with low effort.

In this post, we look into an example problem from chemistry. How can we calculate the temperature of several mixtures that haven been mixed together. Based upon this problem we investigate the parfor-loop in MATLAB. Implementation in other languages, e.g. C++ and/Rust, may be content for another blog post.

Example Problem

The mixed temperature of several materials can be calculated by the richmannschen Mischungsregel that is shown in the following formula:

$$T_m = \frac{\sum m_i \cdot c_i \cdot T_i}{\sum m_i \cdot c_i}$$

As inputs we need the heat capacity $~c_i~$ and the masses $~m_i~$ and the temperatures $~T_i~$. For this, we use two $~KxN~$ matrices, whereby N is the number of mixtures that are calculated and K is the number of mixed materials. The first matrix T consists of the temperatures of the K mixed materials and the second matrix M consists of the masses. The heat capacity of the materials are stored in a vector with K elements. Based on this, we calculate the mixed temperature $~T_m~$ N times. We start by defining our matrices and the heat capacity in the following script:

N = 1000;
%N = 1000000;
K = 4;
c = rand(1, K) * 0.5 + 2;
M = rand(N, K) * 30 + 1;
T = rand(N, K) * 50 + 5;

To summarize the script, we will calculate the mixed temperature 1000 times for mixtures with four materials. For this, we create four random heat capacities in the range of 2.0 and 2.5 and also 1000x4 masses in the range of 1 to 31 and 1000x4 temperatures in the range of 5 to 55.

Implementation

We will use loops for calculation although MATLAB matrice-based operators are well suited to solve the problem without loops. In productive code the application of the operators is recommended. Nevertheless, a loop-based implementation to calculate the richmannsche mixed temperature is defined in the following script:

results = zeros(N,1);
tic
for i=1:N 
%parfor i=1:N
    denominator = M(i,:) .* c(:);
    nominator = denominator(:) .* T(i, :);
    results (i) = sum(nominator) / sum(denominator);
end
fprintf("Duration with n=%i: %g seconds.\n", N, toc);

In the first line we reserve memory to store the results. Then we start a time measurement with tic. In line 3/4 for or parfor loops are applied to calculate the N mixed temperatures. In the loop body, the richmannsche rules is applied, first the denominator and than the denominator is calculated with elementwise matrix operators. In line 7 the results is calculated. Line 9 prints the needed time on the screen.

The keyword parfor commands MATLAB to use data-decomposition. Internally the n calculations are divided in X sections. X is often the number of cores, such that each section calculated approximately n/X mixed temperatures. Sometimes MATLAB is responsible to setup a threadpool and for the needed communication between workers. Sometimes the switch between for and parfor is without effort. For this, requirements of the parfor loop must not be violated. The parfor loops needs integer count variables and neither data dependencies, e.g. access of previous indicies i-1, nor nested parfor loops are allowed. More information are here.

Time Measurements

Now we do time measurements, we start with n=1000.

Duration with n=1000: 0.0061538 seconds. (for-Schleife)
Duration with n=1000: 0.105125 seconds. (parfor-Schleife)

This is a suprise. The parallel variant needs more time than the sequential. The reason is the initial time that MATLAB needs to transfer the data to different threads. We repeat the experiment with n=1000000:

Duration with n=1000000: 4.46868 seconds. (for-Schleife)
Duration with n=1000000: 1.16883 seconds. (parfor-Schleife)

With a huge database the parallelization of the easy to solve problem is meaningful. If the operations in the loop body would be more complex than a pair of additions and multiplications one would see the positive effect of the parallelization with smaller n.

Summary

By the keyword parfor MATLAB supports a fast implementation of parallel for loops based upon data-decomposition. We learned that the initial effort may be big in respect to the problem and that then the application of parallel for loops is not recommended. But for scaled-up problems the parallelization comes with a measurable benefit. Therefore it is great that the application of parallel for loops is possible without much work of the programmer.

Leave a Reply