2011年11月2日水曜日

MKL parallel execution benchmark (part 2)

Using the same code, but for 1024x1024 real symmetric matrices.

The efficiency is better for the larger matrices, but still not very good at this size.
(Still the calculation is somewhat faster with more threads.)

さっきと同じで、32x32 の系について
(1024x1024 行列の対角化)

(予想されるように)行列のサイズが大きくなると並列化の効率は向上する。
このサイズでは並列度を増やすとそれなりに時間の短縮になるが、効率はあまり良くない


**** smp 1 ****

実行時間: 6811秒
Command being timed: "./a.out"
User time (seconds): 6809.87
System time (seconds): 0.02
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:53:31

**** smp 2 ****

実行時間: 4559秒
Command being timed: "./a.out"
User time (seconds): 8988.00
System time (seconds): 52.54
Percent of CPU this job got: 198%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:15:59

**** smp 4 ****

実行時間: 3358秒
Command being timed: "./a.out"
User time (seconds): 13043.14
System time (seconds): 153.09
Percent of CPU this job got: 392%
Elapsed (wall clock) time (h:mm:ss or m:ss): 55:57.95

**** smp 8 ****

実行時間: 2897秒
Command being timed: "./a.out"
User time (seconds): 22270.76
System time (seconds): 360.07
Percent of CPU this job got: 781%
Elapsed (wall clock) time (h:mm:ss or m:ss): 48:17.03

0 件のコメント:

コメントを投稿