I added multithreading capability to the Siipi speedtest tool, and it showed surprising results: Looping got almost number of cores times faster, but allocating memory got almost number of cores times slower!
Here's the test with the first part using 4 cores, and 2nd part using 1 core (as before):
Speedtest 1.1 (c) 2009 Siipi
Counting 10 billion floating points using 4 cores...
thread 1 begin=25000.000000
thread 2 begin=50000.000000
Main loop begin=0.000000
thread 3 begin=75000.000000
thread 3 loop ended at 100000.000003.
thread 2 loop ended at 75000.000003.
thread 1 loop ended at 50000.000007.
Main loop Done. i=2500000050, n=25000.000009, time=4.390000s.
Creating and deleting 100 million class objects using 1 cores...
Main loop Done. i=100000000, time=13.313000s.
Total time=17.703000s.
Now the same test with both parts using 4 cores:
Speedtest 1.1 (c) 2009 Siipi
Counting 10 billion floating points using 4 cores...
thread 1 begin=25000.000000
thread 2 begin=50000.000000
Main loop begin=0.000000
thread 3 begin=75000.000000
thread 3 loop ended at 100000.000003.
thread 1 loop ended at 50000.000007.
thread 2 loop ended at 75000.000003.
Main loop Done. i=2500000050, n=25000.000009, time=6.312000s.
Creating and deleting 100 million class objects using 4 cores...
thread 1 begin=25000000
thread 2 begin=50000000
thread 3 begin=75000000
Main loop Done. i=25000000, time=66.188000s.
Total time=72.500000s.
As reference the original test, using no multithreading:
Speedtest 1.0 © 2008 Siipi
Counting 10 billion floating points...
Done. i=1410063201, n=100000.000003, time=15.078000s.
Creating and deleting 100 million class objects...
Done. i=100000000, time=12.890000s.
Total time=27.968000s.
I found also an interesting article where the author claims that multithreading does not speed things up (which is not always true), but rather stops the system from getting blocked (which is always true, as I've experience with the Lucid engine also):
http://www.anomaly.org/wade/blog/2005/08/unintuitive_multithreading_spe.html
|
|