Programming soup: Real Boost.multi

poniedziałek, 6 kwietnia 2015

Real Boost.multi_index performance

Update 09.05.2015: thanks to Joaquín it turned out that I had slight bug in my plotting script - x tick labels were wrong. All pictures have been fixed.

Several months ago I prepared and conducted a presentation about multi_index library from Boost at Nokia TechMeetUp. Essentially I showed that it is pretty useful library that should be used when there's a need to represent the same data with multiple "views". For instance, if we have to access the same list of something sorted by different attributes, Boost.MultiIndex seems to be very good choice as the documentation claims strong consistency and, what's also important, better performance. What's also interesting is that Facebook's library - folly - utilizes that in its TimeoutQueue class.

In the middle of the presentation I put some performance graphs which I mindlessly copied them from the documentation. Had I known that they are old, I'd never do that. Fortunately it was pointed out by friend of mine. These graphs in fact illustrate performance gains, but measurements were made with ancient compiler versions like GCC 3.4.5.

After the presentation I promised myself that I'll investigate this a bit, but as usual time didn't permit. Till now.

I made measurements with newer versions of GCC and Clang. Unfortunately I have no access to Visual Studio. If you can provide me VS stats I'd be pleased, though.

Here are my results for:

CPU: i5-3320M

RAM: 8GB

OS: Debian 7

Compilation flags: -O3

Boost version: 1.57

Images from documentation are placed at the right side.

In the case of ordered index it's visible that either standard containers were improved or compilers learned how to optimize code using them better. Nevertheless, there is visible trend showing that with more than 10⁶ entries, multi_index container could perform better than naive approach.

In case of two indexes - one indexed and the second sequenced situation changes. In the original chart from the documentation we can observe huge improvement in favor to multi_index starting from 10⁵ elements. My results are quite different - there is no performance gain with multi_index. For 10⁶ elements the difference is dramatic comparing the two diagrams: ~130% and ~30%.

Interestingly, on my machine multi_index performed better in case of only 1 sequenced index. This is surprise for me, because in the documentation it is opposite. Well, in this scenario multi_index outperformed naive approach.
Rest of diagrams are presented below. As you can see they illustrate that multi_index documentation is outdated. Sometimes there are some performance gains, but not that huge as presented in diagrams from documentation.

Undoubtedly this deserves deeper investigation. Especially it should be discovered why something that had to perform better stopped to do so.
Unfortunately currently I have no time to dig in into this. (Maybe in next days I'll have some time to check how cache-friendly multi_index library is). Nevertheless, everything that I showed pushes me even harder into opinion that everything should be measured - even 3rd-party things with a lot of promises in the documentation.

image source

7 komentarzy:

Joaquín M López Muñoz6 maja 2015 23:35
Hi Sławomir,

I'd like to know more about your results, as they don't seem to be on line with what I'd expect. I've run the test on the same machine (Windows 7, i5-2520M @2.5GHz, 4GB RAM) using different g++ and Boost versions:

A: g++ (GCC) 3.4.4 (cygming special), Boost 1.34 (May 2007)

1 ordered index
10^3 elmts: 110.12% ( 0.59 ms / 0.53 ms)
10^4 elmts: 115.78% ( 6.21 ms / 5.36 ms)
10^5 elmts: 112.63% ( 69.64 ms / 61.83 ms)
space gain: 80.00%
1 sequenced index
10^3 elmts: 100.21% ( 0.46 ms / 0.46 ms)
10^4 elmts: 96.37% ( 4.64 ms / 4.81 ms)
10^5 elmts: 100.58% ( 46.53 ms / 46.26 ms)
space gain: 100.00%
2 ordered indices
10^3 elmts: 60.17% ( 0.69 ms / 1.15 ms)
10^4 elmts: 72.24% ( 8.33 ms / 11.53 ms)
10^5 elmts: 75.94% (101.96 ms / 134.27 ms)
space gain: 70.00%
1 ordered index + 1 sequenced index
10^3 elmts: 57.40% ( 0.59 ms / 1.03 ms)
10^4 elmts: 62.12% ( 6.49 ms / 10.45 ms)
10^5 elmts: 66.82% ( 75.62 ms / 113.17 ms)
space gain: 75.00%
3 ordered indices
10^3 elmts: 46.93% ( 0.79 ms / 1.68 ms)
10^4 elmts: 52.33% ( 8.91 ms / 17.02 ms)
10^5 elmts: 60.13% (117.43 ms / 195.28 ms)
space gain: 66.67%
2 ordered indices + 1 sequenced index
10^3 elmts: 43.53% ( 0.70 ms / 1.60 ms)
10^4 elmts: 43.83% ( 7.45 ms / 16.99 ms)
10^5 elmts: 43.25% ( 92.01 ms / 212.77 ms)
space gain: 69.23%

B: g++ (GCC) 3.4.4 (cygming special), Boost 1.58 (April 2015)

1 ordered index
10^3 elmts: 108.30% ( 0.58 ms / 0.54 ms)
10^4 elmts: 112.31% ( 6.12 ms / 5.45 ms)
10^5 elmts: 112.74% ( 70.07 ms / 62.15 ms)
space gain: 80.00%
1 sequenced index
10^3 elmts: 101.40% ( 0.47 ms / 0.46 ms)
10^4 elmts: 101.07% ( 4.65 ms / 4.60 ms)
10^5 elmts: 102.15% ( 46.94 ms / 45.95 ms)
space gain: 100.00%
2 ordered indices
10^3 elmts: 63.36% ( 0.70 ms / 1.10 ms)
10^4 elmts: 68.48% ( 8.13 ms / 11.87 ms)
10^5 elmts: 76.46% (104.27 ms / 136.38 ms)
space gain: 70.00%
1 ordered index + 1 sequenced index
10^3 elmts: 53.41% ( 0.60 ms / 1.12 ms)
10^4 elmts: 61.96% ( 6.56 ms / 10.59 ms)
10^5 elmts: 68.55% ( 76.93 ms / 112.22 ms)
space gain: 75.00%
3 ordered indices
10^3 elmts: 46.71% ( 0.79 ms / 1.68 ms)
10^4 elmts: 53.25% ( 9.10 ms / 17.08 ms)
10^5 elmts: 60.52% (119.69 ms / 197.76 ms)
space gain: 66.67%
2 ordered indices + 1 sequenced index
10^3 elmts: 44.04% ( 0.70 ms / 1.58 ms)
10^4 elmts: 43.08% ( 7.36 ms / 17.09 ms)
10^5 elmts: 46.10% ( 91.17 ms / 197.78 ms)
space gain: 69.23%

C: g++ (GCC) 4.8.1, Boost 1.58 (April 2015)

1 ordered index
10^3 elmts: 100.81% ( 0.17 ms / 0.17 ms)
10^4 elmts: 98.71% ( 1.96 ms / 1.99 ms)
10^5 elmts: 88.64% ( 22.18 ms / 25.02 ms)
space gain: 80.00%
1 sequenced index
10^3 elmts: 90.37% ( 0.08 ms / 0.09 ms)
10^4 elmts: 90.31% ( 0.78 ms / 0.87 ms)
10^5 elmts: 90.84% ( 8.34 ms / 9.18 ms)
space gain: 100.00%
2 ordered indices
10^3 elmts: 71.05% ( 0.26 ms / 0.37 ms)
10^4 elmts: 75.69% ( 3.01 ms / 3.98 ms)
10^5 elmts: 68.74% ( 35.52 ms / 51.67 ms)
space gain: 70.00%
1 ordered index + 1 sequenced index
10^3 elmts: 55.04% ( 0.16 ms / 0.29 ms)
10^4 elmts: 62.44% ( 1.93 ms / 3.09 ms)
10^5 elmts: 69.11% ( 24.11 ms / 34.89 ms)
space gain: 75.00%
3 ordered indices
10^3 elmts: 64.47% ( 0.35 ms / 0.54 ms)
10^4 elmts: 68.54% ( 4.04 ms / 5.89 ms)
10^5 elmts: 69.46% ( 48.26 ms / 69.49 ms)
space gain: 66.67%
2 ordered indices + 1 sequenced index
10^3 elmts: 56.58% ( 0.27 ms / 0.48 ms)
10^4 elmts: 59.80% ( 3.14 ms / 5.25 ms)
10^5 elmts: 60.27% ( 38.38 ms / 63.69 ms)
space gain: 69.23%

(continued in next post)
OdpowiedzUsuń
Odpowiedzi
Joaquín M López Muñoz6 maja 2015 23:35
(continues from previous post)

So: A and B were run with a fairly old compiler (3.4.4, 2005) using Boost 1.34 (2007) and Boost 1.58 (current, 2015), respectively. Results are basically the same, which shows that there's been no degradation in Boost.MultiIndex code per se. C, on the other hand, uses GCC 4.8.1 (2013) with Boost 1.58: *relative* performance of Boost.MultiIndex is somewhat poorer (but nowhere as poor as you got), but please note that overall absolute timings, when compared with B, are 2.5-5x times faster! So, my interpretation is that compilers have gotten much better (improved overall performance) which also results, incidentally, in a reduction in Boost.MultiIndex relative advantage.

Please contact me back if you'd like to further analyze your results.

Best regards,
OdpowiedzUsuń
Odpowiedzi
Joaquín M López Muñoz9 maja 2015 02:53
It'll take me some days to grab a copy of Clang and do that test. In the meantime, I understand you've changed the test code from the original in Boost.MultiIndex docs as n ranges between 10^4 and 10^6. Can you upload the source code you used? Thank you.
OdpowiedzUsuń
Odpowiedzi
Joaquín M López Muñoz9 maja 2015 05:48
Can you post the program output? A look at absolute timings might shed some light, I think. Thank you
OdpowiedzUsuń
Odpowiedzi

Dodaj komentarz