{"id":439,"date":"2025-03-25T18:29:00","date_gmt":"2025-03-25T18:29:00","guid":{"rendered":"https:\/\/permutationcity.co.uk\/bp\/?p=439"},"modified":"2025-03-27T17:40:30","modified_gmt":"2025-03-27T17:40:30","slug":"bits-and-bools","status":"publish","type":"post","link":"https:\/\/permutationcity.co.uk\/bp\/2025\/03\/25\/bits-and-bools\/","title":{"rendered":"Bits and bools"},"content":{"rendered":"\n<p>One of my friends was wondering about the performance difference between large representations of bools,\nsay bytes, and packed representations of bools, individual bits.\nAccessing a byte is easier, requiring fewer instructions, but is going to fill up your processor&#8217;s cache.\nAccessing a bit is harder, requiring more instructions, but you can fit more in.\nIf you&#8217;re going down a\n<a href=\"https:\/\/permutationcity.co.uk\/bp\/2024\/04\/09\/data-oriented-design\/\">Data-oriented Design<\/a>\nroute which should you choose?\nI&#8217;ve looked at bit manipulation before when talking about the\n<a href=\"https:\/\/permutationcity.co.uk\/bp\/2024\/08\/06\/journey-destination-and-story\/\">journey, destination and story<\/a>\nof code&#8217;s development.\nI thought this would be an easy investigation.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"accessing-bits\">Accessing bits<\/h2>\n\n\n\n<p>If we want to compress bools into a smaller space that probably means a collection of unsigned integers.\nTaking an index for which bool we want that has to be used to determine which integer and then which bit.\nThere are options to consider:\nwhat unsigned integer to use,\nwhat collection to use.\nI can use some templates in my testing although a final implementation might pick specific types.<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;clike&quot;,&quot;mime&quot;:&quot;text\/x-c++src&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;C++&quot;,&quot;language&quot;:&quot;C++&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;cpp&quot;}\">template &lt;typename Collection&gt;\nclass BitCollection\n{\npublic:\n    using Value = Collection::value_type;\n\n    BitCollection(Collection&amp; values) :\n        m_Values(values) {\n    };\n\n    bool GetBit(size_t index) const {\n        auto offsets = IndexToOffsets(index);\n        auto&amp; value = m_Values[offsets.Value];\n        auto mask = GetBitMask&lt;Value&gt;(offsets.Bit);\n        return value &amp; mask;\n    }\n\n    void SetBit(size_t index, bool bit) {\n        auto offsets = IndexToOffsets(index);\n        auto&amp; value = m_Values[offsets.Value];\n        auto mask = GetBitMask&lt;Value&gt;(offsets.Bit);\n        if (bit) {\n            value |= mask;\n        }\n        else {\n            value &amp;= ~mask;\n        }\n    }\n\nprivate:\n    Collection&amp; m_Values;\n\n    struct Offsets {\n        size_t Value;\n        uint8_t Bit;\n    };\n\n    static Offsets IndexToOffsets(size_t index) {\n        auto bitCount = GetBitCount&lt;Value&gt;();\n        return { index \/ bitCount, static_cast&lt;uint8_t&gt;(index % bitCount) };\n    }\n};<\/pre><\/div>\n\n\n\n<p>There are three basic operations here: checking, setting and clearing a bool.\nThey all have the same setup so we are doing the same thing a lot.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"how-fast\">How fast?<\/h2>\n\n\n\n<p>Let&#8217;s start by considering\n<a href=\"https:\/\/quick-bench.com\/q\/ovnxfbVD_mscwNsVL1nqr60klHg\">which underlying integer<\/a>\nto use.\nThey&#8217;re all of a similar speed but it does make a difference.\n<code>uint64_t<\/code> and particularly <code>uint32_t<\/code> are better.\nQuick Bench shows the assembly as well.\nTo me it looks like the 32-bit code is a bit more compact.\nMaybe the instruction set or the optimiser is better suited for that.<\/p>\n\n\n\n<p>Now we can look at how the different algorithms do.\nI&#8217;m going to compare my code to\n<a href=\"https:\/\/en.cppreference.com\/w\/cpp\/container\/array\"><code>std::array<\/code><\/a> and\n<a href=\"https:\/\/en.cppreference.com\/w\/cpp\/container\/vector\"><code>std::vector<\/code><\/a>\nusing <code>bool<\/code> but also\n<a href=\"https:\/\/en.cppreference.com\/w\/cpp\/utility\/bitset\"><code>std::bitset<\/code><\/a>\nwhich has a compact representation.\n<a href=\"https:\/\/quick-bench.com\/q\/21Z3jNhe7mrdCkyuoOcO1fsOyu8\">Which algorithm<\/a>\nis best?<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Everything is a similar order of magnitude.<\/li>\n\n\n\n<li>A array of bools is fast.<\/li>\n\n\n\n<li>A vector of bools is slower. I would have thought it should have the same access patterns. What&#8217;s slowing it down?<\/li>\n\n\n\n<li>The standard bitset is the same as the vector. The extra bit manipulation takes some time but not too much.<\/li>\n\n\n\n<li>My <code>BitCollection&lt;std::array&lt;uint32_t>><\/code> is the same as the standard bitset. We are probably doing similar things under the hood.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"what-about-the-cache\">What about the cache?<\/h2>\n\n\n\n<p>At first glance <code>std::array&lt;bool><\/code> is the winner. This gives you the easiest access to each of the bools <em>and<\/em> is faster. The main limitation is that this type has a fixed size determined at compile time. If you&#8217;re sure of your maximum size and don&#8217;t mind wasting spaces it&#8217;s great. If those are problems then the <code>std::vector&lt;bool><\/code> is a good runner up. Similarly easy access and matches the speed of the other options.<\/p>\n\n\n\n<p>For these test I even chose a large collection size.\nThat&#8217;s more likely to blow out the cache and give an advantage to a compact representation.\nDoes the\n<a href=\"https:\/\/quick-bench.com\/q\/ejGYtPRDDfT9oFYnY1B3vPOnqOI\">cache make a difference<\/a>?\nI had to hunt for it but it does.\nWhen we get up to 8M bools things start slowing down.\nI&#8217;d go higher but Quick Bench starts timing out.\nSo how does that\n<a href=\"https:\/\/quick-bench.com\/q\/KVgF1HGdw2REqriZOS99rCjMHLM\">affect our algorithm<\/a>?\nBasic bools are still the winner.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"on-balance\">On balance<\/h2>\n\n\n\n<p>The idea of getting some savings by packing things in tightly is nice but\ndidn&#8217;t pan out in this situation.\nWhile bit manipulation isn&#8217;t really slow it does take a little more time <em>on these tests<\/em>.\nDifferent setups have different properties and that might be enough to make a difference.\nMaybe you&#8217;re on a machine with a small cache or\nmaybe you need to transfer data between systems.\nIf speed is critically important then it could be worth giving it a go.<\/p>\n\n\n\n<p>I&#8217;d say a more significant issue is the extra complexity.\nWhile we can package all of this up in a handy class it makes for more layers.\nTrying to investigate or debug code where everything has been packed makes it harder to inspect anything.\nIn this case\n<a href=\"https:\/\/permutationcity.co.uk\/bp\/2024\/02\/06\/what-is-simple-anyway\/\">keep it simple<\/a>\nmakes sense.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of my friends was wondering about the performance difference between large representations of bools, say bytes, and packed representations of bools, individual bits. Accessing a byte is easier, requiring fewer instructions, but is going to fill up your processor&#8217;s cache. Accessing a bit is harder, requiring more instructions, but you can fit more in. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_import_markdown_pro_load_document_selector":0,"_import_markdown_pro_submit_text_textarea":"","footnotes":""},"categories":[1],"tags":[22,7],"class_list":["post-439","post","type-post","status-publish","format-standard","hentry","category-general","tag-algorithms","tag-performance"],"_links":{"self":[{"href":"https:\/\/permutationcity.co.uk\/bp\/wp-json\/wp\/v2\/posts\/439","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/permutationcity.co.uk\/bp\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/permutationcity.co.uk\/bp\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/permutationcity.co.uk\/bp\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/permutationcity.co.uk\/bp\/wp-json\/wp\/v2\/comments?post=439"}],"version-history":[{"count":3,"href":"https:\/\/permutationcity.co.uk\/bp\/wp-json\/wp\/v2\/posts\/439\/revisions"}],"predecessor-version":[{"id":442,"href":"https:\/\/permutationcity.co.uk\/bp\/wp-json\/wp\/v2\/posts\/439\/revisions\/442"}],"wp:attachment":[{"href":"https:\/\/permutationcity.co.uk\/bp\/wp-json\/wp\/v2\/media?parent=439"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/permutationcity.co.uk\/bp\/wp-json\/wp\/v2\/categories?post=439"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/permutationcity.co.uk\/bp\/wp-json\/wp\/v2\/tags?post=439"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}