{"id":402,"date":"2024-10-22T17:49:39","date_gmt":"2024-10-22T17:49:39","guid":{"rendered":"https:\/\/permutationcity.co.uk\/bp\/?p=402"},"modified":"2024-10-23T15:02:58","modified_gmt":"2024-10-23T15:02:58","slug":"reorganising-data","status":"publish","type":"post","link":"https:\/\/permutationcity.co.uk\/bp\/2024\/10\/22\/reorganising-data\/","title":{"rendered":"Reorganising data"},"content":{"rendered":"\n<p>I&#8217;ve talked about\n<a href=\"https:\/\/permutationcity.co.uk\/bp\/2024\/04\/09\/data-oriented-design\/\">data-oriented design<\/a>\nbefore.\nIt looks at structuring things to remove unnecessary data,\nkeep cache refreshes down and thereby get faster code.\nCompilers are pretty good nowadays at optimising code.\nThat is they can re-order code, inline functions and unroll loops but\nthey&#8217;re not going to restructure your data for you.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"packing-booleans\">Packing booleans<\/h2>\n\n\n\n<p>Here&#8217;s a contrived structure:<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;clike&quot;,&quot;mime&quot;:&quot;text\/x-c++src&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;C++&quot;,&quot;language&quot;:&quot;C++&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;cpp&quot;}\">struct Test {\n    bool B0;\n    bool B1;\n    ...\n};<\/pre><\/div>\n\n\n\n<p>I had a look and my compiler laid the data out at one byte per boolean.\nGiven that simple layout I&#8217;m going to use\n<a href=\"https:\/\/en.cppreference.com\/w\/cpp\/container\/array\"><code>std::array<\/code><\/a>\nwhich will do the same thing but is easier to test.\nI then compared it to\n<a href=\"https:\/\/en.cppreference.com\/w\/cpp\/utility\/bitset\"><code>std::bitset<\/code><\/a>\nwhich uses one bit per boolean,\nplus a bit of wasted space at the end.<\/p>\n\n\n\n<p>I made a little test to repeatedly scan through data:<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;clike&quot;,&quot;mime&quot;:&quot;text\/x-c++src&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;C++&quot;,&quot;language&quot;:&quot;C++&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;cpp&quot;}\">    bool accumulator = false;\n    for (uint32_t i = 0; i &lt; s_Count; i++) {\n      for (uint32_t j = 0; j &lt; s_Size; j++) {\n        accumulator ^= values[j];\n      }\n    }<\/pre><\/div>\n\n\n\n<p>With the array it will have to access a byte which is easy to do but takes up more space.\nWith the bitset it will have to access a bit which is harder to do but takes up less space.\nWhich is faster?\nNeither, they <a href=\"https:\/\/quick-bench.com\/q\/_O_sl7hAeCwGrNX6AS_Xyu9W8f4\">neck and neck<\/a>.<\/p>\n\n\n\n<p>However if you look at the assembly for the array code it&#8217;s huge.\nIt&#8217;s done a lot of unrolling.\nThe bitset version takes less data, less code and is just as fast.\nGiven that should we always be using bitsets?\nNot really.\nIt&#8217;s the same speed not faster.\nMore importantly it makes the code more complicated.\nYou wouldn&#8217;t be able to refer to a simple boolean any more.\nYou&#8217;d have to access it via an index into the bitset.\nWe also tend to put associated variables together rather than cluster them according to data requirements.\nDoing so is going to be harder to read, write and understand.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"optimising-compiler\">Optimising compiler<\/h2>\n\n\n\n<p>Could we keep the code looking the same and\nget the compiler to do the work?\nIn theory it should be possible for the compiler to order the data in a structure however it likes.\nHowever, there&#8217;s certainly some C++ code out there that makes assumptions about how data is laid out.\nFiles being read in and pointers to bytes being cast directly to structs.\nOther languages with stricter rules are less likely to have that problem.<\/p>\n\n\n\n<p>That could involve packing booleans together into a single byte but\nit could also place a couple of shorts next to each other give a cumulative 4 byte alignment.<\/p>\n\n\n\n<p>You could tell the compiler:<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;clike&quot;,&quot;mime&quot;:&quot;text\/x-c++src&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;C-ish&quot;,&quot;language&quot;:&quot;C++&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;cpp&quot;}\">unordered struct Test {\n    bool Alpha;\n    char Beta;\n    bool Gamma;\n    short Epsilon;\n    bool Zeta;\n    int Eta;\n    bool Theta;\n};<\/pre><\/div>\n\n\n\n<p>And it would lay it out thus:<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;clike&quot;,&quot;mime&quot;:&quot;text\/x-c++src&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;text&lt;br&gt;&quot;,&quot;language&quot;:&quot;C++&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;cpp&quot;}\">    Padding : 63..60\n      Alpha : 59\n      Gamma : 58\n       Zeta : 57\n      Theta : 56\n       Beta : 55..48\n    Epsilon : 47..32\n        Eta : 31..0<\/pre><\/div>\n\n\n\n<p>Or you could leave it in the hands of the programmer but\nseparate the declaration of the data from the layout of the data.\nPerhaps you could choose to structure it differently in different parts of the program.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"in-the-end\">In the end<\/h2>\n\n\n\n<p>I didn&#8217;t know what to expect from the speed tests. That they <em>so<\/em> similar in speed is surprising. As before I think it&#8217;s best to focus on the code in general first. Make it easy to read, write and change. Think about algorithms and data structures because they are likely to have the biggest effects. If you <em>need<\/em> to get performance don&#8217;t make assumptions. Use a profiler, do tests, and find out what&#8217;s fastest.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">P.S.<\/h2>\n\n\n\n<p>In my initial timing tests I used a really big array to fill up the processor cache and<br>see if the bitset would be faster. What about a smaller test? With 128 booleans, <a href=\"https:\/\/quick-bench.com\/q\/_KJnmXOy6YzjK3FWw0Qu-cNDrgE\">they are still neck and neck<\/a>. With 64 booleans, <a href=\"https:\/\/quick-bench.com\/q\/kucy07DJ970CwI0ueTr4Eis3sAo\">the bitset pulls ahead<\/a>. With 32 booleans, <a href=\"https:\/\/quick-bench.com\/q\/N4RRSy-awoeWaCRi27Jaf20JuxU\">the bitset is 2000 times faster<\/a>! With 16 booleans, <a href=\"https:\/\/quick-bench.com\/q\/8Ub2fwDr1x57qNz20KV2zcwaj7g\">they are neck and neck again<\/a>. So, again, use a profiler and make <em>meaningful<\/em> tests because there are all sorts of surprises out there.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve talked about data-oriented design before. It looks at structuring things to remove unnecessary data, keep cache refreshes down and thereby get faster code. Compilers are pretty good nowadays at optimising code. That is they can re-order code, inline functions and unroll loops but they&#8217;re not going to restructure your data for you. Packing booleans [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_import_markdown_pro_load_document_selector":0,"_import_markdown_pro_submit_text_textarea":"","footnotes":""},"categories":[33],"tags":[7],"class_list":["post-402","post","type-post","status-publish","format-standard","hentry","category-overview","tag-performance"],"_links":{"self":[{"href":"https:\/\/permutationcity.co.uk\/bp\/wp-json\/wp\/v2\/posts\/402","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/permutationcity.co.uk\/bp\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/permutationcity.co.uk\/bp\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/permutationcity.co.uk\/bp\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/permutationcity.co.uk\/bp\/wp-json\/wp\/v2\/comments?post=402"}],"version-history":[{"count":4,"href":"https:\/\/permutationcity.co.uk\/bp\/wp-json\/wp\/v2\/posts\/402\/revisions"}],"predecessor-version":[{"id":406,"href":"https:\/\/permutationcity.co.uk\/bp\/wp-json\/wp\/v2\/posts\/402\/revisions\/406"}],"wp:attachment":[{"href":"https:\/\/permutationcity.co.uk\/bp\/wp-json\/wp\/v2\/media?parent=402"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/permutationcity.co.uk\/bp\/wp-json\/wp\/v2\/categories?post=402"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/permutationcity.co.uk\/bp\/wp-json\/wp\/v2\/tags?post=402"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}