Apples vs oranges

In some of my recent reading I’ve been frustrated at authors making poor comparisons. The typical behaviour is to showcase an example of good coding with their preferred language / methodology against an example of bad coding with the competitor. It’s not, say, the language which is making the difference. It’s whether the code has been well written or not.

Rust’s enums

I’m going to pick on Programming Rust by O’Reilly as it’s been the most recent. Rust allows an enum to hold a primary value and addition data relating to that value. The book shows the start of a json processor implementation:

enum Json {
    Null,
    Boolean(bool),
    Number(f64),
    String(String),
    Array(Vec<Json>),
    Object(Box<HashMap<string, Json>),
}

So if you have a Number then there is an associated floating point, if you have an Array there is an associated vector of Json values. There is simple control flow available to use this to access the right data at the right time. Coming from a C++ background this seems surprising but I can see the benefits.

Next the book shows some C++ code to do the same thing:

class JSON {
private:
    enum Tag {
        Null, Boolean, Number, String, Array, Object
    };
    union Data {
        bool boolean;
        double number;
        shared_ptr<string> str;
        shared_ptr<vector<JSON>> array;
        shared_ptr<unordered_map<string, JSON>> object;
        
        Data() {}
        ~Data() {}
        ...
    };
    
    Tag tag;
    Data data;

public:
    bool is_null() const { return tag == Null; }
    bool is_boolean() const { return tag == Boolean; }
    bool get_boolean() const {
        assert(is_boolean());
        return data.boolean;
    }
    void set_boolean(bool value) {
        this->~JSON(); // clean up string/array/object value
        tag = Boolean;
        data.boolean = value;
    }
    ...
};

That’s bigger and is taking more code to do the same sort of thing.

Is that really what someone writing good C++ would do? I would have been tempted to expose the enum Tag type and a getter for it. That would allow a switch-statement to choose the right control pathway without having to both with individual methods to check each tag type. The std::shared_ptr wrappers are probably unnecessary. I’m not sure how much space std::string, std::vector and std::unordered_map take up but most of their data will be stored elsewhere. It might have been done to give a uniform destructor behaviour.

All that seems insignificant when you consider that C++17 introduced std::variant. The code could have looked something like this:

struct JSON {
    using Data = std::variant<
        bool,
        double,
        std::string,
        std::vector<JSON>,
        std::unordered_map<std::string, JSON>
    >;

    Data data;
};

You can access the variant via type, e.g. json.data.get<std::string>(), or index. No enum by default but you could add one and some helpers methods to facilitate things. Or, much like Rust, other flow control options are available:

    std::visit(overloaded{
        [](bool boolean) { std::cout << boolean << std::endl; },
        [](double number) { std::cout << number << std::endl; },
        [](const std::string& string) { std::cout << string << std::endl; },
        [](const std::vector<JSON>& array) {
            for (const auto& json : array) {
                output(json.data);
            }
        },
        [](const std::unordered_map<std::string, JSON>& object) {
            for (const auto& [key, json] : object) {
                output(json.data);
            }
        },
    }, data);

This isn’t as elegant as Rust can manage but it is a much closer match than the book presented.

The first edition of this book was written in 2018 and the second edition in 2021. std::variant was introduced before either of these were published. Maybe it hadn’t been noticed when the first book was written and then the sample was just reproduced for the second book. The writers may well not have been as up to date with C++ as they are with Rust. Maybe they did know and just wanted to present a great contrast between the two languages. Was that necessary? Rust would still manage to do the same thing with less code.

If I spot something like this it just make me mistrust the rest of the book. I’m going to take everything else that’s said with a pinch of salt. Even things where the author does have a good point.

On balance

This could be taken as a lesson to avoid putting too much trust in what you read. Just because someone says one thing is good and one thing is bad doesn’t make it so.

However I think this is about anyone, ourselves included, comparing apples with oranges:

If you’ve got an old and new rendering algorithm for a game should they both be tested on the same scene? Is it an old or new test scene? Is there going to be an advantage for one algorithm or another. Should they each get a scene custom built according to their advantages and disadvantages.
If you switch to a new language and your bug count goes up what does that mean? Maybe the new language has an inherent problem. Maybe you just aren’t as familiar with the language. Maybe it’s easier to do testing so the bugs are being found faster but still fixed at the same rate.

If have to compare apples and oranges because that’s all there is then try to be aware of it.

Rust’s enums

On balance

Comments

Leave a Reply Cancel reply