Mem value sanitizing #633

ricab · 2019-02-22T18:45:02Z

Improves the handling of memory and disk size inputs. Means to address three issues, one of which is still pending.

gerboland · 2019-02-27T10:10:06Z

src/daemon/daemon.cpp

+    auto mem_size = std::get<0>(mem_disk_name_and_errors);      // use structured bindings instead in C++17
+    auto disk_space = std::get<1>(mem_disk_name_and_errors);    // use structured bindings instead in C++17
+    auto instance_name = std::get<2>(mem_disk_name_and_errors); // use structured bindings instead in C++17
+    auto option_errors = std::get<3>(mem_disk_name_and_errors); // use structured bindings instead in C++17


I dislike this because there's no clear connection between 0 and mem_size, 1 and disk_space, etc. Sure it works, but when we add an entry in the tuple, or swap entries with same type, it's easy to refer to the wrong index and create a bug, yet the code will still compile.

A quick struct will give names to these entries, establishing a stronger key/value connection. e.g. VirtualMachineDescription

Note, you could correctly point out that we do something very similar in the return statements of validate_create_arguments, to_machine_desc, etc - where we use brace initializers to populate a struct - as again there's no clear key/value connection. I'm not a giant fan of that style either, but at least it is localised to the creation of the struct, and not the consumer, so the person who changes it will (I think) be more aware of the consequences.

That is a good point and, when talking about public functions or functions with multiple uses, I generally agree. But in localized private functions that only exist to offload code from the caller, I am not so sure. There are pros and cons and I am not sure how to proceed in this occasion.

Currently (with tuple), the code will compile only if the types are compatible with how they are used by the consumer. This would be the same with struct.

I do agree it is more readable to have the vars named rather than indexed in large tuples. But it is less clear if structs proliferate for every useful return combination. (I wish we had named returns [and params for that matter]).

Indexes would go away with C++17 (i.e. auto [mem, disk, name, errors] = validate_create_arguments(...)).

It is still the ordering that determines what is what, whether the consumer of the returned expression is outside (current case) or within validate_create_arguments (if we returned a struct instead). In both cases, code outside of the function is affected by changes in the return.

But, as you say, in one case those changes are properly localized to the producer, while in the other it leaks to its consumer. This is the main argument for the struct approach and what makes it the better choice in public interfaces.

The cons of the tuple approach are minimized in this particular case, as the function is local to the compilation unit (anonymous namespace) and only separate from the caller in the interest of readability. This is still no reason to preclude a struct, but...

The goal of all of this code (most of launch) is already to initialize a (complete) VirtualMachineDescription. So I am not sure creating another intermediate description struct is appropriate.

How would I name this struct? VirtualMahineMemDiskNameAndErrors? VirtualMachineSomeOptions? VirtualMachinePartialDescription?

I suppose the right thing would be for VirtualMachineDescription's constructor to receive the request's parameters and derive itself from them. But that sounds like excessive refactoring for the current purposes...

Alternatives:

input parameters - I tend to dislike those because I like objects to be meaningful from construction (the other [the literal] meaning of RAII).

partially filling VirtualMachineDescription in validate_create_arguments - I wouldn't favor this either for the same reason

validate arguments separately (one function for memory, one for disk, etc.) - they would still need to return pairs (because of possible errors). Additionally, to maintain the same behavior as today, option_errors would have to be combined after all functions were called.

put all the code directly in launch...

So, @gerboland what do you think?

Taking your points

as I said, swapping entries of same type will still compile, but create bug. Public/private is irrelevant to that argument, and the greatest strength structs have.

I'm only taking this case, not every return. Named returns would be ideal, closest is a struct IMO.

same problem, just replace indexes with key names.

I'm not saying it's ideal, but least worst option IMO

not just public IMO

readability isn't my priority here, avoiding bugs is - especially those possibly introduced later by a different author.

it's a struct just to pass a list of values with labels. My objection is purely down to the lack of labels with your approach here.

I dunno, ValidatedVMOptions?

Sorry, my opinion hasn't changed. Note I'm not making a big deal here, it's just code after all :)

I'll let @townsend2010 have his say.

I do feel like the proposed approach is more prone to bugs accidentally introduced if some other maintainer or contributor down the road added another check here, like for example, validating the number of cores/CPU's as well. What I'm not a fan of, is that the consumer has to know the exact order of the elements that are returned from validate_create_arguments. With a struct, it only needs to know the struct's member name and the rest still are valid, even if a contributor puts a new member variable anywhere in the struct.

But as @gerboland said, it's just code and it works now, so I wouldn't block on this either:)

Thanks for the comments. No big deal here either! Just debating an interesting issue with implications in future coding practices :) I actually agree with you to a large extent, but I don't quite follow what you say in 1 and 6.

Suppose I used a struct Foo, with ctor Foo(mem, disk, name, errors). The function would say return Foo{a,b,c,d}; right? Aren't argument swapping bugs as likely? The way I see it, a change in the producer requires a change in the consumer in both cases. The compiler will do the same. What changes is where those responsibilities lie, which is why I mentioned public/private.

A struct-based approach encapsulates the consumer, preventing impact on the function's client(s). But if the function has a single client and is only there to offload code into the anonymous namespace, the benefit is small IMO. When considering also drawbacks (struct proliferation, a la carte types representing no clear concept), I place the ideal compromise a bit differently (barring named returns).

Anyway... I just saw @townsend2010's comment and will modify the code to use the struct.

I just committed changes with the ordering now encapsulated in a struct. Does that address the concerns?

src/utils/utils.cpp

include/multipass/utils.h

src/daemon/daemon.cpp

Replace variable initializations that were previously marked as candidates for structured bindings in C++17.

Use a corrected regular expression to validate memory size values. Accept lowercase suffixes (k, m, g). Improve and extend corresponding tests. Fixes #470.

Test a new memory normalization interface, to simultaneously validate and convert memory values to the smallest unit (bytes).

Implement verification and normalization of memory values in a single step. Use that in the daemon, replacing mere validation. Update hostname validation for consistency. Update tests accordingly. Fixes #616.

Provide a first step to obtain the number of bytes (to compare with other numbers) and a second step to express the same thing as a string that is suitable for backend memory/disk arguments.

Make tests expect error messages in error stream instead.

Makes failing tests pass. Fixes #588.

This attempts to address a review concern.

Use Qt's API rather than the standard library to convert a QString to long long, thus avoiding the intermediate std::string step. This addresses a review concern.

ricab · 2019-02-28T11:11:44Z

This currently uses C++17 features (fallthrough attribute and structured bindings), so rebased on that branch.

gerboland · 2019-02-28T13:43:41Z

src/daemon/daemon.cpp

+        std::string disk_space;
+        std::string instance_name;
+        mp::LaunchError option_errors;
+    } ret{mem_size, disk_space, instance_name, option_errors};


This doesn't address my concern. There is still no connection between this ordering of params and the order in the consumer. E.g. this will compile, but is wrong:

auto [disk_space, mem_size, instance_name, option_errors] = validate_create_arguments(request);

This is the root of my dislike of this syntactic sugar.

If the params were all different types, the type system would catch errors. So I'm ok with this syntax being used with care. But here, I think using a struct like this would give more safety. Here's my thinking:
https://pastebin.ubuntu.com/p/RsQ3sf6dZy/

My motivation is that this moves the responsibility for parameter order checking to solely the location the param list is created (i.e. in validate_create_arguments). Not where it is consumed.

With this struct approach, if someone swaps mem_size and disk_space, they only need to care about reordering those params in the creation of the struct. No other code needs to change, and no bugs created.

Sure, we could go all the way and insist on this at creation:

CheckedArguments args; args.mem_size = mem_size; args.disk_space = .... return args;

but I think that's unnecessary. I trust that if an author wants to change the CheckedArguments list order, they'll be sure to do the right thing. But I don't trust they'll update all the consumers correctly too.

Is it pretty? No. But IMO it's worth it for safety.

Just my $0.02, but that pastebin is what I envisioned too when I made my comments.

OK, I will do it that way then, but...

My motivation is that this moves the responsibility for parameter order checking to solely the location the param list is created (i.e. in validate_create_arguments). Not where it is consumed.

Well, the ordering of public attributes is part of a struct's interface, so it is not being contained. IOW, anyone changing the order would still have to check all clients (think also construction, assignment...). But I understand your concern is that they may forget so it is perhaps safer not to rely on that here.

Drop structured-bindings and access struct attributes by name. This addresses a review request.

ricab · 2019-02-28T18:02:20Z

OK take 3 :)

@gerboland did as you requested, keeping the struct inside the function as discussed.

Saviq · 2019-02-28T20:22:01Z

Moved back onto move-to-cpp17 as we had to revert it temporarily.

gerboland

Thank you for taking my comments on board. I'm happy with the approach.

I am getting a compile error on Bionic (with Qt5.9)

src/utils/utils.cpp:70:26: error: ‘const class QString’ has no member named ‘front’
             switch (unit.front().toLower().toLatin1())

Unfortunately the QString::front() method was added in Qt5.10. Alternative is at(0)

ricab · 2019-03-01T11:02:48Z

@gerboland thanks for noticing that!

I will have to move this because changing the base of this PR closed it and github does not allow reopening. New PR comming...

654: Mem value sanitizing - take 3 r=gerboland a=ricab Moved from #633 (after rebase and force-push). Co-authored-by: Ricardo Abreu <[email protected]>

[utils] Use a consistent namespace alias for utils

ricab force-pushed the mem-value-sanitizing branch from 5c50093 to d8026cf Compare February 26, 2019 20:59

ricab changed the title ~~WIP: Mem value sanitizing~~ Mem value sanitizing Feb 26, 2019

gerboland reviewed Feb 27, 2019

View reviewed changes

src/utils/utils.cpp Outdated Show resolved Hide resolved

Move to C++17

746f1dd

gerboland reviewed Feb 27, 2019

View reviewed changes

include/multipass/utils.h Show resolved Hide resolved

gerboland reviewed Feb 27, 2019

View reviewed changes

src/daemon/daemon.cpp Show resolved Hide resolved

Use structured bindings where pending

0857366

Replace variable initializations that were previously marked as candidates for structured bindings in C++17.

ricab added the no-merge label Feb 27, 2019

ricab changed the title ~~Mem value sanitizing~~ WIP: Mem value sanitizing Feb 27, 2019

ricab added 11 commits February 27, 2019 18:13

Fix memory value validation

a6497fc

Use a corrected regular expression to validate memory size values. Accept lowercase suffixes (k, m, g). Improve and extend corresponding tests. Fixes #470.

Implement tests for memory normalization.

db12d27

Test a new memory normalization interface, to simultaneously validate and convert memory values to the smallest unit (bytes).

Implement memory value normalization.

b03ef72

Implement verification and normalization of memory values in a single step. Use that in the daemon, replacing mere validation. Update hostname validation for consistency. Update tests accordingly. Fixes #616.

Test enforcement of lower memory and disk bounds

c689de7

Split memory value normalization into two steps

425e83e

Provide a first step to obtain the number of bytes (to compare with other numbers) and a second step to express the same thing as a string that is suitable for backend memory/disk arguments.

Factorize memory/disk size tests.

37a8194

Fix tests looking for errors in stdout

11d0bcb

Make tests expect error messages in error stream instead.

Require minimum sizes for memory and disk

16d8b5e

Makes failing tests pass. Fixes #588.

Format branch changes

0da0a04

Replace tuple with struct in helper return

96a15e1

This attempts to address a review concern.

Avoid superfluous std::string conversion

984d4c3

Use Qt's API rather than the standard library to convert a QString to long long, thus avoiding the intermediate std::string step. This addresses a review concern.

ricab force-pushed the mem-value-sanitizing branch from ccf86ef to 984d4c3 Compare February 28, 2019 11:06

ricab changed the base branch from master to move-to-cpp17 February 28, 2019 11:08

ricab changed the title ~~WIP: Mem value sanitizing~~ Mem value sanitizing Feb 28, 2019

ricab removed the no-merge label Feb 28, 2019

bors bot closed this Feb 28, 2019

Saviq changed the base branch from move-to-cpp17 to master February 28, 2019 13:31

Saviq reopened this Feb 28, 2019

gerboland reviewed Feb 28, 2019

View reviewed changes

Avoid depending on struct attribute ordering.

498b7b8

Drop structured-bindings and access struct attributes by name. This addresses a review request.

Saviq changed the base branch from master to move-to-cpp17 February 28, 2019 20:21

gerboland suggested changes Feb 28, 2019

View reviewed changes

Saviq mentioned this pull request Mar 1, 2019

Introduce Process, ProcessSpec and ProcessFactory #650

Merged

ricab force-pushed the move-to-cpp17 branch from 0857366 to 779d108 Compare March 1, 2019 10:49

ricab closed this Mar 1, 2019

ricab changed the base branch from move-to-cpp17 to move-to-cpp17-take2 March 1, 2019 10:58

ricab mentioned this pull request Mar 1, 2019

Mem value sanitizing - take 3 #654

Merged

bors bot added a commit that referenced this pull request Mar 8, 2019

Merge #654

46a7c97

654: Mem value sanitizing - take 3 r=gerboland a=ricab Moved from #633 (after rebase and force-push). Co-authored-by: Ricardo Abreu <[email protected]>

ricab pushed a commit that referenced this pull request May 8, 2025

Merge pull request #633 from canonical/use-consistent-ns-alias

60a2bd5

[utils] Use a consistent namespace alias for utils

Mem value sanitizing #633

Mem value sanitizing #633

Uh oh!

Conversation

ricab commented Feb 22, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ricab Feb 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ricab commented Feb 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ricab commented Feb 28, 2019

Uh oh!

Saviq commented Feb 28, 2019

Uh oh!

gerboland left a comment

Choose a reason for hiding this comment

Uh oh!

ricab commented Mar 1, 2019

Uh oh!

Uh oh!

ricab Feb 27, 2019 •

edited

Loading

ricab commented Feb 28, 2019 •

edited

Loading