-
Notifications
You must be signed in to change notification settings - Fork 277
Vsd value set implementation #5793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vsd value set implementation #5793
Conversation
ba445cf
to
1d6640f
Compare
Codecov Report
@@ Coverage Diff @@
## develop #5793 +/- ##
===========================================
+ Coverage 69.08% 72.68% +3.59%
===========================================
Files 1242 1475 +233
Lines 100842 156359 +55517
===========================================
+ Hits 69667 113647 +43980
- Misses 31175 42712 +11537
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
681e835
to
f500b97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the amazing test cases and many fixes. I would say this is 95% of the way there. Let's have a chat about the remaining 5%, I don't think it will be hard to fix up.
@@ -0,0 +1,12 @@ | |||
CORE | |||
main.c | |||
--variable-sensitivity --vsd-values intervals --show |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something that works but is not widely documented is having multiple .desc
files in the same directory. Thus rather than having a bunch of different directories for the ternary code, you can have intervals.desc
, constants.desc
etc.
main::1::b1 \(\) -> \[1, 1\] @ \[2\] | ||
main::1::b2 \(\) -> \[0, 0\] @ \[3\] | ||
main::1::b3 \(\) -> \[0, 1\] @ \[5\] | ||
main::1::i \(\) -> \[A, A\] @ \[7\] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realise this is probably not your doing but if the output is in hex, it should have a leading 0x
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It did take me a little while to realise that the interval output was hex. I would be delighted to change everything to decimal or everything to hex :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All decimal or all hex (or even, all both) are valid design choices and arguable. The current mix and lack of 0x
is definitely wrong. I would be very happy to have PRs that fix this. Again, out of scope for this PR.
^SIGNAL=0$ | ||
main::1::p .* value-set-begin: 2, 3 :value-set-end | ||
main::1::q .* value-set-begin: 5, 10 :value-set-end | ||
main::1::t .* value-set-begin: 7, 8, 12, 13 :value-set-end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is especially interesting as it shows that the abstraction is not relational; the link between p
and q
is not maintained after the merge at the end of the if
, so you wind up with false alarms like 12
.
regression/goto-analyzer/value-set-function-pointers-incremented-02/main.c
Show resolved
Hide resolved
src/analyses/variable-sensitivity/variable_sensitivity_configuration.cpp
Outdated
Show resolved
Hide resolved
@@ -41,9 +41,7 @@ vsd_configt vsd_configt::from_options(const optionst &options) | |||
|
|||
// This should always be on (for efficeny with 3-way merge) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ... not 100% sure this is actually true, perhaps we should re-visit this bit of the config.
const abstract_object_sett &new_values, | ||
const abstract_environmentt &environment) const | ||
{ | ||
auto result = resolve_values(new_values); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably need to have a chat about this function.
@jezhiggins Thanks for the changes. Let's have a call at some point to sort out the last bits. |
13b7e9f
to
73af97d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a partial review. I had a number of comments I wanted to feed back, before continuing to review this PR in its in entirety. It is good to see more functionality added and problems fixed though. So thank-you for your contribution.
I have one general comment - I have found this PR relatively hard going to review. Part of that is that I am currently learning about this area of the code base. But some of this has to do with the way this PR has been organised. So here are some suggestions which could make this easier to review.
- Firstly even if a lot of the lines changed are in the tests rather than the code, those tests still need to be eyeballed to check them over. So I think this could have been more digestible for reviewers if this was 2-4 smaller PRs rather than a single large PR. This can also benefit you as a PR author, because it gives you the opportunity to learn from the first PR and anticipate/fix issues with the follow up PRs before you submit them.
- All commits should have a title which states what is being changed and a commit message body which explains and justifies why that change is being made. If it is fairly straight forward then the why can be a single sentence. The why of a commit may be obvious to you as an author, but it can take time as a reviewer or as anyone looking back at the commit history to look at the changes to infer the why; if it hasn't been included. The PR template links to https://chris.beams.io/posts/git-commit/ which is a great guide to good commit messages if you haven't read it.
- Every commit should both compile and pass all tests when it is checked out. When regressions find their way into the code base then we would want to be able to use
git-bisect
to go back through the commit history to find the commit which introduced the regression, so that we can fix it more easily. Commits which don't pass the tests can add noise to this process and make it very difficult to track down the cause of a regression. This is the case even if the regression is entirely unrelated to the work in your PR. Commits which contain problems which are fixed later on in a PR also add cognitive load to the reviewer. When reviewing a PR in commit order a reviewer will likely comment on a problem when they spot it. Then they will need to make the connection to the later commit where the problem is fixed in order to better understand the PR as a whole. They may find that there initial comment was somewhat redundant. If instead, the commit which fixes a problem is squashed into the commit which adds it then the reviewer doesn't need to comment on that and they don't need to mentally connect the two commits together.
I have labelled some of my review comments with emojis. This is the key to what I mean by them -
- ℹ️ An FYI. Might be useful if you didn't already know about that piece of information.
- 🚫 A comment for where I do not intend to approve the PR, until this has been addressed.
- ⛏️ A nit pick. It would be good to get these addressed, if possible. However if you don't have time to address these or you disagree with them, then I am OK with getting this PR merged without any corresponding changes.
- 💡 Just an idea. Implement if you like the idea and you have the time.
@@ -29,12 +29,10 @@ abstract_object_pointert create_abstract_object( | |||
const namespacet &ns) | |||
{ | |||
if(top || bottom) | |||
return abstract_object_pointert( | |||
new abstract_object_classt { type, top, bottom }); | |||
return std::make_shared<abstract_object_classt>(type, top, bottom); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Nice clean up.
⛏️ Could you add a commit message body justifying the rational for the change? E.G. - "Because it is best practise to create shared pointers using make_shared
. See C++ Core Guidelines points C.151 and R.22".
@@ -1,4 +1,4 @@ | |||
int main(int argc, char argv[]) | |||
int main(int argc, char argv[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting fixes to code which has been added as part of this PR should ideally be merged into the commit where it is added. Only fixes to code which isn't changed/added by your PR really needs to be in separate commits. This fixing of formatting in these commits can be automated using - git filter-branch --tree-filter 'git-clang-format develop' -- develop..HEAD
. Assuming that your PR is already based on develop
.
@@ -321,23 +320,43 @@ abstract_object_pointert interval_abstract_valuet::expression_transform( | |||
if(num_operands == 0) | |||
return environment.abstract_object_factory(type, ns, true); | |||
|
|||
if(expr.id() == ID_plus) | |||
if(expr.id() == ID_if) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ℹ️ There is a set of function templates to provide an abstraction over checking/casting to different types of expression. For example this check could be written as if(can_cast_expr<if_exprt>(expr))
. You can also cast complete with validity checks by using expr_try_dynamic_cast<if_exprt>(expr)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - these aren't something I've encountered in the code I've worked in. I'll bear them in mind.
@@ -0,0 +1,14 @@ | |||
int unknown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛏️ Could you make the commit title more descriptive? Something like Add more value-set behaviour tests
. It wasn't clear to me from reading the commit title only that this only added more tests rather than more functionality.
src/analyses/variable-sensitivity/value_set_abstract_object.cpp
Outdated
Show resolved
Hide resolved
@@ -148,8 +148,12 @@ abstract_object_pointert value_set_abstract_objectt::expression_transform( | |||
collective_operands, | |||
[&resulting_objects, this, &expr, &environment, &ns]( | |||
const std::vector<abstract_object_pointert> &ops) { | |||
auto operands_expr = exprt::operandst { }; | |||
for (auto v : ops) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛏️ Can we use for(const auto &v : ops)
in order to make it clear that v
is not mutated and to remove the copying?
auto operands_expr = exprt::operandst { }; | ||
for (auto v : ops) | ||
operands_expr.push_back(v->to_constant()); | ||
auto rewritten_expr = exprt(expr.id(), expr.type(), std::move(operands_expr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛏️ I suggest using const exprt rewritten_expr{expr.id(), expr.type(), std::move(operands_expr)};
, in order to avoid the unnecessary use of auto
.
src/analyses/variable-sensitivity/value_set_abstract_object.cpp
Outdated
Show resolved
Hide resolved
0732f1e
to
724bb13
Compare
@thomasspriggs Thanks for reviewing. Please forgive me if I am telling you things you already know but some of the social context for this PR:
|
@martin-cs To my knowledge, the only other user of VSD that we might have in common is no longer using it. We are no longer actively involved in working with VSD at the moment, other than as part of a general role in looking after the health of the CBMC project. |
@chrisr-diffblue Thanks for the update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jezhiggins and I have had a long discussion about this PR. I am happy for it to be merged as he says he will address the following issues in future PRs:
- Making the limit on the number of elements configurable via the factory.
- Consolidating the "what to do when we reach the maximum number of elements" code from merge and eval and making sure that this works for all contained types (this includes not automatically converting to intervals).
- Try to avoiding wrapping and unwrapping context and last-modified-information.
- Looking at how value sets of compound objects (arrays, structs, etc.) and intervals work.
I'll raises these internally on one of our issues.
@thomasspriggs are you happy for this to be merged on this basis (along with the fixes that @jezhiggins has already made)?
I'd really like the issues with commits that don't pass tests to be resolved first. Some of the non-compiling commits in the previous work which was merged in that huge PR have already caused us problems when bisecting the history. I'm OK with the non-compiling commits in that PR of old work, because its better than leaving to rot forever. However I think we should be trying to do better for new work. Especially when all that is required is squashing a fix commit in to the correct preceding commit. I'm perfectly fine with the rest being at the discretion of those who know this part of the code base best. |
Thank you for the wider context. I am interested in broadening my knowledge of the code base, as there are a limited number of parts that I know well at the moment. I was aware of the large PR of historical work being merged, even if I don't really know its complete history. It is good to see more work being done to bring it up to a higher standard. I heard that the value set work could potentially be useful for improving the |
Reduce number of special cases, and generalise expression evaluation. Special case for ID_plus no longer needed, as a+b+c is still evaluated correctly. Previously incorrectly evaluated expressions such as a*b*c now also evaluated properly.
Without this all operations return top, which is unhelpful. With this change, simple arithmetic starts to work, unary operations, etc. Still more to for multi-value sets. Note - change in behaviour for function pointers. I believe previous behaviour was an error.
Covering unary operations and addition with multi-valued sets
expression_transform correctly creates all combinations of operands, but passed the unmodified expression down for further evaluation. If the expression was something like 2 + p then at the next stage down, we'd look up symbol 'p' and be handed back a value set. The evaluated would then try to convert that value set to a constant which would, naturally, fail. Consequently, the expression would evaluate to top. Now, we rewrite the expression in terms of the evaluated operands. If expression was 2 + p, and p was the set { 2, 3 } the expression is rewritten first as 2 + 2, then as 2 + 3, which we can gather up into our result of { 4, 5 }. The various regression tests in this commit exercise various combinations of constants and value-sets, value-sets of different size, and so on. All the tests use addition, but the operator is immaterial.
Splendid side-effect of the expression_transform work :)
Do this by ensuring new object we return are created though the factory, rather than instantiated directly. Several regression tests needed updating as a consequence of this change, generally tweaking the regex to allow the context info in place of a hard line end.
Reduce duplication, separated object create from the context object that may wrap it.
I'm not super thrilled about how I've done this, because ideally all object creation should go through the object factory. However, in this particular case a large value set could be converted to an interval. If we hand off to the factory it'll just create another value-set, so we need to explicitly create the interval. However, when we do that we don't have the context wrapper (write location or data dependency). Consequently, I've added variable_sensitivity_object_factoryt::wrap_with_context and exposed it through abstract_environment so we can add the appropriate context wrapper should we need it.
Tidy up evaluate_conditional and pull out evaluate_unary_expr
…factory Default arguments on a virtual function aren't safe. Even through this function isn't overriden anywhere now, it's a latent bug waiting for the future.
Avoid shadowing member variable, make single-arg constructor explicit
value-set ordering is unstable in the face of code changes, which made tests fragile. Sorting output fixes that and is also easier on the human eye too. We're sorting string representations, so the output isn't a true numerical sort but it's good enough.
value-sets and the data-dependency configuration can be safely combined
@thomasspriggs At the moment the only in-tree use of VSD is in Function pointers are a separate but related question. At the moment function pointer removal is a very syntactic process. For a number of years I have wanted (and tried) to make it more semantic. Doing so would mean we could use VSD for function pointer removal and then, yes, it would reduce the number of alternatives which could have a significant performance improvement on some code. |
724bb13
to
f363e38
Compare
I understand 👍 I've gone through the PR, fixing up those problematic commits. It should be good to merge now. |
The focus of this PR is filling out value-set behaviour. Up to now you could assign to a value-set, but any kind of operation would result in the value-set going to top. This PR fills out value-set behaviour for unary & n-ary operations, and the ternary operator too because that's a bit of an edge case.
This PR also adds array indexing operations with value-sets, and makes value-sets work with context tracking. Large value-sets are correctly converted to intervals, previously they just went top.
Finally, the value-set output is now sorted. value-set intervan ordering is unstable in the face of code changes, which made tests fragile. Sorting output fixes that and is also easier on the human eye.
In addition, there's are some refactoring work in interval_value_objectt, mainly to clarify the implementation so I could draw parallels with value-set.
This PR looks larger that it really is. Of the 70 files listed, 55 of them are part of the regression tests and all but a handful are new tests covering value-set behaviour. The rest are renamed to better express what they test, or have minor fixes now value-set behaviour has been filled out. The substantial changes are all in value_set_abstract_object.cpp