Skip to content

Vsd value set implementation #5793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Feb 13, 2021

Conversation

jezhiggins
Copy link
Contributor

@jezhiggins jezhiggins commented Jan 30, 2021

The focus of this PR is filling out value-set behaviour. Up to now you could assign to a value-set, but any kind of operation would result in the value-set going to top. This PR fills out value-set behaviour for unary & n-ary operations, and the ternary operator too because that's a bit of an edge case.

This PR also adds array indexing operations with value-sets, and makes value-sets work with context tracking. Large value-sets are correctly converted to intervals, previously they just went top.

Finally, the value-set output is now sorted. value-set intervan ordering is unstable in the face of code changes, which made tests fragile. Sorting output fixes that and is also easier on the human eye.

In addition, there's are some refactoring work in interval_value_objectt, mainly to clarify the implementation so I could draw parallels with value-set.

This PR looks larger that it really is. Of the 70 files listed, 55 of them are part of the regression tests and all but a handful are new tests covering value-set behaviour. The rest are renamed to better express what they test, or have minor fixes now value-set behaviour has been filled out. The substantial changes are all in value_set_abstract_object.cpp

@jezhiggins jezhiggins force-pushed the vsd-value-set-implementation branch from ba445cf to 1d6640f Compare January 30, 2021 10:24
@codecov
Copy link

codecov bot commented Jan 30, 2021

Codecov Report

Merging #5793 (f363e38) into develop (23c11e8) will increase coverage by 3.59%.
The diff coverage is 95.04%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #5793      +/-   ##
===========================================
+ Coverage    69.08%   72.68%   +3.59%     
===========================================
  Files         1242     1475     +233     
  Lines       100842   156359   +55517     
===========================================
+ Hits         69667   113647   +43980     
- Misses       31175    42712   +11537     
Flag Coverage Δ
cproversmt2 43.61% <ø> (?)
regression 66.82% <92.77%> (+0.02%) ⬆️
unit 32.27% <36.14%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ses/variable-sensitivity/interval_abstract_value.h 100.00% <ø> (ø)
...s/variable-sensitivity/value_set_abstract_object.h 100.00% <ø> (ø)
...-sensitivity/variable_sensitivity_object_factory.h 100.00% <ø> (ø)
...riable-sensitivity/full_struct_abstract_object.cpp 81.81% <33.33%> (-0.09%) ⬇️
.../analyses/variable-sensitivity/abstract_object.cpp 69.15% <50.00%> (-4.80%) ⬇️
...yses/variable-sensitivity/abstract_environment.cpp 87.42% <87.50%> (+2.02%) ⬆️
...s/variable-sensitivity/interval_abstract_value.cpp 78.86% <94.44%> (-0.79%) ⬇️
...variable-sensitivity/value_set_abstract_object.cpp 87.06% <97.14%> (+15.64%) ⬆️
...alyses/variable-sensitivity/abstract_environment.h 100.00% <100.00%> (ø)
...s/variable-sensitivity/constant_abstract_value.cpp 94.38% <100.00%> (+2.38%) ⬆️
... and 1282 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23c11e8...f363e38. Read the comment docs.

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the amazing test cases and many fixes. I would say this is 95% of the way there. Let's have a chat about the remaining 5%, I don't think it will be hard to fix up.

@@ -0,0 +1,12 @@
CORE
main.c
--variable-sensitivity --vsd-values intervals --show
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that works but is not widely documented is having multiple .desc files in the same directory. Thus rather than having a bunch of different directories for the ternary code, you can have intervals.desc, constants.desc etc.

main::1::b1 \(\) -> \[1, 1\] @ \[2\]
main::1::b2 \(\) -> \[0, 0\] @ \[3\]
main::1::b3 \(\) -> \[0, 1\] @ \[5\]
main::1::i \(\) -> \[A, A\] @ \[7\]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise this is probably not your doing but if the output is in hex, it should have a leading 0x.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did take me a little while to realise that the interval output was hex. I would be delighted to change everything to decimal or everything to hex :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All decimal or all hex (or even, all both) are valid design choices and arguable. The current mix and lack of 0x is definitely wrong. I would be very happy to have PRs that fix this. Again, out of scope for this PR.

^SIGNAL=0$
main::1::p .* value-set-begin: 2, 3 :value-set-end
main::1::q .* value-set-begin: 5, 10 :value-set-end
main::1::t .* value-set-begin: 7, 8, 12, 13 :value-set-end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is especially interesting as it shows that the abstraction is not relational; the link between p and q is not maintained after the merge at the end of the if, so you wind up with false alarms like 12.

@@ -41,9 +41,7 @@ vsd_configt vsd_configt::from_options(const optionst &options)

// This should always be on (for efficeny with 3-way merge)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ... not 100% sure this is actually true, perhaps we should re-visit this bit of the config.

const abstract_object_sett &new_values,
const abstract_environmentt &environment) const
{
auto result = resolve_values(new_values);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably need to have a chat about this function.

@martin-cs
Copy link
Collaborator

@jezhiggins Thanks for the changes. Let's have a call at some point to sort out the last bits.

jezhiggins added a commit to jezhiggins/cbmc that referenced this pull request Feb 3, 2021
jezhiggins added a commit to jezhiggins/cbmc that referenced this pull request Feb 6, 2021
@jezhiggins jezhiggins force-pushed the vsd-value-set-implementation branch from 13b7e9f to 73af97d Compare February 6, 2021 15:30
Copy link
Contributor

@thomasspriggs thomasspriggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a partial review. I had a number of comments I wanted to feed back, before continuing to review this PR in its in entirety. It is good to see more functionality added and problems fixed though. So thank-you for your contribution.

I have one general comment - I have found this PR relatively hard going to review. Part of that is that I am currently learning about this area of the code base. But some of this has to do with the way this PR has been organised. So here are some suggestions which could make this easier to review.

  1. Firstly even if a lot of the lines changed are in the tests rather than the code, those tests still need to be eyeballed to check them over. So I think this could have been more digestible for reviewers if this was 2-4 smaller PRs rather than a single large PR. This can also benefit you as a PR author, because it gives you the opportunity to learn from the first PR and anticipate/fix issues with the follow up PRs before you submit them.
  2. All commits should have a title which states what is being changed and a commit message body which explains and justifies why that change is being made. If it is fairly straight forward then the why can be a single sentence. The why of a commit may be obvious to you as an author, but it can take time as a reviewer or as anyone looking back at the commit history to look at the changes to infer the why; if it hasn't been included. The PR template links to https://chris.beams.io/posts/git-commit/ which is a great guide to good commit messages if you haven't read it.
  3. Every commit should both compile and pass all tests when it is checked out. When regressions find their way into the code base then we would want to be able to use git-bisect to go back through the commit history to find the commit which introduced the regression, so that we can fix it more easily. Commits which don't pass the tests can add noise to this process and make it very difficult to track down the cause of a regression. This is the case even if the regression is entirely unrelated to the work in your PR. Commits which contain problems which are fixed later on in a PR also add cognitive load to the reviewer. When reviewing a PR in commit order a reviewer will likely comment on a problem when they spot it. Then they will need to make the connection to the later commit where the problem is fixed in order to better understand the PR as a whole. They may find that there initial comment was somewhat redundant. If instead, the commit which fixes a problem is squashed into the commit which adds it then the reviewer doesn't need to comment on that and they don't need to mentally connect the two commits together.

I have labelled some of my review comments with emojis. This is the key to what I mean by them -

  • ℹ️ An FYI. Might be useful if you didn't already know about that piece of information.
  • 🚫 A comment for where I do not intend to approve the PR, until this has been addressed.
  • ⛏️ A nit pick. It would be good to get these addressed, if possible. However if you don't have time to address these or you disagree with them, then I am OK with getting this PR merged without any corresponding changes.
  • 💡 Just an idea. Implement if you like the idea and you have the time.

@@ -29,12 +29,10 @@ abstract_object_pointert create_abstract_object(
const namespacet &ns)
{
if(top || bottom)
return abstract_object_pointert(
new abstract_object_classt { type, top, bottom });
return std::make_shared<abstract_object_classt>(type, top, bottom);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Nice clean up.
⛏️ Could you add a commit message body justifying the rational for the change? E.G. - "Because it is best practise to create shared pointers using make_shared. See C++ Core Guidelines points C.151 and R.22".

@@ -1,4 +1,4 @@
int main(int argc, char argv[])
int main(int argc, char argv[])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting fixes to code which has been added as part of this PR should ideally be merged into the commit where it is added. Only fixes to code which isn't changed/added by your PR really needs to be in separate commits. This fixing of formatting in these commits can be automated using - git filter-branch --tree-filter 'git-clang-format develop' -- develop..HEAD. Assuming that your PR is already based on develop.

@@ -321,23 +320,43 @@ abstract_object_pointert interval_abstract_valuet::expression_transform(
if(num_operands == 0)
return environment.abstract_object_factory(type, ns, true);

if(expr.id() == ID_plus)
if(expr.id() == ID_if)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ There is a set of function templates to provide an abstraction over checking/casting to different types of expression. For example this check could be written as if(can_cast_expr<if_exprt>(expr)). You can also cast complete with validity checks by using expr_try_dynamic_cast<if_exprt>(expr).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - these aren't something I've encountered in the code I've worked in. I'll bear them in mind.

@@ -0,0 +1,14 @@
int unknown();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ Could you make the commit title more descriptive? Something like Add more value-set behaviour tests. It wasn't clear to me from reading the commit title only that this only added more tests rather than more functionality.

@@ -148,8 +148,12 @@ abstract_object_pointert value_set_abstract_objectt::expression_transform(
collective_operands,
[&resulting_objects, this, &expr, &environment, &ns](
const std::vector<abstract_object_pointert> &ops) {
auto operands_expr = exprt::operandst { };
for (auto v : ops)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ Can we use for(const auto &v : ops) in order to make it clear that v is not mutated and to remove the copying?

auto operands_expr = exprt::operandst { };
for (auto v : ops)
operands_expr.push_back(v->to_constant());
auto rewritten_expr = exprt(expr.id(), expr.type(), std::move(operands_expr));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ I suggest using const exprt rewritten_expr{expr.id(), expr.type(), std::move(operands_expr)};, in order to avoid the unnecessary use of auto.

@jezhiggins jezhiggins force-pushed the vsd-value-set-implementation branch from 0732f1e to 724bb13 Compare February 12, 2021 11:26
@martin-cs
Copy link
Collaborator

@thomasspriggs Thanks for reviewing. Please forgive me if I am telling you things you already know but some of the social context for this PR:

  • The variable sensitivity domain (VSD) has quite a long and a complex history with many different developers involved. Along that path the design and design intent was not always communicated as well as it could have been. As a consequence there are some parts that need some re-architecting. Jez and I are working on doing that and this is one step towards it.
  • After several years being developed out of tree, the bulk of VSD was merged in Variable sensitivity domain #5472 following a substantial team effort. One thing we discussed informally that I don't know if it made to the ticket was that it was better to merge it partially sorted and fix-up on tree rather than continue to track develop until it was perfect.
  • As a consequence of these two, when I have been reviewing I have not been as strict about following all of the rules for PRs; as long as it passes all of CI and is a net improvement then it could be mergable. There is still a lot of tidy up to do and some of it is overlapping so I am accepting that not everything will get done on the first pass.
  • As far as I know, my team / project are the only users of this code. I know at one point Diffblue had another user of this but despite my best attempts I haven't been able to find out who is currently responsible for that. If you know I would love to be able to co-ordinate.
  • If you have time / energy / use cases for this or any of the rest of goto-analyzer then it is most appreciated and I'm happy to include you / anyone else in The Plan.

@chrisr-diffblue
Copy link
Contributor

@martin-cs To my knowledge, the only other user of VSD that we might have in common is no longer using it. We are no longer actively involved in working with VSD at the moment, other than as part of a general role in looking after the health of the CBMC project.

@martin-cs
Copy link
Collaborator

@chrisr-diffblue Thanks for the update.

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jezhiggins and I have had a long discussion about this PR. I am happy for it to be merged as he says he will address the following issues in future PRs:

  1. Making the limit on the number of elements configurable via the factory.
  2. Consolidating the "what to do when we reach the maximum number of elements" code from merge and eval and making sure that this works for all contained types (this includes not automatically converting to intervals).
  3. Try to avoiding wrapping and unwrapping context and last-modified-information.
  4. Looking at how value sets of compound objects (arrays, structs, etc.) and intervals work.

I'll raises these internally on one of our issues.

@thomasspriggs are you happy for this to be merged on this basis (along with the fixes that @jezhiggins has already made)?

@thomasspriggs
Copy link
Contributor

@jezhiggins and I have had a long discussion about this PR. I am happy for it to be merged as he says he will address the following issues in future PRs:

  1. Making the limit on the number of elements configurable via the factory.
  2. Consolidating the "what to do when we reach the maximum number of elements" code from merge and eval and making sure that this works for all contained types (this includes not automatically converting to intervals).
  3. Try to avoiding wrapping and unwrapping context and last-modified-information.
  4. Looking at how value sets of compound objects (arrays, structs, etc.) and intervals work.

I'll raises these internally on one of our issues.

@thomasspriggs are you happy for this to be merged on this basis (along with the fixes that @jezhiggins has already made)?

I'd really like the issues with commits that don't pass tests to be resolved first. Some of the non-compiling commits in the previous work which was merged in that huge PR have already caused us problems when bisecting the history. I'm OK with the non-compiling commits in that PR of old work, because its better than leaving to rot forever. However I think we should be trying to do better for new work. Especially when all that is required is squashing a fix commit in to the correct preceding commit. I'm perfectly fine with the rest being at the discretion of those who know this part of the code base best.

@thomasspriggs
Copy link
Contributor

@thomasspriggs Thanks for reviewing. Please forgive me if I am telling you things you already know but some of the social context for this PR:

  • The variable sensitivity domain (VSD) has quite a long and a complex history with many different developers involved. Along that path the design and design intent was not always communicated as well as it could have been. As a consequence there are some parts that need some re-architecting. Jez and I are working on doing that and this is one step towards it.
  • After several years being developed out of tree, the bulk of VSD was merged in Variable sensitivity domain #5472 following a substantial team effort. One thing we discussed informally that I don't know if it made to the ticket was that it was better to merge it partially sorted and fix-up on tree rather than continue to track develop until it was perfect.
  • As a consequence of these two, when I have been reviewing I have not been as strict about following all of the rules for PRs; as long as it passes all of CI and is a net improvement then it could be mergable. There is still a lot of tidy up to do and some of it is overlapping so I am accepting that not everything will get done on the first pass.
  • As far as I know, my team / project are the only users of this code. I know at one point Diffblue had another user of this but despite my best attempts I haven't been able to find out who is currently responsible for that. If you know I would love to be able to co-ordinate.
  • If you have time / energy / use cases for this or any of the rest of goto-analyzer then it is most appreciated and I'm happy to include you / anyone else in The Plan.

Thank you for the wider context. I am interested in broadening my knowledge of the code base, as there are a limited number of parts that I know well at the moment. I was aware of the large PR of historical work being merged, even if I don't really know its complete history. It is good to see more work being done to bring it up to a higher standard.

I heard that the value set work could potentially be useful for improving the remove_function_pointers pass in cbmc, because it could be used to reduce the number of different cases added to the switch blocks. However to my knowledge this isn't a piece of work which we are currently planning to undertake.

Reduce number of special cases, and generalise expression evaluation.
Special case for ID_plus no longer needed, as a+b+c is still evaluated
correctly. Previously incorrectly evaluated expressions such as a*b*c now
also evaluated properly.
Without this all operations return top, which is unhelpful. With this
change, simple arithmetic starts to work, unary operations, etc.
Still more to for multi-value sets.

Note - change in behaviour for function pointers. I believe previous
behaviour was an error.
Covering unary operations and addition with multi-valued sets
expression_transform correctly creates all combinations of operands, but
passed the unmodified expression down for further evaluation. If the
expression was something like
  2 + p
then at the next stage down, we'd look up symbol 'p' and be handed back
a value set. The evaluated would then try to convert that value set to a
constant which would, naturally, fail. Consequently, the expression
would evaluate to top.
Now, we rewrite the expression in terms of the evaluated operands. If
expression was 2 + p, and p was the set { 2, 3 } the expression is
rewritten first as 2 + 2, then as 2 + 3, which we can gather up into our
result of { 4, 5 }.

The various regression tests in this commit exercise various
combinations of constants and value-sets, value-sets of different size,
and so on. All the tests use addition, but the operator is immaterial.
Splendid side-effect of the expression_transform work :)
Do this by ensuring new object we return are created though the factory,
rather than instantiated directly.
Several regression tests needed updating as a consequence of this change,
generally tweaking the regex to allow the context info in place of a
hard line end.
Reduce duplication, separated object create from the context object that
may wrap it.
I'm not super thrilled about how I've done this, because ideally all
object creation should go through the object factory. However, in this
particular case a large value set could be converted to an interval. If
we hand off to the factory it'll just create another value-set, so we
need to explicitly create the interval. However, when we do that we
don't have the context wrapper (write location or data dependency).

Consequently, I've added variable_sensitivity_object_factoryt::wrap_with_context
and exposed it through abstract_environment so we can add the
appropriate context wrapper should we need it.
Tidy up evaluate_conditional and pull out evaluate_unary_expr
…factory

Default arguments on a virtual function aren't safe. Even through this
function isn't overriden anywhere now, it's a latent bug waiting for the
future.
Avoid shadowing member variable, make single-arg constructor explicit
value-set ordering is unstable in the face of code changes, which made
tests fragile. Sorting output fixes that and is also easier on the human
eye too.

We're sorting string representations, so the output isn't a true
numerical sort but it's good enough.
value-sets and the data-dependency configuration can be safely combined
@martin-cs
Copy link
Collaborator

@thomasspriggs At the moment the only in-tree use of VSD is in goto-analyzer. However CBMC does pointer analysis and I have had a long-term goal of unifying this code with the ait abstract intepreter. Doing that would mean we could use VSD for pointer analysis and that would improve a number of performance issues with pointers.

Function pointers are a separate but related question. At the moment function pointer removal is a very syntactic process. For a number of years I have wanted (and tried) to make it more semantic. Doing so would mean we could use VSD for function pointer removal and then, yes, it would reduce the number of alternatives which could have a significant performance improvement on some code.

@jezhiggins jezhiggins force-pushed the vsd-value-set-implementation branch from 724bb13 to f363e38 Compare February 13, 2021 13:18
@jezhiggins
Copy link
Contributor Author

I'd really like the issues with commits that don't pass tests to be resolved first. Some of the non-compiling commits in the previous work which was merged in that huge PR have already caused us problems when bisecting the history. I'm OK with the non-compiling commits in that PR of old work, because its better than leaving to rot forever. However I think we should be trying to do better for new work. Especially when all that is required is squashing a fix commit in to the correct preceding commit. I'm perfectly fine with the rest being at the discretion of those who know this part of the code base best.

I understand 👍 I've gone through the PR, fixing up those problematic commits. It should be good to merge now.

@martin-cs martin-cs merged commit 268f886 into diffblue:develop Feb 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants