-
Notifications
You must be signed in to change notification settings - Fork 18
Feat/read sfcf multi #210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/read sfcf multi #210
Conversation
Hi, |
Thanks for the quick answer. Yes, you are right, however at the complexity of the loops, I do not have data, in which I could test all loops at once, except for the one already checked in the test. I could come up with tests that use the benchmarks. These iterate over correlators and quarks, but not wavefunctions and offsets. |
Something I forgot to mention is that the benchmarks are done in the compact format of sfcf, where I expect the largest benefit. The other outputs are in some way already sorted by correlator, such that there only the number of wavefunctions, quarks and offsets plays a role. |
Thanks for this PR! I just glimpsed over the code and saw sets of nested for loops in many places. Is there maybe a way to refactore these? |
Hi, mmmm... for now I have spent my time organising the loops, as this is basically all that is done here. |
Hi, so the basic idea to make things in the loops easier, I am now using python-benedict (https://github.com/fabiocaccamo/python-benedict) to map the dict keys. |
Hi Justus, sorry for getting back to you so late. I had a look at your changes and I would prefer not to introduce additional dependencies unless necessary. If I understand it correctly you only use import itertools
names = ["f1", "fA"]
quarks = ["u", "d", "s"]
offs = ["1", "2", "8"]
sep = "/" # Separator char
# Create dict with indentifiers separated by sep (for example 'f1/d/2')
corr_dict = {}
for tup in itertools.product(names, quarks, offs):
corr_dict[sep.join(tup)] = []
# Retrieve entries
for key, value in corr_dict.items():
name, quark, off = key.split(sep)
# Process value here Let me know what you think. |
Hi Fabian, In the meanwhile I have tested the new method a bit more and checked the deltas read by the old and the new method for an |
One thing, that we would have to do by hand in the solution you suggest is the "undoing" of the concat of keys, such that we have the original dict structure in the end. I think this would be essential and less confusing for the user. |
What do you mean by original dict structure? Can you explain what the dict that is returned should look like? |
Ah sorry, of course I can. I structured the returned dictionary in the following way: |
Okay, I guess if you want to stick with the nested dict than there is no benefit in using a different data structure for the computation. |
Hi, I reverted the changes that introduced benedict. Also, I found a bug in one of the tests and excluded the "nice output" feature, that i built earlier, as it was not stable enough. |
Hi, I think, as soon as you fix the linting errors, you are good to go to merge the pull request when you are confident that the checks catch (almost) all possible mistakes. |
I fully agree with Simon. I'm also concernced about maintainability and readability and would prefer a more concise version (for example using itertools). But then again this is very specialised code which we might never touch again and which is not performance critical so I would also be fine with merging the changes when flake8 is happy. |
Hi,
Maybe addressing these two things could benefit the code? |
Sounds good, your call if it's worth investing the time to rewrite stuff! |
Hi, so finally I refactored everything as Fabian suggested in an earlier comment. as the result is put in its final norm in the last few lines of the method, I use the "flat" structure all the way through until there. I then added the option to either use nested dicts or a flat dict as the result. |
pyerrors/input/sfcf.py
Outdated
|
||
|
||
def _lists2key(*lists): | ||
sep = "/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that you define the separator char in two places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, you are right... That comes from trying somewhere else, then copying...
Hey there,
so as I mentioned in a previous pull request, I wanted to rewrite the read_sfcf method again.$\times$ #cnfg
So... I did. There is now a method "read_sfcf_multi", which is a superset of the "read_sfcf" method.
The read_sfcf method is now simply routed through the new method.
All of this was done for the following reason: I have tried to accelerate the reading of sfcf measurements in my projects for some time.
Finally, I found that the main time is spent in IO, as per read every file is opened once.
As it is usual to not read one correlator at a time, but e.g. 8 correlators at a time, this results in 8
open(file)
calls, which needs to retrieve the file from disk and loads it in the buffer.Instead, now I am doing the following:
I directly read all relevant correlators from a file, then move on to the next file. Therefore I only need #cnfg
open(file)
calls. The drawback being that afterwards I have to manage unpacking a python dict, which can be done much faster.Here some benchmarks taken with
timeit
:#cnfg 2146
Old implementation:
#corrs 72
read_sfcf 70.942 s
New implementation:
#corrs 72
read_sfcf 71.462 s
read_sfcf_multi 4.777 s
#corrs 36
read_sfcf 34.308 s
read_sfcf_multi 2.280 s
#corrs 20
read_sfcf 18.421 s
read_sfcf_multi 1.653 s
This new functionality is not yet tested extensively (mainly I have done the tests by pytest that are also in this PR and the benchmarks on a set of measurements that I have for a project of mine).