Skip to content

Conversation

@AndersTrier
Copy link
Contributor

When reworking my Rust implementation of leopard16, I realized that the last loop in fwht is unreachable.

On the last iteration of the outermost loop, dist4 == order, after which we update the value of dist to dist4, making us unable to enter the body of the last loop.

order is always either 256 or 16384, and

In [2]: 4<<2<<2<<2
Out[2]: 256
In [5]: 4<<2<<2<<2<<2<<2<<2
Out[5]: 16384

On a related note: I tried to optimize fwht further by using 8 values at a time (fwht_8). It was slightly faster on x86, but slower on ARM.
fwht_16 was slower on both architectures.

@AndersTrier AndersTrier marked this pull request as draft March 11, 2024 11:00
as we're always working on an array of size `order`
@AndersTrier AndersTrier force-pushed the AndersTrier/leopard_fwht_del_dead_code branch from 4b07afa to b32e610 Compare March 11, 2024 11:10
@AndersTrier AndersTrier marked this pull request as ready for review March 11, 2024 11:14
@klauspost
Copy link
Owner

Looks like a nice cleanup! btw, added your repo to the README since I assume they are compatible.

@klauspost
Copy link
Owner

wrt unrolling, yeah I settled on 4x since 8x didn't give a significant speedup and didn't want to choke old platforms with limited registers more than needed.

@AndersTrier
Copy link
Contributor Author

Looks like a nice cleanup! btw, added your repo to the README since I assume they are compatible.

Thanks!

I would have to test wrt. compatibility, as I don't think high rate and low rate are compatible, and sometimes you can choose either.

@AndersTrier
Copy link
Contributor Author

wrt unrolling, yeah I settled on 4x since 8x didn't give a significant speedup and didn't want to choke old platforms with limited registers more than needed.

Did you consider implementing it with SIMD?

I don't know if it's worth it, but I found some code to get you started if you want to give it a try ;)
https://github.com/catid/leopard/blob/master/docs/vector_fwht_4.txt
https://github.com/paritytech/reed-solomon-novelpoly/blob/df906e1ca27dc6c0c1b663f1653f57d4620f03dd/reed-solomon-novelpoly/src/field/inc_log_mul.rs#L118
https://stackoverflow.com/a/54133143

@klauspost
Copy link
Owner

Yeah, it could be done... It could actually be rather effective AFAICT. But it's been too long since I looked at this to judge if/when it would provide a speedup.

@klauspost klauspost merged commit 85a5e93 into klauspost:master Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants