Skip to content

Bugfix in running TapedFunction. #135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 26, 2022
Merged

Bugfix in running TapedFunction. #135

merged 4 commits into from
Mar 26, 2022

Conversation

yebai
Copy link
Member

@yebai yebai commented Mar 25, 2022

I noticed a subtle bug while investigating the performance benchmarks in #133. It turns out that we never run the taped function in @btime (more precisely, tf is only run once) because the tf.counter variable is not reset after each run.

Benchmark results for this PR:

benchmarking rosenbrock...
  Run Original Function:  390.527 μs (16 allocations: 6.10 MiB)
  Run TapedFunction:  479.145 μs (282 allocations: 6.11 MiB)
  Run TapedFunction (compiled):  497.135 μs (309 allocations: 6.11 MiB)
benchmarking ackley...
  Run Original Function:  1.396 ms (0 allocations: 0 bytes)
  Run TapedFunction:  770.484 ms (16698219 allocations: 445.53 MiB)
  Run TapedFunction (compiled):  743.913 ms (18498246 allocations: 592.02 MiB)
benchmarking matrix_test...
  Run Original Function:  95.125 μs (16 allocations: 469.22 KiB)
  Run TapedFunction:  121.585 μs (322 allocations: 477.91 KiB)
  Run TapedFunction (compiled):  122.064 μs (352 allocations: 480.67 KiB)
benchmarking neural_net...
  Run Original Function:  552.198 ns (4 allocations: 576 bytes)
  Run TapedFunction:  4.707 μs (75 allocations: 3.06 KiB)
  Run TapedFunction (compiled):  4.596 μs (83 allocations: 3.73 KiB)

Before this PR (copied from #133 (comment))

benchmarking rosenbrock...
  Run Original Function:  998.907 μs (16 allocations: 6.10 MiB)
  Run TapedFunction:  23.594 ns (0 allocations: 0 bytes)
  Run TapedFunction (compiled):  38.710 ns (1 allocation: 48 bytes)
benchmarking ackley...
  Run Original Function:  2.027 ms (0 allocations: 0 bytes)
  Run TapedFunction:  23.494 ns (0 allocations: 0 bytes)
  Run TapedFunction (compiled):  40.625 ns (1 allocation: 48 bytes)
benchmarking matrix_test...
  Run Original Function:  154.501 μs (16 allocations: [469]
  Run TapedFunction:  22.066 ns (0 allocations: 0 bytes)
  Run TapedFunction (compiled):  38.005 ns (1 allocation: 48 bytes)
benchmarking neural_net...
  Run Original Function:  708.276 ns (4 allocations: 576 bytes)
  Run TapedFunction:  28.743 ns (0 allocations: 0 bytes)
  Run TapedFunction (compiled):  48.381 ns (1 allocation: 48 bytes)

@yebai yebai requested a review from KDr2 March 25, 2022 22:18
@KDr2
Copy link
Member

KDr2 commented Mar 25, 2022

Hmmm... this result makes much more sense now...

@yebai yebai merged commit 6448ba7 into master Mar 26, 2022
@delete-merged-branch delete-merged-branch bot deleted the bugfix branch March 26, 2022 09:31
@yebai
Copy link
Member Author

yebai commented Mar 26, 2022

Hmmm... this result makes much more sense now...

@KDr2 It might be worth analysing the new benchmark results to improve the implementation (e.g. type stability, allocation). We can talk a bit more when we meet. I think TapedFunction should target a similar performance of ReverseDiff.Tape's forward pass.

This was referenced Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants