-
Notifications
You must be signed in to change notification settings - Fork 55
New page added for Research Area "Automatic Differentiation" #163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
vgvassilev
merged 6 commits into
compiler-research:master
from
QuillPusher:new_page_for_ad
May 3, 2024
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
bf7d11e
New page added for Research Area "AD"
QuillPusher 564b74c
Changes suggested in David's Review Comments
QuillPusher 050fcdb
Improve content for automatic differentiation page
vaithak 3794da9
Added images and ran through grammar tool
QuillPusher f394f80
Changes after Vassil's review
QuillPusher f218a8a
Update _pages/automatic_differentiation.md
vgvassilev File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,173 @@ | ||
--- | ||
title: "Automatic Differentiation" | ||
layout: gridlay | ||
excerpt: "Automatic Differentiation is a general and powerful technique | ||
of computing partial derivatives (or the complete gradient) of a function inputted as a | ||
computer program." | ||
sitemap: true | ||
permalink: /automatic_differentiation | ||
--- | ||
|
||
## Automatic differentiation | ||
|
||
Automatic Differentiation (AD) is a general and powerful technique for | ||
computing partial derivatives (or the complete gradient) of a function | ||
inputted as a computer program. | ||
|
||
It takes advantage of the fact that any computation can be represented as a | ||
composition of simple operations / functions - this is generally represented | ||
in a graphical format and referred to as the [computation | ||
graph](https://colah.github.io/posts/2015-08-Backprop/). AD works by | ||
repeatedly applying the chain rule over this graph. | ||
|
||
### Understanding Differentiation in Computing | ||
|
||
Efficient computation of gradients is a crucial requirement in the fields of | ||
scientific computing and machine learning, where approaches like [Gradient | ||
Descent](https://en.wikipedia.org/wiki/Gradient_descent) are used to | ||
iteratively converge over the optimum parameters of a mathematical model. | ||
|
||
Within the context of computing, there are various methods for | ||
differentiation: | ||
|
||
- **Manual Differentiation**: This consists of manually applying the rules of | ||
differentiation to a given function. While straightforward, it can be | ||
tedious and error-prone, especially for complex functions. | ||
|
||
- **Numerical Differentiation**: This method approximates the derivatives | ||
using finite differences. It is relatively simple to implement but can | ||
suffer from numerical instability and inaccuracy in its results. It doesn't | ||
scale well with the number of inputs in the function. | ||
|
||
- **Symbolic Differentiation**: This approach uses symbolic manipulation to | ||
compute derivatives analytically. It provides accurate results but can lead | ||
to lengthy expressions for large computations. It requires the computer | ||
program to be representable in a closed-form mathematical expression, and | ||
thus doesn't work well with control flow scenarios (if conditions and loops) | ||
in the program. | ||
|
||
- **Automatic Differentiation (AD)**: Automatic Differentiation is a general | ||
and an efficient technique that works by repeated application of the chain | ||
rule over the computation graph of the program. Given its composable nature, | ||
it can easily scale for computing gradients over a very large number of | ||
inputs. | ||
|
||
### Forward and Reverse mode AD | ||
Automatic Differentiation works by applying the chain rule and merging the | ||
derivatives at each node of the computation graph. The direction of this graph | ||
traversal and derivative accumulation results in two approaches: | ||
|
||
- Forward Mode, Tangent Mode: starts the accumulation from the input | ||
parameters towards the output parameters in the graph. This means that we | ||
apply the chain rule to the inner functions first. That approach | ||
calculates derivatives of output(s) with respect to a single input | ||
variable. | ||
|
||
 | ||
|
||
- Reverse Mode, Adjoint Mode: starts at the output node of the graph and moves backward | ||
towards all the input nodes. For every node, it merges all paths that | ||
originated at that node. It tracks how every node affects one output. Hence, | ||
it calculates the derivative of a single output with respect to all inputs | ||
simultaneously - the gradient. | ||
|
||
 | ||
|
||
### Automatic Differentiation in C++ | ||
|
||
Automated Differentiation implementations are based on [two major techniques]: | ||
Operator Overloading and Source Code Transformation. Compiler Research Group's | ||
focus has been on exploring the [Source Code Transformation] technique, which | ||
involves constructing the computation graph and producing a derivative at | ||
compile time. | ||
|
||
[The source code transformation approach] enables optimization by retaining | ||
all the complex knowledge of the original source code. The compute graph is | ||
constructed during compilation and then transformed to generate the derivative | ||
code. The drawback of that approach in many implementations is that, it | ||
typically uses a custom parser to build code representation and produce the | ||
transformed code. It is difficult to implement (especially in C++), but it is | ||
very efficient, since many computations and optimizations can be done ahead of | ||
time. | ||
|
||
### Advantages of using Automatic Differentiation | ||
|
||
- Automatic Differentiation can calculate derivatives without any [additional | ||
precision loss]. | ||
|
||
- It is not confined to closed-form expressions. | ||
|
||
- It can take derivatives of algorithms involving conditionals, loops, and | ||
recursion. | ||
|
||
- It can be easily scaled for functions with a very large number of inputs. | ||
|
||
### Automatic Differentiation Implementation with Clad - a Clang Plugin | ||
|
||
Implementing Automatic Differentiation from the ground up can be challenging. | ||
However, several C++ libraries and tools are available to simplify the | ||
process. The Compiler Research Group has been working on [Clad], a C++ library | ||
that enables Automatic Differentiation using the LLVM compiler infrastructure. | ||
It is implemented as a plugin for the Clang compiler. | ||
|
||
[Clad] operates on Clang AST (Abstract Syntax Tree) and is capable of | ||
performing C++ Source Code Transformation. When Clad is given the C++ source | ||
code of a mathematical function, it can algorithmically generate C++ code for | ||
the computing derivatives of that function. Clad has comprehensive coverage of | ||
the latest C++ features and a well-rounded fallback and recovery system in | ||
place. | ||
|
||
**Clad's Key Features**: | ||
|
||
- Support for both, Forward Mode and Reverse Mode Automatic Differentiation. | ||
|
||
- Support for differentiation of the built-in C input arrays, built-in C/C++ | ||
scalar types, functions with an arbitrary number of inputs, and functions | ||
that only return a single value. | ||
|
||
- Support for loops and conditionals. | ||
|
||
- Support for generation of single derivatives, gradients, Hessians, and | ||
Jacobians. | ||
|
||
- Integration with CUDA for GPU programming. | ||
|
||
- Integration with Cling and ROOT for high-energy physics data analysis. | ||
|
||
### Clad Benchmarks (while using Automatic Differentiation) | ||
|
||
[Benchmarks] show that Clad is numerically faster than the conventional | ||
Numerical Differentiation methods, providing Hessians that are 450x (~dim/25 | ||
times faster). [General benchmarks] demonstrate a 3378x improvement in speed | ||
with Clad (compared to Numerical Differentiation) based on central | ||
differences. | ||
|
||
For more information on Clad, please view: | ||
|
||
- [Clad - Github Repository](https://github.com/vgvassilev/clad) | ||
|
||
- [Clad - ReadTheDocs](https://clad.readthedocs.io/en/latest/) | ||
|
||
- [Clad - Video Demo](https://www.youtube.com/watch?v=SDKLsMs5i8s) | ||
|
||
- [Clad - PDF Demo](https://indico.cern.ch/event/808843/contributions/3368929/attachments/1817666/2971512/clad_demo.pdf) | ||
|
||
- [Clad - Automatic Differentiation for C++ Using Clang - Slides](https://indico.cern.ch/event/1005849/contributions/4227031/attachments/2221814/3762784/Clad%20--%20Automatic%20Differentiation%20in%20C%2B%2B%20and%20Clang%20.pdf) | ||
|
||
- [Automatic Differentiation in C++ - Slides](https://compiler-research.org/assets/presentations/CladInROOT_15_02_2020.pdf) | ||
|
||
|
||
|
||
[Clad]: https://compiler-research.org/clad/ | ||
|
||
[Benchmarks]: https://compiler-research.org/assets/presentations/CladInROOT_15_02_2020.pdf | ||
|
||
[General benchmarks]: https://indico.cern.ch/event/1005849/contributions/4227031/attachments/2221814/3762784/Clad%20--%20Automatic%20Differentiation%20in%20C%2B%2B%20and%20Clang%20.pdf | ||
|
||
[additional precision loss]: https://compiler-research.org/assets/presentations/CladInROOT_15_02_2020.pdf | ||
|
||
[Source Code Transformation]: https://compiler-research.org/assets/presentations/V_Vassilev-SNL_Accelerating_Large_Workflows_Clad.pdf | ||
|
||
[two major techniques]: https://compiler-research.org/assets/presentations/G_Singh-MODE3_Fast_Likelyhood_Calculations_RooFit.pdf | ||
|
||
[The source code transformation approach]: https://compiler-research.org/assets/presentations/I_Ifrim-EuroAD21_GPU_AD.pdf |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, we can add some images for forward and reverse mode AD, even the one on the Wikipedia page should be good enough.