https://github.com/augustunderground/edelwace
Reinforcement Learning Agents for Analog Circuit Sizing in Haskell.
https://github.com/augustunderground/edelwace
analog-circuit deep-reinforcement-learning haskell hasktorch hindsight-experience-replay ppo prioritized-experience-replay sac td3 torch
Last synced: 10 months ago
JSON representation
Reinforcement Learning Agents for Analog Circuit Sizing in Haskell.
- Host: GitHub
- URL: https://github.com/augustunderground/edelwace
- Owner: AugustUnderground
- License: bsd-3-clause
- Created: 2022-02-18T17:28:42.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2023-03-12T20:03:52.000Z (about 3 years ago)
- Last Synced: 2025-03-06T16:36:56.876Z (about 1 year ago)
- Topics: analog-circuit, deep-reinforcement-learning, haskell, hasktorch, hindsight-experience-replay, ppo, prioritized-experience-replay, sac, td3, torch
- Language: Haskell
- Homepage: https://augustunderground.github.io/edelwace/
- Size: 979 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: ChangeLog.md
- License: LICENSE
Awesome Lists containing this project
README
# EDELWAC²E
Reinforcement Learning Agents for
[GAC²E](https://github.com/AugustUnderground/gace) through
[Hym](https://github.com/AugustUnderground/hym) with
[HaskTorch](https://github.com/hasktorch/hasktorch).
## Setup
LibTorch is required, as per HaskTorch Documentation, and must be symlinked
into this directory. Then source `setenv` in your shell.
For training, [Hym](https://github.com/AugustUnderground/hym) must be up
and running.
For tracking, [mlflow](https://www.mlflow.org) and
[mlflow-hs](https://github.com/AugustUnderground/mlflow-hs) must be installed.
```bash
$ source setenv
$ stack build
```
## Usage
With default options
```bash
$ stack run
```
otherwise
```bash
$ stack exec -- edelwace-exe [options]
```
```
Usage: edelwace-exe [-l|--algorithm ALGORITHM] [-H|--host HOST] [-P|--port PORT]
[-i|--ace ID] [-p|--pdk PDK] [-v|--var VARIANT]
[-a|--act ACTIONS] [-o|--obs OBSERVATIONS] [-f|--path FILE]
[-T|--tracking-host HOST] [-R|--tracking-port PORT]
GACE RL Trainer
Available options:
-l,--algorithm ALGORITHM DRL Algorithm, one of sac, td3, ppo (default: "sac")
-H,--host HOST Hym server host address (default: "localhost")
-P,--port PORT Hym server port (default: "7009")
-i,--ace ID ACE OP ID (default: "op2")
-p,--pdk PDK ACE Backend (default: "xh035")
-v,--var VARIANT GACE Environment Variant (default: "0")
-a,--act ACTIONS Dimensions of Action Space (default: 10)
-o,--obs OBSERVATIONS Dimensions of Observation Space (default: 39)
-f,--path FILE Checkpoint File Path (default: "./models")
-T,--tracking-host HOST MLFlow tracking server host address
(default: "localhost")
-R,--tracking-port PORT MLFlow tracking server port (default: "5000")
-h,--help Show this help text
```
### Dependencies
- hasktorch
- libtorch-ffi
- mtl
- wreq
- aeson
- optparse-applicative
- mlflow-hs
## Algorithms
[Haddock](https://augustunderground.github.io/edelwace/) is availbale.
**Caution:** Excessive use of Unicode and Strictness.
### Soft Actor Critic (SAC)
[Arxiv](https://arxiv.org/abs/1812.05905v2)
Soft Actor Critic (SAC) Agent for continuous action space. Start with `-l sac`
and `-v 0` for continuous electrical design space.
It appears that state scaling / standardization makes things worse for SAC. The
loss steadily increases and no learning occurs.
### Proximal Policy Optimization (PPO)
[Arxiv](https://arxiv.org/abs/1707.06347)
Proximal Policy Optimization (PPO) Agent for discrete and continuous action
spaces. Start with `-l ppo` and `-v 2` for discrete electrical design space.
Dscrete PPO needs about ~4k steps before plateauing around an average reward of
~0.4. The area is way smaller than the target, while offset is not quite
reached.

### Twin Delayed Deep Deterministic Policy Gradient (TD3)
[Arxiv](https://arxiv.org/abs/1802.09477)
Twin Delayed Deep Deterministic Policy Gradient (TD3) Agent for continuous
action space. Start with `-l td3` and `-v 0` for continuous electrical design
space.
### Prioritized Experience Replay (PER)
[Arxiv](https://arxiv.org/abs/1511.05952)
Only implemented in SAC and deactivated for the moment. To quote ERE Paper:
> We show that SAC+PER can marginally improve the sample efficiency performance
> of SAC, but much less so than SAC+ERE.
### Emphasizing Recent Experience (ERE)
[Arxiv](https://arxiv.org/abs/1906.04009)
...
### Hindsight Experience Replay (HER)
[Arxiv](https://arxiv.org/abs/1707.01495)
...
## Results
...
## TODO
- [X] Implement SAC
- [X] Implement TD3
- [X] Implement PPO
- [X] Implement PER
- [X] Implement ERE
- [X] Implement SAC+PER
- [X] Implement SAC+ERE
- [ ] Implement SAC+ERE+PER
- [X] Implement HER
- [ ] Implement TD3+HER
- [ ] Wait for Normal Distribution in HaskTorch
- [ ] Remove strictness where unecessary
- [X] Add agent loading ability
- [X] Command Line Options
- [X] MLFlow tracking
- [X] Visualization (MLFlow?)