https://github.com/sshine/evm-opcodes
Opcode types for Ethereum Virtual Machine (EVM)
https://github.com/sshine/evm-opcodes
hacktoberfest haskell
Last synced: 1 day ago
JSON representation
Opcode types for Ethereum Virtual Machine (EVM)
- Host: GitHub
- URL: https://github.com/sshine/evm-opcodes
- Owner: sshine
- License: mit
- Created: 2020-04-03T09:30:58.000Z (almost 6 years ago)
- Default Branch: main
- Last Pushed: 2026-01-17T08:40:39.000Z (9 days ago)
- Last Synced: 2026-01-17T19:20:22.077Z (9 days ago)
- Topics: hacktoberfest, haskell
- Language: Haskell
- Homepage:
- Size: 155 KB
- Stars: 6
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# evm-opcodes
[](http://opensource.org/licenses/MIT)
[](https://hackage.haskell.org/package/evm-opcodes)
[](http://stackage.org/nightly/package/evm-opcodes)
[](http://stackage.org/lts/package/evm-opcodes)
[](https://github.com/sshine/evm-opcodes/actions/workflows/haskell.yml)
This Haskell library provides opcode types for the Ethereum Virtual Machine (EVM).
The library has one parameterised type, `Opcode' j` where `j` is the annotation
for the jump-related instructions `JUMP`, `JUMPI` and `JUMPDEST`, and it has
three concrete variants:
- `type Opcode = Opcode' ()`
- `type PositionalOpcode = Opcode' Word`
- `type LabelledOpcode = Opcode' Text`
The library has a fixpoint algorithm that translates labelled jumps into
positional jumps, and it has another function that translates those positional
jumps into plain EVM opcodes where a constant is pushed before a jump is made.
The constant that is pushed depends on the position of `JUMPDEST` which depends
on the constant that is pushed, hence the need for fixpoint iteration.
When pushing a constant to the stack, EVM uses `push0`, `push1`, `push2`, ...,
`push32` where the number 0-32 refers to how many bytes the constant occupies.
Instead of having 32 unique push commands, this library has a single `PUSH
!Word256` constructor that serializes to the right `push1`, `push2`, etc.
## Example
Imagine translating the following C program to EVM opcodes:
```
int x = 1;
while (x != 0) { x *= 2 };
```
Since EVM is stack-based, let's put `x` on the stack.
```haskell
λ> import EVM.Opcode
λ> import EVM.Opcode.Labelled as L
λ> import EVM.Opcode.Positional as P
λ> let opcodes = [PUSH 1,JUMPDEST "loop",DUP1,ISZERO,JUMPI "end",PUSH 2,MUL,JUMP "loop",JUMPDEST "end"]
λ> L.translate opcodes
Right [PUSH 1,JUMPDEST 2,DUP1,ISZERO,JUMPI 14,PUSH 2,MUL,JUMP 2,JUMPDEST 14]
λ> P.translate <$> L.translate opcodes
Right [PUSH 1,JUMPDEST,DUP1,ISZERO,PUSH 14,JUMPI,PUSH 2,MUL,PUSH 2,JUMP,JUMPDEST]
λ> fmap opcodeText . P.translate <$> L.translate opcodes
Right ["push1 1","jumpdest","dup1","iszero","push1 14","jumpi","push1 2","mul","push1 2","jump","jumpdest"]
```
## Accounts for size of `PUSH`es when doing absolute jumps
EVM's `jump` and `jumpi` instructions are parameterless. Instead they pop and
jump to the address on the top of the stack. In order to perform absolute jumps
in the code, it is necessary to `PUSH` an address on the stack first. This is
inconvenient, and so `PositionalOpcode` and `LabelledOpcode` are easier to use.
But what's more inconvenient is what happens to the offset of an absolute jump
when the address being jumped to crosses a boundary where its byte index can no
longer be represented by the same amount of bytes.
Take for example this EVM code:
```
0x00: push1 255
0x02: jump
0x03: stop
0x04: stop
0x05: stop
...
0xfe: stop
0xff: jumpdest
```
which can be represented with the following `LabelledOpcode`:
```haskell
λ> import EVM.Opcode
λ> import EVM.Opcode.Labelled as L
λ> import EVM.Opcode.Positional as P
λ> let opcodes = [JUMP "skip"] <> replicate 252 STOP <> [JUMPDEST "skip"]
λ> fmap (fmap opcodeText . P.translate) (L.translate opcodes)
Right ["push1 255","jump","stop","stop","stop",...,"jumpdest"]
```
Note especially the byte size of a `PUSH 255` vs. a `PUSH 256`:
```haskell
λ> opcodeSize (PUSH 255)
2
λ> opcodeSize (PUSH 256)
3
```
Then add another one-byte opcode between the `jump` and the `jumpdest`:
```haskell
λ> let opcodes = [JUMP "skip"] <> replicate 253 STOP <> [JUMPDEST "skip"]
λ> fmap (fmap opcodeText . P.translate) (L.translate opcodes)
Right ["push2 257","jump","stop","stop","stop",...,"jumpdest"]
```
Even though one byte was added, because the address of `jumpdest` is now
greater than 255, all references to it now take more than 2 bytes. Concretely,
one reference went from 2 bytes to 3 bytes, or rather, one `JUMP "skip"` became
a `push2 257` instead of a `push1 255`. And if there were many such `jump`s,
this amounts to a bit of book-keeping.
This happens at subsequent boundaries as well. While this library handles each
boundary the same way, it is unlikely to have EVM bytecode of more than a few
kilobytes at present time.
```haskell
λ> let opcodes = [JUMP "skip"] <> replicate 65532 STOP <> [JUMPDEST "skip"]
λ> fmap (fmap opcodeText . P.translate) (L.translate opcodes)
Right ["push3 65537","jump","stop","stop","stop",...,"jumpdest"]
```