Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/laugharne/optimal_function_names_en
Optimizing gas costs is a key challenge in the development of smart contracts on the Ethereum blockchain, as every operation carried out on Ethereum incurs a gas cost.
https://github.com/laugharne/optimal_function_names_en
abi assembly bytecode function-dispatcher gas gas-optimization keccak-256 keccak256 optimization solidity yul yul-assembly
Last synced: about 1 month ago
JSON representation
Optimizing gas costs is a key challenge in the development of smart contracts on the Ethereum blockchain, as every operation carried out on Ethereum incurs a gas cost.
- Host: GitHub
- URL: https://github.com/laugharne/optimal_function_names_en
- Owner: Laugharne
- Created: 2023-12-22T15:40:34.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-01-24T15:18:13.000Z (12 months ago)
- Last Synced: 2024-01-24T16:34:50.680Z (12 months ago)
- Topics: abi, assembly, bytecode, function-dispatcher, gas, gas-optimization, keccak-256, keccak256, optimization, solidity, yul, yul-assembly
- Homepage:
- Size: 483 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Optimization on Ethereum: Make a Difference with Function Names
- [Optimization on Ethereum: Make a Difference with Function Names](#optimization-on-ethereum-make-a-difference-with-function-names)
- [TL;DR](#tldr)
- [Introduction](#introduction)
- [Selectors and signatures](#selectors-and-signatures)
- [Introduction to the "Function Dispatcher"](#introduction-to-the-function-dispatcher)
- [Operation](#operation)
- [In Solidity](#in-solidity)
- [Reminder on Solidity Function Visibilities](#reminder-on-solidity-function-visibilities)
- [During Compilation](#during-compilation)
- [Generated Code](#generated-code)
- [Diagram](#diagram)
- [Evaluation Order](#evaluation-order)
- [Automatic Getter](#automatic-getter)
- [In Yul](#in-yul)
- [An increasing Complexity!](#an-increasing-complexity)
- [Influence of the Runs Level](#influence-of-the-runs-level)
- [Eleven Functions and a Thousand Runs](#eleven-functions-and-a-thousand-runs)
- [Pseudo-code](#pseudo-code)
- [Gas Cost Calculation](#gas-cost-calculation)
- [Consumption Statistics](#consumption-statistics)
- [Algorithms and Processing Order](#algorithms-and-processing-order)
- [Linear Search runs = 200](#linear-search-runs--200)
- [Fractional Search runs = 1000](#fractional-search-runs--1000)
- [Optimizations](#optimizations)
- [Execution Cost Optimization](#execution-cost-optimization)
- [Intrinsic Cost Optimization](#intrinsic-cost-optimization)
- [Examples of Gains on Intrinsic Costs:](#examples-of-gains-on-intrinsic-costs)
- [Select0r](#select0r)
- [Conclusions](#conclusions)
- [Additional resources](#additional-resources)## TL;DR
1. *Cost optimization in gas is crucial for smart contracts on Ethereum.*
2. *The "function dispatcher" manages the execution of functions in smart contracts for EVMs.*
3. *Solidity compiler generates the "function dispatcher" for publicly exposed functions, whereas in Yul, it needs to be coded.*
4. *Signatures, hashes, and footprints of functions are determined by their names and parameter types.*
5. *The compiler's optimization setting and the number of functions impact the function selection algorithm.*
6. *Strategic renaming of functions optimizes gas costs and the selection order, influenced by footprint values.*## Introduction
Cost optimization in gas is a key challenge in the development of smart contracts on the Ethereum blockchain, as each operation on Ethereum incurs a gas cost. This article is the translation of [Optimisation sur Ethereum : Faites la différence avec les noms de fonctions](https://medium.com/@franck.maussand/optimisation-sur-ethereum-faites-la-diff%C3%A9rence-avec-les-noms-de-fonctions-ba4692c9e39f) (🇫🇷).
**Reminder :**
- The **bytecode** represents a smart contract on the blockchain as a sequence of hexadecimal values.
- The Ethereum Virtual Machine (**EVM**) executes instructions by reading this bytecode during interactions with the contract.
- Each elementary instruction, encoded in one byte, is called an **opcode** and has a gas cost reflecting the resources required for its execution.
- A compiler translates this source code into bytecode executable by the EVM and provides elements such as the Application Binary Interface (ABI).
- An **ABI** defines how a contract's functions should be called and data exchanged, specifying the data types of arguments and the functions' signatures.In this article, we will explore how simply naming your functions can influence the gas costs associated with your contract.
We will also discuss various optimization strategies, from the order of signature hashes to function renaming tricks, to reduce costs associated with interactions with your contracts.
**Details :**
This article is based on:
1. **Solidity** code (0.8.13, 0.8.17, 0.8.20, 0.8.22)
2. Compiled using the `solc` compiler
3. For **EVMs** on **Ethereum**The following concepts will be covered:
- The signature: the numerical identifier of a function within the EVM.
- The "*function dispatcher*": the mechanism for selecting a function within a contract.
- And the function name as an argument (on the caller side).## Selectors and signatures
The **signature** of a function as used with the Ethereum Virtual Machines (**EVMs**) (Solidity) consists of the concatenation of its name and parameter types (excluding return type and spaces).
The **function selector** is the unique identifier for the function. In Solidity, this involves the 4 most significant bytes (32 bits) of the result of hashing the function's signature with the [**Keccak-256 algorithm**](https://www.geeksforgeeks.org/difference-between-sha-256-and-keccak-256/).
This is based on the [**Solidity ABI specifications**](https://docs.soliditylang.org/en/develop/abi-spec.html#function-selector).
I would like to emphasize again that I am referring to the function selector for the **solc compiler for Solidity**, and this might not be the case for other languages like **Rust**, which operates on a completely different paradigm.
Considering parameter types is essential to differentiate functions with the same name but different parameters, as seen in the `safeTransferFrom` method of [**ERC721 tokens**](https://eips.ethereum.org/EIPS/eip-721).
However, the fact that only **four bytes** are retained for the function selector implies potential **hash collision risks** between two functions—a rare but existing risk despite over 4 billion possibilities (2^32).
As evidenced by the [**Ethereum Signature Database**](https://www.4byte.directory/signatures/?bytes4_signature=0xcae9ca51) with the following example:
| Function selectors | Signatures |
| ------------------ | ------------------------------------------------------------ |
| `0xcae9ca51` | `onHintFinanceFlashloan(address,address,uint256,bool,bytes)` |
| `0xcae9ca51` | `approveAndCall(address,uint256,bytes)` |Fortunately, a simple Solidity contract with these two functions does not compile.
```
TypeError: Function signature hash collision for approveAndCall(address,uint256,bytes)
--> contracts/HashCollision.sol:10:1:
|
10 | contract HashCollision {
| ^ (Relevant source part starts here and spans across multiple lines).
```However, this remains problematic: Check out the challenge [**Hint-finance**](https://github.com/paradigmxyz/paradigm-ctf-2022/tree/main/hint-finance), in the [**Web3 Hacking: Paradigm CTF 2022**](https://medium.com/amber-group/web3-hacking-paradigm-ctf-2022-writeup-3102944fd6f5).
## Introduction to the "Function Dispatcher"
The "Function Dispatcher" (or function manager) in smart contracts written for the **EVMs** is a component of the contract that determines which function should be executed when someone interacts with the contract through an ABI.
In essence, the "Function Dispatcher" is like a conductor during calls to the functions of a smart contract. It ensures that the right functions are called when you perform specific actions on the contract.
### Operation
When interacting with a smart contract through a transaction, you specify which function you want to execute. The "*function dispatcher*" thus links the command to the specific function that will be called.
The function's signature is retrieved from the `calldata` during contract execution, and a `revert` occurs if the call cannot be matched with a function of the contract.
The selection mechanism is similar to that of a `switch/case` structure or a set of `if/else` statements.
### In Solidity
Applying what has been discussed above, we obtain, for the following function:
```solidity
function square(uint32 num) public pure returns (uint32) {
return num * num;
}
```The following signatures, hash, and selectors :
| Fonction | square(uint32 num) public pure returns (uint32) |
| --------- | ------------------------------------------------------------------ |
| Signature | `square(uint32)` (*1*) |
| Hash | `d27b38416d4826614087db58e4ea90ac7199f7f89cb752950d00e21eb615e049` |
| Selector | `d27b3841` |(*1*) : *Keccak-256 online calculator : [`square(uint32)`](https://emn178.github.io/online-tools/keccak_256.html?input_type=utf-8&input=square(uint32))*
In Solidity, the "*function dispatcher*" is generated by the compiler, so there's no need to handle the coding of this complex task.
It only applies to functions in a contract that are accessible from outside the contract, thus having an access attribute of external and public.
#### Reminder on Solidity Function Visibilities
1. **External**: External functions are designed to be called from **outside the contract**, typically by other contracts or external accounts. It is the visibility to expose a public interface to your contract.
2. **Public**: Public functions are accessible from **both outside and inside the contract**.
3. **Internal** and **Private**: Internal and private functions can only be called from **inside the contract** (and contracts inheriting from it in the case of internal).
**Example #1**:
```solidity
pragma solidity 0.8.13;contract MyContract {
uint256 public value;
uint256 internalValue;function setValue(uint256 _newValue) external {
value = _newValue;
}function getValue() public view returns (uint256) {
return value;
}function setInternalValue(uint256 _newValue) internal {
internalValue = _newValue;
}function getInternalValue() public view returns (uint256) {
return internalValue;
}
}
```#### During Compilation
If we revisit the previous code used as an example, we obtain the following signatures and footprints:
| Fonctions | Signatures | Keccak | Selectors |
| ------------------------------------------------------ | --------------------------- | ----------------- | -------------- |
| **`setValue(uint256 _newValue) external`** | `setValue(uint256)` | `55241077...ecbd` | **`55241077`** |
| **`getValue() public view returns (uint256)`** | `getValue()` | `20965255...ad96` | **`20965255`** |
| **`setInternalValue(uint256 _newValue) internal`** | `setInternalValue(uint256)` | `6115694f...7ce1` | **`6115694f`** |
| **`getInternalValue() public view returns (uint256)`** | `getInternalValue()` | `e778ddc1...c094` | **`e778ddc1`** |(*The hashes from Keccak have been intentionally truncated*)
If we examine the ABI generated during compilation, the function `setInternalValue()` does not appear, which is expected as its visibility is `internal` (see above).
It is noteworthy in the ABI data, the reference to the `value` storage data, which is `public` (we will come back to this later).
##### Generated Code
Here is an excerpt of the "*function dispatcher*" code generated by the `solc` compiler (Solidity version: 0.8.13). It can be observed that the numerical value of the fingerprint is retrieved from the `calldata`, and this value is then compared to the different function signatures, allowing for a "jump" to the code of the desired function.
```assembly
tag 1
JUMPDEST
POP
PUSH 4
CALLDATASIZE
LT
PUSH [tag] 2
JUMPI
PUSH 0
CALLDATALOAD
PUSH E0
SHR
DUP1
PUSH 20965255 // ◄ signature : getValue()
EQ
PUSH [tag] getValue_0
JUMPI
DUP1
PUSH 3FA4F245 // ◄ signature : value (automatic storage getters)
EQ
PUSH [tag] 4
JUMPI
DUP1
PUSH 55241077 // ◄ signature : setValue(uint256)
EQ
PUSH [tag] setValue_uint256_0
JUMPI
DUP1
PUSH E778DDC1 // ◄ signature : getInternalValue()
EQ
PUSH [tag] getInternalValue_0
JUMPI
tag 2
JUMPDEST
PUSH 0
DUP1
REVERT
```##### Diagram
In diagram form, one can better understand the selection mechanism, similar to that of a `switch/case` structure or a set of `if/else` statements.
![](assets/functions_dispatcher_diagram.png)
##### Evaluation Order
**Important**: The evaluation order of functions is not the same as their declaration order in the code!
| Evaluation Order | Order in the code | Selectors | Signatures |
| ---------------- | ------------------ | ----------- | ---------------------------- |
| 1 | **3** | `20965255` | `getValue()` |
| 2 | **1** | `3FA4F245` | `value` (*automatic getter*) |
| 3 | **2** | `55241077` | `setValue(uint256)` |
| 4 | **4** | `E778DDC1` | `getInternalValue()` |Indeed, function signature evaluations are ordered by an ascending sort on their values.
`20965255` < `3FA4F245` < `55241077` < `E778DDC1`
##### Automatic Getter
The function with the selector `3FA4F245` is actually an automatic **getter** for the public data `value`, and it is generated by the compiler. In Solidity, the compiler automatically provides a public getter for any public storage variable.
```solidity
uint256 public value;
```We can find the selection footprint (`3FA4F245`) and the function (at `tag 4` address) of the automatic getter for this variable in the opcodes.
**Selector** :
```assembly
DUP1
PUSH 3FA4F245
EQ
PUSH [tag] 4
JUMPI
```**Fonction** :
```assembly
tag 4
JUMPDEST
PUSH [tag] 11
PUSH [tag] 12
JUMP [in]
tag 11
JUMPDEST
PUSH 40
MLOAD
PUSH [tag] 13
SWAP2
SWAP1
PUSH [tag] abi_encode_tuple_t_uint256__to_t_uint256__fromStack_reversed_0
JUMP [in]
tag 13
JUMPDEST
PUSH 40
MLOAD
DUP1
SWAP2
SUB
SWAP1
RETURN
```"`getter` actually has the same code as the `getValue()` function."
```assembly
tag getValue_0
JUMPDEST
PUSH [tag] getValue_1
PUSH [tag] getValue_3
JUMP [in]
tag getValue_1
JUMPDEST
PUSH 40
MLOAD
PUSH [tag] getValue_2
SWAP2
SWAP1
PUSH [tag] abi_encode_tuple_t_uint256__to_t_uint256__fromStack_reversed_0
JUMP [in]
tag getValue_2
JUMPDEST
PUSH 40
MLOAD
DUP1
SWAP2
SUB
SWAP1
RETURN
```Demonstrating the futility of having the variable `value` with the `public` attribute in conjunction with the `getValue()` function, and also highlighting a weakness in the Solidity compiler `solc` that cannot merge the code of the two functions.
For those interested in delving deeper, here is a link to [**a detailed article**](https://medium.com/coinmonks/soliditys-cheap-public-face-b4e972e3924d) on `automatic storage getters` in Solidity.
### In Yul
Here is an excerpt from an example of a [**ERC20 contract**](https://docs.soliditylang.org/en/develop/yul.html#complete-erc20-example) entirely written in **Yul**.
While **Solidity** provides abstraction and readability, **Yul**, a lower-level language close to assembly, allows for much finer control over execution.
```yul
object "runtime" {
code {
// Protection against sending Ether
require(iszero(callvalue()))// Dispatcher
switch selector()
case 0x70a08231 /* "balanceOf(address)" */ {
returnUint(balanceOf(decodeAsAddress(0)))
}
case 0x18160ddd /* "totalSupply()" */ {
returnUint(totalSupply())
}
case 0xa9059cbb /* "transfer(address,uint256)" */ {
transfer(decodeAsAddress(0), decodeAsUint(1))
returnTrue()
}
case 0x23b872dd /* "transferFrom(address,address,uint256)" */ {
transferFrom(decodeAsAddress(0), decodeAsAddress(1), decodeAsUint(2))
returnTrue()
}
case 0x095ea7b3 /* "approve(address,uint256)" */ {
approve(decodeAsAddress(0), decodeAsUint(1))
returnTrue()
}
case 0xdd62ed3e /* "allowance(address,address)" */ {
returnUint(allowance(decodeAsAddress(0), decodeAsAddress(1)))
}
case 0x40c10f19 /* "mint(address,uint256)" */ {
mint(decodeAsAddress(0), decodeAsUint(1))
returnTrue()
}
default {
revert(0, 0)
}/* ---------- calldata decoding functions ----------- */
function selector() -> s {
s := div(calldataload(0), 0x100000000000000000000000000000000000000000000000000000000)
}...
```
It features the same cascading `if/else` structure as in the previous diagram.
Creating a contract **entirely in Yul** requires coding the "*function dispatcher*" manually, allowing one to choose the order of processing imprints and utilize algorithms beyond a simple cascading test suite.
## An increasing Complexity!
Now, here's a completely different example to illustrate that things are actually more complex!
Because depending on the **number of functions** and the **optimization level** (see: `--optimize-runs`), the Solidity compiler behaves differently!
**Example #2:**
```solidity
// SPDX-License-Identifier: GPL-3.0pragma solidity 0.8.17;
contract Storage {
uint256 numberA;
uint256 numberB;
uint256 numberC;
uint256 numberD;
uint256 numberE;// selector : C534BE7A
function storeA(uint256 num) public {
numberA = num;
}// selector : 9AE4B7D0
function storeB(uint256 num) public {
numberB = num;
}// selector : 4CF56E0C
function storeC(uint256 num) public {
numberC = num;
}// selector : B87C712B
function storeD(uint256 num) public {
numberD = num;
}// selector : E45F4CF5
function storeE(uint256 num) public {
numberE = num;
}// selector : 2E64CEC1
function retrieve() public view returns (uint256) {
return Multiply( numberA, numberB, numberC, numberD);
}}
```Here, the `storage` variables are `internal` (default attribute in Solidity), so no automatic getter will be added by the compiler.
And we indeed have 6 functions listed in the ABI JSON. The **6 following `public` functions** with their dedicated signatures:
| Fonctions | Signatures | Selectors |
| ---------------------------------------------- | ----------------- | -------------- |
| **`storeA(uint256 num) public`** | `storeA(uint256)` | **`C534BE7A`** |
| **`storeB(uint256 num) public`** | `storeB(uint256)` | **`9AE4B7D0`** |
| **`storeC(uint256 num) public`** | `storeC(uint256)` | **`4CF56E0C`** |
| **`storeD(uint256 num) public`** | `storeD(uint256)` | **`B87C712B`** |
| **`storeE(uint256 num) public`** | `storeE(uint256)` | **`E45F4CF5`** |
| **`retrieve() public view returns (uint256)`** | `retrieve()` | **`2E64CEC1`** |Based on the [**optimization level**](https://docs.soliditylang.org/en/develop/internals/optimizer.html) of the compiler, we get a different code for the "*function dispatcher*".
With a level of **200** (`--optimize-runs 200`), we obtain the type of code generated previously, with its cascading `if/else` statements.
```assembly
tag 1
JUMPDEST
POP
PUSH 4
CALLDATASIZE
LT
PUSH [tag] 2
JUMPI
PUSH 0
CALLDATALOAD
PUSH E0
SHR
DUP1
PUSH 2E64CEC1
EQ
PUSH [tag] retrieve_0
JUMPI
DUP1
PUSH 4CF56E0C
EQ
PUSH [tag] storeC_uint256_0
JUMPI
DUP1
PUSH 9AE4B7D0
EQ
PUSH [tag] storeB_uint256_0
JUMPI
DUP1
PUSH B87C712B
EQ
PUSH [tag] storeD_uint256_0
JUMPI
DUP1
PUSH C534BE7A
EQ
PUSH [tag] storeA_uint256_0
JUMPI
DUP1
PUSH E45F4CF5
EQ
PUSH [tag] storeE_uint256_0
JUMPI
PUSH 0
DUP1
REVERT
```However, with a higher level of `runs` (`--optimize-runs 300`)
```assembly
tag 1
JUMPDEST
POP
PUSH 4
CALLDATASIZE
LT
PUSH [tag] 2
JUMPI
PUSH 0
CALLDATALOAD
PUSH E0
SHR
DUP1
PUSH B87C712B
GT
PUSH [tag] 9
JUMPI
DUP1
PUSH B87C712B
EQ
PUSH [tag] storeD_uint256_0
JUMPI
DUP1
PUSH C534BE7A
EQ
PUSH [tag] storeA_uint256_0
JUMPI
DUP1
PUSH E45F4CF5
EQ
PUSH [tag] storeE_uint256_0
JUMPI
PUSH 0
DUP1
REVERT
tag 9
JUMPDEST
DUP1
PUSH 2E64CEC1
EQ
PUSH [tag] retrieve_0
JUMPI
DUP1
PUSH 4CF56E0C
EQ
PUSH [tag] storeC_uint256_0
JUMPI
DUP1
PUSH 9AE4B7D0
EQ
PUSH [tag] storeB_uint256_0
JUMPI
tag 2
JUMPDEST
PUSH 0
DUP1
REVERT
```The opcodes and the execution flow with `--optimize-runs 300` are no longer the same, as shown in the following diagram.
![](assets/functions_split_dispatcher_diagram.png)
It can be observed that the tests are "split" into two linear searches around a pivot value `B87C712B`, thereby reducing consumption for the less favorable cases of `storeB(uint256)` and `storeE(uint256)`.
### Influence of the Runs Level
Only **4 tests** for the functions `storeB(uint256)` and `storeE(uint256)`, instead of, respectively, **3 tests** and **6 tests** with the previous algorithm.
Determining the trigger for this type of optimization is more delicate; for example, the threshold for the number of functions happens to be 6 to trigger it with `--optimize-runs 284`, providing **two sets** of 3 linear test series.
When the number of functions is less than 4, the selection process is done through linear search. However, with five or more functions, the compiler splits the processing based on its optimization parameter.
[Tests on basic contracts](https://github.com/Laugharne/solc_runs_dispatcher) with 4 to 15 functions, using optimizations from 200 to 1000 executions, have demonstrated these thresholds.
The following table (resulting from these tests) shows the number of splits, indicating the number of linear searches.
**Record of the number of linear sequences based on runs level and the number of functions**
![](assets/func_runs.png)
( *F : Number of functions / R : Runs level* )
Are these thresholds (associated with `runs` values) likely to evolve with subsequent versions of the `solc` compiler?
### Eleven Functions and a Thousand Runs
Let's delve into an example for a contract with 11 functions to visualize the impact on gas consumption.
With **11 eligible functions** and a higher `runs` level of `--optimize-runs 1000`, we transition from **two ranges** (one of 6 + one of 5) to **four ranges** (three of 3 + one of 2).
### Pseudo-code
This time, I won't reproduce the opcodes and the associated diagram. To clarify the explanation, here is the execution flow in the form of *pseudo-code*, resembling code in the **C** language.
```c
// [tag 1]
// 1 gas (JUMPDEST)
if( selector >= 0x799EBD70) { // 22 = (3+3+3+3+10) gas
if( selector >= 0xB9E9C35C) { // 22 = (3+3+3+3+10) gas
if( selector == 0xB9E9C35C) { goto storeF } // 22 = (3+3+3+3+10) gas
if( selector == 0xC534BE7A) { goto storeA } // 22 = (3+3+3+3+10) gas
if( selector == 0xE45F4CF5) { goto storeE } // 22 = (3+3+3+3+10) gas
revert()
}
// [tag 15]
// 1 gas (JUMPDEST)
if( selector == 0x799EBD70) { goto storeG } // 22 = (3+3+3+3+10) gas
if( selector == 0x9AE4B7D0) { goto storeB } // 22 = (3+3+3+3+10) gas
if( selector == 0xB87C712B) { goto storeD } // 22 = (3+3+3+3+10) gas
revert()
} else {
// [tag 14]
// 1 gas (JUMPDEST)
if( selector >= 0x4CF56E0C) { // 22 = (3+3+3+3+10) gas
if( selector == 0x4CF56E0C) { goto storeC } // 22 = (3+3+3+3+10) gas
if( selector == 0x6EC51CF6) { goto storeJ } // 22 = (3+3+3+3+10) gas
if( selector == 0x75A64B6D) { goto storeH } // 22 = (3+3+3+3+10) gas
revert()
}
// [tag 16]
// 1 gas (JUMPDEST)
if( selector == 0x183301E7) { goto storeI } // 22 = (3+3+3+3+10) gas
if( selector == 0x2E64CEC1) { goto retrieve } // 22 = (3+3+3+3+10) gas
revert()
}
```The joints around the different "pivot" values are more clearly distinguished:
- With `799EBD70` as the **first pivot value**.
- Then `0x4CF56E0C` & `0xB9E9C35C` as **secondary pivot values**.### Gas Cost Calculation
I used the code of a Solidity contract with **11 eligible functions** for the "*function dispatcher*" as a reference to estimate the gas cost of the selection, depending on whether it's a linear or fractioned search.
It's only the **cost of selection** in the "*function dispatcher*" and not the execution of functions that is estimated. We don't concern ourselves with what the function does or how much gas it consumes, nor with the code that extracts the function's signature by fetching data from the `calldata` area.
The estimation of gas costs for the used opcodes was done with the assistance of the following sites:
- [**Ethereum Yellow Paper**](https://ethereum.github.io/yellowpaper/paper.pdf) (Berlin version)
- [**EVM Codes - An Ethereum Virtual Machine Opcodes Interactive Reference**](https://www.evm.codes/?fork=shanghai)The relevant **opcodes** in play for our purposes are as follows:
| Mnemonic | Gas | Description |
| ------------------ | --- | --------------------------------------- |
| `JUMPDEST` | 1 | Mark valid jump destination. |
| `DUP1` | 3 | Clone 1st value on stack |
| `PUSH4 0xXXXXXXXX` | 3 | Push 4-byte value onto stack. |
| `GT` | 3 | Greater-than comparison. |
| `EQ` | 3 | Equality comparison. |
| `PUSH [tag]` | 3 | Push 2-byte value onto stack. |
| `JUMPI` | 10 | Conditionally alter the program counter |This allowed me to estimate the gas search costs for each function, for the [threshold values](#influence-of-the-runs-level) of `200` and `1000` runs, thus leading to different processing, sequential for `200 runs` and "fractionated" for `1000 runs`.
| Signatures | Selectors | Gas (linear) | Gas (splited) |
| ----------------- | ---------------- | --------------- | --------------- |
| `storeI(uint256)` | `183301E7` | **22 (*min*)** | 69 |
| `retrieve()` | `2E64CEC1` | 44 | 91 |
| `storeC(uint256)` | `4CF56E0C` (*2*) | 66 | 69 |
| `storeJ(uint256)` | `6EC51CF6` | 88 | 90 |
| `storeH(uint256)` | `75A64B6D` | 110 | **112 (*max*)** |
| `storeG(uint256)` | `799EBD70` (*1*) | 132 | 68 |
| `storeB(uint256)` | `9AE4B7D0` | 154 | 90 |
| `storeD(uint256)` | `B87C712B` | 176 | **112 (*max*)** |
| `storeF(uint256)` | `B9E9C35C` (*2*) | 198 | **67 (*min*)** |
| `storeA(uint256)` | `C534BE7A` | 220 | 89 |
| `storeE(uint256)` | `E45F4CF5` | **242 (*max*)** | 111 |- (*1*): *First pivot value for 1000 runs*
- (*2*): *Secondary pivot values for 1000 runs*### Consumption Statistics
If we take a closer look at the results of certain **statistics** on both types of search.
| \ | Linear | Fractional |
| ---------- | ------ | ---------- |
| Min | **22** | 67 |
| Max | 242 | **112** |
| Average | 132 | **88** |
| Deviation | 72,97 | **18,06** |We observe significant differences. Specifically, a lower **average** (-33%) with a considerably lower [**standard deviation**](https://en.wikipedia.org/wiki/Standard_deviation) of consumptions (4 times less) in favor of the fractional processing.
## Algorithms and Processing Order
Depending on the algorithm used by the Solidity compiler to generate the "*function dispatcher*," the processing order of functions will differ, both from the order of declaration in the source code and from the alphabetical order.
### Linear Search (runs = 200)
| # | Signatures | Selectors |
| ------ | ----------------- | ---------- |
| **1** | `storeI(uint256)` | `183301E7` |
| **2** | `retrieve()` | `2E64CEC1` |
| **3** | `storeC(uint256)` | `4CF56E0C` |
| **4** | `storeJ(uint256)` | `6EC51CF6` |
| **5** | `storeH(uint256)` | `75A64B6D` |
| **6** | `storeG(uint256)` | `799EBD70` |
| **7** | `storeB(uint256)` | `9AE4B7D0` |
| **8** | `storeD(uint256)` | `B87C712B` |
| **9** | `storeF(uint256)` | `B9E9C35C` |
| **10** | `storeA(uint256)` | `C534BE7A` |
| **11** | `storeE(uint256)` | `E45F4CF5` |The number of tests and the complexity of the process are proportional to the number of functions, in [**O(n)**](https://en.wikipedia.org/wiki/Time_complexity#List_of_time_complexities).
### Fractional Search (runs = 1000)
| # | Signatures | Selectors |
| ------ | ----------------- | ---------- |
| **1** | `storeF(uint256)` | `B9E9C35C` |
| **2** | `storeG(uint256)` | `799EBD70` |
| **3** | `storeI(uint256)` | `183301E7` |
| **4** | `storeC(uint256)` | `4CF56E0C` |
| **5** | `storeA(uint256)` | `C534BE7A` |
| **6** | `storeJ(uint256)` | `6EC51CF6` |
| **7** | `storeB(uint256)` | `9AE4B7D0` |
| **8** | `retrieve()` | `2E64CEC1` |
| **9** | `storeE(uint256)` | `E45F4CF5` |
| **10** | `storeH(uint256)` | `75A64B6D` |
| **11** | `storeD(uint256)` | `B87C712B` |This is not a [binary search](https://en.wikipedia.org/wiki/Binary_search_algorithm) in the strict sense of the term but rather a segmentation into groups of sequential tests around pivot values. However, in the end, the complexity is the same, in [O(log n)](https://en.wikipedia.org/wiki/Time_complexity#Table_of_common_time_complexities).
## Optimizations
If we assume that functions are called fairly (at the same frequency of use), their calls will not cost the same based on their signatures (and therefore their names). It's clear that the cost of selecting a call to these functions, regardless of the algorithm, is highly heterogeneous, and while it can be estimated, it cannot be imposed.
However, by strategically renaming functions, adding suffixes, for example, you can influence the results of function signatures and, consequently, the gas costs associated with these functions. This practice can optimize gas consumption in your smart contract, not only during function selection in the EVM but also, as we will see later, during transactions.
The cost of a transaction consists of two parts: the **intrinsic cost** (including those related to the useful data of transactions) and the **execution cost**. Our optimizations focus on these two costs.
You can find more information on the breakdown of transaction costs on [this page](https://www.lucassaldanha.com/transaction-execution-ethereum-yellow-paper-walkthrough-4-7/).
The combination of these two optimization approaches makes a **significant difference** by reducing gas consumption in smart contracts. This is particularly crucial in certain areas such as MEV (arbitrage) where optimization is vital.
### Execution Cost Optimization
To illustrate, modifying the function signature `square(uint32)` to `square_low(uint32)` changes the fingerprint to `bde6cad1` instead of `d27b3841`.
The lower value of the new fingerprint will prioritize the processing of calls to this function. This optimization can be crucial for highly complex smart contracts, reducing the time needed to search and select the correct function to call, resulting in gas savings and improved performance on the Ethereum blockchain.
The fact that the search is fractionated rather than linear complicates matters a bit. Depending on the number of functions and the compiler's optimization level, threshold values are more challenging to determine to choose new signatures based on the desired order.
### Intrinsic Cost Optimization
When you send a transaction on the Ethereum blockchain, you include data specifying which function of the smart contract you want to call and what the arguments of that function are. The gas cost of a transaction partly depends on the number of zero bytes in the transaction data.
As specified in the [**Ethereum Yellow Paper**](https://ethereum.github.io/yellowpaper/paper.pdf) (Berlin version),
![](assets/g_tx_data.png)
- `Gtxdatazero` costs **4 gas** for each zero byte in the transaction.
- `Gtxdatanonzero` costs **16 gas** for each non-zero byte, which is **4 times more expensive**.Thus, whenever a zero byte (`00`) is used in `msg.data` instead of a non-zero byte, it saves **12 gas**.
This EVM characteristic also impacts the consumption of other opcodes like `Gsset` and `Gsreset`. To illustrate, modifying the function signature `square(uint32)` to `square_Y7i(uint32)` changes the fingerprint to `00001878` instead of `d27b3841`.
The two most significant bytes of the fingerprint (`0000`) not only prioritize the **processing of the call** to this function, as seen earlier, but also consume **less gas** during data retrieval (**40** instead of **64**).
Here are some additional examples:
| Signatures (optimal) | Selectors (optimal) | Signatures | Selectors |
| ---------------------- | ------------------- | ------------------ | --------- |
| `deposit_ps2(uint256)` | 0000fee6 | `deposit(uint256)` | b6b55f25 |
| `mint_540(uint256)` | 00009d1c | `mint(uint256)` | a0712d68 |
| `b_1Y()` | 00008e0c | `b()` | 4df7e3d0 |Similarly, being able to use signatures with **three zero-weight bytes** allows for consuming only **28 gas**.
For instance, [**`deposit278591A(uint)`**](https://emn178.github.io/online-tools/keccak_256.html?input_type=utf-8&input=deposit278591A(uint)) and [**`deposit_3VXa0(uint256)`**](https://emn178.github.io/online-tools/keccak_256.html?input_type=utf-8&input=deposit_3VXa0(uint256)), with respective signatures **`00000070`** and **`0000007e`**, achieve this optimization.
However, given that there can be only a unique selection value (signature), there can be only **one function in a contract** with a signature that has four zero bytes, even though multiple signatures may lead to this optimized signature **`00000000`**, allowing for consuming only **16 gas** (example with the following signature: [**`execute_44g58pv()`**](https://emn178.github.io/online-tools/keccak_256.html?input_type=utf-8&input=execute_44g58pv())).
#### Examples of Gains on Intrinsic Costs:
| Signatures | Selectors | # of zeros | Gas | Gain (gas) |
| ------------------- | ---------- | ---------- | --- | ---------- |
| `execute()` | `61461954` | 0 | 64 | **0** |
| `execute_5Hw()` | `00af0043` | 1 | 52 | **8** |
| `execute_mAX()` | `0000eb63` | 2 | 40 | **24** |
| `execute_6d4S()` | `000000ae` | 3 | 28 | **36** |
| `execute_44g58pv()` | `00000000` | 4 | 16 | **48** |## Select0r
I have developed **Select0r**, a tool written in **Rust** that allows you to rename your functions to optimize their calls. The program, given a function signature, will provide a list of alternative signatures with lower gas costs, enabling better ordering for the "*function dispatcher*."
[**GitHub - Laugharne/select0r**](https://github.com/Laugharne/select0r/tree/main)
## Conclusions
- Optimizing gas costs is a crucial aspect of designing efficient smart contracts on Ethereum.
- By paying attention to details such as the order of function signatures, the number of leading zeros in the hash, the order of function processing, and function renaming, you can significantly reduce the costs associated with your contract.- **However,** be aware that this may reduce the user-friendliness and readability of your code.
- Optimization for execution may not be necessary for so-called administrative functions or those infrequently called.
- On the other hand, it should be prioritized for functions assumed to be the most frequently called (to be determined manually or statistically during practical tests).
- A single optimization may seem insignificant, especially compared to the overall cost of a transaction. However, a set of optimizations performed on a series of transactions makes all the difference, and it's not limited to optimizations on the "*function dispatcher*."
In the end, these optimizations can make the difference between a cost-effective contract and one that is gas-expensive.
--------
Credits: **[Franck Maussand](mailto:[email protected])**
*Special thanks to [**Igor Bournazel**](https://github.com/ibourn) for his suggestions and proofreading of this article.*
--------
## Additional resources
- Hash function :
- [Hash function - Wikipedia](https://en.wikipedia.org/wiki/Hash_function)- Keccak :
- [SHA-3 - Wikipedia](https://en.wikipedia.org/wiki/SHA-3)
- [Difference Between SHA-256 and Keccak-256 - GeeksforGeeks](https://www.geeksforgeeks.org/difference-between-sha-256-and-keccak-256/)- Binary search :
- [Binary search algorithm - Wikipedia](https://en.wikipedia.org/wiki/Binary_search_algorithm)
- [Big O notation - Wikipedia](https://en.wikipedia.org/wiki/Big_O_notation)- References :
- [Ethereum Yellow Paper](https://ethereum.github.io/yellowpaper/paper.pdf)
- [Opcodes for the EVM](https://ethereum.org/en/developers/docs/evm/opcodes/)
- [EVM Codes - An Ethereum Virtual Machine Opcodes Interactive Reference](https://www.evm.codes/?fork=shanghai)
- [Operations with dynamic Gas costs](https://github.com/wolflo/evm-opcodes/blob/main/gas.md)
- [Contract ABI Specification — Solidity 0.8.22 documentation](https://docs.soliditylang.org/en/develop/abi-spec.html#function-selector)
- [Yul — Solidity 0.8.22 documentation](https://docs.soliditylang.org/en/latest/yul.html)
- [Yul — Complete ERC20 Example](https://docs.soliditylang.org/en/develop/yul.html#complete-erc20-example)
- [Using the Compiler — Solidity 0.8.22 documentation](https://docs.soliditylang.org/en/latest/using-the-compiler.html)
- [The Optimizer — Solidity 0.8.22 documentation](https://docs.soliditylang.org/en/develop/internals/optimizer.html)- Tools :
- [GitHub - Laugharne/select0r](https://github.com/Laugharne/select0r/tree/main) ✨
- [Keccak-256 Online](http://emn178.github.io/online-tools/keccak_256.html)
- [Compiler Explorer](https://godbolt.org/)
- [Solidity Optimize Name](https://emn178.github.io/solidity-optimize-name/)
- [Ethereum Signature Database](https://www.4byte.directory/)
- [GitHub - shazow/whatsabi: Extract the ABI (and other metadata) from Ethereum bytecode, even without source code.](https://github.com/shazow/whatsabi)- Misc :
- [Function Dispatching | Huff Language](https://docs.huff.sh/tutorial/function-dispatching/#linear-dispatching)
- [Solidity’s Cheap Public Face](https://medium.com/coinmonks/soliditys-cheap-public-face-b4e972e3924d)
- [Web3 Hacking: Paradigm CTF 2022 Writeup](https://medium.com/amber-group/web3-hacking-paradigm-ctf-2022-writeup-3102944fd6f5)
- [paradigm-ctf-2022/hint-finance at main · paradigmxyz/paradigm-ctf-2022 · GitHub](https://github.com/paradigmxyz/paradigm-ctf-2022/tree/main/hint-finance)
- [GitHub - Laugharne/solc_runs_dispatcher](https://github.com/Laugharne/solc_runs_dispatcher)
- [WhatsABI? with Shazow - YouTube](https://www.youtube.com/watch?v=sfgassm8SKw)
- [Ethereum Yellow Paper Walkthrough (4/7) - Transaction Execution](https://www.lucassaldanha.com/transaction-execution-ethereum-yellow-paper-walkthrough-4-7/)