{"id":13815611,"url":"https://github.com/paradigmxyz/cryo","last_synced_at":"2025-05-13T21:04:31.848Z","repository":{"id":179219186,"uuid":"658996666","full_name":"paradigmxyz/cryo","owner":"paradigmxyz","description":"cryo is the easiest way to extract blockchain data to parquet, csv, json, or python dataframes","archived":false,"fork":false,"pushed_at":"2025-01-08T07:31:22.000Z","size":1181,"stargazers_count":1348,"open_issues_count":57,"forks_count":143,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-04-28T13:58:58.101Z","etag":null,"topics":["crypto","ethereum","evm","parquet","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/paradigmxyz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-27T00:11:53.000Z","updated_at":"2025-04-25T16:34:08.000Z","dependencies_parsed_at":"2023-10-12T01:02:39.033Z","dependency_job_id":"9813cc11-d219-4d5d-b84b-c932ba2d5de5","html_url":"https://github.com/paradigmxyz/cryo","commit_stats":null,"previous_names":["paradigmxyz/cryo"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paradigmxyz%2Fcryo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paradigmxyz%2Fcryo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paradigmxyz%2Fcryo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paradigmxyz%2Fcryo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/paradigmxyz","download_url":"https://codeload.github.com/paradigmxyz/cryo/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254028427,"owners_count":22002251,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crypto","ethereum","evm","parquet","rust"],"created_at":"2024-08-04T04:03:39.812Z","updated_at":"2025-05-13T21:04:31.825Z","avatar_url":"https://github.com/paradigmxyz.png","language":"Rust","funding_links":[],"categories":["ETL Tools","Rust","Analytics","Data Query"],"sub_categories":["Market Intelligence \u0026 Analysis","Other"],"readme":"# ❄️🧊 cryo 🧊❄️\n\n[![Rust](https://github.com/paradigmxyz/cryo/actions/workflows/build_and_test.yml/badge.svg)](https://github.com/paradigmxyz/cryo/actions/workflows/build_and_test.yml) [![Telegram Chat](https://img.shields.io/badge/Telegram-join_chat-blue.svg)](https://t.me/paradigm_data)\n\n`cryo` is the easiest way to extract blockchain data to parquet, csv, json, or a python dataframe.\n\n`cryo` is also extremely flexible, with [many different options](#cryo-help) to control how data is extracted + filtered + formatted\n\n*`cryo` is an early WIP, please report bugs + feedback to the issue tracker*\n\n*note that `cryo`'s default settings will slam a node too hard for use with 3rd party RPC providers. Instead, `--requests-per-second` and `--max-concurrent-requests` should be used to impose ratelimits. Such settings will be handled automatically in a future release*.\n\nto discuss cryo, check out [the telegram group](https://t.me/paradigm_data)\n\n## Contents\n\n1. [Example Usage](#example-usage)\n2. [Installation](#installation)\n3. [Data Schema](#data-schemas)\n4. [Code Guide](#code-guide)\n5. [Documentation](#documentation)\n    1. [Basics](#cryo-help)\n    2. [Syntax](#cryo-syntax)\n    3. [Datasets](#cryo-datasets)\n\n## Example Usage\n\nuse as `cryo \u003cdataset\u003e [OPTIONS]`\n\n| Example | Command |\n| :- | :- |\n| Extract all logs from block 16,000,000 to block 17,000,000 | `cryo logs -b 16M:17M` |\n| Extract blocks, logs, or traces missing from current directory | `cryo blocks txs traces` |\n| Extract to csv instead of parquet | `cryo blocks txs traces --csv` |\n| Extract only certain columns | `cryo blocks --include number timestamp` |\n| Dry run to view output schemas or expected work | `cryo storage_diffs --dry` |\n| Extract all USDC events | `cryo logs --contract 0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48` |\n\nFor a more complex example, see the [Uniswap Example](./examples/uniswap.sh).\n\n`cryo` uses `ETH_RPC_URL` env var as the data source unless `--rpc \u003curl\u003e` is given\n\n## Installation\n\nThe simplest way to use `cryo` is as a cli tool:\n\n#### Method 1: install from source\n\n```bash\ngit clone https://github.com/paradigmxyz/cryo\ncd cryo\ncargo install --path ./crates/cli\n```\n\nThis method requires having rust installed. See [rustup](https://rustup.rs/) for instructions.\n\n#### Method 2: install from crates.io\n\n```bash\ncargo install cryo_cli\n```\n\nThis method requires having rust installed. See [rustup](https://rustup.rs/) for instructions.\n\nMake sure that `~/.cargo/bin` is on your `PATH`. One way to do this is by adding the line `export PATH=\"$HOME/.cargo/bin:$PATH\"` to your `~/.bashrc` or `~/.profile`.\n\n### Python Installation\n\n`cryo` can also be installed as a python package:\n\n#### Installing `cryo` python from pypi\n\n(make sure rust is installed first, see [rustup](https://www.rust-lang.org/tools/install))\n\n```bash\npip install maturin\npip install cryo\n```\n\n#### Installing `cryo` python from source\n\n```bash\npip install maturin\ngit clone https://github.com/paradigmxyz/cryo\ncd cryo/crates/python\nmaturin build --release\npip install --force-reinstall \u003cOUTPUT_OF_MATURIN_BUILD\u003e.whl\n```\n\n## Data Schemas\n\nMany `cryo` cli options will affect output schemas by adding/removing columns or changing column datatypes.\n\n`cryo` will always print out data schemas before collecting any data. To view these schemas without collecting data, use `--dry` to perform a dry run.\n\n#### Schema Design Guide\n\nAn attempt is made to ensure that the dataset schemas conform to a common set of design guidelines:\n- By default, rows should contain enough information in their columns to be order-able (unless the rows do not have an intrinsic order).\n- Columns should usually be named by their JSON-RPC or ethers.rs defaults, except in cases where a much more explicit name is available.\n- To make joins across tables easier, a given piece of information should use the same datatype and column name across tables when possible.\n- Large ints such as `u256` should allow multiple conversions. A `value` column of type `u256` should allow: `value_binary`, `value_string`, `value_f32`, `value_f64`, `value_u32`, `value_u64`, and `value_d128`. These types can be specified at runtime using the `--u256-types` argument.\n- By default, columns related to non-identifying cryptographic signatures are omitted by default. For example, `state_root` of a block or `v`/`r`/`s` of a transaction.\n- Integer values that can never be negative should be stored as unsigned integers.\n- Every table should allow a `chain_id` column so that data from multiple chains can be easily stored in the same table.\n\nStandard types across tables:\n- `block_number`: `u32`\n- `transaction_index`: `u32`\n- `nonce`: `u32`\n- `gas_used`: `u64`\n- `gas_limit`: `u64`\n- `chain_id`: `u64`\n- `timestamp`: `u32`\n\n#### JSON-RPC\n\n`cryo` currently obtains all of its data using the [JSON-RPC](https://ethereum.org/en/developers/docs/apis/json-rpc/) protocol standard.\n\n|dataset|blocks per request|results per block|method|\n|-|-|-|-|\n|Blocks|1|1|`eth_getBlockByNumber`|\n|Transactions|1|multiple|`eth_getBlockByNumber`, `eth_getBlockReceipts`, `eth_getTransactionReceipt`|\n|Logs|multiple|multiple|`eth_getLogs`|\n|Contracts|1|multiple|`trace_block`|\n|Traces|1|multiple|`trace_block`|\n|State Diffs|1|multiple|`trace_replayBlockTransactions`|\n|Vm Traces|1|multiple|`trace_replayBlockTransactions`|\n\n`cryo` use [ethers.rs](https://github.com/gakonst/ethers-rs) to perform JSON-RPC requests, so it can be used any chain that ethers-rs is compatible with. This includes Ethereum, Optimism, Arbitrum, Polygon, BNB, and Avalanche.\n\nA future version of `cryo` will be able to bypass JSON-RPC and query node data directly.\n\n## Code Guide\n- Code is arranged into the following crates:\n    - `cryo_cli`: convert textual data into cryo function calls\n    - `cryo_freeze`: core cryo code\n    - `cryo_python`: cryo python adapter\n    - `cryo_to_df`: procedural macro for generating dataset definitions\n- Do not use panics (including `panic!`, `todo!`, `unwrap()`, and `expect()`) except in the following circumstances: tests, build scripts, lazy static blocks, and procedural macros\n\n## Documentation\n\n1. [cryo help](#cryo-help)\n2. [cryo syntax](#cryo-syntax)\n3. [cryo datasets](#cryo-datasets)\n\n#### cryo help\n\n(output of `cryo help`)\n\n```\ncryo extracts blockchain data to parquet, csv, or json\n\nUsage: cryo [OPTIONS] [DATATYPE]...\n\nArguments:\n  [DATATYPE]...  datatype(s) to collect, use cryo datasets to see all available\n\nOptions:\n      --remember    Remember current command for future use\n  -v, --verbose     Extra verbosity\n      --no-verbose  Run quietly without printing information to stdout\n  -h, --help        Print help\n  -V, --version     Print version\n\nContent Options:\n  -b, --blocks \u003cBLOCKS\u003e...           Block numbers, see syntax below\n      --timestamps \u003cTIMESTAMPS\u003e...   Timestamp numbers in unix, overridden by blocks\n  -t, --txs \u003cTXS\u003e...                 Transaction hashes, see syntax below\n  -a, --align                        Align chunk boundaries to regular intervals,\n                                     e.g. (1000 2000 3000), not (1106 2106 3106)\n      --reorg-buffer \u003cN_BLOCKS\u003e      Reorg buffer, save blocks only when this old,\n                                     can be a number of blocks [default: 0]\n  -i, --include-columns [\u003cCOLS\u003e...]  Columns to include alongside the defaults,\n                                     use `all` to include all available columns\n  -e, --exclude-columns [\u003cCOLS\u003e...]  Columns to exclude from the defaults\n      --columns [\u003cCOLS\u003e...]          Columns to use instead of the defaults,\n                                     use `all` to use all available columns\n      --u256-types \u003cU256_TYPES\u003e...   Set output datatype(s) of U256 integers\n                                     [default: binary, string, f64]\n      --hex                          Use hex string encoding for binary columns\n  -s, --sort [\u003cSORT\u003e...]             Columns(s) to sort by, `none` for unordered\n      --exclude-failed               Exclude items from failed transactions\n\nSource Options:\n  -r, --rpc \u003cRPC\u003e                    RPC url [default: ETH_RPC_URL env var]\n      --network-name \u003cNETWORK_NAME\u003e  Network name [default: name of eth_getChainId]\n\nAcquisition Options:\n  -l, --requests-per-second \u003climit\u003e  Ratelimit on requests per second\n      --max-retries \u003cR\u003e              Max retries for provider errors [default: 5]\n      --initial-backoff \u003cB\u003e          Initial retry backoff time (ms) [default: 500]\n      --max-concurrent-requests \u003cM\u003e  Global number of concurrent requests\n      --max-concurrent-chunks \u003cM\u003e    Number of chunks processed concurrently\n      --chunk-order \u003cCHUNK_ORDER\u003e    Chunk collection order (normal, reverse, or random)\n  -d, --dry                          Dry run, collect no data\n\nOutput Options:\n  -c, --chunk-size \u003cCHUNK_SIZE\u003e      Number of blocks per file [default: 1000]\n      --n-chunks \u003cN_CHUNKS\u003e          Number of files (alternative to --chunk-size)\n      --partition-by \u003cPARTITION_BY\u003e  Dimensions to partition by\n  -o, --output-dir \u003cOUTPUT_DIR\u003e      Directory for output files [default: .]\n      --subdirs \u003cSUBDIRS\u003e...         Subdirectories for output files\n                                     can be `datatype`, `network`, or custom string\n      --label \u003cLABEL\u003e                Label to add to each filename\n      --overwrite                    Overwrite existing files instead of skipping\n      --csv                          Save as csv instead of parquet\n      --json                         Save as json instead of parquet\n      --row-group-size \u003cGROUP_SIZE\u003e  Number of rows per row group in parquet file\n      --n-row-groups \u003cN_ROW_GROUPS\u003e  Number of rows groups in parquet file\n      --no-stats                     Do not write statistics to parquet files\n      --compression \u003cNAME [#]\u003e...    Compression algorithm and level [default: lz4]\n      --report-dir \u003cREPORT_DIR\u003e      Directory to save summary report\n                                     [default: {output_dir}/.cryo/reports]\n      --no-report                    Avoid saving a summary report\n\nDataset-specific Options:\n      --address \u003cADDRESS\u003e...         Address(es)\n      --to-address \u003caddress\u003e...      To Address(es)\n      --from-address \u003caddress\u003e...    From Address(es)\n      --call-data \u003cCALL_DATA\u003e...     Call data(s) to use for eth_calls\n      --function \u003cFUNCTION\u003e...       Function(s) to use for eth_calls\n      --inputs \u003cINPUTS\u003e...           Input(s) to use for eth_calls\n      --slot \u003cSLOT\u003e...               Slot(s)\n      --contract \u003cCONTRACT\u003e...       Contract address(es)\n      --topic0 \u003cTOPIC0\u003e...           Topic0(s) [aliases: event]\n      --topic1 \u003cTOPIC1\u003e...           Topic1(s)\n      --topic2 \u003cTOPIC2\u003e...           Topic2(s)\n      --topic3 \u003cTOPIC3\u003e...           Topic3(s)\n      --event-signature \u003cSIG\u003e...     Event signature for log decoding\n      --inner-request-size \u003cBLOCKS\u003e  Blocks per request (eth_getLogs) [default: 1]\n      --js-tracer \u003ctracer\u003e           Event signature for log decoding\n\nOptional Subcommands:\n      cryo help                      display help message\n      cryo help syntax               display block + tx specification syntax\n      cryo help datasets             display list of all datasets\n      cryo help \u003cDATASET(S)\u003e         display info about a dataset\n```\n\n#### cryo syntax\n\n(output of `cryo help syntax`)\n\n```\nBlock specification syntax\n- can use numbers                    --blocks 5000 6000 7000\n- can use ranges                     --blocks 12M:13M 15M:16M\n- can use a parquet file             --blocks ./path/to/file.parquet[:COLUMN_NAME]\n- can use multiple parquet files     --blocks ./path/to/files/*.parquet[:COLUMN_NAME]\n- numbers can contain { _ . K M B }  5_000 5K 15M 15.5M\n- omitting range end means latest    15.5M: == 15.5M:latest\n- omitting range start means 0       :700 == 0:700\n- minus on start means minus end     -1000:7000 == 6001:7001\n- plus sign on end means plus start  15M:+1000 == 15M:15.001M\n- can use every nth value            2000:5000:1000 == 2000 3000 4000\n- can use n values total             100:200/5 == 100 124 149 174 199\n\nTimestamp specification syntax\n- can use numbers                    --timestamp 5000 6000 7000\n- can use ranges                     --timestamp 12M:13M 15M:16M\n- can use a parquet file             --timestamp ./path/to/file.parquet[:COLUMN_NAME]\n- can use multiple parquet files     --timestamp ./path/to/files/*.parquet[:COLUMN_NAME]\n- can contain { _ . m h d w M y }    31_536_000 525600m 8760h 365d 52.143w 12.17M 1y\n- omitting range end means latest    15.5M: == 15.5M:latest\n- omitting range start means 0       :700 == 0:700\n- minus on start means minus end     -1000:7000 == 6001:7001\n- plus sign on end means plus start  15M:+1000 == 15M:15.001M\n- can use n values total             100:200/5 == 100 124 149 174 199\n\nTransaction specification syntax\n- can use transaction hashes         --txs TX_HASH1 TX_HASH2 TX_HASH3\n- can use a parquet file             --txs ./path/to/file.parquet[:COLUMN_NAME]\n                                     (default column name is transaction_hash)\n- can use multiple parquet files     --txs ./path/to/ethereum__logs*.parquet\n```\n\n#### cryo datasets\n\n(output of `cryo help datasets`)\n\n```\ncryo datasets\n─────────────\n- address_appearances\n- balance_diffs\n- balance_reads\n- balances\n- blocks\n- code_diffs\n- code_reads\n- codes\n- contracts\n- erc20_balances\n- erc20_metadata\n- erc20_supplies\n- erc20_transfers\n- erc20_approvals\n- erc721_metadata\n- erc721_transfers\n- eth_calls\n- four_byte_counts (alias = 4byte_counts)\n- geth_calls\n- geth_code_diffs\n- geth_balance_diffs\n- geth_storage_diffs\n- geth_nonce_diffs\n- geth_opcodes\n- javascript_traces (alias = js_traces)\n- logs (alias = events)\n- native_transfers\n- nonce_diffs\n- nonce_reads\n- nonces\n- slots (alias = storages)\n- storage_diffs (alias = slot_diffs)\n- storage_reads (alias = slot_reads)\n- traces\n- trace_calls\n- transactions (alias = txs)\n- vm_traces (alias = opcode_traces)\n\ndataset group names\n───────────────────\n- blocks_and_transactions: blocks, transactions\n- call_trace_derivatives: contracts, native_transfers, traces\n- geth_state_diffs: geth_balance_diffs, geth_code_diffs, geth_nonce_diffs, geth_storage_diffs\n- state_diffs: balance_diffs, code_diffs, nonce_diffs, storage_diffs\n- state_reads: balance_reads, code_reads, nonce_reads, storage_reads\n\nuse cryo help \u003cDATASET\u003e to print info about a specific dataset\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fparadigmxyz%2Fcryo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fparadigmxyz%2Fcryo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fparadigmxyz%2Fcryo/lists"}