{"id":30988461,"url":"https://github.com/paradigmxyz/absorb","last_synced_at":"2025-09-12T17:44:00.613Z","repository":{"id":314393569,"uuid":"956859112","full_name":"paradigmxyz/absorb","owner":"paradigmxyz","description":null,"archived":false,"fork":false,"pushed_at":"2025-08-08T06:12:56.000Z","size":831,"stargazers_count":137,"open_issues_count":5,"forks_count":12,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-09-12T07:52:45.564Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/paradigmxyz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-03-29T01:54:31.000Z","updated_at":"2025-09-02T16:53:29.000Z","dependencies_parsed_at":"2025-09-12T07:52:48.534Z","dependency_job_id":"21ded9c9-dceb-422d-b799-f2bd5e83096d","html_url":"https://github.com/paradigmxyz/absorb","commit_stats":null,"previous_names":["paradigmxyz/absorb"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/paradigmxyz/absorb","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paradigmxyz%2Fabsorb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paradigmxyz%2Fabsorb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paradigmxyz%2Fabsorb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paradigmxyz%2Fabsorb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/paradigmxyz","download_url":"https://codeload.github.com/paradigmxyz/absorb/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paradigmxyz%2Fabsorb/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274848309,"owners_count":25360981,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-12T02:00:09.324Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-12T17:43:55.569Z","updated_at":"2025-09-12T17:44:00.602Z","avatar_url":"https://github.com/paradigmxyz.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![image](https://github.com/user-attachments/assets/7323b83e-fc5b-496c-b67b-bad6a188873b)\n\n# absorb 🧽🫧🫧\n\n`absorb` makes it easy to 1) collect, 2) manage, 3) query, and 4) customize datasets from nearly any data source\n\n🚧 ***this is a preview release of beta software, and it is still under active development*** 🚧\n\n## Features\n- **limitless dataset library**: access to millions of datasets across 20+ diverse data sources\n- **intuitive cli+python interfaces**: collect or query any dataset in a single line of code\n- **maximal modularity**: built on open standards for frictionless integration with other tools\n- **easy extensibility**: add new datasets or data sources with just a few lines of code\n\n## Contents\n1. [Installation](#installation)\n2. [Example Usage](#example-usage)\n    1. [Command Line](#example-command-line-usage)\n    2. [Python](#example-python-usage)\n3. [Supported Data Sources](#supported-data-sources)\n4. [Output Format](#output-format)\n5. [Configuration](#configuration)\n\n\n## Installation\n\nbasic installation\n```bash\nuv tool install paradigm_absorb\n```\n\ninstall with all extras\n```bash\nuv tool install paradigm_absorb[test,datasources,interactive]\n```\n\ninstall from source\n```bash\ngit clone git@github.com:paradigmxyz/absorb.git\nuv tool install --editable .[test,datasources,interactive]\n```\n\n\n## Example Usage\n\n#### Example Command Line Usage\n\n```bash\n# collect dataset and save as local files\nabsorb collect kalshi\n\n# list datasets that are collected or available\nabsorb ls\n\n# show schemas of dataset\nabsorb schema kalshi\n\n# create new custom dataset\nabsorb new custom_dataset\n\n# upload custom dataset\nabsorb upload custom_dataset\n```\n\n#### Example Python Usage\n\n```python\nimport absorb\n\n# collect dataset and save as local files\nabsorb.collect('kalshi.metrics')\n\n# get schemas of dataset\nschema = absorb.get_schema('kalshi.metrics')\n\n# query dataset eagerly, as polars DataFrame\ndf = absorb.query('kalshi.metrics')\n\n# query dataset lazily, as polars LazyFrame\nlf = absorb.query('kalshi.metrics', lazy=True)\n\n# upload custom dataset\nabsorb.upload('source.table')\n```\n\n\n## Supported Data Sources\n\n🚧 under construction 🚧\n\n`absorb` collects data from each of these sources:\n\n- [4byte](https://www.4byte.directory) function and event signatures\n- [allium](https://www.allium.so) crypto data platform\n- [bigquery](https://cloud.google.com/blockchain-analytics/docs/supported-datasets) crypto ETL datasets\n- [binance](https://data.binance.vision) trades and OHLC candles on the Binance CEX\n- [blocknative](https://docs.blocknative.com/data-archive/mempool-archive) Ethereum mempool archive\n- [chain_ids](https://github.com/ethereum-lists/chains) chain id's\n- [coingecko](https://www.coingecko.com/) token prices\n- [cryo](https://github.com/paradigmxyz/cryo) EVM datasets\n- [defillama](https://defillama.com) DeFi data\n- [dune](https://dune.com) tables and queries\n- [fred](https://fred.stlouisfed.org) federal macroeonomic data\n- [git](https://git-scm.com) commits, authors, and file diffs of a repo\n- [growthepie](https://www.growthepie.xyz) L2 metrics\n- [kalshi](https://kalshi.com) prediction market metrics\n- [l2beat](https://l2beat.com) L2 metrics\n- [mempool dumpster](https://mempool-dumpster.flashbots.net) Ethereum mempool archive\n- [snowflake](https://www.snowflake.com/) generalized data platform\n- [sourcify](https://sourcify.dev) verified contracts\n- [tic](https://ticdata.treasury.gov) usa treasury department data\n- [tix](https://github.com/paradigmxyz/tix) price feeds\n- [vera](https://verifieralliance.org) verified contract archives\n- [xatu](https://github.com/ethpandaops/xatu-data) many Ethereum datasets\n\nTo list all available datasets and data sources, type `absorb ls` on the command line.\n\nTo display information about the schema and other metadata of a dataset, type `absorb help \u003cDATASET\u003e` on the command line.\n\n\n## Output Format\n\n`absorb` uses the filesystem as its database. Each dataset is stored as a collection of parquet files, either on local disk or in the cloud.\n\nDatasets can be stored in any location on your disks, and absorb will use symlinks to organize those files in the `ABSORB_ROOT` tree.\n\nthe `ABSORB_ROOT` filesystem directory is organized as:\n\n```\n{ABSORB_ROOT}/\n    datasets/\n        \u003csource\u003e/\n            tables/\n                \u003cdatatype\u003e/\n                    {filename}.parquet\n                table_metadata.json\n            repos/\n                {repo_name}/\n    absorb_config.json\n```\n\n## Configuration\n\n`absorb` uses a config file to specify which datasets to track.\n\nSchema of `absorb_config.json`:\n\n```python\n{\n    'version': str,\n    'tracked_tables': list[TableDict],\n    'use_git': bool,\n    'default_bucket': {\n        'rclone_remote': str | None,\n        'bucket_name': str | None,\n        'path_prefix': str | None,\n        'provider': str | None,\n    },\n}\n```\n\nschema of `dataset_config.json`:\n\n```python\n{\n    'source_name': str,\n    'table_name': str,\n    'table_class': str,\n    'parameters': dict[str, JSONValue],\n    'table_version': str,\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fparadigmxyz%2Fabsorb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fparadigmxyz%2Fabsorb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fparadigmxyz%2Fabsorb/lists"}