{"id":50608249,"url":"https://github.com/datavil/framex","last_synced_at":"2026-06-06T01:00:29.207Z","repository":{"id":254047892,"uuid":"845330154","full_name":"datavil/framex","owner":"datavil","description":"A light-weight, dataset obtaining library for fast prototyping, tutorial creation, and experimenting.","archived":false,"fork":false,"pushed_at":"2026-03-22T19:17:26.000Z","size":5421,"stargazers_count":1,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-03-23T10:56:22.236Z","etag":null,"topics":["data-analysis","data-fetching","data-science","dataframe","datasets","visualization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datavil.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-08-21T03:27:44.000Z","updated_at":"2026-03-22T19:17:29.000Z","dependencies_parsed_at":"2024-08-21T04:41:57.857Z","dependency_job_id":"447690e4-7b79-44ec-92ce-4bab0f5db6fa","html_url":"https://github.com/datavil/framex","commit_stats":null,"previous_names":["zaf4/frames","zaf4/framex"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/datavil/framex","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datavil%2Fframex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datavil%2Fframex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datavil%2Fframex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datavil%2Fframex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datavil","download_url":"https://codeload.github.com/datavil/framex/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datavil%2Fframex/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33965591,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-05T02:00:06.157Z","response_time":120,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","data-fetching","data-science","dataframe","datasets","visualization"],"created_at":"2026-06-06T01:00:18.877Z","updated_at":"2026-06-06T01:00:29.141Z","avatar_url":"https://github.com/datavil.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Banner](https://github.com/datavil/framex/blob/master/.github/framex_banner_narrower.png?raw=true)](https://framex.datavil.org)\nA [Datavil](https://datavil.org) project.\n\n# FrameX\n\n[![GitHub](https://img.shields.io/badge/GitHub-100000?style=flat\u0026logo=github\u0026logoColor=white)](https://github.com/DataVil/framex) [![PyPI](https://img.shields.io/pypi/v/framex?color=blue)](https://pypi.org/project/framex/)\n\n**FrameX** is a light-weight, dataset fetching library for fast **prototyping**, **tutorial creation**, and **experimenting**. FrameX has currently over **80** datasets available.\n\n\nBuilt on top of [Polars](https://pola.rs/).\n\n\n\n\n## Installation\n\nTo get started, install the library with:\n\n``` shell\npip install framex\n```\n\n## Usage\n\n### Python\n\n``` python\nimport framex as fx\n```\n\n#### Loading datasets\n\n``` python\niris = fx.load(\"iris\")\n```\n\nis equivalent to\n\n``` python\nfrom framex import iris\n```\n\nwhich returns a [**polars DataFrame**](https://docs.pola.rs/api/python/stable/reference/dataframe/index.html)\\\nTherefore, you can use all the **polars** functions and methods on the returned **DataFrame**.\n\n``` python\niris.head()\n```\n\n``` text\nshape: (5, 5)\n┌──────────────┬─────────────┬──────────────┬─────────────┬─────────┐\n│ sepal_length ┆ sepal_width ┆ petal_length ┆ petal_width ┆ species │\n│ ---          ┆ ---         ┆ ---          ┆ ---         ┆ ---     │\n│ f32          ┆ f32         ┆ f32          ┆ f32         ┆ str     │\n╞══════════════╪═════════════╪══════════════╪═════════════╪═════════╡\n│ 5.1          ┆ 3.5         ┆ 1.4          ┆ 0.2         ┆ setosa  │\n│ 4.9          ┆ 3.0         ┆ 1.4          ┆ 0.2         ┆ setosa  │\n│ 4.7          ┆ 3.2         ┆ 1.3          ┆ 0.2         ┆ setosa  │\n│ 4.6          ┆ 3.1         ┆ 1.5          ┆ 0.2         ┆ setosa  │\n│ 5.0          ┆ 3.6         ┆ 1.4          ┆ 0.2         ┆ setosa  │\n└──────────────┴─────────────┴──────────────┴─────────────┴─────────┘\n```\n\n``` python\niris = fx.load(\"iris\", lazy=True)\n```\n\nwhich returns a [**polars LazyFrame**](https://docs.pola.rs/api/python/stable/reference/lazyframe/index.html)\n\nBoth these operations create local copies of the datasets by default `cache=True`.\n\n#### Available datasets\n\nTo see the list of available datasets, run:\n\n``` python\nfx.available()\n```\n\n\n``` python\n{'remote': ['iris', 'mpg', 'netflix', 'starbucks', 'titanic'], 'local': ['titanic']}\n```\nPS, shorthened for clarity\n\nwhich returns a dictionary of both **locally** and **remotely** available datasets.\n\nTo see only **local** or **remote** datasets, run:\n\n``` python\nfx.available(\"local\")\nfx.available(\"remote\")\n```\n\n``` python\n{'local': ['titanic']}\n{'remote': ['iris', 'mpg', 'netflix', 'starbucks', 'titanic']}\n```\n\n#### Getting information on Datasets\n\nTo get information on a dataset, run:\n\n``` python\nfx.about(\"mpg\") # basically the same as `fx.about(\"mpg\", mode=\"print\")`\n```\n\nwhich will print the information on the dataset as the following:\n\n``` text\nNAME    : mpg\nSOURCE  : https://www.kaggle.com/datasets/uciml/autompg-dataset\nLICENSE : CC0: Public Domain\nORIGIN  : Kaggle\nOG NAME : autompg-dataset\n```\n\nOr you can get the information as a single row polars.DataFrame by running:\n\n``` python\nrow = fx.about(\"mpg\", mode=\"row\")\nprint(row)\n```\n\nwhich will print the information on the dataset **ASCII art** as the following:\n\n``` text\nshape: (1, 4)\n┌──────┬─────────────────────────────────┬────────────────────┬────────┐       \n│ name ┆ source                          ┆ license            ┆ origin │       \n│ ---  ┆ ---                             ┆ ---                ┆ ---    │       \n│ str  ┆ str                             ┆ str                ┆ str    │       \n╞══════╪═════════════════════════════════╪════════════════════╪════════╡       \n│ mpg  ┆ https://www.kaggle.com/dataset… ┆ CC0: Public Domain ┆ Kaggle │       \n└──────┴─────────────────────────────────┴────────────────────┴────────┘ \n```\n\nor you can simply treat `row` as a polars DataFrame in your code.\n\n#### Getting Dataset URLs\n\nIn case you need the file links.\n\n``` python\nurl_pokemon = fx.get_url(\"pokemon\")\n```\n\nby default, the format is \" feather\".\n\nOptionally, you can specify the format of the dataset.\n\n``` python\nurl_pokemon_csv = fx.get_url(\"pokemon\", format=\"csv\")\n```\n\n### CLI\n\nframex CLI has a slight overhead of around 400 milliseconds due to imports. However, operations still take less than a second, unless bottlenecked by the download speed. \n\nTO see all the available commands, run:\n``` shell\nfx -h\n```\n\n![Banner](https://github.com/datavil/framex/blob/master/.github/mainCLI.png?raw=true)\n\n\n#### get\n\nGet a single dataset (to the current directory):\n\n``` shell\nfx get iris\n```\n\nor get multiple datasets:\n\n``` shell\nfx get iris mpg titanic\n```\n\nwhich will download dataset(s) to the current directory.\n\nto get the datasets into cache directory:\n\n``` shell\nfx get iris mpg titanic --cache\n```\n\nor to a specific directory:\n\n``` shell\nfx get iris mpg titanic --dir data\n```\n\n#### list\n\nTo get the name of the available datasets on the **remote server**.\n\n``` shell\nfx list\n```\n\nthis will list all available datasets on the remote server.\n\n\nto get the names of the available datasets that includes \"dia\"\n``` shell\nfx list dia\n```\n\n``` shell\t\nLocally available datasets: (feather, parquet, csv, other)\n\nRemote datasets:\ndiamonds\n```\n\n#### about\n\nTo get information on a dataset or datasets, run:\n\n``` shell\nfx about mpg iris\n```\n\n#### show\n\nTo show a preview of a single dataset\n\n``` shell\nfx show iris\n```\n\n#### describe\n\nTo describe (or summarize) a dataset\n\n``` shell\nfx describe iris\n```\n\nFor more parameters\n\n``` shell\nfx get --help\n```\n\n#### bring\n\nBring a dataset to the current directory from cache:\n\n``` shell\nfx bring iris\n```\n\nor bring multiple datasets:\n\n``` shell\nfx bring iris mpg titanic\n```\n\nwhich will bring dataset(s) to the current directory from cache directory.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatavil%2Fframex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatavil%2Fframex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatavil%2Fframex/lists"}