{"id":13737398,"url":"https://github.com/dylan-profiler/visions","last_synced_at":"2025-05-15T01:09:56.803Z","repository":{"id":40330911,"uuid":"227633867","full_name":"dylan-profiler/visions","owner":"dylan-profiler","description":"Type System for Data Analysis in Python","archived":false,"fork":false,"pushed_at":"2025-02-01T23:40:28.000Z","size":39720,"stargazers_count":212,"open_issues_count":18,"forks_count":19,"subscribers_count":6,"default_branch":"develop","last_synced_at":"2025-05-03T05:02:27.074Z","etag":null,"topics":["data-analysis","data-science","hacktoberfest","numpy","pandas","python","spark","type-inference","type-system"],"latest_commit_sha":null,"homepage":"https://dylan-profiler.github.io/visions/visions/getting_started/usage/types.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dylan-profiler.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-12-12T15:09:01.000Z","updated_at":"2025-04-30T04:05:57.000Z","dependencies_parsed_at":"2024-02-05T22:47:26.405Z","dependency_job_id":"9a463c31-53f3-42b6-a891-6de92931c37e","html_url":"https://github.com/dylan-profiler/visions","commit_stats":{"total_commits":878,"total_committers":12,"mean_commits":73.16666666666667,"dds":0.520501138952164,"last_synced_commit":"a0b55bbf95e6efe001195e4b497358d6283966b5"},"previous_names":[],"tags_count":23,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dylan-profiler%2Fvisions","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dylan-profiler%2Fvisions/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dylan-profiler%2Fvisions/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dylan-profiler%2Fvisions/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dylan-profiler","download_url":"https://codeload.github.com/dylan-profiler/visions/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254059510,"owners_count":22007769,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","data-science","hacktoberfest","numpy","pandas","python","spark","type-inference","type-system"],"created_at":"2024-08-03T03:01:46.389Z","updated_at":"2025-05-15T01:09:51.793Z","avatar_url":"https://github.com/dylan-profiler.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"images/visions.png\" width=\"600px\"\u003e\u003cbr\u003e\n  \u003ci\u003eAnd these visions of data types, they kept us up past the dawn.\u003c/i\u003e \n\u003c/div\u003e\n\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://pypi.org/project/visions/\"\u003e\n    \u003cimg src=\"https://pepy.tech/badge/visions\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://pypi.org/project/visions/\"\u003e\n    \u003cimg src=\"https://pepy.tech/badge/visions/month\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://pypi.org/project/visions/\"\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/pyversions/visions\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://pypi.org/project/visions/\"\u003e\n    \u003cimg src=\"https://badge.fury.io/py/visions.svg\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://doi.org/10.21105/joss.02145\"\u003e\n    \u003cimg src=\"https://joss.theoj.org/papers/10.21105/joss.02145/status.svg\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://mybinder.org/v2/gh/dylan-profiler/visions/master\"\u003e\n    \u003cimg src=\"https://mybinder.org/badge_logo.svg\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n# The Semantic Data Library\n\n``Visions`` provides a set of tools for defining and using *semantic* data types.\n\n- [x] [Semantic type](https://dylan-profiler.github.io/visions/visions/getting_started/concepts.html#types) detection \u0026\n  inference on sequence data.\n\n- [x] Automated data processing\n\n- [x] Completely customizable. `Visions` makes it easy to build and modify semantic data types for domain specific\n  purposes\n\n- [x] Out of the box support for\n  multiple [backend implementations](https://github.com/dylan-profiler/visions#supported-frameworks) including pandas,\n  spark, numpy, and python\n\n- [x] A robust set\n  of [default types and typesets](https://dylan-profiler.github.io/visions/visions/getting_started/usage/defaults.html)\n  covering the most common use cases.\n\nCheck out the complete\ndocumentation [here](https://dylan-profiler.github.io/visions/visions/getting_started/introduction.html).\n\n## Installation\n\nSource code is available on [github](https://github.com/dylan-profiler/visions) and binary installers via pip.\n\n```\n# Pip\npip install visions\n```\n\nComplete installation instructions (including extras) are available in\nthe [docs](https://dylan-profiler.github.io/visions/visions/getting_started/installation.html).\n\n## Quick Start Guide\n\nIf you want to play immediately check out the examples folder\non [![](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dylan-profiler/visions/master). Otherwise,\nlet's get some data\n\n```python\nimport pandas as pd\n\ndf = pd.read_csv(\"https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv\")\ndf.head(2)\n```\n\n\u003ctable border=\"1\" class=\"dataframe\"\u003e\n  \u003cthead\u003e\n    \u003ctr style=\"text-align: right;\"\u003e\n      \u003cth\u003ePassengerId\u003c/th\u003e\n      \u003cth\u003eSurvived\u003c/th\u003e\n      \u003cth\u003ePclass\u003c/th\u003e\n      \u003cth\u003eName\u003c/th\u003e\n      \u003cth\u003eSex\u003c/th\u003e\n      \u003cth\u003eAge\u003c/th\u003e\n      \u003cth\u003eSibSp\u003c/th\u003e\n      \u003cth\u003eParch\u003c/th\u003e\n      \u003cth\u003eTicket\u003c/th\u003e\n      \u003cth\u003eFare\u003c/th\u003e\n      \u003cth\u003eCabin\u003c/th\u003e\n      \u003cth\u003eEmbarked\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003ctd\u003e1\u003c/td\u003e\n      \u003ctd\u003e0\u003c/td\u003e\n      \u003ctd\u003e3\u003c/td\u003e\n      \u003ctd\u003eBraund, Mr. Owen Harris\u003c/td\u003e\n      \u003ctd\u003emale\u003c/td\u003e\n      \u003ctd\u003e22.0\u003c/td\u003e\n      \u003ctd\u003e1\u003c/td\u003e\n      \u003ctd\u003e0\u003c/td\u003e\n      \u003ctd\u003eA/5 21171\u003c/td\u003e\n      \u003ctd\u003e7.2500\u003c/td\u003e\n      \u003ctd\u003eNaN\u003c/td\u003e\n      \u003ctd\u003eS\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003e2\u003c/td\u003e\n      \u003ctd\u003e1\u003c/td\u003e\n      \u003ctd\u003e1\u003c/td\u003e\n      \u003ctd\u003eCumings, Mrs. John Bradley (Florence Briggs Thayer)\u003c/td\u003e\n      \u003ctd\u003efemale\u003c/td\u003e\n      \u003ctd\u003e38.0\u003c/td\u003e\n      \u003ctd\u003e1\u003c/td\u003e\n      \u003ctd\u003e0\u003c/td\u003e\n      \u003ctd\u003ePC 17599\u003c/td\u003e\n      \u003ctd\u003e71.2833\u003c/td\u003e\n      \u003ctd\u003eC85\u003c/td\u003e\n      \u003ctd\u003eC\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n\nThe most important abstraction in `visions` are Types - these represent semantic notions about data. You have access to\na range of well tested types like `Integer`, `Float`, and `Files` covering the most common software development use\ncases.\nTypes can be bundled together into typesets. Behind the scenes, `visions` builds a traversable graph for any collection\nof types.\n\n```python\nfrom visions import types, typesets\n\n# StandardSet is the basic builtin typeset\ntypeset = typesets.CompleteSet()\ntypeset.plot_graph()\n```\n\n![](https://dylan-profiler.github.io/visions/_images/typeset_complete_base.svg)\nNote: Plots require pygraphviz to be [installed](https://pygraphviz.github.io/documentation/stable/install.html).\n\nBecause of the special relationship between types these graphs can be used to detect the type of your data or _infer_ a\nmore appropriate one.\n\n```python\n# Detection looks like this\ntypeset.detect_type(df)\n\n# While inference looks like this\ntypeset.infer_type(df)\n\n# Inference works well even if we monkey with the data, say by converting everything to strings\ntypeset.infer_type(df.astype(str))\n\u003e\u003e {\n    'PassengerId': Integer,\n    'Survived': Integer,\n    'Pclass': Integer,\n    'Name': String,\n    'Sex': String,\n    'Age': Float,\n    'SibSp': Integer,\n    'Parch': Integer,\n    'Ticket': String,\n    'Fare': Float,\n    'Cabin': String,\n    'Embarked': String\n}\n```\n\n`Visions` solves many of the most common problems working with tabular data for example, sequences of Integers are still\nrecognized as integers whether they have trailing decimal 0's from being cast to float, missing values, or something\nelse altogether. Much of this cleaning is performed automatically providing nicely cleaned and processed data as well.\n\n```python\ncleaned_df = typeset.cast_to_inferred(df)\n```\n\nThis is only a small taste of everything visions can do\nincluding [building your own](https://dylan-profiler.github.io/visions/visions/getting_started/extending.html) domain\nspecific types and typesets so please check out the [API](https://dylan-profiler.github.io/visions/visions/api.html)\ndocumentation or the [examples/](https://github.com/dylan-profiler/visions/tree/develop/examples) directory for more\ninfo!\n\n## Supported frameworks\n\nThanks to its dispatch based implementation `Visions` is able to exploit framework specific capabilities offered by\nlibraries like pandas and spark. Currently it works with the following backends by default.\n\n- [Pandas](https://github.com/pandas-dev/pandas) (feature complete)\n- [Numpy](https://github.com/numpy/numpy) (boolean, complex, date time, float, integer, string, time deltas, string,\n  objects)\n- [Spark](https://github.com/apache/spark) (boolean, categorical, date, date time, float, integer, numeric, object,\n  string)\n- [Python](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range) (string, float, integer,\n  date time, time delta, boolean, categorical, object, complex - other datatypes are untested)\n\nIf you're using pandas it will also take advantage of parallelization tools like\n[swifter](https://github.com/jmcarpenter2/swifter) if available.\n\nIt also offers a simple annotation based API for registering new implementations as needed. For example, if you wished\nto extend the categorical data type to include a Dask specific implementation you might do something like\n\n```python\nfrom visions.types.categorical import Categorical\nfrom pandas.api import types as pdt\nimport dask\n\n\n@Categorical.contains_op.register\ndef categorical_contains(series: dask.dataframe.Series, state: dict) -\u003e bool:\n    return pdt.is_categorical_dtype(series.dtype)\n```\n\n## Contributing and support\n\nContributions to `visions` are welcome. For more information, please visit the community\ncontributions [page](https://dylan-profiler.github.io/visions/visions/contributing/contributing.html) and join on us\non [slack](https://join.slack.com/t/dylan-profiling/shared_invite/zt-11c9blvpt-AqxXD5AMS9Q6CO7UUm~cRw). The\ngithub [issues tracker](https://github.com/dylan-profiler/visions/issues/new/choose) is used for reporting bugs, feature\nrequests and support questions.\n\nAlso, please check out some of the other companies and packages using `visions` including:\n\n* [pandas profiling](https://github.com/pandas-profiling/pandas-profiling)\n* [Compress*io*](https://github.com/dylan-profiler/compressio)\n* [Bitrook](https://www.bitrook.com/)\n\nIf you're currently using `visions` or would like to be featured here please let us know.\n\n## Acknowledgements\n\nThis package is part of the [dylan-profiler](https://github.com/dylan-profiler)  project. The package is core component\nof [pandas-profiling](https://github.com/pandas-profiling/pandas-profiling). More information can be\nfound [here](https://dylan-profiler.github.io/visions/visions/background/about.html\u003e). This work was partially supported\nby [SIDN Fonds](https://www.sidnfonds.nl/projecten/dylan-data-analysis-leveraging-automatisation).\n\n![](https://github.com/dylan-profiler/visions/raw/master/images/SIDNfonds.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdylan-profiler%2Fvisions","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdylan-profiler%2Fvisions","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdylan-profiler%2Fvisions/lists"}