{"id":28447585,"url":"https://github.com/danielendler/datason","last_synced_at":"2026-03-06T08:03:07.420Z","repository":{"id":296430084,"uuid":"993359532","full_name":"danielendler/datason","owner":"danielendler","description":"A comprehensive Python package for intelligent serialization that handles complex data types with ease, especially ML/AI workflows.","archived":false,"fork":false,"pushed_at":"2026-03-01T00:54:58.000Z","size":3382,"stargazers_count":2,"open_issues_count":5,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-03-01T04:05:54.145Z","etag":null,"topics":["ai","api-development","data-persistence","data-science","deserialization","json","machine-learning","ml","numpy","pandas","python","pytorch","scikit-learn","serialization","tensorflow","workflow-automation"],"latest_commit_sha":null,"homepage":"https://danielendler.github.io/datason/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/danielendler.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":"docs/security.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-05-30T16:49:56.000Z","updated_at":"2026-02-07T23:03:55.000Z","dependencies_parsed_at":"2025-08-25T19:16:08.700Z","dependency_job_id":"2364b4ba-b8e5-42a3-be78-6a554c5cb298","html_url":"https://github.com/danielendler/datason","commit_stats":null,"previous_names":["danielendler/serialpy","danielendler/datason"],"tags_count":20,"template":false,"template_full_name":null,"purl":"pkg:github/danielendler/datason","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielendler%2Fdatason","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielendler%2Fdatason/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielendler%2Fdatason/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielendler%2Fdatason/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/danielendler","download_url":"https://codeload.github.com/danielendler/datason/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielendler%2Fdatason/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30166876,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-06T07:56:45.623Z","status":"ssl_error","status_checked_at":"2026-03-06T07:55:55.621Z","response_time":250,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","api-development","data-persistence","data-science","deserialization","json","machine-learning","ml","numpy","pandas","python","pytorch","scikit-learn","serialization","tensorflow","workflow-automation"],"created_at":"2025-06-06T12:06:06.295Z","updated_at":"2026-03-06T08:03:07.386Z","avatar_url":"https://github.com/danielendler.png","language":"Python","readme":"# datason\n\n[![CI](https://github.com/danielendler/datason/actions/workflows/ci.yml/badge.svg)](https://github.com/danielendler/datason/actions/workflows/ci.yml)\n[![codecov](https://codecov.io/gh/danielendler/datason/graph/badge.svg?token=UYL9LvVb8O)](https://codecov.io/gh/danielendler/datason)\n[![PyPI version](https://img.shields.io/pypi/v/datason.svg)](https://pypi.org/project/datason/)\n[![Python versions](https://img.shields.io/pypi/pyversions/datason.svg)](https://pypi.org/project/datason/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Docs](https://img.shields.io/badge/docs-GitHub%20Pages-blue)](https://danielendler.github.io/datason/)\n\n**Drop-in replacement for `json.dumps`/`json.loads` that handles datetime, NumPy, Pandas, PyTorch, and 50+ Python types. Zero dependencies.**\n\n```python\nimport datason\nimport datetime as dt\nimport numpy as np\n\n# Just replace json.dumps with datason.dumps — everything else works\ndatason.dumps({\"ts\": dt.datetime.now(), \"scores\": np.array([0.9, 0.1])})\n```\n\nNo more `TypeError: Object of type datetime is not JSON serializable`.\n\n## Install\n\n```bash\npip install datason                    # Core (zero dependencies)\npip install datason[numpy]             # + NumPy support\npip install datason[pandas]            # + Pandas support\npip install datason[ml]                # + PyTorch, TensorFlow, scikit-learn, SciPy\npip install datason[all]               # Everything\n```\n\nRequires Python 3.10+.\n\n## Quick Start\n\n```python\nimport datason\nimport datetime as dt\nimport uuid\nfrom decimal import Decimal\nfrom pathlib import Path\n\n# Works exactly like json for simple data\ndatason.dumps({\"name\": \"Alice\", \"age\": 30})\n# '{\"name\": \"Alice\", \"age\": 30}'\n\n# But also handles complex types that json.dumps cannot\ndata = {\n    \"timestamp\": dt.datetime(2024, 6, 15, 10, 30),\n    \"id\": uuid.uuid4(),\n    \"price\": Decimal(\"19.99\"),\n    \"config_path\": Path(\"/data/models\"),\n}\njson_str = datason.dumps(data)\n\n# And brings them back on deserialization\nrestored = datason.loads(json_str)\nassert isinstance(restored[\"timestamp\"], dt.datetime)\nassert isinstance(restored[\"id\"], uuid.UUID)\n```\n\n### NumPy + Pandas\n\n```python\nimport numpy as np\nimport pandas as pd\nimport datason\n\n# NumPy arrays serialize with shape and dtype preserved\narr = np.array([[1.0, 2.0], [3.0, 4.0]])\njson_str = datason.dumps(arr)\nrestored = datason.loads(json_str)\nassert isinstance(restored, np.ndarray)\nassert restored.shape == (2, 2)\n\n# Pandas DataFrames serialize as records by default\ndf = pd.DataFrame({\"name\": [\"Alice\", \"Bob\"], \"score\": [95.5, 87.3]})\njson_str = datason.dumps(df)\nrestored = datason.loads(json_str)\nassert isinstance(restored, pd.DataFrame)\n```\n\n### ML Frameworks\n\n```python\nimport torch\nimport datason\n\n# PyTorch tensors\ntensor = torch.randn(3, 3)\njson_str = datason.dumps({\"weights\": tensor})\nrestored = datason.loads(json_str)\nassert isinstance(restored[\"weights\"], torch.Tensor)\n\n# Also supports: TensorFlow tensors, scikit-learn models, SciPy sparse matrices\n```\n\n## API — 5 Functions\n\n```python\nimport datason\n\ndatason.dumps(obj, **config)    # Serialize to JSON string\ndatason.loads(s, **config)      # Deserialize from JSON string\ndatason.dump(obj, fp, **config) # Write to file\ndatason.load(fp, **config)      # Read from file\ndatason.config(**config)        # Context manager for temp config\n```\n\nThat's the entire public API.\n\n## Supported Types\n\n| Category | Types |\n|----------|-------|\n| **JSON primitives** | `str`, `int`, `float`, `bool`, `None`, `dict`, `list` |\n| **Stdlib** | `datetime`, `date`, `time`, `timedelta`, `UUID`, `Decimal`, `complex`, `Path`, `set`, `tuple`, `frozenset` |\n| **NumPy** | `ndarray`, `integer`, `floating`, `bool_`, `complexfloating` |\n| **Pandas** | `DataFrame`, `Series`, `Timestamp`, `Timedelta` |\n| **PyTorch** | `Tensor` |\n| **TensorFlow** | `Tensor`, `EagerTensor` |\n| **scikit-learn** | All estimators (`LinearRegression`, `RandomForestClassifier`, etc.) |\n| **SciPy** | Sparse matrices (`csr`, `csc`, `coo`, etc.) |\n| **Polars** | `DataFrame`, `Series` |\n| **JAX** | `Array` |\n| **Plotly** | `Figure` |\n\nAll non-core types are optional — install the relevant extra (`numpy`, `pandas`, `ml`).\n\n## Configuration\n\n```python\nimport datason\nfrom datason import DateFormat, NanHandling, DataFrameOrient\n\n# Inline overrides\ndatason.dumps(data, sort_keys=True)\ndatason.dumps(data, date_format=DateFormat.UNIX)\ndatason.dumps(data, nan_handling=NanHandling.STRING)\ndatason.dumps(data, include_type_hints=False)  # Smaller output, no round-trip\n\n# Context manager for scoped config\nwith datason.config(sort_keys=True, nan_handling=NanHandling.STRING):\n    datason.dumps(data)\n\n# Presets for common use cases\nfrom datason import ml_config, api_config, strict_config, performance_config\n\nwith datason.config(**ml_config().__dict__):\n    datason.dumps(model_output)   # UNIX_MS dates, fallback to string\n\nwith datason.config(**api_config().__dict__):\n    datason.dumps(response)       # ISO dates, sorted keys, no type hints\n```\n\n### Config Options\n\n| Option | Type | Default | Description |\n|--------|------|---------|-------------|\n| `date_format` | `DateFormat` | `ISO` | How to serialize datetimes: `ISO`, `UNIX`, `UNIX_MS`, `STRING` |\n| `dataframe_orient` | `DataFrameOrient` | `RECORDS` | DataFrame format: `RECORDS`, `SPLIT`, `DICT`, `LIST`, `VALUES` |\n| `nan_handling` | `NanHandling` | `NULL` | Float NaN/Inf: `NULL`, `STRING`, `KEEP`, `DROP` |\n| `include_type_hints` | `bool` | `True` | Emit type metadata for round-trip fidelity |\n| `sort_keys` | `bool` | `False` | Sort dict keys in output |\n| `max_depth` | `int` | `50` | Max nesting depth (security) |\n| `max_size` | `int` | `100_000` | Max dict/list size (security) |\n| `fallback_to_string` | `bool` | `False` | `str()` unknown types instead of raising |\n| `strict` | `bool` | `True` | Raise on unrecognized type metadata |\n| `redact_fields` | `tuple[str, ...]` | `()` | Field names to redact |\n| `redact_patterns` | `tuple[str, ...]` | `()` | Regex patterns to redact from strings |\n\n## Security Features\n\n### PII Redaction\n\n```python\n# Redact by field name (case-insensitive substring match)\ndatason.dumps(user_data, redact_fields=(\"password\", \"key\", \"secret\", \"token\"))\n# {\"username\": \"alice\", \"password\": \"[REDACTED]\", \"api_key\": \"[REDACTED]\"}\n\n# Redact patterns in string values (built-in: email, ssn, credit_card, phone_us, ipv4)\ndatason.dumps(data, redact_patterns=(\"email\", \"ssn\"))\n```\n\n### Integrity Verification\n\n```python\nfrom datason.security.integrity import wrap_with_integrity, verify_integrity\n\n# Wrap with hash-based integrity envelope\nwrapped = wrap_with_integrity(datason.dumps(data))\nis_valid, payload = verify_integrity(wrapped)\n\n# HMAC with secret key\nwrapped = wrap_with_integrity(datason.dumps(data), key=\"secret\")\nis_valid, payload = verify_integrity(wrapped, key=\"secret\")\n```\n\n### Built-in Limits\n- **Max depth**: 50 (prevents stack overflow from nested data)\n- **Max size**: 100,000 items per dict/list (prevents memory exhaustion)\n- **Circular reference detection** (prevents infinite loops)\n\nAll limits raise `SecurityError` and are configurable.\n\n## How It Works\n\ndatason uses a plugin-based architecture. Every type beyond JSON primitives is handled by a `TypePlugin` registered in a priority-sorted registry:\n\n```\nYour object --\u003e dumps() --\u003e Plugin registry --\u003e Type-specific serializer --\u003e JSON\nJSON string --\u003e loads() --\u003e Plugin registry --\u003e Type-specific deserializer --\u003e Your object\n```\n\nType metadata is embedded as `{\"__datason_type__\": \"datetime\", \"__datason_value__\": \"2024-01-15T10:30:00\"}`, enabling lossless round-trips.\n\n### Writing a Custom Plugin\n\n```python\nfrom datason._protocols import TypePlugin, SerializeContext, DeserializeContext\nfrom datason._registry import default_registry\nfrom datason._types import TYPE_METADATA_KEY, VALUE_METADATA_KEY\n\nclass MoneyPlugin:\n    name = \"money\"\n    priority = 400  # 400+ for user plugins\n\n    def can_handle(self, obj):\n        return isinstance(obj, Money)\n\n    def serialize(self, obj, ctx):\n        return {TYPE_METADATA_KEY: \"Money\", VALUE_METADATA_KEY: {\"amount\": str(obj.amount), \"currency\": obj.currency}}\n\n    def can_deserialize(self, data):\n        return data.get(TYPE_METADATA_KEY) == \"Money\"\n\n    def deserialize(self, data, ctx):\n        v = data[VALUE_METADATA_KEY]\n        return Money(Decimal(v[\"amount\"]), v[\"currency\"])\n\ndefault_registry.register(MoneyPlugin())\n```\n\n## For AI Agents\n\ndatason includes [`llms.txt`](llms.txt) and [`llms-full.txt`](llms-full.txt) for AI agent discoverability. The full reference contains complete API signatures, all config options, and ready-to-use code examples.\n\n## Documentation\n\nFull documentation at **[danielendler.github.io/datason](https://danielendler.github.io/datason/)**.\n\n## License\n\nMIT\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanielendler%2Fdatason","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdanielendler%2Fdatason","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanielendler%2Fdatason/lists"}