{"id":15674762,"url":"https://github.com/bobronium/duper","last_synced_at":"2025-05-06T22:13:47.661Z","repository":{"id":65409361,"uuid":"591831150","full_name":"Bobronium/duper","owner":"Bobronium","description":"20-50x faster deepcopy() replacement","archived":false,"fork":false,"pushed_at":"2023-08-17T11:19:56.000Z","size":43,"stargazers_count":15,"open_issues_count":2,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-05-06T22:13:38.903Z","etag":null,"topics":["copy","deepcopy","duplicator","fast","performance","python","slow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Bobronium.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-22T02:11:10.000Z","updated_at":"2024-11-15T10:34:02.000Z","dependencies_parsed_at":"2024-10-23T12:09:20.118Z","dependency_job_id":null,"html_url":"https://github.com/Bobronium/duper","commit_stats":{"total_commits":14,"total_committers":2,"mean_commits":7.0,"dds":0.0714285714285714,"last_synced_commit":"f2b4c385521808f5a4475e127b7ed87b2c385d85"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bobronium%2Fduper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bobronium%2Fduper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bobronium%2Fduper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bobronium%2Fduper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Bobronium","download_url":"https://codeload.github.com/Bobronium/duper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252776600,"owners_count":21802469,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["copy","deepcopy","duplicator","fast","performance","python","slow"],"created_at":"2024-10-03T15:50:14.736Z","updated_at":"2025-05-06T22:13:47.212Z","avatar_url":"https://github.com/Bobronium.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# duper\n\n20-50x faster than `copy.deepcopy()` on mutable objects.\n\nAims to fill the gaps in performance and obscurity between copy, pickle, json and other serialization libraries, becoming the go-to library for copying objects within the same Python process.\n\n```shell\npip install duper\n```\n[Skip to FAQ](#faq)... \n\n\nNote: In its current implementation, duper.deepdups(x) might be 2-5 times slower than copy.deepcopy() for a single operation. It's when you need to create many identical copies of the same object, using duper.deepdups(x) is going to be advantageous due to its specific design.\n\nIf you have any feedback or ideas, please [open an issue on GitHub](https://github.com/Bobronium/duper/issues) or reach out via [bobronium@gmail.com](mailto:bobronium@gmail.com) or [Telegram](https://t.me/Bobronium).\n\n---\n\n### Showcase\n##### Using unreleased [timesup](https://github.com/Bobronium/timesup) library \\o/. I've planned to release it soon after this one, but had to spend my *time* elswhere and put Open Source on pause. Hopefully, I'll make a first release later this year.\n\n```py\nimport duper\nimport copy\nfrom timesup import timesup\n\n\n@timesup(number=100000, repeats=3)\ndef reconstruction():\n    x = {\"a\": 1, \"b\": [(1, 2, 3), (4, 5, 6)], \"c\": [object(), object(), object()]}  # i\n\n    copy.deepcopy(x)         # ~0.00576 ms (deepcopy)\n    dup = duper.deepdups(x)  # ~0.03131 ms (duper_build)\n    dup()                    # ~0.00013 ms (duper_dup): 45.18 times faster than deepcopy\n```\n\n### Real use case\n#### Pydantic\n\u003cdetails\u003e\n\u003csummary\u003eModels definition\u003c/summary\u003e\n\n```py\nfrom datetime import datetime\nfrom functools import wraps\n\nimport duper\nfrom pydantic import BaseModel, Field\nfrom pydantic.fields import FieldInfo\n\n\nclass User(BaseModel):\n    id: int\n    name: str = \"John Doe\"\n    signup_ts: datetime | None = None\n    friends: list[int] = []\n    skills: dict[str, int] = {\n        \"foo\": {\"count\": 4, \"size\": None},\n        \"bars\": [\n            {\"apple\": \"x1\", \"banana\": \"y\"},\n            {\"apple\": \"x2\", \"banana\": \"y\"},\n        ],\n    }\n\n\n\n@wraps(Field)\ndef FastField(default, *args, **kwargs):\n    \"\"\"\n    Overrides the fields that need to be copied to have default_factories\n    \"\"\"    \n    default_factory = duper.deepdups(default)\n    field_info: FieldInfo = Field(*args, default_factory=default_factory, **kwargs)\n    return field_info\n\n\nclass FastUser(BaseModel):\n    id: int\n    name: str = FastField(\"John Doe\")\n    signup_ts: datetime | None = FastField(None)\n    friends: list[int] = FastField([])\n    skills: dict[str, int] = FastField(\n        {\n            \"foo\": {\"count\": 4, \"size\": None},\n            \"bars\": [\n                {\"apple\": \"x1\", \"banana\": \"y\"},\n                {\"apple\": \"x2\", \"banana\": \"y\"},\n            ],\n        }\n    )\n```\n\n\u003c/details\u003e\n\n```py\n@timesup(number=100000, repeats=3)\ndef pydantic_defaults():\n    User(id=42)        # ~0.00935 ms (with_deepcopy)\n    FastUser(id=1337)  # ~0.00292 ms (with_duper): 3.20 times faster than with_deepcopy\n\n```\n\n### FAQ\n#### What's wrong with `copy.deepcopy()`?\nWell, it's slow. [Extremely slow](https://stackoverflow.com/questions/24756712/deepcopy-is-extremely-slow), in fact. This has been noted by many, but [no equally powerful alternatives](https://stackoverflow.com/questions/1410615/copy-deepcopy-vs-pickle) were suggested.\n\n#### Why not just rewrite it in C or Rust?\n`deepcopy()` needs to examine an arbitrary Python object each time the copy is needed. I figured that this must be quite wasteful, regardless of whether the code that executes this algorithm is compiled or not, since interacting with Python objects inevitably invokes the slow Python interpreter.\n\nWhen I had a proof of concept, I discovered [gh-72793: C implementation of parts of copy.deepcopy](https://github.com/python/cpython/pull/91610), which further confirmed my assumptions.\n\n#### How is `duper` so fast without even being compiled?\nInstead of interacting with slow Python objects for each copy, it compiles concrete instructions to reproduces the object. There is still an interpreter overhead when reconstructing the object, but now it already knows the exact actions that are needed and just executes them.\nInterestingly, I learned that this approach has a lot in common with how `pickle` and `marshal` work.\n\n#### How is it different from `pickle` or `marshal`?\nBoth are designed for `serialization`, so they need to dump objects to `bytes` that can be stored on disk and then used to reconstruct the object, even in a different Python process.\nThis creates many constraints on the data they can serialize, as well as the speed of reconstruction.\n\n`duper`, however, is not constrained by these problems. It only needs to guarantee that the object can be recreated within the same Python process, and it can use that to its advantage.\n\n#### Are there any drawbacks to this approach?\nPerhaps the only drawback is that it's non-trivial to implement.\nWhen it comes to using it, I can't see any fundamental drawbacks, only advantages.\n\nHowever, there are drawbacks to the current implementation. The approach itself boils down to getting a set of minimal instructions that will produce the needed object. But there are different ways to obtain this set of instructions. The fastest way would be to compile the instructions on the fly while deconstructing the object. However, for the sake of simplicity, I used a slower approach of building an AST that compiles to the desired bytecode. Removing this intermediate step should increase the performance of the initial construction by 20-50 times.\n\n#### Is this a drop-in replacement for `deepcopy`?\nNot quite yet, but it aims to be. \n\n#### How should I use it?\n`duper` shines when you need to make multiple copies of the same object.\n\nHere's an example where duper can help the most:\n```python\nimport copy\ndata = {\"a\": 1, \"b\": [[1, 2, 3], [4, 5, 6]]}\ncopies = [copy.deepcopy(data) for _ in range(10000)]\n```\nBy pre-compiling instructions in a separate one-time step, we eliminate all of the overhead from the copying phase: \n```python\nimport duper\ndata = {\"a\": 1, \"b\": [[1, 2, 3], [4, 5, 6]]}\nreconstruct_data = duper.deepdups(data)\ncopies = [reconstruct_data() for _ in range(10000)]\n```\n\n#### Is it production ready?\n[Hell no!](#-project-is-in-poc-state)\n\n### 🚧 Project is in a PoC state\nCurrent priorities\n- [x] Support for immutable types\n- [x] Support for builtin types\n- [x] Support for arbitrary types\n- [x] Partial support for `__deepcopy__` and `__copy__` overrides (memo is not respected)\n- [ ] Support for recursive structures\n- [ ] Find quirky corner cases\n- [ ] Make initial construction faster (potentially 30-50 times faster than current implementation)\n- [ ] Support memo in `__deepcopy__` and `__copy__` overrides\n\nThe project will be ready for release when `duper.deepdups(x)()` behaves the same as `copy.deepcopy()` and is at least as fast, if not faster. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbobronium%2Fduper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbobronium%2Fduper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbobronium%2Fduper/lists"}