{"id":16137396,"url":"https://github.com/purarue/git_doc_history","last_synced_at":"2026-05-17T04:45:30.047Z","repository":{"id":57434463,"uuid":"471846255","full_name":"purarue/git_doc_history","owner":"purarue","description":"copy/track file history in git, with python bindings to traverse and extract history/files/lines at some date","archived":false,"fork":false,"pushed_at":"2024-10-24T23:19:02.000Z","size":43,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-12T22:51:15.569Z","etag":null,"topics":["data","git"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/git-doc-history/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/purarue.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-03-20T00:49:20.000Z","updated_at":"2024-10-24T23:19:06.000Z","dependencies_parsed_at":"2024-10-26T12:10:49.663Z","dependency_job_id":null,"html_url":"https://github.com/purarue/git_doc_history","commit_stats":{"total_commits":15,"total_committers":1,"mean_commits":15.0,"dds":0.0,"last_synced_commit":"0f6da4013d6f6da1a255d612a062a456c8694ead"},"previous_names":["purarue/git_doc_history","seanbreckenridge/git_doc_history"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/purarue%2Fgit_doc_history","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/purarue%2Fgit_doc_history/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/purarue%2Fgit_doc_history/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/purarue%2Fgit_doc_history/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/purarue","download_url":"https://codeload.github.com/purarue/git_doc_history/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247517919,"owners_count":20951719,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","git"],"created_at":"2024-10-09T23:26:48.816Z","updated_at":"2025-10-24T00:06:01.697Z","avatar_url":"https://github.com/purarue.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# git_doc_history\n\nCopy and track files in `git`, and a library to traverse the history\n\nI use this to track my [`todo.txt`](https://github.com/todotxt/todo.txt-cli) files, changes to configuration files, any shell histories which don't support timestamps (see all of my config files [here](https://github.com/purarue/dotfiles/tree/master/.config/git_doc_history))\n\nThis copies the files to a different directory, so it doesn't interfere with the application/configuration\n\nBy copying those files to a separate directory, I can always roll back to previous file, or see what the file was like a couple days/months ago.\n\nFor shell histories/files which are unique lines of text (e.g., my `todo.txt` file) this also lets me estimate timestamps for when new lines were added to the history/text files, using the `iter_commit_snapshots` and `parse_snapshot_diffs` below, which emits added/removed events for individual lines with estimated times\n\nThis was mostly created for [HPI](https://github.com/purarue/HPI), so I don't have to rewrite the code to extract lines for git history over and over\n\nThis is a general purpose solution for tracking file history in `git` -- so its not extremely opinionated. In some cases it can be seen as a stop-gap solution, to have some file versioning in case you ever want to roll back. It may work particularly well for basic files with a couple dozen lines (e.g. I use it for RSS feeds, `todo.txt`, bookmarks, and a couple history files)\n\n## Installation\n\nRequires `python3.10+`\n\nTo install with pip, run:\n\n```\npip install git_doc_history\n```\n\n## Usage\n\nThe main script to backup data is the bash script [`bin/git_doc_history`](bin/git_doc_history), which gets installed into your `~/.local/bin/` directory.\n\nIf uses a config file (parsed with [`python-dotenv`](https://github.com/theskumar/python-dotenv) -- so you can use bash-like syntax to grab environment variables) like:\n\n```\nSOURCE_DIR=~/.todo  # copy from\nBACKUP_DIR=~/data/todo_git_history # copy to\n# multiple lines means multiple files\nCOPY_FILES=\"todo.txt\ndone.txt\"\n```\n\nYou can either provide the full path to that config file, or place the file in `~/.config/git_doc_history`\n\nFor example, after placing it at `~/.config/git_doc_history/todo` -- to copy/commit any changes, run:\n\n```bash\n$ git_doc_history todo\n```\n\n```\nGenerated configuration:\nSOURCE_DIR: /home/username/data/todo\nBACKUP_DIR: /home/username/data/todo_git_history\nCOPY_FILES: todo.txt\ndone.txt\n'/home/username/data/todo/todo.txt' -\u003e '/home/username/data/todo_git_history/todo.txt'\n'/home/username/data/todo/done.txt' -\u003e '/home/username/data/todo_git_history/done.txt'\n'/home/username/data/todo/.gitignore' -\u003e '/home/username/data/todo_git_history/.gitignore'\n[master f927490] update\n 1 file changed, 1 insertion(+)\n create mode 100644 .gitignore\n```\n\nThat uses `python3 -m git_doc_history shell todo` to parse the configuration file, like:\n\n```bash\neval \"$(python3 -m git_doc_history shell todo)\"\n```\n\nThe python library comes with a small CLI interface to extract a file from some time ago:\n\n```\n$ python3 -m git_doc_history extract-file-at --at 2020-09-20 -c todo todo.txt -\nsetup command of completion\n```\n\nThe `BACKUP_DIR` is of course just a regular git directory -- you can `reset --hard` to some point in the past to get rid of recent commits, `rebase`/`squash` to merge commits or do whatever you please\n\n### Library Usage\n\nMost things will be done with `git_doc_history.DocHistory`\n\nThis doesn't assume the filetype is readable text (you may be storing images/binary doc files in the git repository), so the default is to return the data as `bytes` -- you can `.decode(\"utf-8\")` to convert that to readable text\n\nTo traverse the entire history:\n\n```python\nfrom git_doc_history import DocHistory\nfrom git_doc_history.config import parse_config, resolve_config\n\n# parse the config from the env file\ndoc = DocHistory.from_dict(parse_config(resolve_config(\"todo\")))\n\n# iterate through the history for the todo.txt file\nfor snapshot in doc.iter_commit_snapshots(\"todo.txt\"):\n    print(str(snapshot.commit_sha))\n    print(str(snapshot.dt))\n    print(snapshot.data.decode(\"utf-8\"))\n```\n\n#### Parsing Diffs\n\nIterates through the git history in chronological order, keeping track\nof when data was added or removed. By default, this parses the `file`\ngiven by splitting it into lines. If lines are added/removed, this returns an\nevent which specifies when in the history, and what was added/removed\n\nAlternatively, can pass a `parse_func`, which is a function which\naccepts the `DocHistorySnapshot`, and returns a list of hashable items\nto store as state\n\nFor an example of parsing diffs, see [`examples/todotxt_diff.py`](examples/todotxt_diff.py):\n\nExample output looks something like:\n\n```\nadded 2022-03-08 12:14:45 (C) 2022-03-08 create shebang script +programming\nremoved 2022-03-08 13:14:58 (C) 2022-03-08 create shebang script +programming\nadded 2022-03-08 22:23:39 save formhistory.sqlite in browserexport\nremoved 2022-03-08 23:23:45 save formhistory.sqlite in browserexport\nadded 2022-03-09 02:49:58 (C) create a python fzf wrapper because apparently I can't find a good one\nadded 2022-03-10 16:24:24 (B) 2022-03-10 create plaintext playlist parser module +music\nremoved 2022-03-11 01:30:49 (B) 2022-03-10 create plaintext playlist parser module +music\nadded 2022-03-11 10:37:06 (C) 2022-03-11 sync tmux from home directory +programming\nadded 2022-03-12 03:44:24 install undotree +vim +programming\nremoved 2022-03-12 04:44:51 (C) 2022-03-11 sync tmux from home directory +programming\nremoved 2022-03-12 10:51:20 install undotree +vim +programming\n```\n\nIn this case, 'removed' would mean I either changed the text on the line, or (more likely) I completed it\n\n### Tests\n\n```bash\ngit clone 'https://github.com/purarue/git_doc_history'\ncd ./git_doc_history\npip install '.[testing]'\npytest\nflake8 ./git_doc_history\nmypy ./git_doc_history\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpurarue%2Fgit_doc_history","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpurarue%2Fgit_doc_history","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpurarue%2Fgit_doc_history/lists"}