{"id":16672865,"url":"https://github.com/bgutter/personal-data-mining","last_synced_at":"2026-05-01T08:32:51.116Z","repository":{"id":83119671,"uuid":"189898378","full_name":"bgutter/personal-data-mining","owner":"bgutter","description":"A collection of Python libraries for mining personal data. Intended to be used from an Emacs org-babel notebook, or, in an IPython REPL.","archived":false,"fork":false,"pushed_at":"2021-01-10T18:35:13.000Z","size":33,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-12-26T20:14:23.140Z","etag":null,"topics":["ipython","pandas","personal-data","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bgutter.png","metadata":{"files":{"readme":"README.org","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-06-02T21:45:20.000Z","updated_at":"2025-03-21T20:57:05.000Z","dependencies_parsed_at":null,"dependency_job_id":"9f94d554-9b1f-4ad0-82b9-37d67694ed87","html_url":"https://github.com/bgutter/personal-data-mining","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bgutter/personal-data-mining","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bgutter%2Fpersonal-data-mining","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bgutter%2Fpersonal-data-mining/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bgutter%2Fpersonal-data-mining/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bgutter%2Fpersonal-data-mining/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bgutter","download_url":"https://codeload.github.com/bgutter/personal-data-mining/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bgutter%2Fpersonal-data-mining/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32490810,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-30T13:12:12.517Z","status":"online","status_checked_at":"2026-05-01T02:00:05.856Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ipython","pandas","personal-data","python"],"created_at":"2024-10-12T12:07:32.828Z","updated_at":"2026-05-01T08:32:51.097Z","avatar_url":"https://github.com/bgutter.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"#+TITLE: Personal Data Mining\n\nThis repo consists of a collection of Python packages that I used to\nanalyze my personal data. Generally speaking, they're intended to be\nused either in an Emacs org-file using org-babel, or, in an IPython\nREPL.\n\n* Cash Account Exports\n\n=cash_reader= is a small convenience library for analyzing CSV exports from Mint.com and Tiller Money. Under the hood, it's mostly just Pandas.\n\n** Quick Start\n\n#+begin_src python\nfrom cash_ledger import *\n\n# Read in the transactions\n# The numbers in all examples below have been replaced with 1's and 2's at random\ntransactions = CashLedger.from_mint( \"./transactions-from-mint.csv\" )\n\n# Remove any zero-sum transfer transaction pairs (moving cash between accounts, credit card payments, etc)\n# I always do this immediately after loading data from file, because I never care about these transactions, and\n# it's a little too slow to run on each query.\ntransactions = transactions.transfers( invert=True )\n\n# Where did most of my discretionary spending occur in 2020?\ntransactions.in_accounts([\"Mulligan Bank\", \"Discover\"]).expenses().in_year( 2020 ).in_categories( \"Shopping\" ).by_description().totals().sort_values()\n# Out[305]:\n# description\n# Amazon                                  -1212.22\n# Target                                  -1212.11\n# Ab Abebooks                              -121.12\n# Name: amount, dtype: float64\n\n# Aside from the augmented APIs we'll show below, the transactions object can be treated as a pandas DataFrame\n# We can use the pandas DataFrame.sample() method to get 5 random transactions\ntransactions.sample( 5 )\n# Out[232]:\n#                                          description                               original_description             category             account  Labels  Notes   amount       date\n# 8078                                    Examplething  EXAMPLE THING numbers 34234 HERE 32323023=23=2...             Shopping       Mulligan Bank     NaN    NaN  -221.00 2015-11-01\n# 2209                                          Amazon               AMAZON GO AMZN.COM/BILLWA12121212212             Shopping            Discover     NaN    NaN   -12.12 2019-09-24\n# 167   Internet transfer to Interest Checking account  Internet transfer to Interest Checking account...             Transfer  Some Checking Acct     NaN    NaN -1212.00 2020-12-06\n# 1847                                           Geico     GEICO *AUTO 1212121212121212121212121212121212       Auto Insurance            Discover     NaN    NaN   -12.12 2019-12-05\n# 1583                                     - THANK YOU                         ONLINE PAYMENT - THANK YOU  Credit Card Payment       Mulligan Bank     NaN    NaN   121.12 2020-01-31\n\n# Get yearly total income\ntransactions.income().yearly().totals()\n# Out[215]:\n# date\n# 2013     12121.12\n# 2014     21211.12\n# 2015    212121.12\n# 2016    222111.11\n# 2017    212121.12\n# 2018    121221.21\n# 2019    212221.22\n# 2020    212212.22\n# Name: amount, dtype: float64\n\n# Get all money spent at Walmart (and wal-mart, WAL-MART, etc)\ntransactions.search( \"wal.*mart\" ).expenses().total()\n# Out[212]: -1212.12\n\n# Get all spending per category in 2018\ntransactions.when( after=\"1/1/2018\", before=\"1/1/2019\" ).expenses().by_category().totals()\n# Out[213]:\n# category\n# Advertising           -12.12\n# Air Travel          -1212.12\n# Alcohol \u0026 Bars      -1212.12\n# Amusement             -12.12\n# Auto \u0026 Transport     -121.12\n#                       ...\n# Travel                -12.12\n# Tuition              -121.12\n# Utilities           -1212.12\n# Vacation              -12.12\n# Name: amount, Length: 21, dtype: float64\n\n# Get all accounts which were expensed in summer 2013\ntransactions.when( after=\"5/1/2013\", before=\"8/1/2013\" ).expenses().accounts()\n# Out[221]:\n# array(['FREE CHECKING x121212', 'Discover', 'STATEMENT SAVINGS x121212'],\n#       dtype=object)\n\n# Get all veterinary expenses in two accounts by year\ntransactions.expenses().in_accounts( [ \"Mulligan Bank\", \"Discover\" ] ).in_categories( \"Veterinary\" ).yearly().totals()\n# Out[234]:\n# date\n# 2017   -121.12\n# 2019   -121.22\n# 2020   -222.12\n# Name: amount, dtype: float64\n\n# Set the category for all income from the \"Discover\" account to \"Cash Back\"\n# Then recategorize transactions in any account named like \"EXX...\" as a Student Loan payment.\n# Then set the category for all transactions matching \"ford\" before 2016 as \"Auto Maintenance\"\ntransactions.recategorize( transactions.income().in_accounts( \"Discover\" ), \"Cash Back\", inplace=True )\ntransactions.recategorize( transactions.account_like( \"EXXX.*\" ), \"Student Loan\", inplace=True )\ntransactions.recategorize( transactions.expenses().when( before=\"1/1/2017\" ).search( \"ford\" ), \"Auto Maintenance\", inplace=True )\n\n# Get total spending per vendor (roughly) since the start of 2020\ntransactions.when( after=\"1/1/2020\" ).expenses().by_description().totals().sort_values()\n\n# Remembering that dataframe APIs are available, note that you can use .to_csv() to inspect any transaction subsets\ntransactions.search( \"conoco\" ).when( before=\"4/23/2015\" ).to_csv( \"./old-car-gas-purcahses.csv\" )\n#+end_src\n\n** API Overview\n\nSee source comments for full documentation.\n\nAside from what is documented here, any valid =pandas.DataFrame= operation can be applied to a =TransactionsExport= object.\n\nSimilarly, any valid =pandas.GroupBy= operation may be applied to a =TransactionsExportCollection= object (which is returned by all of the grouping APIs).\n\n*** Filtering APIs\n\n| API                                                     | Description                                                                              |\n|---------------------------------------------------------+------------------------------------------------------------------------------------------|\n| =search( regex, invert=False )=                         | Keep only transactions whose description (original or final) contains regex. Invertible. |\n| =account_like( regex, invert=False )=                   | Keep only transactions whose account contains regex. Invertible.                         |\n| =income()=                                              | Keep only transactions whose amount is more than zero.                                   |\n| =expenses()=                                            | Keep only transactions whose amount is less than or equal to zero.                       |\n| =transfers( invert=False, time_window=None )=           | Keep only transactions which are part of a transfer pair. Invertible.                    |\n| =when( after=None, before=None, invert=False )=         | Keep only transactions which occur in a time range. Invertible.                          |\n| =in_year( year )=                                       | Keep only transactions which occur in a particular year.                                 |\n| =with_amount( above=None, below=None, invert=False )=   | Keep only transactions with occur in an amount range. Invertible.                        |\n| =in_accounts( account_or_accounts, invert=False )=      | Keep only transactions occurring in a set of accounts. Invertible.                       |\n| =in_categories( category_or_categories, invert=False )= | Keep only transactions occurring in a set of categories. Invertible.                     |\n\n*** Editing APIs\n\n| API                                                | Description                                          |\n|----------------------------------------------------+------------------------------------------------------|\n| =recategorize( transaction_subset, new_category )= | Change the category for a selection of transactions. |\n\n*** Descriptive APIs\n\n| API                       | Description                                                                     |\n|---------------------------+---------------------------------------------------------------------------------|\n| =accounts()=              | Get all unique accounts referenced in the current transaction set.              |\n| =categories()=            | Get all unique categories referenced in the current transaction set.            |\n| =descriptions()=          | Get all unique descriptions referenced in the current transaction set.          |\n| =original_descriptions()= | Get all unique original descriptions referenced in the current transaction set. |\n| =total()=                 | Get the sum of all amounts of all transactions in the current set.              |\n\n*** Grouping APIs\n\nAll grouping APIs return a =TransactionsExportCollection=.\n\n| API                         | Description                                 |\n|-----------------------------+---------------------------------------------|\n| =by_category()=             | Group transactions by category.             |\n| =by_account()=              | Group transactions by account.              |\n| =by_description()=          | Group transactions by description.          |\n| =by_original_description()= | Group transactions by original description. |\n| =yearly()=                  | Group transactions by calendar year.        |\n| =monthly()=                 | Group transactions by month.                |\n| =weekly()=                  | Group transactions by week.                 |\n| =daily()=                   | Group transactions by day.                  |\n\n*** Grouped (TransactionsExportCollection) APIs\n\n| API                    | Description                                   |   |\n|------------------------+-----------------------------------------------+---|\n| =totals()=             | Applies \".total()\" to each group.             |   |\n| =transaction_counts()= | Get the number of transactions in each group. |   |\n\n* Stock Account Tracking\n\nSimilar to cash account API, documentation is TODO\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbgutter%2Fpersonal-data-mining","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbgutter%2Fpersonal-data-mining","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbgutter%2Fpersonal-data-mining/lists"}