{"id":50897755,"url":"https://github.com/cclgroupltd/ccl_chromium_reader","last_synced_at":"2026-07-03T16:01:23.757Z","repository":{"id":42374915,"uuid":"293560534","full_name":"cclgroupltd/ccl_chromium_reader","owner":"cclgroupltd","description":"(Sometimes partial) Python re-implementations of the technologies involved in reading various data sources in Chrome-esque applications.","archived":false,"fork":false,"pushed_at":"2026-06-08T09:49:06.000Z","size":243,"stargazers_count":232,"open_issues_count":13,"forks_count":47,"subscribers_count":9,"default_branch":"master","last_synced_at":"2026-06-08T11:25:16.374Z","etag":null,"topics":["cache","chrome","dfir","digitalforensics","indexeddb","leveldb","localstorage","python","sessionstorage","snappy"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cclgroupltd.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2020-09-07T15:08:37.000Z","updated_at":"2026-06-08T09:49:36.000Z","dependencies_parsed_at":"2024-09-09T14:44:31.648Z","dependency_job_id":"8645ef07-b58d-49f1-b8a2-00a9f41a500d","html_url":"https://github.com/cclgroupltd/ccl_chromium_reader","commit_stats":null,"previous_names":["cclgroupltd/ccl_chromium_reader","cclgroupltd/ccl_chrome_indexeddb"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cclgroupltd/ccl_chromium_reader","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cclgroupltd%2Fccl_chromium_reader","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cclgroupltd%2Fccl_chromium_reader/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cclgroupltd%2Fccl_chromium_reader/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cclgroupltd%2Fccl_chromium_reader/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cclgroupltd","download_url":"https://codeload.github.com/cclgroupltd/ccl_chromium_reader/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cclgroupltd%2Fccl_chromium_reader/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35092185,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-03T02:00:05.635Z","response_time":110,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cache","chrome","dfir","digitalforensics","indexeddb","leveldb","localstorage","python","sessionstorage","snappy"],"created_at":"2026-06-16T01:31:30.078Z","updated_at":"2026-07-03T16:01:23.747Z","avatar_url":"https://github.com/cclgroupltd.png","language":"Python","funding_links":[],"categories":["chrome"],"sub_categories":[],"readme":"# ccl_chromium_reader\nThis repository contains a package of (sometimes partial)\nre-implementations of the technologies used by Chrome/Chromium/Chrome-esque\napplications to store data in a range of data-stores in Python. These libraries \nprovide programmatic access to these data-stores with a digital forensics slant\n(e.g. for most artefacts, offsets or IDs for the data are provided so that they \ncan be located and manually checked).\n\nThe technologies supported are:\n* Snappy decompression\n* LevelDB\n* Protobuf\n* Pickles\n* V8 object deserialization\n* Blink object deserialization\n* IndexedDB\n* Web Storage (Local Storage and Session Storage)\n* Cache (both Block File and Simple formats)\n* SNSS Session files (partial support)\n* FileSystem API\n* Notifications API (Platform Notifications)\n* Downloads (from shared_proto_db)\n* History\n\nAdditionally, there are a number of utility scripts included such as:\n* `ccl_chromium_cache.py` - using the cache library as a command line tool dumps\n  the cache and all HTTP header information.\n* `ccl_chrome_audit.py` - a tool which can be used to scan the data-stored supported\n  by the included libraries, plus a couple more, for records related to a host -\n  designed as a research tool into data stored by web apps.\n\n\n## Python Versions\nThe code in this library was written and tested using Python 3.10. It *should* work\nwith 3.9, but uses language features which were not present in earlier versions.\nSome parts of the library will probably work OK going back a few versions, but if\nyou report bugs related to any version before 3.10, the first question will be: can\nyou upgrade to 3.10?\n\n## A Note On Requirements\nThis repository contains a `requirements.txt` in the pip format. Other than `Brotli` \nThe dependencies listed are only required for the `ccl_chrome_audit.py` script or \nwhen using the `ccl_chromium_cache` module as a script for dumping the cache; the \nlibraries work using only the other scripts in this repository and the Python \nstandard library.\n\n## Documentation\nThe documentation in the libraries is currently sparser than ideal, but some \nrecent work has been undertaken to add more usage strings and fill in some gaps\nin the type-hints. We welcome pull requests to fill in gaps in the documentation.\n\n## ccl_chrome_audit\nThis script audits multiple data stores in a Chrom(e|ium) profile folder based on\na fragment (regex) of a host name. It is designed to aid in research into web apps\nby quickly highlighting what data related to that domain is stored where (also of\nus with Electron apps etc.)\n\n### Caveats\nAt the moment, the script is designed primarily for use on Windows and on the \nhost where the data was populated (this is because of the Cookie decryption being\nachieved using DPAPI). \n\n### Usage\n```\nccl_chrome_audit.py \u003cchrome profile folder\u003e [cache folder (for mobile)]\n```\n\n### Current Supported Data Sources\n* Bookmarks\n* History\n* Downloads (from History)\n* Downloads (from shared_proto_db)\n* Favicons\n* Cache\n* Cookies\n* Local Storage\n* Session Storage\n* IndexedDb\n* File System API\n* Platform Notifications \n* Logins\n* Sessions (SNSS)\n\n\n## ChromiumProfileFolder\nThe `ChromiumProfileFolder` class is intended to act as a convenient entry-point to\nmuch of the useful functionality in the package. It performs on-demand loading of \ndata, so the \"start-up cost\" of using this object over the individual modules \nis near-zero, but with the advantage of better searching and filtering \nfunctionality built in and an easier interface to bring together data from these\ndifferent sources.\n\nIn this version `ChromiumProfileFolder` supports the following data-stores:\n* History\n* Cache\n* IndexedDB\n* Local Storage\n* Session Storage\n\nTo use the object, simply pass the path of the profile folder into the constructor\n(the object supports the context manager interface):\n\n```python\nimport pathlib\nfrom ccl_chromium_reader import ChromiumProfileFolder\n\nprofile_path = pathlib.Path(\"profile path goes here\")\n\nwith ChromiumProfileFolder(profile_path) as profile:\n    ...  # do things with the profile\n```\n\nMost of the methods of the `ChromiumProfileFolder` object which retrieve data can \nsearch/filter through a `KeySearch` interface which in essence is on of: \n* a `str`, in which case the search will try to exactly match the value\n* a collection of `str` (e.g., `list` or `tuple`), in which case the search will\n  try to exactly match one of the values contained therein\n* a `re.pattern` in which case the search attempts to match the pattern anywhere\n  in the search (same as `re.search`)\n* a function which takes a `str` and returns a `bool` indicating whether it's a\n  match.\n\n```python\nimport re\nimport pathlib\nfrom ccl_chromium_reader import ChromiumProfileFolder\n\nprofile_path = pathlib.Path(\"profile path goes here\")\n\nwith ChromiumProfileFolder(profile_path) as profile:\n    # Match one of two possible hosts exactly, then a regular expression for the key\n    for ls_rec in profile.iter_local_storage(\n            storage_key=[\"http://not-a-real-url1.com\", \"http://not-a-real-url2.com\"], \n            script_key=re.compile(r\"message\\d{1,3}?-text\")):\n        print(ls_rec.value)\n        \n    # Match all urls which end with \"\u0026read=1\"\n    for hist_rec in profile.iterate_history_records(url=lambda x: x.endswith(\"\u0026read=1\")):\n        print(hist_rec.title, hist_rec.url)\n\n```\n\n## IndexedDB\nThe `ccl_chromium_indexeddb.py` library processes IndexedDB data found in Chrome et al. \n\n### Blog\nRead a blog on the subject here: https://www.cclsolutionsgroup.com/post/indexeddb-on-chromium\n\n### Caveats\nThere is a fair amount of work yet to be done in terms of documentation, but \nthe modules should be fine for pulling data out of IndexedDB, with the following\ncaveats:\n\n#### LevelDB deleted data\nThe LevelDB module will spit out live and deleted/old versions of records\nindiscriminately; it's possible to differentiate between them with some\nwork, but that hasn't really been baked into the modules as they currently\nstand. So you are getting deleted data \"for free\" currently...whether you\nwant it or not.\n\n#### Blink data types\nI am fairly satisfied that all the possible V8 object types are accounted for\n(but I'm happy to be shown otherwise and get that fixed of course!), but it\nis likely that the hosted Blink objects aren't all there yet; so if you hit\nupon an error coming from inside ccl_blink_value_deserializer and can point\nme towards test data, I'd be very thankful!\n\n#### Cyclic references\nIt is noted in the V8 source that recursive referencing is possible in the\nserialization, we're not yet accounting for that so if Python throws a\n`RecursionError` that's likely what you're seeing. The plan is to use a \nsimilar approach to ccl_bplist where the collection types are subclassed and\ndo Just In Time resolution of the items, but that isn't done yet.\n\n## Using the modules\nThere are two methods for accessing records - a more pythonic API using a set of \nwrapper objects and a raw API which doesn't mask the underlying workings. There is\nunlikely to be much benefit to using the raw API in most cases, so the wrapper objects\nare recommended unless you have a compelling reason otherwise.\n\n### Wrapper API\n```python\nimport sys\nfrom ccl_chromium_reader import ccl_chromium_indexeddb\n\n# assuming command line arguments are paths to the .leveldb and .blob folders\nleveldb_folder_path = sys.argv[1]\nblob_folder_path = sys.argv[2]\n\n# open the indexedDB:\nwrapper = ccl_chromium_indexeddb.WrappedIndexDB(leveldb_folder_path, blob_folder_path)\n\n# You can check the databases present using `wrapper.database_ids`\n\n# Databases can be accessed from the wrapper in a number of ways:\ndb = wrapper[2]  # accessing database using id number\ndb = wrapper[\"MyTestDatabase\"]  # accessing database using name (only valid for single origin indexedDB instances)\ndb = wrapper[\"MyTestDatabase\", \"file__0@1\"]  # accessing the database using name and origin\n# NB using name and origin is likely the preferred option in most cases\n\n# The wrapper object also supports checking for databases using `in`\n\n# You can check for object store names using `db.object_store_names`\n\n# Object stores can be accessed from the database in a number of ways:\nobj_store = db[1]  # accessing object store using id number\nobj_store = db[\"store\"]  # accessing object store using name\n\n# Records can then be accessed by iterating the object store in a for-loop\nfor record in obj_store.iterate_records():\n    print(record.user_key)\n    print(record.value)\n\n    # if this record contained a FileInfo object somewhere linking\n    # to data stored in the blob dir, we could access that data like\n    # so (assume the \"file\" key in the record value is our FileInfo):\n    with record.get_blob_stream(record.value[\"file\"]) as f:\n        file_data = f.read()\n\n# By default, any errors in decoding records will bubble an exception \n# which might be painful when iterating records in a for-loop, so either\n# passing True into the errors_to_stdout argument and/or by passing in an \n# error handler function to bad_deserialization_data_handler, you can \n# perform logging rather than crashing:\n\nfor record in obj_store.iterate_records(\n        errors_to_stdout=True, \n        bad_deserializer_data_handler= lambda k,v: print(f\"error: {k}, {v}\")):\n    print(record.user_key)\n    print(record.value)\n```\n\n### Raw access API\n```python\nimport sys\nfrom ccl_chromium_reader import ccl_chromium_indexeddb\n\n# assuming command line arguments are paths to the .leveldb and .blob folders\nleveldb_folder_path = sys.argv[1]\nblob_folder_path = sys.argv[2]\n\n# open the database:\ndb = ccl_chromium_indexeddb.IndexedDb(leveldb_folder_path, blob_folder_path)\n\n# there can be multiple databases, so we need to iterate through them (NB \n# DatabaseID objects contain additional metadata, they aren't just ints):\nfor db_id_meta in db.global_metadata.db_ids:\n    # and within each database, there will be multiple object stores so we\n    # will need to know the maximum object store number (this process will be\n    # cleaned up in future releases):\n    max_objstore_id = db.get_database_metadata(\n            db_id_meta.dbid_no, \n            ccl_chromium_indexeddb.DatabaseMetadataType.MaximumObjectStoreId)\n    \n    # if the above returns None, then there are no stores in this db\n    if max_objstore_id is None:\n        continue\n\n    # there may be multiple object stores, so again, we iterate through them\n    # this time based on the id number. Object stores start at id 1 and the\n    # max_objstore_id is inclusive:\n    for obj_store_id in range(1, max_objstore_id + 1):\n        # now we can ask the indexeddb wrapper for all records for this db\n        # and object store:\n        for record in db.iterate_records(db_id_meta.dbid_no, obj_store_id):\n            print(f\"key: {record.user_key}\")\n            print(f\"key: {record.value}\")\n\n            # if this record contained a FileInfo object somewhere linking\n            # to data stored in the blob dir, we could access that data like\n            # so (assume the \"file\" key in the record value is our FileInfo):\n            with record.get_blob_stream(record.value[\"file\"]) as f:\n                file_data = f.read()\n```\n\n## Local Storage\n`ccl_chromium_localstorage` contains functionality to read the Local Storage data from\na Chromium/Chrome profile folder.\n\n### Blog\nRead a blog on the subject here: https://www.cclsolutionsgroup.com/post/chromium-session-storage-and-local-storage\n\n### Using the module\n\nAn example showing how to iterate all records, grouped by host is shown below:\n```python\nimport sys\nimport pathlib\nfrom ccl_chromium_reader import ccl_chromium_localstorage\n\nlevel_db_in_dir = pathlib.Path(sys.argv[1])\n\n# Create the LocalStoreDb object which is used to access the data\nwith ccl_chromium_localstorage.LocalStoreDb(level_db_in_dir) as local_storage:\n    for storage_key in local_storage.iter_storage_keys():\n        print(f\"Getting records for {storage_key}\")\n      \n        for record in local_storage.iter_records_for_storage_key(storage_key):\n            # we can attempt to associate this record with a batch, which may\n            # provide an approximate timestamp (withing 5-60 seconds) for this\n            # record.\n            batch = local_storage.find_batch(record.leveldb_seq_number)\n            timestamp = batch.timestamp if batch else None\n            print(record.leveldb_seq_number, record.script_key, record.value, sep=\"\\t\")\n\n```\n\n## Session Storage\n`ccl_chromium_sessionstorage` contains functionality to read the Session Storage data from\na Chromium/Chrome profile folder.\n\n### Blog\nRead a blog on the subject here: https://www.cclsolutionsgroup.com/post/chromium-session-storage-and-local-storage\n\n### Using the module\nAn example showing how to iterate all records, grouped by host is shown below:\n\n```python\nimport sys\nimport pathlib\nfrom ccl_chromium_reader import ccl_chromium_sessionstorage\n\nlevel_db_in_dir = pathlib.Path(sys.argv[1])\n\n# Create the SessionStoreDb object which is used to access the data\nwith ccl_chromium_sessionstorage.SessionStoreDb(level_db_in_dir) as session_storage: \n    for host in session_storage.iter_hosts():\n        print(f\"Getting records for {host}\")\n        for record in session_storage.iter_records_for_host(host):\n          print(record.leveldb_sequence_number, record.key, record.value)\n\n```\n\n## Cache\n`ccl_chromium_cache` contains functionality for reading Chromium cache data (both \nblock file and simple cache formats). It can be used to programmatically access \ncache data and metadata (including http headers).\n\n### CLI\nExecuting the module as a script allows you to dump a cache (either format) and \ncollate all metadata into a csv file.\n\n```\nUSAGE: ccl_chromium_cache.py \u003ccache input dir\u003e \u003cout dir\u003e\n\n```\n\n### Using the module\nThe main() function (which provides the CLI) in the module shows the full \nprocess of detecting the cache type, reading data and metadata from the cache.\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcclgroupltd%2Fccl_chromium_reader","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcclgroupltd%2Fccl_chromium_reader","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcclgroupltd%2Fccl_chromium_reader/lists"}