https://github.com/astrowonk/mastodon_archive_reader
Read a mastodon archive and create a sqlite3 database of archived post content
https://github.com/astrowonk/mastodon_archive_reader
archive full-text-search mastodon plotly-dash python sqlite3 text-search
Last synced: 3 months ago
JSON representation
Read a mastodon archive and create a sqlite3 database of archived post content
- Host: GitHub
- URL: https://github.com/astrowonk/mastodon_archive_reader
- Owner: astrowonk
- License: mit
- Created: 2023-02-25T02:01:28.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-02-25T14:08:02.000Z (over 2 years ago)
- Last Synced: 2025-06-09T23:43:55.287Z (4 months ago)
- Topics: archive, full-text-search, mastodon, plotly-dash, python, sqlite3, text-search
- Language: Python
- Homepage:
- Size: 18.6 KB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
The archive_reader.py script (or the `ArchiveReader` class within) reads in your Mastodon archive outbox.json (specifically posts you made) and creates a `main.db` sqlite3 database.
The database holds two tables and one view:
* `search_data`. This is a virtual table created with FTS5 that allows for full text search of your posts.
* `full_data`. This is every column from the archive that contains an `object_id`.
* `combined`. This is a view that combines the two tables above on extracted `int_id` column.Creating the sqlite database requires pandas and [html2text](https://pypi.org/project/html2text/).
I also include a [Plotly Dash](https://dash.plotly.com) `app.py` to allow for GUI searching of the archive, using sqlite full text search ([FTS5](https://www.sqlite.org/fts5.html)) on the contents of the archived posts. You will need Plotly Dash installed to run this. It's not intended for deployment, but to run locally as a way to explore the database you created.
Usage
```bash
$ python archive_reader.py archive_folder_name
```
That will create the sqlite database `main.db`.
Running app.py
```
python app.py
```will launch a simple plotly dash app to search your archive.
### TODO
* Figure out the list of dictionaries in the `attachments` portion of the JSON file and embed media attachments in the Dash app.
* Add some advanced search to the dash app such as supporting date range.
* Add Pagination of results (maybe, that's some work...)
* Re-do UI to separate advanced sql searches from full text search (and whatever date etc params I add)