https://github.com/astrowonk/mastodon_archive_reader

Read a mastodon archive and create a sqlite3 database of archived post content
https://github.com/astrowonk/mastodon_archive_reader

archive full-text-search mastodon plotly-dash python sqlite3 text-search

Last synced: 3 months ago
JSON representation

Read a mastodon archive and create a sqlite3 database of archived post content

Host: GitHub
URL: https://github.com/astrowonk/mastodon_archive_reader
Owner: astrowonk
License: mit
Created: 2023-02-25T02:01:28.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-02-25T14:08:02.000Z (over 2 years ago)
Last Synced: 2025-06-09T23:43:55.287Z (4 months ago)
Topics: archive, full-text-search, mastodon, plotly-dash, python, sqlite3, text-search
Language: Python
Homepage:
Size: 18.6 KB
Stars: 3
Watchers: 2
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

The archive_reader.py script (or the `ArchiveReader` class within) reads in your Mastodon archive outbox.json (specifically posts you made) and creates a `main.db` sqlite3 database.

The database holds two tables and one view:

* `search_data`. This is a virtual table created with FTS5 that allows for full text search of your posts.
* `full_data`. This is every column from the archive that contains an `object_id`.
* `combined`. This is a view that combines the two tables above on extracted `int_id` column.

Creating the sqlite database requires pandas and [html2text](https://pypi.org/project/html2text/).

I also include a [Plotly Dash](https://dash.plotly.com) `app.py` to allow for GUI searching of the archive, using sqlite full text search ([FTS5](https://www.sqlite.org/fts5.html)) on the contents of the archived posts. You will need Plotly Dash installed to run this. It's not intended for deployment, but to run locally as a way to explore the database you created.

Usage

```bash

$ python archive_reader.py archive_folder_name

```

That will create the sqlite database `main.db`.

Running app.py

```
python app.py
```

will launch a simple plotly dash app to search your archive.

Screenshot of the plotly Dash app in use returning some search results.

### TODO

* Figure out the list of dictionaries in the `attachments` portion of the JSON file and embed media attachments in the Dash app.
* Add some advanced search to the dash app such as supporting date range.
* Add Pagination of results (maybe, that's some work...)
* Re-do UI to separate advanced sql searches from full text search (and whatever date etc params I add)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/astrowonk/mastodon_archive_reader

Awesome Lists containing this project

README