https://github.com/acearchive/yahoo-groups-reader
Browse archives of the now-defunct service Yahoo Groups
https://github.com/acearchive/yahoo-groups-reader
archive bootstrap gulp static-site yahoo-groups
Last synced: about 1 month ago
JSON representation
Browse archives of the now-defunct service Yahoo Groups
- Host: GitHub
- URL: https://github.com/acearchive/yahoo-groups-reader
- Owner: acearchive
- Created: 2022-04-09T19:09:01.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2023-10-15T20:53:22.000Z (over 2 years ago)
- Last Synced: 2025-04-03T15:44:55.379Z (about 1 year ago)
- Topics: archive, bootstrap, gulp, static-site, yahoo-groups
- Language: Go
- Homepage:
- Size: 1.13 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# yahoo-groups-reader
This is a CLI tool for rendering Yahoo Groups archives exported using
[yahoo-group-archiver](https://github.com/IgnoredAmbience/yahoo-group-archiver).
This tool accepts a directory of RFC 822 `.eml` files as exported by
yahoo-group-archiver and builds a static site for browsing the archive.
[You can see an example of a generated site here.](https://hha.acearchive.lgbt/)
## Features
- Designed for accessibility.
- Responsive and mobile-friendly.
- Parses the plain-text email markup used by Yahoo Groups into beautiful and
semantic HTML.
- Client-side full-text search of the archive. This can be disabled at build
time.
- Supports both a light and dark theme based on your browser preferences.
- Fast. Bundles and minifies assets, strips out unused CSS, prefetches links
using [instant.page](https://instant.page/), serves efficient cache headers,
uses etags and cache-busting filenames, and lazily loads the search index on
the first search when the browser requests sites to [reduce data
usage](https://wicg.github.io/savedata/).
- Follows best security practices. [Scores 120/100 on Mozilla
Observatory](https://observatory.mozilla.org/analyze/hha.acearchive.lgbt)
when your hosting provider uses the `_headers` file.
- Uses [Open Graph metadata](https://ogp.me/) for SEO and social media
previews. Generates a screenshot of the site at build time to show in media
previews.
- Scores 100/100 in Performance, Accessibility, Best Practices, and SEO on
[Google Lighthouse](https://developers.google.com/web/tools/lighthouse).
- You can customize the generated site with external links at build time via
CLI flags. Supports [Feather icons](https://feathericons.com/).
## Usage
This tool has two components:
1. A Go program in `parser/` which parses the archive and builds the HTML.
2. A [gulp](https://gulpjs.com/) pipeline in `pipeline/` which builds,
minifies, and optimizes all the necessary CSS, JavaScript, and fonts.
### Run the parser
To run the parser, you must first install [Go](https://go.dev/).
To run the parser:
```
cd ./parser
go run . ~/your-yahoo-group/email --title "Your Yahoo Group" --base "https://your-yahoo-group.example.com/"
```
To see additional options for the parser:
```
go run . --help
```
This will produce a directory `../output` containing the generated HTML, but
you still need to run the asset pipeline to build the full site.
### Run the asset pipeline
To run the asset pipeline, you must first install
[npm](https://www.npmjs.com/).
To run the asset pipeline:
```
cd ./pipeline
npm install
npx gulp
```
This will produce a directory `../public` containing the generated static site.
This directory will include a `_headers` file which instructs hosting providers
like [Netlify](https://docs.netlify.com/routing/headers/) and [Cloudflare
Pages](https://developers.cloudflare.com/pages/platform/headers/) of which HTTP
response headers to serve for different paths.
The asset pipeline accepts the following environment variables:
- `OUTPUT_DIR`: The path of the directory generated by the parser (default
`../output`).
- `PUBLIC_DIR`: The path to build the static site at (default `../public`).
- `DISALLOW_ROBOTS`: When set, a `robots.txt` will be generated that disallows
crawling. Otherwise, crawling is allowed for the whole site.
## Gotchas
- This tool was written specifically for the Yahoo Group [Haven for the Human
Amoeba](https://acearchive.lgbt/artifact/haven-for-the-human-amoeba/). While
the tool was designed to be generalizable to other Yahoo Groups, it hasn't
been tested with other data sets.
- The way this tool parses plain-text email markup is best-effort and often
breaks. The markup used by Yahoo Groups is inconsistent and appears to have
changed many times over the course of its history. This tool is designed to
prefer false negatives (ignoring syntax constructs and leaving them as
literal text) over false positives (potentially mangling text by treating it
as markup).
- Some messages in Yahoo Groups archives are multipart messages containing both
plain-text markup and HTML. However, given the long lifespan of Yahoo Groups,
messages in older groups may use long-deprecated HTML features. For this
reason, along with the security implications of rendering untrusted HTML and
accessibility concerns, this tool ignores HTML messages and always attempts
to parse the plain-text markup instead. Embedded HTML in plain-text markup is
printed as literal text.
- This tool doesn't attempt to handle attachments in messages.
- If a timestamp in a message is missing a time zone offset, it is assumed to
be UTC.
- The way the full-text search is implemented currently may not scale well to
large archives. If performance is a problem, you can disable the search
functionality at build time.