https://github.com/probcomp/gen-website-private
https://github.com/probcomp/gen-website-private
Last synced: 11 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/probcomp/gen-website-private
- Owner: probcomp
- Created: 2024-05-23T12:03:22.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-01-27T12:15:09.000Z (about 1 year ago)
- Last Synced: 2025-05-08T22:57:24.170Z (11 months ago)
- Language: JavaScript
- Homepage: https://gen-website-private.gen.dev
- Size: 728 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Gen Website (Private)
This repo contains code and workflows that enable probcomp repositories to publish private websites from GitHub Actions.
### I manage a probcomp repo. How do I use this?
Create a GitHub action which builds your repo's website, and then follow the example in `.github/workflows/publish_private_website_example.yml`. You'll need to create an artifact containing your website files (at the end of your job where you build the site), and then pass that artifact's name to the action `probcomp/gen-website-private/.github/workflows/publish_private_website.yml@main`. The action requires `id-token: write` permissions.
Your repo will be served from its own subdomain: `.gen.dev`.
### Who can access these websites?
Members of the Google Groups `genjax-users@chi-fro.org` and `all@chi-fro.org` have access. To grant new users access, add them to one of these groups.
### How can I make my website public?
To make a website public, publish it to GitHub Pages (or another public environment) and ask tech-admin@chi-fro.org to point your subdomain (eg. `YOUR_REPO.gen.dev`) at the new site. There is a GitHub Action called "Set CNAME Record for gen.dev" for this purpose.
(Due to how [IAP](https://cloud.google.com/security/products/iap) works, it's not possible to manage visibility at a granular level in the app engine instance that manages private websites.)
## Admin / Implementation Notes
Private websites are served by a single App Engine (Standard Environment) instance. Access to the website is controlled via [Identity-Aware Proxy](https://console.cloud.google.com/security/iap?referrer=search&project=probcomp-caliban).
To grant access to new users, add them to one of the Google Groups that has access. To grant access to new groups, add the `IAP-secured Web App User` role to an IAM principal.
### Authentication Notes
Access to Google Cloud services is managed via
[Workload Identity Federation through a Service Account](https://github.com/google-github-actions/auth?tab=readme-ov-file#workload-identity-federation-through-a-service-account)
using the [google-github](https://github.com/google-github-actions/auth) action. This avoids managing secrets.
We will now find in our GCP account:
1. A [Workload Identity Pool](https://cloud.google.com/iam/docs/manage-workload-identity-pools-providers) called `app-engine-publishers`, containing...
2. A GitHub OIDC provider, configured with:
- Issuer (URL): https://token.actions.githubusercontent.com
- Attribute mappings:
- `google.subject` -`assertion.sub`
- `attribute.repository` - `assertion.repository`
- `attribute.repository_owner` - `assertion.repository_owner`
- Attribute Conditions:
`assertion.repository_owner == 'probcomp' && assertion.repository == 'probcomp/gen-website-private'`
3. A `github-appengine-deploy` service account, with the roles:
- App Engine Deployer
- App Engine Service Admin
- Cloud Build Service Account
- Workload Identity User
4. When the service account is added to the identity pool, it also has an attribute mapping specified to restrict usage.
- `attribute.repository` - `probcomp/gen-website-private`
There is also a second identity pool, `gen-website-private-publishers`, which grants all probcomp repositories access to the private bucket within GitHub Actions.
Using this identity pool, a GitHub action in any probcomp website can modify the `gen-website-private` bucket without restriction.
To enable App Engine to create signed blobs (time-limited links to files in the private bucket), I added the required permission via the following command (using the console UI didn't work, [this](https://stackoverflow.com/a/76493825) helped):
```
gcloud projects add-iam-policy-binding probcomp-caliban --member=serviceAccount:probcomp-caliban@appspot.gserviceaccount.com --role='roles/iam.serviceAccountTokenCreator'
```
### SSL / Custom Domains
To publish to `www.gen.dev`, set `SUBDOMAIN` to `www`. To publish to a `PARENT_DOMAIN` other than `gen.dev`, an additional custom domain must be added via App Engine in Google Cloud.
[These instructions](https://gist.github.com/patmigliaccio/d559035e1aa7808705f689b20d7b3fd3) were essential to enabling SSL for a wildcard
subdomain on App Engine. I created an origin certificate in Cloudflare, appended the [Cloudflare Origin CA root certificate (ECC PEM)](https://developers.cloudflare.com/ssl/origin-configuration/origin-ca#cloudflare-origin-ca-root-certificate) to the PEM file, and converted the private key to RSA using the following command ([note](https://gist.github.com/patmigliaccio/d559035e1aa7808705f689b20d7b3fd3?permalink_comment_id=4421351#gistcomment-4421351) the `-traditional` flag):
```sh
openssl rsa -in domain.com-YYYY-MM-dd.key -out domain.com-RSA-YYYY-MM-dd.key -traditional
```
The certificate was free and expires in 15 years; it's only useful for use between Cloudflare and App Engine. (If we would switch DNS providers we would need another wildcard subdomain SSL solution.)
### Accessing Files from Other Buckets
This server now supports accessing files from any Google Cloud Storage bucket that grants read access to the `gen-website-private-admin@probcomp-caliban.iam.gserviceaccount.com` service account. You can access these files using the following URL pattern:
```
https://probcomp-caliban.uc.r.appspot.com/bucket//
```
Note that we currently use signed urls for bucket redirects, which do not respect the CORS policy of the bucket.
### CORS
As buckets are private, we redirect using time-limited signed urls, which do not follow the CORS policy of the bucket. ~CORS support is handled by `cors-config.json` which was added to the bucket via `gsutil cors set cors-config.json gs://gen-website-private` ([details](https://stackoverflow.com/questions/45273514/google-cloud-storage-gcs-cors-wildcard))~
## Developer notes
### Cloudflare Worker
A cloudflare worker (`./private-website-cache/src/worker.js`) sits in front of `*.gen.dev` to apply caching policies, which are otherwise ignored/overwritten by IAP (Identity Aware Proxy). These can be modified by:
- editing `worker.js`
- making sure you have access to our Cloudflare group
- `npx wrangler login` and `npx deploy` from within the `private-website-cache` directory
### Caching Policies
The Cloudflare worker implements three tiers of caching:
1. HTML files:
- Private cache (per-user)
- 60 second max age
- 30 second stale-while-revalidate window
2. Static assets (default):
- Public cache
- 24 hour max age
- 1 hour stale-while-revalidate window
3. Large binary files (.wasm, .data):
- Public cache
- 1 year max age
- Immutable (no revalidation)
- Uses Cloudflare Cache API for improved performance
The worker strips standard caching headers from the origin and applies these policies consistently. It also preserves ETags and Last-Modified dates when available, and sets Vary: Accept-Encoding for proper handling of compressed content.