https://github.com/internetarchive/keystone
ARCH Web Client
https://github.com/internetarchive/keystone
Last synced: 8 months ago
JSON representation
ARCH Web Client
- Host: GitHub
- URL: https://github.com/internetarchive/keystone
- Owner: internetarchive
- Created: 2024-08-28T19:48:56.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-17T21:00:53.000Z (10 months ago)
- Last Synced: 2025-03-31T11:51:12.485Z (10 months ago)
- Language: TypeScript
- Size: 4.06 MB
- Stars: 3
- Watchers: 18
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Keystone
## About
Keystone is a web client for the [ARCH (Archives Research Compute Hub)](https://github.com/internetarchive/arch) job server.
### Run Keystone & ARCH using Docker
Note that the following features are only available in the hosted version at: https://arch.archive-it.org
- Google Colab integration
- Dataset publication to archive.org
#### Prerequisites
- [GNU Make](https://www.gnu.org/software/make/manual/make.html)
- [Docker](https://www.docker.com/)
#### Build and Run the Docker Image
##### 1. Build the images
```
make build-images
```
##### 2. Run the services
```
docker compose up
```
##### 3. Surf on over to [http://localhost:12342](http://localhost:12342)
##### 4. Log in
Log in as one of the three user types that `dev/entrypoint.py` created for you:
- **Superuser**: username: `system` password: `password`
- **Admin**: username: `admin` password: `password`
- **Normal**: username: `test` password: `password`
#### The "arch-shared" Directory
The `build-images` Make target will create a local `arch-shared` subdirectory that will be mounted
within both the running Keystone and ARCH containers to serve as the storage destination for ARCH outputs,
and as a place to add your own custom collections of WARCs for analysis.
The `arch-shared` directory has the structure:
```
arch-shared/
├── in
│ └── collections
├── log
└── out
├── custom-collections
└── datasets
```
These subdirectories are utilized as follows:
- `log`
- ARCH job logs
- `out/custom-collections`
- ARCH Custom Collection output files
- `out/datasets`
- ARCH Dataset output files
- `in/collections`
- A place to make your own WARCs available to ARCH as inputs - see "Analyze Your WARCs" below
##### Analyze Your WARCs
For each group of WARCs that you'd like to analyze as a collection:
1. Create a new subdirectory within `arch-shared/in/collections` with a descriptive kebab-case style name like `my-test-collection` and copy your `*.warc.gz` into it, e.g.
```
arch-shared/
└── in
└── collections
└── my-test-collection
└── ARCHIVEIT-22994-CRAWL_SELECTED_SEEDS-JOB1965703-SEED3267421-h3.warc.gz
```
2. Restart both the Keystone and ARCH containers
```
docker compose restart keystone arch
```
3. Your new collection will now be visibile in Keystone (e.g. as `My Test Collection`)