https://github.com/nvsl/cse141pp-root

Last synced: 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/nvsl/cse141pp-root
Owner: NVSL
Created: 2021-06-23T06:00:13.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2024-03-13T07:10:39.000Z (about 1 year ago)
Last Synced: 2025-01-27T08:16:13.269Z (4 months ago)
Language: Shell
Size: 232 KB
Stars: 0
Watchers: 8
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Setting up CSE142L Development

## Procedure for upgrading between labs

1. Spin down cluster
2. Deploy new version of server code google.
3. Push new docker images to :latest
4. Check that lab docker image is set correctly.
5. Go to sleep
6. Check that datahub is starting the right image.
6. Spin up cluster.

## FA21 notes

Make the DSMP cluster fetch new images:
```
for node in 03 04 05 06 07 08 09 10 11 12; do for tags in v60 22fa-lab-1 latest; do for image in stevenjswanson/cse142l-service-dsmlp stevenjswanson/cse142l-runner; do ssh sjswanson@its-ieng6maas-$node.ucsd.edu sudo docker pull $image:$tags;done;done&done
```

docker swarm join --token SWMTKN-1-0fqztdaagojadjzw3wcqfqksyp81duidygog9h7xblnj44ejdn-aozma6gikjgwa3ski5gjxbrys 172.17.77.30:2377

sudo docker swarm join --token SWMTKN-1-0fqztdaagojadjzw3wcqfqksyp81duidygog9h7xblnj44ejdn-aozma6gikjgwa3ski5gjxbrys 172.17.77.30:2377;done

sudo docker node promote its-ieng6maas-11.ucsd.edu its-ieng6maas-10.ucsd.edu

headnode: ssh [email protected]

To start a service on the cluster, do
. d i
cse142 -n dev --replicas 10 --service --name runner-service --image stevenjswanson/cse142l-swanson-service:latest bash -c "cse142 --no-http cluster runner"

and the put the command at the end of the output in quotes:

docker service create --replicas 10 --replicas-max-per-node 1 --env GOOGLE_CLOUD_PROJECT --env DJR_CLUSTER --env DJR_DOCKER_SCRATCH --env CLOUD_MODE --env DJR_SERVER --env DJR_JOB_TYPE --env CSE142L_RUNNER_DOCKER_IMAGE --env EMULATION_DIR --env CLOUD_NAMESPACE --env SECRETS_DIRECTORY --env PACKET_PROJECT_ID --env DOCKER_USERNAME --env DOCKER_ACCESS_TOKEN --env GOOGLE_CREDENTIALS_FILE --env GITHUB_OAUTH_TOKEN --env THIS_DOCKER_CONTAINER=runner-service --env HOME=/root --mount type=bind,source=/var/run/docker.sock,dst=/var/run/docker.sock --mount type=bind,source=/tmp,dst=/tmp --env GOOGLE_APPLICATION_CREDENTIALS=/cse142L/CSE141pp-Config/secrets/cse142l-dev-c775b40fa9bf.json --workdir /cse142L --name runner-service stevenjswanson/cse142l-swanson-service:latest bash -c 'cse142 --no-http cluster runner'

You can do this with a simple ssh command.

sudo docker run -it --env GOOGLE_CLOUD_PROJECT --env DJR_CLUSTER --env DJR_DOCKER_SCRATCH --env CLOUD_MODE --env DJR_SERVER --env DJR_JOB_TYPE --env CSE142L_RUNNER_DOCKER_IMAGE --env EMULATION_DIR --env CLOUD_NAMESPACE --env SECRETS_DIRECTORY --env PACKET_PROJECT_ID --env DOCKER_USERNAME --env DOCKER_ACCESS_TOKEN --env GOOGLE_CREDENTIALS_FILE --env GITHUB_OAUTH_TOKEN --env THIS_DOCKER_CONTAINER=runner-service --env HOME=/root --mount type=bind,source=/var/run/docker.sock,dst=/var/run/docker.sock --mount type=bind,source=/tmp,dst=/tmp --env GOOGLE_APPLICATION_CREDENTIALS=/cse142L/CSE141pp-Config/secrets/cse142l-dev-c775b40fa9bf.json --workdir /cse142L --name runner-service -w /cse142L/ stevenjswanson/cse142l-swanson-service:latest bash --rcfile /cse142L/env.sh -c '. env.sh; cse142 --no-http -v cluster runner'

Start the runner service on the dsmlp cluster:

```
sudo docker service create --replicas 10 --env GOOGLE_CLOUD_PROJECT --env DJR_CLUSTER --env DJR_DOCKER_SCRATCH --env CLOUD_MODE --env DJR_SERVER --env DJR_JOB_TYPE --env CSE142L_RUNNER_DOCKER_IMAGE --env EMULATION_DIR --env CLOUD_NAMESPACE --env SECRETS_DIRECTORY --env PACKET_PROJECT_ID --env DOCKER_USERNAME --env DOCKER_ACCESS_TOKEN --env GOOGLE_CREDENTIALS_FILE --env GITHUB_OAUTH_TOKEN --env THIS_DOCKER_CONTAINER=runner-service --env HOME=/root --mount type=bind,source=/var/run/docker.sock,dst=/var/run/docker.sock --mount type=bind,source=/tmp,dst=/tmp --env GOOGLE_APPLICATION_CREDENTIALS=/cse142L/CSE141pp-Config/secrets/cse142l-dev-c775b40fa9bf.json --workdir /cse142L --name runner-service -w /cse142L/ stevenjswanson/cse142l-service-dsmlp:latest bash -c '. env.sh; cse142 --no-http -v cluster runner'
```

## Lab Job Runner Overview

The course's lab runner provide allow students to run short-running jobs on
bare-metal servers in the cloud. The basic flow is this:

1. Students edit their source code.
2. They submit a command to run in the cloud.
3. The tools zip up their local files, ship them to the cloud, run the command, collect the results, and ship them back.

This underlying system is built on four cloud-based services:

1. A data base (Google Cloud Datastore)
2. A blob store (Google Cloud Storage)
3. A publish/subscribe system (Google Cloud Pub/Sub)
4. A web-based REST API for manipulating the above (Running in Google App Engine).

The main interface to the system is the `cse142` command line tool. It takes
numerous subcommands to manipulate the above resources and submit jobs.

User authenticate to the system using their UCSD Google credentials (i.e,
either their @eng or @ucsd.edu email addresses.)

## Prerequisites

You'll need docker installed. We have a document about doing that here:

https://github.com/NVSL/CSE141pp-Lab-Introduction-to-the-Development-Environment/blob/master/RunningDocker.md

For linux,
Followed this guide to install: https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository
And this guide to run docker as a non-root user: https://docs.docker.com/engine/install/linux-postinstall/

## Logging into the Development Server

**If you are running on your own machine, this is not necessary**

Get the IP address of the devel server from Steve, we'll call it `IP`.

You have access to the root account on the server, but you shouln't use it.
Create and use your own account below, it prevents us from getting in eachother's way.

```
ssh root@IP
```

to log in as root.

Create your account and allow it to `sudo` and use docker:

```
adduser
adduser sudo
adduser docker
```

Then, install your ssh public key (It seems to require and RSA key: `id_rsa.pub` rather than `id_dsa.pub`):

```
su
mkdir ~/.ssh
cat >> ~/.ssh/authorized_keys
```

And then paste in your public key.

Then set perms on your `.ssh` directory:

```
chmod go-rwx -R ~/.ssh
```

Logout and then back in:

```
ssh @IP
```

## Getting into Docker

To set up your development environment initially, first do

```
git clone --recurse-submodules [email protected]:NVSL/CSE141pp-Root.git
```

This clones the root repo and several sub-repos.

You should get something like:

```
Submodule path 'CSE141pp-Config': checked out 'fa0e4ebb3959df2a30c8a49570b0fed346bd2604'
Submodule path 'CSE141pp-DJR': checked out 'b297949fd62dbb6c8e3e897c928a8acdf449df2d'
Submodule path 'CSE141pp-LabPython': checked out '991e709477ea0a187c01c39a8c2fd7b8e3a0afd8'
Submodule path 'CSE141pp-SimpleCNN': checked out '3656b5c5213434d437106f61eb0aaeb92dc7cbe8'
Submodule path 'CSE141pp-Tool-Moneta': checked out '73213d9872f80f057d20e0527ebfd11381e3088a'
Submodule path 'CSE141pp-Tool-Moneta-Pin': checked out 'ccad791816593cbff21d8599c3925fd1c0677c57'
Submodule path 'cse141pp-archlab': checked out 'c5e2e711fbc59740bb6e14d19a3f9a4f2785524e'
```

You may need to do install python 3.9's `venv` package. On linux you do it like this:

```
sudo apt install python3.9-venv python3.8-venv
```

Next build install the tools we use to build the docker images:

```
cd CSE141pp-Root
. env.sh # this initializes some environment variables and sets your PATH
make bootstrap # this creates a python virtual environment to hold the tools we use to invoke docker properly.
```

Next, you'll need to build the docker images locally with:

```
make
```
This will run for a while. You can then do

```
docker images
```

to see them (It'll have your user name in the image name):

```
REPOSITORY TAG IMAGE ID CREATED SIZE
stevenjswanson/cse142l-swanson-runner latest e3cb52c116d7 2 hours ago 12.2GB
stevenjswanson/cse142l-swanson-runner s21-dev e3cb52c116d7 2 hours ago 12.2GB
stevenjswanson/cse142l-swanson-dev latest c90dcd27a70e 2 hours ago 7.84GB
stevenjswanson/cse142l-swanson-dev s21-dev c90dcd27a70e 2 hours ago 7.84GB
stevenjswanson/cse142l-swanson-service latest d28b327b0ada 2 hours ago 7.84GB
stevenjswanson/cse142l-swanson-service s21-dev d28b327b0ada 2 hours ago 7.84GB
```

Then, to get yourself into a development docker container:

```
bin/cse142dev
```

This will drop you into `/cse142L` in the docker container which is mapped to
the root of your `CSE141pp-Root` directory. You can edit your files from
outside the container using the editor of your choice and the changes will be
reflected here.

If `cse142dev` find a `.ssh` directory in your home directory, it will create
an `ssh-agent` to store your ssh key and invokes `ssh-add` to add it. If
`cse142dev` asks for a password, this is why. It'll save you from having to
type your password over and over again.

The first time you do this, you'll need to run this command to set yourself up
to do development (after getting inside docker with `cse142dev`):

```
./setup.sh
```

to build everything from source. It will take a little while. After that,
invoking `CSE141pp-Root/bin/cse142dev` is all you'll need to do to get into
docker and start working.

From inside the docker container you can see information about the docker container with

```
whereami
```

## Accessing Moneta

Moneta is our memory trace visualizer. It uses a command line tool called
`mtrace` to generate traces and `jupyter-notebook` to view them. The process
below includes the installation of Moneta as well.

From inside docker, you can run `jupyter-notebook --allow-root .` to start up
jupyter notebook server. The output will look something like this

```
[I 22:38:48.814 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret
[I 22:38:49.494 NotebookApp] JupyterLab extension loaded from /opt/conda/lib/python3.7/site-packages/jupyterlab
[I 22:38:49.494 NotebookApp] JupyterLab application directory is /opt/conda/share/jupyter/lab
[I 22:38:49.497 NotebookApp] Serving notebooks from local directory: /cse142L
[I 22:38:49.497 NotebookApp] The Jupyter Notebook is running at:
[I 22:38:49.497 NotebookApp] http://swanson-dev-192.168.4.21:8888/?token=9480933a7aae975a670f6293f8fb4f94582be04f07a123a0
[I 22:38:49.497 NotebookApp] or http://127.0.0.1:8888/?token=9480933a7aae975a670f6293f8fb4f94582be04f07a123a0
[I 22:38:49.498 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 22:38:49.506 NotebookApp]

To access the notebook, open this file in a browser:
file:///root/.local/share/jupyter/runtime/nbserver-49-open.html
Or copy and paste one of these URLs:
http://swanson-dev-192.168.4.21:8888/?token=9480933a7aae975a670f6293f8fb4f94582be04f07a123a0
or http://127.0.0.1:8888/?token=9480933a7aae975a670f6293f8fb4f94582be04f07a123a0
```

If you are running on your own machine, you should be able to acces it via the `127.0.0.1:8888` url at the end.

If you are on the devolpment server, replace `127.0.0.1` with the IP address you used to connect to the devel server.

## Creating Your Job Runner Account

`cse142dev` drops you into docker container that has priviliged, direct access
to the job runner above, so you don't have to authenticate to use it. You will
use this capability to create an account for yourself. From then on, you'll
authenticate and use it as a student would.

The command lines to create and edit users are similar to those you'll use to
manipulate other data (e.g., jobs and labs).

First, check if everything is working (it may take a little bit. Direct access
is slow.):

```
cse142 --no-http user list
```

The `--no-http` bypasses authentication. `user` means we are going to do
something to users. `list` means list them.

This should give you list of current users. It will at least include
`[email protected]` and probably others.

To create your user with admin priviliges:

```
cse142 --no-http user create --email --name "" --role "ADMIN"
```

Check to see that it worked:

```
cse142 --no-http user list
```

You should see yourself:

```
3341f94e-254d-4f58-9f5c-a825e067f9a7 [email protected] [email protected] ['USER']
cb531cf4-2aa3-4e48-8629-3c36ffadee4c [email protected] [email protected] ['ADMIN']
```

The fields are a unique ID, email, full name, and roles (for permissions).

Now you can authenticate:

```
cse142 login
```

and follow the instructions to login via a web browser.

You should then be able to

```
cse142 user list
```

You should see yourself again. If you made a mistake you can edit your account like so:

```
cse142 user update [email protected] --set name="Steven Swanson"
```

You can pass either your email or the id.

To see more detail about yourself do:

```
cse142 user list -l [email protected]
```

and get something like

```
{ 'created_time': datetime.datetime(2021, 7, 10, 5, 59, 55, 399152, tzinfo=tzutc()),
'email': '[email protected]',
'id': '3341f94e-254d-4f58-9f5c-a825e067f9a7',
'last_authenticated_time': datetime.datetime(2021, 7, 10, 5, 59, 55, 399208, tzinfo=tzutc()),
'last_login_time': None,
'mood': '',
'name': 'Steven Swanson',
'namespace': 'default',
'project_id': 'cse142l-dev',
'roles': ['USER'],
'secret': ''}
```

## Running A Job

The job runner is supposed to approximate the experience of running command
line commands on a machine in the cloud. Each job runs in a fresh docker
container that is nearly identical to the one you are working in but has fewer
permissions.

Each job is run in the context of a "lab" that specifies which exact docker
image to use. You can see the list of available labs with

```
cse142 lab list
```

You should get something like this:

```
76c847c9-c3fa-4d3b-b56e-87fcfab46edf test 2021-07-06 10:35:00-07:00 stevenjswanson/cse142l-runner:latest test2
```

The field are id, short-name, due date, docker image, and full name.

To run your first job:

```
mkdir t
cd t
cse142 job run --lab test "echo hello world"
```

You'll get something like this:

```
You are submitting a job for lab test2 (test).
Creating job e4e80402-9217-4835-9e6b-56c7ef3b4977 0.00 0.00
Ready for submission. 1.27 1.27
Job e4e80402-9217-4835-9e6b-56c7ef3b4977 is in state 'PUBLISHED'. 0.56 1.83
Job e4e80402-9217-4835-9e6b-56c7ef3b4977 is in state 'PUBLISHED'. 1.08 2.90
Job e4e80402-9217-4835-9e6b-56c7ef3b4977 is in state 'RUNNING'. 1.07 3.97
Job e4e80402-9217-4835-9e6b-56c7ef3b4977 is in state 'RUNNING'. 1.06 5.03
Job e4e80402-9217-4835-9e6b-56c7ef3b4977 is in state 'DONE_RUNNING'. 1.06 6.09
Job e4e80402-9217-4835-9e6b-56c7ef3b4977 is in state 'DONE_RUNNING'. 1.14 7.24
Job e4e80402-9217-4835-9e6b-56c7ef3b4977 is in state 'DONE_RUNNING'. 1.07 8.31
Job e4e80402-9217-4835-9e6b-56c7ef3b4977 is in state 'DONE_RUNNING'. 1.06 9.36
Job e4e80402-9217-4835-9e6b-56c7ef3b4977 succeeded. 1.06 10.42
Writing results 1.00 11.42
hello world
Job Complete 0.55 11.97
```

It tracks the lifetime of the job. The numbers at the end are the latency
since last update and total execution time. You can see the output of your job
near the bottom.

You can list the jobs you've run with

```
cse142 job list
```

As an admin, you'll see all the jobs run on the system. It'll be a lot.

You can list the last job you ran with

```
cse142 job list LAST
```

```
cse142 job list -l LAST
```

## Life Cycle of a Job

The steps in a jobs execution are as follows:

1. Create the job object.
2. Recursively zip up most of the files in the current directory. ("Ready for submission")
3. Upload them to the cloud.
4. Submit the job for execution (state "PUBLISHED")
5. One of the job runners pulls it (state "SCHEDULED", not shown here)
6. Runs it (state "RUNNING")
7. Zips up modified files (state "DONE_RUNNING")
8. Upload the modified files to the cloud.
9. Finish up ("succeeded")

All the while, your `cse142` command is polling the job for updates. When it
sees it's succeeded, it downloads the zip file of updated files, and unzips it.
It also prints the `stdout`/`stderr` for the command ("hello world").

We created the `t` directory above to prevent the job for sucking in all the
source code in your directory. It takes a long time (and might fail).

## More Complicated Jobs

Here are some things to try:

Create `main.c`:

```
#include

void main() {
printf("Hello world\n");
}
```

Then:

```
cse142 job run --lab test 'gcc main.c -o main; ./main'
```

and then

```
gcc main.c -o main
cse142 job run --lab test './main'
```

or even

```
cse142 job run --lab test 'gcc main.c -o main'
./main
```

## Dogfooding and Testing

We would like the course tools to be as user-friendly as possible. So, please
use them for developing the labs. IF you have feedback about how we can make
the tools more pleasant to use, please let Professor Swanson know.

In particular, `cse142` should meeting the following requirements

1. The user should never get at python stack dump. If you get one, it's bug
please send me the command line and the output.
2. You shouldn't see "Command failed due to uncaught exception" errors either. They are a catchall to avoid stack dumps. If you can
2. The error messages make sense. If you find one misleading or confusing, let me know.
3. The use of color should be consistent.

## Cloning Labs

You should clone labs in the `labs` directory.

You can get a list of all the labs (including old and usued labs) here: https://github.com/NVSL?q=CSE141pp-Lab&type=&language

For instance to clone the final project:

```
cd labs
git clone [email protected]:NVSL/CSE141pp-Lab-FinalProject.git
```

## Running a Lab

To get started working on lab, let's run the final project. It's here:

```
cd labs/CSE141pp-Lab-FinalProject/
```

You should be able to run the lab locally with

```
make
```

It'll take a little while. Ignore the `make: runlab: Command not found`

Then you can run it in the cloud with

```
cse142 job run --lab test 'make clean; make
```

## Starting the Runner Service

**Unless you are steve, You probably don't need to do this.**

First, start a swarm:

```
docker swarm init --advertise-addr
```

Then start the runner service:

```
cse142 dev --service --name runner-service --image stevenjswanson/cse142l-service:latest bash -c "cse142 cluster runner"
```

## Debugging The STudent experience

The `runner` docker image is self-contained so it'll work as a student image running on campus servers via dsmlp.

Here's a command line to mimic what they do:

```
docker run -it -h $HOME -w $HOME --mount type=bind,source=$HOME,dst=/root --publish published=8889,target=8888 stevenjswanson/cse142l-swanson-runner:latest cse142-notebook
```

## Building Docker Images

OUr images are based on the Jupyter notebook scipy image, but the published version use an old version of python, so we built it ourselves:

```
make docker-stacks
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nvsl/cse141pp-root

Awesome Lists containing this project

README