Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/christinalk/pearc24-pelican-jobs


https://github.com/christinalk/pearc24-pelican-jobs

Last synced: 16 days ago
JSON representation

Awesome Lists containing this project

README

        

{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "84a3c724-a84a-487f-8c1c-0039bc7026b1",
"metadata": {},
"outputs": [],
"source": [
"WORKDIR=$HOME/pearc24-pelican-jobs\n",
"echo $WORKDIR"
]
},
{
"cell_type": "markdown",
"id": "86b272d4-9fda-461a-b0b0-bd947f716da8",
"metadata": {},
"source": [
"## Recap: Fetching Data with Pelican\n",
"\n",
"This set of commands downloads a test data file (in a sequence data format) from the Open Science Data Federation. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9a3e505c-b7d1-481b-af10-098516367182",
"metadata": {},
"outputs": [],
"source": [
"cd $WORKDIR/data\n",
"\n",
"OSDF=pelican://osg-htc.org\n",
"OBJ_PATH=ospool/uc-shared/public/osg-training/tutorial-fastqc/test.fastq"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0a420e3c-0c89-4d32-9336-768562900ac3",
"metadata": {},
"outputs": [],
"source": [
"pelican object get $OSDF/$OBJ_PATH test.fastq"
]
},
{
"cell_type": "markdown",
"id": "628b9aa5-392f-443a-8d42-d3347f706eba",
"metadata": {},
"source": [
"The following command should display the beginning of a genomic sequence file: "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c720d9dc-98d6-487b-a534-052c0d317ee7",
"metadata": {},
"outputs": [],
"source": [
"head test.fastq"
]
},
{
"cell_type": "markdown",
"id": "cf0bb64b-4788-4268-b286-5c695f66418e",
"metadata": {},
"source": [
"## Sample Job Submission"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d990c4ff-f7cc-4bb0-b3aa-bfa52dc20d2f",
"metadata": {},
"outputs": [],
"source": [
"cd $WORKDIR/sample"
]
},
{
"cell_type": "markdown",
"id": "62a9cc00-154c-4fcd-91d5-87c3f9acfccf",
"metadata": {},
"source": [
"Look at the contents of the HTCondor job submit file below. There should be some familiar elements (resource requests, where to save stdout/stderr/log files, what commands to run) and some potentially new elements (transferring files). "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "abb08e5d-39ac-4651-b333-8f95599a151d",
"metadata": {},
"outputs": [],
"source": [
"cat sample.submit"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "724adaf8-93d6-4350-9061-30e77c1fc98c",
"metadata": {},
"outputs": [],
"source": [
"condor_submit sample.submit"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0e2c75d3-33e4-40f7-9fa1-51c7ce251c5d",
"metadata": {},
"outputs": [],
"source": [
"condor_q"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f262c58f-8242-4c4b-967c-571b1e441d33",
"metadata": {},
"outputs": [],
"source": [
"cat job*.output"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "35c7d78c-64ba-43a1-83cb-ed88db7fee53",
"metadata": {},
"outputs": [],
"source": [
"cat output*.txt"
]
},
{
"cell_type": "markdown",
"id": "feff9e9a-1707-4084-b2ac-8d623ebccc8c",
"metadata": {},
"source": [
"## Job Submission with Pelican"
]
},
{
"cell_type": "markdown",
"id": "49d05095-d70b-4a8f-9b55-51853d6338d6",
"metadata": {},
"source": [
"### One Job Fetching a Container and Data File"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fe4091f5-b3d0-4b07-875e-4a8a3cba6b33",
"metadata": {},
"outputs": [],
"source": [
"cd $WORKDIR/fastqc"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "84b959bd-86a9-452c-8257-1e3bb4735b26",
"metadata": {},
"outputs": [],
"source": [
"ls -lh"
]
},
{
"cell_type": "markdown",
"id": "965f9840-3089-4252-a6de-f0201b05e145",
"metadata": {},
"source": [
"We are now going to submit a slightly more complex job example. This job will fetch both the `test.fastq` file from the OSDF that we used a minute ago, as well as a container with the `fastQC` bioinformatics program. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f94880bb-7bbe-418c-8ced-e9c07b2eab46",
"metadata": {},
"outputs": [],
"source": [
"grep \"pelican\" single-fastqc.submit"
]
},
{
"cell_type": "markdown",
"id": "d781dbc9-8f28-4d23-9d1a-615c63bc1c64",
"metadata": {},
"source": [
"The job itself will run the FastQC program on the fetched data file and produce a visualization, which will get written back to the `results` folder"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9ce2632a-5923-4717-9c74-4333debbef3e",
"metadata": {},
"outputs": [],
"source": [
"cat single-fastqc.submit"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "93d3017c-850f-4b39-9002-8262b9259217",
"metadata": {},
"outputs": [],
"source": [
"condor_submit single-fastqc.submit"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6bc8c01d-564c-43fb-9746-8d33c91946df",
"metadata": {},
"outputs": [],
"source": [
"condor_q"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6db8c33d-178a-4d08-9b69-4df4a0ce52eb",
"metadata": {},
"outputs": [],
"source": [
"ls results/"
]
},
{
"cell_type": "markdown",
"id": "db3db941-5efd-4311-a085-213ab44fe274",
"metadata": {},
"source": [
"One of the script commands was an `ls` so we can see that the `test.fastq` was downloaded by looking at the standard output file. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eb1e8f32-f349-4114-8470-df6af04e381b",
"metadata": {},
"outputs": [],
"source": [
"cat logs/*.out"
]
},
{
"cell_type": "markdown",
"id": "331797c7-fa97-41c5-9408-e458b5b31882",
"metadata": {},
"source": [
"### Multiple Jobs Fetching a Single Container and Unique Data Files"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "03314b5b-7393-4eb8-8059-9a5e91e8d4ef",
"metadata": {},
"outputs": [],
"source": [
"cd $WORKDIR/fastqc"
]
},
{
"cell_type": "markdown",
"id": "5a708023-20ec-494e-949c-5ba90b4c70f7",
"metadata": {},
"source": [
"Because the Pelican object links can be quite long, it's helpful to use intermediate variables in the submit file. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "73bd6c16-5100-4abb-b88c-6c3b9774cc0f",
"metadata": {},
"outputs": [],
"source": [
"grep \"OBJ_LOC\" many-fastqc.submit"
]
},
{
"cell_type": "markdown",
"id": "e93ac776-bf74-4746-ae58-34b30fb0f813",
"metadata": {},
"source": [
"Finally, we'll run the same FastQC analysis, but with multiple data files (again, being fetched from the OSDF). "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bebb5f33-1517-4f11-b85f-162bb5f8f2ed",
"metadata": {},
"outputs": [],
"source": [
"cat many-fastqc.submit"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1e93df78-4ff2-487c-b13e-4e4269d8b8b1",
"metadata": {},
"outputs": [],
"source": [
"condor_submit many-fastqc.submit"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "99c27834-1f87-40e2-a5f2-c48830306da0",
"metadata": {},
"outputs": [],
"source": [
"condor_q"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "07b8a924-ea31-4755-ad0e-6094e907b3b1",
"metadata": {},
"outputs": [],
"source": [
"ls results/"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e30df20a-f2da-4f0d-9eed-deedb3c81823",
"metadata": {},
"outputs": [],
"source": [
"cat logs/*.out"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Bash",
"language": "bash",
"name": "bash"
},
"language_info": {
"codemirror_mode": "shell",
"file_extension": ".sh",
"mimetype": "text/x-sh",
"name": "bash"
}
},
"nbformat": 4,
"nbformat_minor": 5
}