{"id":18666775,"url":"https://github.com/bionode/bionode-watermill-tutorial","last_synced_at":"2025-10-26T08:41:10.456Z","repository":{"id":70808683,"uuid":"93843434","full_name":"bionode/bionode-watermill-tutorial","owner":"bionode","description":"This is a tutorial for bionode-watermill","archived":false,"fork":false,"pushed_at":"2018-04-26T18:17:44.000Z","size":26,"stargazers_count":1,"open_issues_count":1,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-12-27T18:11:50.652Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bionode.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-06-09T09:38:20.000Z","updated_at":"2018-04-26T18:17:46.000Z","dependencies_parsed_at":"2023-02-26T10:00:39.280Z","dependency_job_id":null,"html_url":"https://github.com/bionode/bionode-watermill-tutorial","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bionode%2Fbionode-watermill-tutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bionode%2Fbionode-watermill-tutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bionode%2Fbionode-watermill-tutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bionode%2Fbionode-watermill-tutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bionode","download_url":"https://codeload.github.com/bionode/bionode-watermill-tutorial/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239493676,"owners_count":19647995,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T08:34:01.015Z","updated_at":"2025-10-26T08:41:10.371Z","avatar_url":"https://github.com/bionode.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# bionode-watermill for dummies!\n\n* [Objective](#objective)\n* [First things first](#first-things-first)\n* [Defining a task](#defining-a-task)\n    * [Input/output](#inputoutput)\n* [Using orchestrators](#using-orchestrators)\n    * [join](#join)\n    * [junction](#junction)\n    * [fork](#fork)\n* [Useful links](#useful-links)\n\n## Objective\n\nThis tutorial is intended for those that attempt to assemble a bioinformatics \npipeline using bionode-watermill for the first time.\n\n## First things first\n\nThis tutorial assumes that you have installed `npm`, `git` and `node`. Node.js required for the full tutorial should be version 7 or higher.\n\nTo setup and test the scripts within this tutorial follow these simple steps:\n\n* `git clone https://github.com/bionode/bionode-watermill-tutorial.git`\n* `cd bionode-watermill-tutorial`\n* `npm install bionode-watermill`\n\n## Defining a task\n\nWatermill is a tool that lets you orchestrate tasks. So, lets first \nunderstand how to define a **task**. \n \nTo define a **task** we first need to require bionode-watermill:\n\n```javascript\nconst watermill = require('bionode-watermill') \nconst task = watermill.task  /* have to specify task because watermill object\n has more variables*/\n```\n\nAfter, we can use task variable to define a given task:\n\n* Using standard javascript style:\n\n```javascript\n// this is a kiss example of how tasks work with shell\nconst simpleTask = task({\n  output: '*.txt', // checks if output file matches the specified pattern\n  params: 'test_file.txt',  //defines parameters to be passed to the\n    // task function\n  name: 'This is the task name' //defines the name of the task\n}, function(resolvedProps) {\n    const params = resolvedProps.params\n    return 'touch ' + params\n  }\n)\n```\n\n* Or you can also do something like the following in ES6 syntax, using arrow \nfunctions:\n\n```javascript\n// this is a kiss example of how tasks work with shell\nconst simpleTask = task({\n  output: '*.txt', // checks if output file matches the specified pattern\n  params: 'test_file.txt',  /*defines parameters to be passed to the\n     task function*/\n  name: 'This is the task name' //defines the name of the task\n}, ({ params }) =\u003e `touch ${params}`\n)\n```\n\nNote: [Template literals](https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Template_literals)\nare very useful since they allow to include place holders (${ }) within \nstrings. Template literals are enclosed by the back-tick (\\` \\`) as exemplified \nabove.\n\n\nThen after defining the task, it may be executed like this:\n```javascript\n// runs the task and returns a promise, and can also return a callback\nsimpleTask()\n```\nThis task will create a new file (empty) inside a directory named \n\"data/\\\u003cuid\u003e/\".\nYou may also notice that a 'bunch' of text was outputted to terminal and it \ncan be useful for debugging your pipelines.\n\nThe above example is available [here](https://github.com/bionode/bionode-watermill-tutorial/blob/master/simple_task.js).\nYou can test it by running: `node simple_task.js`\n\n### Input/output\n\nAlthough already discussed [elsewhere](https://github.com/bionode/bionode-watermill/blob/master/docs/Task.md#input-and-output)\nwithin bionode-watermill documentation, in this tutorial I intend to explain \nhow input/output are managed by bionode-watermill. \nFirst, you can either hardcore input to something like:\n\n```javascript\n{ input: 'ERR1229296.sra' }\n```\n\nor instead you can specify glob patterns which are in fact better explained \n[here](https://github.com/bionode/bionode-watermill/blob/master/docs/Task.md#input-and-output).\nBut, basically, what you need to know is that you can specify input to \nsomething like:\n\n```javascript\n{ input: '*.sra' }\n```\n\nThis tells bionode-watermill to crawl within the `data` directory in search \nfor the first hit that matches this pattern. So, pay attention when specifying\n this glob patterns if you have multiple `.sra` files within this folder or \n generated by other tasks that are not your target task (the last one that \n generated a `.sra` file in this example). To circumvent this you can provide\n  file names that you can easily manage. For instance if you have one file \n  named `ERR1229296.sra` and another one `ERR1229297.sra` and you want just \n  the first one, you can easily pass the input as follows:\n  \n```javascript\n{ input: '*6.sra' }\n```\n\nor of course hardcode it. \n\nOutput works in a very similar way, however there are a few specificities \nthat the user must be aware of: \n\n- Output object is not the output filename, it is used only to match the file\n extension to the expected result of the task. So despite necessary for \n proper resolving the task.\n```javascript\n// this won't work!!!\n{ output: 'myfile.txt' }\n\n// rather you should provide this as follows:\n{ \n  output: '*.txt',\n  params: { output: 'myfile.txt' }\n}\n```\n\nRemember, task.output is used to match the output file pattern and if you \nwant to specify a given filename to the output you need to use task.params\n.output object instead where you can freely specify the output file name.\n\n## Using orchestrators\n\n[What are orchestrators?](https://github.com/bionode/bionode-watermill#what-are-orchestrators)\n\n* ### Join\n\n**Join** is an operator that lets you run a number of tasks in a given order. \nFor instance if we are interested in creating a file and writing to it \nin two different instances. But let's first define a new task so we can \nperform it after the task that we called `simpleTask`:\n\n```javascript\nconst writeToFile = task({\n  input: '*.txt', // specifies the pattern of the expected input\n  output: '*.txt', // checks if output file matches the specified pattern\n  name: 'Write to file' //defines the name of the task\n}, ({ input }) =\u003e `echo \"some string\" \u003e\u003e ${input}`\n)\n```\n\nSo, task `writeToFile` writes \"some string\" to the file that we have just \ncreated in task `simpleTask`. However, to do so, we need the file to be \ncreated first and only then write something to it.\nIn order to achieve this we use `join`:\n\nBefore applying the pipeline first we need to require **join** \n\n```javascript\n// === WATERMILL ===\nconst {\n  task,\n  join\n} = require('bionode-watermill')\n```\n\nAnd then,\n\n```javascript\n// this is a kiss example of how join works\nconst pipeline = join(simpleTask, writeToFile)\n\n//executes the join itself\npipeline()\n```\n\nThis operation will generate two directories inside `data` folder, one which \nis responsible for the first task (`simpleTask`) that will create a new\n file called `test_file.txt`, and a second task (`writeToFile`) that will do \n a symlink to `test_file.txt` and write to it, since we have indicated that \n we would like to write for the same file as the input. Note that once again \n files will be inside a directory named \"data/\\\u003cuid\u003e/\" (but in this case you \n will have two directories with distinct uids).\n\nThe above example is available [here](https://github.com/bionode/bionode-watermill-tutorial/blob/master/simple_join.js).\nYou can test the above example by running: `node simple_join.js`\n\n* ### Junction\n\nUnlike **join**, **junction** allows to run multiple tasks in parallel. \n\nHowever, we will have to create a new task since if we simply replace in the \nprevious pipeline **join** with **junction**, we will end up with a file \nnamed `test_file.txt` with nothing written inside, because if you create the \nfile and write to it at the same time, write won't work, but the file will be\n created. \n \n But first, don't forget to:\n ```javascript\n // === WATERMILL ===\n const {\n   task,\n   join,\n   junction\n } = require('bionode-watermill')\n ```\n And only then:\n ```javascript\n // this will not produce the file with text in it!\nconst pipeline = junction(simpleTask, writeToFile)\n```\n\nSo, we will define a new simple task:\n\n```javascript\nconst writeAnotherFile = task({\n  output:'*.file', // specifies the pattern of the expected input\n  params: 'another_test_file.file', /* checks if output file matches the\n  specified pattern*/\n  name: 'Yet another task'\n}, ({ params }) =\u003e `touch ${params} | echo \"some new string\" \u003e\u003e ${params}`\n)\n```\n\nAnd then execute the new pipeline:\n\n```javascript\n// this is a kiss example of how junction works\nconst pipeline = junction(\n  join(simpleTask, writeToFile),  /* this \"joint\" tasks will be executed at the\n  same time as the task bellow */\n  writeAnotherFile\n)\n\n//executes the pipeline itself\npipeline()\n```\n\nThis new pipeline consists on creating two files and writing text to them. Note \nthat in `writeAnotherFile` task in this task pipe is used \n in shell (\"|\") along with the shell commands `touch` and `echo`. That is a \n feature that bionode-watermill also supports. Of course, these are simple \n tasks that can be performed only with shell commands (but they are merely \n illustrative). Instead, as mentioned above you can use javascript **callback** \n functions or **promises** as the final return of a **task**.\n \nNevertheless, if you browse to `data` folder, you should have three folders \n(because you have three tasks). One with the text file generated in the first\n task, another one with a symlink for the first task (that was used to write \n to this file) and finally a third one in which you should have the file \n generated and written in the third task (named `another_test_file.file`). \n\nThe above example is available [here](https://github.com/bionode/bionode-watermill-tutorial/blob/master/simple_junction.js).\nYou can test the above example by running: `node simple_junction.js`\n\n* ### Fork\n\nWhile **junction** handles two or more tasks at the same time, **fork** \nallows to pass the output of two or more different tasks to the next task. \nImagine you have two different files being generated in two different tasks \n and want to  process them using the same task in the next step. In this case \n bionode-watermill uses **fork**, to split the pipeline in two distinct \n branches that after will be processed independently. \n \n If you have something like:\n ```javascript \n join(\n   taskA,\n   fork(taskB, taskC),\n   taskD\n )\n ```\n This will result in something like this:  ```taskA -\u003e taskB -\u003e taskD'``` and \n ```taskA -\u003e taskC -\u003e taskD''```, with two distinct final outputs for the \n pipeline. This is a quite useful feature to benchmark programs or if you are\n  interested in running multiple programs that do the same type of analyses \n  and compare the results of both analyses.\n  \n  Importantly, the same type of pipeline with **junction** instead of **fork**,\n   ```javascript \n   join(\n     taskA,\n     junction(taskB, taskC),\n     taskD\n   )\n   ```\n   would result in the following workflow: ```taskA -\u003e taskB, taskC -\u003e taskD```,\n    where taskD has only one final result.\n    \n But enough talk, lets get to work!\n \n  First:\n  \n  ```javascript\n  // === WATERMILL ===\n  const {\n    task,\n    join,\n    fork\n  } = require('bionode-watermill')\n  ```\n \n For the fork tutorial, two functions will be defined. These functions \n create a file and write to it:\n \n ```javascript\nconst simpleTask1 = task({\n    output: '*.txt', // checks if output file matches the specified pattern\n    params: 'test_file.txt',  //defines parameters to be passed to the\n    // task function\n    name: 'task1: creating file 1' //defines the name of the task\n  }, ({ params }) =\u003e `touch ${params} | echo \"this is a string from first file\" \u003e\u003e ${params}`\n)\n\nconst simpleTask2 = task({\n    output:'*.txt', // specifies the pattern of the expected input\n    params: 'another_test_file.txt', /* checks if output file matches the\n     specified pattern*/\n    name: 'task 2: creating file 2'\n  }, ({ params }) =\u003e `touch ${params} | echo \"this is a string from second file\" \u003e\u003e ${params}`\n)\n```\n\nThen, a task to be performed after the fork, which will add the same text to \nthese files:\n\n```javascript\nconst appendFiles = task({\n    input: '*.txt', // specifies the pattern of the expected input\n    output: '*.txt', // checks if output file matches the specified patters\n    name: 'Write to files' //defines the name of the task\n  }, ({ input }) =\u003e `echo \"after fork string\" \u003e\u003e ${input}`\n)\n```\n\nAnd finally our pipeline execution:\n\n```javascript\n// this is a kiss example of how fork works\nconst pipeline = join(\n  fork(simpleTask1, simpleTask2),\n  appendFiles\n)\n\n//executes the pipeline itself\npipeline()\n```\n\nThis should result in four output directories in our `data` folder. Notice \nthat contrarily to **junction**, where three tasks would render three output \ndirectories, with **fork** the result of our pipeline are four output \ndirectories, where the outputs from `simpleTask1` and `simpleTask2` where \nboth processed by task `appendFiles`.\n\nThe above example is available [here](https://github.com/bionode/bionode-watermill-tutorial/blob/master/simple_fork.js).\nYou can test the above example by running: `node simple_fork.js`\n \n\n## Useful links\n\n* [How to require bionode-watermill inside my project?](https://github.com/bionode/GSoC17/blob/master/notes/running_watermill.md)\n\n* [Prefer javascript standard syntax? Then use the following URL](https://github.com/bionode/bionode-watermill-tutorial/tree/master/js_standard_tutorial)\n\n* [Is this not challenging enough? Then try our other example pipelines](https://github.com/bionode/bionode-watermill/tree/master/examples/pipelines)\n    * [A pipeline to perform mapping with bowtie and bwa in parallel](https://github.com/bionode/bionode-watermill/tree/master/examples/pipelines/two-mappers)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbionode%2Fbionode-watermill-tutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbionode%2Fbionode-watermill-tutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbionode%2Fbionode-watermill-tutorial/lists"}