{"id":26824340,"url":"https://github.com/charles-plessy/tutorial","last_synced_at":"2025-08-16T12:16:02.837Z","repository":{"id":11063687,"uuid":"13405579","full_name":"charles-plessy/tutorial","owner":"charles-plessy","description":"Various tutorials on how to analyse transcriptomic data.","archived":false,"fork":false,"pushed_at":"2023-06-28T05:47:13.000Z","size":1973,"stargazers_count":8,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-30T09:45:18.395Z","etag":null,"topics":["cage","edger","fantom","transcriptome","tutorial"],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/charles-plessy.png","metadata":{"files":{"readme":"README.html","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2013-10-08T06:45:12.000Z","updated_at":"2024-04-17T11:32:42.000Z","dependencies_parsed_at":"2022-09-02T20:00:58.236Z","dependency_job_id":"34c98635-376f-42bc-bc40-d828b64a6f4a","html_url":"https://github.com/charles-plessy/tutorial","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/charles-plessy%2Ftutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/charles-plessy%2Ftutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/charles-plessy%2Ftutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/charles-plessy%2Ftutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/charles-plessy","download_url":"https://codeload.github.com/charles-plessy/tutorial/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251319598,"owners_count":21570426,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cage","edger","fantom","transcriptome","tutorial"],"created_at":"2025-03-30T09:45:24.129Z","updated_at":"2025-04-28T13:00:17.584Z","avatar_url":"https://github.com/charles-plessy.png","language":"HTML","readme":"\u003ch1 id=\"tutorials-for-analysing-cage-and-deep-race-data.\"\u003eTutorials for analysing CAGE and Deep-RACE data.\u003c/h1\u003e\n\u003cp\u003eVarious tutorials on how to analyse \u003ca href=\"https://en.wikipedia.org/wiki/Cap_analysis_gene_expression\"\u003eCAGE\u003c/a\u003e data.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ca href=\"./Deep-RACE1/Deep-RACE1.html\"\u003eDeep-RACE\u003c/a\u003e (work in progress)\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"./CAGE_differential_analysis1/analysis.html\"\u003eCAGE differential analysis 1\u003c/a\u003e (work in progress)\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"./CAGE_differential_analysis2/analysis.html\"\u003eCAGE differential analysis 2\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"./FANTOM5_SDRF_files/sdrf.html\"\u003eSimple use of FANTOM5 SDRF files\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"./CAGE_normalisation_by_subsampling/subsampling.html\"\u003eNormalisation of CAGE libraries by sub-sampling\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThese tutorials are designed to be executed on a Linux system's command line interface (also called \u003cem\u003eTerminal\u003c/em\u003e or \u003cem\u003eshell\u003c/em\u003e). I recommend the book \u003cem\u003e\u003ca href=\"http://linuxcommand.org/tlcl.php\" title=\"A Complete Introduction\"\u003eThe Linux Command Line\u003c/a\u003e\u003c/em\u003e, by William E. Shotts, Jr, January 2012, \u003ca href=\"http://nostarch.com/tlcl.htm\" title=\"the finest in geek entertainment\"\u003eno starch press\u003c/a\u003e to people not familiar with entering commands on the keyboard.\u003c/p\u003e\n\u003cp\u003eThe programs used are assumed to be installed in advance. On the \u003ca href=\"http://www.debian.org\"\u003eDebian\u003c/a\u003e operating system, many of them (BWA, SAMtools, BEDTools, ...) are available pre-packaged and will be installed (altogether with many other programs) by the command \u003ccode\u003eapt-get install med-bio\u003c/code\u003e.\u003c/p\u003e\n\u003cp\u003eOther software have to be downloaded and installed by hand. Place them in the \u003ccode\u003ebin\u003c/code\u003e directory in your home directory, and set their executable property in order to use them. If you had to create the \u003ccode\u003ebin\u003c/code\u003e directory, it will only be taken into account at your next connection (see \u003ca href=\"http://stackoverflow.com/questions/16366986/adding-bin-directory-in-your-path\"\u003estackoverflow\u003c/a\u003e for alternatives).\u003c/p\u003e\n\u003cp\u003eHere is for example how to download, compile and install the \u003ca href=\"http://genome.gsc.riken.jp/osc/english/software/src/tagdust.tgz\"\u003etagdust\u003c/a\u003e software. By convention, we will download the software in a directory called \u003ccode\u003esrc\u003c/code\u003e. \u003cem\u003eCompiling\u003c/em\u003e means to produce the executable program suitable for your computer, using the \u003ca href=\"https://en.wikipedia.org/wiki/Source_code\"\u003esource code\u003c/a\u003e that was downloaded. On Debian systems, the programs necessary for compiling a program made in the C programming language can be installed through the \u003ccode\u003ebuild-essential\u003c/code\u003e package.\u003c/p\u003e\n\u003cpre\u003e\u003ccode\u003ecd                    # move back to the home directory\nmkdir -p src          # create the src directory if it did not exist.\ncd src                # enter the src directory\nwget http://genome.gsc.riken.jp/osc/english/software/src/tagdust.tgz   # download TagDust\ntar xvf tagdust.tgz   # unpack TagDust\ncd tagdust            # enter the freshly tagdust directory created by TagDust\nmake                  # compile the program\ncp tagdust ~/bin      # copy tagdust to the \u0026#39;bin\u0026#39; directory in your home directory\u003c/code\u003e\u003c/pre\u003e\n\u003ch2 id=\"frequent-problems\"\u003eFrequent problems\u003c/h2\u003e\n\u003ch3 id=\"command-not-found.\"\u003eCommand not found.\u003c/h3\u003e\n\u003cp\u003eIt is not enough to compile a program. The command-line interface needs to find them, and by default it does not search in the current work directory.\u003c/p\u003e\n\u003cp\u003eA very good explanation is in \u003cem\u003e\u003ca href=\"http://linuxcommand.org/tlcl.php\" title=\"A Complete Introduction\"\u003eThe Linux Command Line\u003c/a\u003e\u003c/em\u003e's chapter 24, section \u003cem\u003eScript File Location\u003c/em\u003e. Here is a brief summary.\u003c/p\u003e\n\u003cp\u003eThe standard way to make programs accessible is to add them to one of a set of pre-defined directories that are collectively called the \u003cem\u003ePATH\u003c/em\u003e. For system-wide installations, the directory is usually \u003ccode\u003e/usr/bin\u003c/code\u003e. For local installations by a single user, the directory is usually called \u003ccode\u003ebin\u003c/code\u003e, in the \u003cem\u003ehome\u003c/em\u003e directory, also accessible via the shortcut \u003ccode\u003e~/bin\u003c/code\u003e. If it does not exist, it can be created like any other directory, but it may be necessary to log out and in again in order for the system to recognise this directory in the \u003cem\u003ePATH\u003c/em\u003e.\u003c/p\u003e\n\u003cp\u003eIn addition, the program needs to have the executable permissions. These can be given with the \u003ccode\u003echmod\u003c/code\u003e command (see \u003cem\u003e\u003ca href=\"http://linuxcommand.org/tlcl.php\" title=\"A Complete Introduction\"\u003eThe Linux Command Line\u003c/a\u003e\u003c/em\u003e's chapter 24, section \u003cem\u003eExecutable Permissions\u003c/em\u003e.), or via the file navigator of the desktop graphical interface.\u003c/p\u003e\n\u003cp\u003eLastly, it is possible to run a program that is not in the \u003cem\u003ePATH\u003c/em\u003e. For this, just indicate in which directory it is. The current directory is always aliased to \u003ccode\u003e.\u003c/code\u003e, so to run a program called \u003ccode\u003emyscript\u003c/code\u003e that is in the current directory, type \u003ccode\u003e./myscript\u003c/code\u003e. (The comment above about executable permissions still applies).\u003c/p\u003e\n\u003ch3 id=\"what-is-that-sponge\"\u003eWhat is that sponge ?\u003c/h3\u003e\n\u003cp\u003e\u003ccode\u003esponge\u003c/code\u003e is a command from the \u003ca href=\"http://joeyh.name/code/moreutils/\"\u003emoreutils\u003c/a\u003e collection, that I use frequently. On Debian systems, it is easy to install via the \u003ca href=\"packages.debian.org/moreutils\"\u003emoreutils\u003c/a\u003e package.\u003c/p\u003e\n\u003cp\u003eThe goal of \u003ccode\u003esponge\u003c/code\u003e is to solve the following problem: when one file is read, piped to a command, and the result is redirected to the file itself, the contents are not updated as expected, but the file is deleted. This is because at the very beginning of the command, the file receiving the redirection is transformed in an empty file before its contents are even read. For example, with a file called \u003ccode\u003eexample.fq\u003c/code\u003e:\u003c/p\u003e\n\u003cpre\u003e\u003ccode\u003ecat example.fq | fastx_trimmer -f 11 \u0026gt; example.fq          # Deletes the file.\ncat example.fq | fastx_trimmer -f 11 | sponge example.fq   # Trims the first 10 nucleotides.\u003c/code\u003e\u003c/pre\u003e\n\u003cp\u003eWithout \u003ccode\u003esponge\u003c/code\u003e, one would need to create a temporary file (which is actually what \u003ccode\u003esponge\u003c/code\u003e does in a more proper way behind the scene).\u003c/p\u003e\n\u003cpre\u003e\u003ccode\u003ecat example.fq | fastx_trimmer -f 11 \u0026gt; example.tmp.fq\nmv example.tmp.fq example.fq\u003c/code\u003e\u003c/pre\u003e\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcharles-plessy%2Ftutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcharles-plessy%2Ftutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcharles-plessy%2Ftutorial/lists"}