{"id":28679254,"url":"https://github.com/antoniojbt/project_quickstart","last_synced_at":"2026-02-11T08:33:01.239Z","repository":{"id":62578134,"uuid":"79537885","full_name":"antoniojbt/project_quickstart","owner":"antoniojbt","description":"Boilerplate tools and templates for setting up a data analysis project.","archived":false,"fork":false,"pushed_at":"2025-01-03T16:33:11.000Z","size":1032,"stargazers_count":1,"open_issues_count":4,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-03T17:31:21.747Z","etag":null,"topics":["data-science","data-science-toolbox","template-project"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/antoniojbt.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-01-20T07:47:53.000Z","updated_at":"2023-03-20T11:25:57.000Z","dependencies_parsed_at":"2022-11-03T19:38:20.310Z","dependency_job_id":null,"html_url":"https://github.com/antoniojbt/project_quickstart","commit_stats":null,"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"purl":"pkg:github/antoniojbt/project_quickstart","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antoniojbt%2Fproject_quickstart","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antoniojbt%2Fproject_quickstart/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antoniojbt%2Fproject_quickstart/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antoniojbt%2Fproject_quickstart/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/antoniojbt","download_url":"https://codeload.github.com/antoniojbt/project_quickstart/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antoniojbt%2Fproject_quickstart/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259742913,"owners_count":22904629,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","data-science-toolbox","template-project"],"created_at":"2025-06-14T01:04:12.458Z","updated_at":"2026-02-11T08:33:01.220Z","avatar_url":"https://github.com/antoniojbt.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":".. image:: https://www.repostatus.org/badges/latest/inactive.svg\n   :alt: Project Status: Inactive – The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.\n   :target: https://www.repostatus.org/#inactive\n\n.. image:: https://github.com/AntonioJBT/project_quickstart/actions/workflows/tests.yml/badge.svg\n   :target: https://github.com/AntonioJBT/project_quickstart/actions/workflows/tests.yml\n\n.. image:: https://readthedocs.org/projects/project-quickstart/badge/?version=latest\n   :target: http://project-quickstart.readthedocs.io/en/latest/?badge=latest\n   :alt: Documentation Status\n\n.. image:: https://zenodo.org/badge/79537885.svg\n   :target: https://zenodo.org/badge/latestdoi/79537885\n\n\n##################\nproject_quickstart\n##################\n\nBoilerplate tools and templates for setting up a data analysis project.\n\nCreate a new directory, subfolders and files that will help quickstart your data science project with packaging, testing, scripts, reporting and other templates.\n\nQuickstart:\n\n.. code-block:: bash\n\n   pip install project_quickstart\n   project_quickstart --help\n   project_quickstart -n my_super_project\n   \n\nThis tool was produced with the following in mind:\n\n- Reproducibility concepts and best practice implementation\n- Use of Ruffus_ as a pipeline tool and `CGAT tools`_ for support \n- Python_ programming and packaging_\n- restructuredText_ and Sphinx_ for reporting\n- GitHub Actions_ and tox_ for testing\n- Conda_ and Docker_ for management and development\n- GitHub_ for version control\n\nI've additionally put some basic instructions/reminders to link GitHub with:\n\n- ReadtheDocs_ (to easily render your documentation online)\n- Zenodo_ (for archiving your code and generating a DOI)\n- GitHub Actions (to integrate code testing)\n\n.. _Ruffus: http://www.ruffus.org.uk/\n\n.. _`CGAT tools`: http://www.cgat.org/cgat/Tools/the-cgat-code-collection\n\n.. _Python: https://www.python.org/\n\n.. _packaging: https://packaging.python.org/\n\n.. _restructuredText: http://docutils.sourceforge.net/rst.html\n\n.. _Sphinx: http://www.sphinx-doc.org/en/stable/\n\n.. _GitHub Actions: https://github.com/features/actions\n\n.. _tox: https://tox.readthedocs.io/en/latest/\n\n.. _Conda: http://conda.pydata.org/docs/#\n\n.. _Docker: https://www.docker.com/\n.. _GitHub: https://github.com/\n\n.. _ReadtheDocs: https://readthedocs.org/\n\n.. _Zenodo: https://guides.github.com/activities/citable-code/\n\n\nSome of the reasoning\n#####################\n\n    - Analyses are rarely only run once even within the same project. Automating as much as possible saves time and errors. The setup can be costly initially but over time this pays off, particularly when needing to track errors, confirming results, handing over or reconstructing the history and reasoning (even to yourself months later).\n    - Usually a project is built around one data set/experiment/question but even in this case it's easy to see potential gains from automating and packaging.\n    - Packaging your project can help with reproducibility, freezing code, version control, collaboration and general mental sanity (while managing a project at least).\n    - Later on the code or parts of it could be extracted to work on the general case as a separate entity.\n    - This package is based on Python but the same applies to other languages. See discussions on writing your projects as packages in R (R. Flight_, H. Parker_ (also here__), H. Wickham_ and others_ for example). See Hadley Wickham's R_ ecosystem_ and book_.\n    \n.. _Flight: http://rmflight.github.io/posts/2014/07/analyses_as_packages.html\n    \n.. _Parker: https://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/\n\n__ https://hilaryparker.com/2013/04/03/personal-r-packages/\n\n.. _Wickham: http://r-pkgs.had.co.nz/intro.html\n\n.. _others: https://github.com/kbroman/broman\n\n.. _book: http://r-pkgs.had.co.nz/\n\n.. _ecosystem: http://hadley.nz/\n\n.. _R: https://www.r-project.org/\n\n\nInstallation\n############\n\nTested on macOS, Windows and Linux.\n\nPlease raise an issue if you have problems.\n\nDependencies\n============\n\n- Python \u003e=3.8\n- If you run the examples option you will need many more tools. See\n  the Dockerfiles included for specific instructions.\n\n\nFrom GitHub\n===========\n\nTo download and install from GitHub (you need git installed), at the command line do:\n\n.. code-block:: bash\n\n   pip install git+git://github.com/AntonioJBT/project_quickstart.git\n\nor clone from GitHub (https example, you may need ssh):\n\n.. code-block:: bash\n\n   git clone https://github.com/AntonioJBT/project_quickstart.git\n   cd project_quickstart\n   pip install -e .\n\nSee stackoverflow_ example and pip docs_ for further help and explanations pip and git installs.\n\n.. _stackoverflow: http://stackoverflow.com/questions/8247605/configuring-so-that-pip-install-can-work-from-github\n.. _docs: https://pip.pypa.io/en/stable/reference/pip_install/#vcs-support/pip_install.html#vcs-support\n\n\npip\n===\n\n.. code-block:: bash\n\n   pip install project_quickstart\n\n\nUsage\n#####\n\nCreate a project directory skeleton. From the command line do:\n\n.. code-block:: bash\n\n   project_quickstart --help\n   project_quickstart -n my_super_project\n   project_quickstart --script-R my_super_script # which will create an R script template called my_super_script.R\n   project_quickstart --script-python my_super_script # which will create a Python script template called my_super_script.py\n\nThis will create data, code, manuscript and results directories along with Python and R template scripts and the necessary skeleton files for Python packaging, Docker, GitHub Actions CI, Sphinx, etc.\n\nThe --script options will create additional copies of script templates in the current working directory.\n\n\nA pipeline example\n==================\n\nTo run an example of a project with scripts, pipeline and report, you'll need to install several additional tools.\nSee the Dockerfiles on how to do this for Linux.\n\nTo create and run within a conda environment you can try the following bash script.\nYou may need to run commands manually if it fails though and there are other dependencies which need manual installation. You'll need Cairo graphics libraries, Inkscape and latex for the plotting scripts and report.\n\nNote that if you intend to run the pipeline example below, you may want to install cgat-core_ before and within that environment install the additional tools required.\n\n.. _cgat-core: https://github.com/cgat-developers/cgat-core\n\n.. code-block:: bash\n\n   wget https://raw.githubusercontent.com/AntonioJBT/project_quickstart/master/requirements_pq_example.sh\n   bash requirements_pq_example.sh # provided as an example, you probably want to inspect it first and run commands manually\n   # If you're on Mac OS X you'll also need:\n   conda install python.app\n\nOnce you have everything installed, run:\n\n.. code-block:: bash\n\n   conda activate pq_test\n   project_quickstart --example # will create a project with runnable scripts and pipeline\n   cd pq_example/results\n   python ../code/pq_example/pq_example.py --createDF -O ../data/raw/my_dataframe # Generate some start-up data in the raw data folder\n   ln -s ../data/raw/my_dataframe.tsv . # Create a symbolic link to your results directory\n   Rscript ../code/pq_example/pq_example.R -I my_dataframe.tsv\n   Rscript ../code/pq_example/plot_pq_example_pandas.R -I my_dataframe.tsv\n   # You'll need pythonw for matplotlib if on a Mac:\n   python ../code/pq_example/svgutils_pq_example.py \\\n                        --plotA=my_dataframe_gender_glucose_boxplot.svg \\\n                        --plotB=my_dataframe_age_histogram.svg \\\n                        -O F1_my_dataframe\n\nYou can also try:\n\n.. code-block:: bash\n                        \n   Rscript ../code/pq_example/pq_example_mtcars.R\n   Rscript ../code/pq_example/plot_pq_example_mtcars.R\n   python ../code/pq_example/svgutils_pq_example.py --plotA=mtcars_cyl_wt_boxplot_2.svg \\\n                                                    --plotB=mtcars_hp_qsec_scatterplot.svg \\\n                                                    -O F1_mtcars\n   python ../code/pq_example/svgutils_pq_example.py --plotA=mtcars_wt_histogram.svg  \\\n                                                    --plotB=mtcars_boxplot_lm.svg \\\n                                                    -O F2_mtcars\n\nsvgutils_pq_example.py is a simple wrapper for the python package svgutils,\ndon't expect too much. You can modify the script, play around with scale(),\nmove(), Grid(), etc.\n\n\nYou can get a simple example of a report, based on sphinx-quickstart_, by doing:\n\n.. code-block:: bash\n\n   cp -r ../code/pq_example/pipeline_pq_example/pipeline_report .\n   cd pipeline_report\n   ln -s ../../code/pq_example/pipeline_pq_example/configuration/pipeline.yml .\n   make html \n   ln -s _build/html/report_pipeline_pq_example.html .\n   make latexpdf\n   ln -s _build/latex/pq_example.pdf .\n\nYou can run most of this with a bash script:\n\n.. code-block:: bash\n\n   project_quickstart --example # will create a project with runnable scripts and pipeline\n   cd pq_example/results\n   # Use pythonw if on a Mac, otherwise python:\n   bash ../code/pq_example/examples.sh python \u003e examples.log\n   open pipeline_report/pq_example.pdf pipeline_report/report_pipeline_pq_example.html # in a Mac\n\nIf you have cgat-core_ installed you can try the following:\n\n.. code-block:: bash\n\n   project_quickstart --example # will create a project with runnable scripts and pipeline\n   cd pq_example/results\n   python ../code/pq_example/pipeline_pq_example/pipeline_pq_example.py --help\n   # Get a copy of the configuration files, you need to modify the yml file manually:\n   python ../code/pq_example/pipeline_pq_example/pipeline_pq_example.py config \n   python ../code/pq_example/pipeline_pq_example/pipeline_pq_example.py show full\n   python ../code/pq_example/pipeline_pq_example/pipeline_pq_example.py printconfig\n   python ../code/pq_example/pipeline_pq_example/pipeline_pq_example.py plot full\n   python ../code/pq_example/pipeline_pq_example/pipeline_pq_example.py make full --local\n   python ../code/pq_example/pipeline_pq_example/pipeline_pq_example.py make make_report --local\n   open pipeline_report/_build/latex/pq_example.pdf pipeline_report/_build/html/report_pipeline_pq_example.html\n\nIf submitting to a cluster consider using a ~/.cgat.yml file (`see an example`_) for configuration and submitting like:\n\n.. _`see an example`: https://github.com/AntonioJBT/pipeline_example/blob/master/Docker_and_config_file_examples/cgat.yml\n\n.. code-block::\n\n    # Run on the cluster, scripts are short but you may still want to nohup it:\n    nohup python ../code/pq_example/pipeline_pq_example/pipeline_pq_example.py make full \u0026\n    tail -f nohup.out\n\nYou can also see this `tutorial on pipelines`_ with Ruffus and CGAT tools from Ian Sudbery's lab. I've also created a repository to keep some tests and minimal setup for Ruffus, DRMAA and cgat-core which may be helpful (`pipeline_example`_).\n\n.. _`tutorial on pipelines`: https://github.com/sudlab/pipeline_tutorial\n.. _Ruffus: http://www.ruffus.org.uk/\n.. _sphinx-quickstart: http://www.sphinx-doc.org/en/stable/index.html\n.. _`pipeline_example`: https://github.com/AntonioJBT/pipeline_example\n\n\nCitation\n########\n\nThis is a simple utility tool but if you find a way to cite it please do so (!):\n\n.. image:: https://zenodo.org/badge/79537885.svg\n   :target: https://zenodo.org/badge/latestdoi/79537885\n   \n\nContribute\n##########\n\n`Issue Tracker`_\n\n.. _`Issue Tracker`: https://github.com/AntonioJBT/project_quickstart/issues\n\nYou are more than welcome to fork or submit pull requests (!).\n\n\nChange log\n##########\n\nv0.4\n\n- refactored and updated, see release tag for details\n\n\nv0.3\n\n- updated to cgat-core 0.5.6\n- switched from ini to yml\n- minor bugs in bash example\n- included function to find path to R script being executed\n- minor bug in the example report conf.py\n- added ggthem template\n- added scripts option in setup.py template to run package scripts from CLI\n- added rsync example command and instructions for remote copies\n- added Ruffus/CGAT simplified pipeline template script\n- added example scripts and pipeline, option '--example'\n\nv0.2\n\n- Initial release\n\n\nLicense\n#######\n\nGPL-3\n\n\nMore details and suggestions\n############################\n\nProject workflow \n=================\n\n#. Run this package to setup folders, github repo structure, code testing, py package files, etc.\n#. Download packages, tools, etc. Setup Docker, conda kaspel, or other form of tracking environment, packages and their versions.\n#. Manually connect GitHub with integrated services (GitHub Actions CI, Zenodo, RTD).\n#. Code and test code with tox, GitHub Actions and py.test\n#. Analyse\n#. Create new scripts, new pipelines, test them\n#. Document code as you go, update with sphinx autodoc\n#. Generate internal report with plots, text, etc.\n#. Freeze with release tag + zenodo archiving and/or tar ball with py sdist\n#. Repeat cycle\n\nEven if the code is project specific it can still be versioned, frozen and archived for reproducibility purposes later on.\n\nYou can later on build computational pipelines using for example a pipeline quickstart tool based on a `Ruffus and CGAT framework`_.\n\n.. _`Ruffus and CGAT framework`: https://github.com/cgat-developers/cgat-flow/blob/master/CGATPipelines/pipeline_quickstart.py\n\nYou will need to install other software (e.g. R, Ruffus_, Sphinx_, etc.) to make full use depending on your preferences.\n\n\nproject_quickstart usage notes\n==============================\n\nproject_quickstart.py creates a folder structure with file templates for:\n\n- data\n- code\n- results\n- manuscript (reports, general documents, references, etc.)\n\nSee this layout_ for one explanation on organising Python projects\n\n.. _layout: https://www.cgat.org/downloads/public/cgatpipelines/documentation/Reference.html#term-pipeline-scripts\n\nproject_quickstart.py copies the contents of project_quickstart/templates/project_template/ so as to have all the skeleton files needed for:\n\n- Github repository files (but not .git) like: .gitignore, README, THANKS, TODO, LICENCE, etc.\n- GitHub Actions workflow files, tests dir with skeleton files\n- Tox python testing\n- Python packaging files\n- Dockerfile\n- etc\n- Zenodo, see `Zenodo GitHub guide`_. Allow permissions and then with each tag release Zenodo archives the repo and gives it a DOI. See also SSI blog_ on Zenodo.\n\nThese go into the code directory.\n\n.. _`Zenodo GitHub guide`: https://guides.github.com/activities/citable-code/\n\t\n.. _blog: https://www.software.ac.uk/blog/2016-09-26-making-code-citable-zenodo-and-github\n\nMake additional script template copies with project_quickstart.py (located in project_quickstart/templates/project_template/).\n\n\nTesting\n=======\n\n- See tox, GitHub Actions and py.test for a proper setup of py virtualenv, CI and unit testing respectively.\n- Check GitHub Actions setup, add pep8 and flake8 to improve your code.\n- See CGAT docs for an explanation `on testing`_.\n\t\n.. _`on testing`: https://www.cgat.org/downloads/public/cgat/documentation/testing.html#testing\n\n\nUpload code to GitHub\n=====================\n\nTo create a repository after having already created files do the following:\n\n\tManually create a blank (no files at all) repository online in your GitHub account\n\nIn your local machine, under my_project_xxx/code/ do:\n\n.. code-block:: bash\n\n\tgit init\n\tgit add *\n\tgit commit -am 'great message'\n\tgit remote add origin https://github.com/user_xxx/my_project_xxx.git\n\tgit push -u origin master\n\n\t# To copy on any other machine simply run:\n\tgit clone https://github.com/user_xxx/my_project_xxx.git\n\n\nDocumentation\n=============\n\nAfter setting up a project, edit the INI and rst files so that variables that get repeated (such as project name, author, date, etc.) are automatically passed to the various files that need them (setup.py, Dockerfile, manuscript_template, etc.). These will get substituted when running python setup.py or rendering rst documents for instance.\n\nDifferent renderers can give slightly different results (e.g. GitHub, RTD, Sphinx_, rst2pdf, etc.)\n\nrst2pdf can substitute rst variables but pandoc doesn't seem to do it properly.\n\nSee some notes in CGAT reports_.\n\n.. _reports: https://www.cgat.org/downloads/public/cgatpipelines/documentation/PipelineReports.html#writingreports\n\n- Add Python docs with rst, Sphinx_, quickstart_\n- Check doctests_\n- See this tutorial_ for Sphinx_ and general python packaging/workflow\n- See also `Jeff Knupp's tutorial`_ and other `similar blogs`_ on Python packaging.\n\n.. _tutorial: https://jeffknupp.com/blog/2013/08/16/open-sourcing-a-python-project-the-right-way/\n\n.. _quickstart: http://thomas-cokelaer.info/tutorials/sphinx/quickstart.html\n\n.. _doctests: http://thomas-cokelaer.info/tutorials/sphinx/doctest.html\n\n.. _`Jeff Knupp's tutorial`: https://www.jeffknupp.com/blog/2013/08/16/open-sourcing-a-python-project-the-right-way/\n\n.. _`similar blogs`: https://www.pydanny.com/cookie-project-templates-made-easy.html\n\n\nTry to follow Python style guides. See projects where these have been slightly adapted as an example (CGAT style_).\n\n.. _style: https://www.cgat.org/downloads/public/cgat/documentation/styleguide.html#styleguide\n\n\nDependencies\n============\n\nThese can become a nightmare as many programs are needed when running pipelines in biomedical research. Try to stick to one package manager, such as conda. Pip and conda usually play well and complement each other. \n\nDocker images and testing can also make things easier for reproducible environments.\n\nTo run the example pipeline above see the Dockerfiles in this repository for installation instructions and images you can try.\n\n\n\nArchiving and computing environment\n===================================\n\nYou can use releases as code freezes. These can be public, remote, local, private, etc.\n\nFor example, you can greate tags for commits on GitHub, these create compressed files with versioning. See `git tagging`_ on how to do this.\n\n.. _`git tagging`: https://git-scm.com/book/en/v2/Git-Basics-Tagging\n\nFor example, if you want to tag and version a previous commit, do the following:\n\n.. code-block:: bash\n\n   # Update version.py if needed\n   # Check the tag history:\n   git tag\n   \n   # Check the commit log and copy the commit reference:\n   git log --pretty=oneline\n\n   # Create a tag, give it a version, internal message and point it to the commit you want to tag:\n   git tag -a v0.1 -m \"code freeze for draft 1, 23 June 2017\" 7c3c7b76e4e3b47016b4f899c3aa093a44c5e053\n\n   # Push the tag \n   # By default, the git push command does not transfer tags to remote servers, so run:\n   git push origin v0.1\n \n   # You'll then need to click around in the GitHub repository to formally publish the release.\n\n-----\n\nSee bioconda_, contributing a recipe_ and guidelines_ to help manage the project's dependencies and computational environment.\n\n.. _bioconda: https://bioconda.github.io/index.html\n\t\n.. _recipe: https://bioconda.github.io/contribute-a-recipe.html\n\t\n.. _guidelines: https://bioconda.github.io/guidelines.html\n\nIf your code is useful to others, you can make it available with PyPI, create a Dockerfile and/or Conda recipe.\n\n-----\n\n.. note::\n\t\n\tMany links are tutorials I've come across, if you know of other good ones please share them.\n\t\n\tMake sure to check the official sites and follow their tutorials for each of the tools as a primary source however.\n\t\n\tFeel free to fork, raise issues and send pull requests.\n\n\nSimilar packages\n================\n\nI discovered CookieCutter_ while working on this. It probably does what I have setup here better, with useful features, flexibility and many templates for different types of projects.\n\n.. _CookieCutter: https://github.com/audreyr/cookiecutter-pypackage\n\nSee its data-science_ and reproducibility_ templates for example.\n\n.. _reproducibility: https://github.com/mkrapp/cookiecutter-reproducible-science\n\n.. _data-science: https://github.com/drivendata/cookiecutter-data-science\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantoniojbt%2Fproject_quickstart","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fantoniojbt%2Fproject_quickstart","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantoniojbt%2Fproject_quickstart/lists"}