{"id":17961207,"url":"https://github.com/erictleung/data-science-resources","last_synced_at":"2026-02-27T04:32:04.631Z","repository":{"id":72464212,"uuid":"56088418","full_name":"erictleung/data-science-resources","owner":"erictleung","description":":earth_americas: Readings and resource materials for data science","archived":false,"fork":false,"pushed_at":"2025-12-18T19:41:01.000Z","size":378,"stargazers_count":31,"open_issues_count":8,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-21T21:46:53.131Z","etag":null,"topics":["bioinformatics","computational-biology","linear-algebra","machine-learning","network-science","statistical-learning","statistical-methods","statistics"],"latest_commit_sha":null,"homepage":"http://erictleung.com/data-science-resources/","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc0-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/erictleung.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"erictleung","ko_fi":"erictleung","custom":["paypal.me/erictleung"]}},"created_at":"2016-04-12T18:37:05.000Z","updated_at":"2025-12-18T19:41:05.000Z","dependencies_parsed_at":null,"dependency_job_id":"a47ddd10-ff4b-42d9-870f-ee7d083502e6","html_url":"https://github.com/erictleung/data-science-resources","commit_stats":{"total_commits":226,"total_committers":2,"mean_commits":113.0,"dds":0.03982300884955747,"last_synced_commit":"b67618d0249fee41ca832e37026a7401b28726c6"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/erictleung/data-science-resources","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erictleung%2Fdata-science-resources","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erictleung%2Fdata-science-resources/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erictleung%2Fdata-science-resources/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erictleung%2Fdata-science-resources/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/erictleung","download_url":"https://codeload.github.com/erictleung/data-science-resources/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erictleung%2Fdata-science-resources/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29884710,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-26T23:51:21.483Z","status":"online","status_checked_at":"2026-02-27T02:00:06.759Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","computational-biology","linear-algebra","machine-learning","network-science","statistical-learning","statistical-methods","statistics"],"created_at":"2024-10-29T11:08:40.167Z","updated_at":"2026-02-27T04:32:04.617Z","avatar_url":"https://github.com/erictleung.png","language":null,"funding_links":["https://github.com/sponsors/erictleung","https://ko-fi.com/erictleung","paypal.me/erictleung"],"categories":[],"sub_categories":[],"readme":"# Data Science Readings and Resources\n\n[![Check Resources](https://github.com/erictleung/review/actions/workflows/test.yml/badge.svg)](https://github.com/erictleung/review/actions/workflows/test.yml)\n\nCurated resource materials from around the internet for data science, with applications in bioinformatics and computational biology and other domains, that I've found useful.\n\n\u003c!-- START doctoc generated TOC please keep comment here to allow auto update --\u003e\n\u003c!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --\u003e\n**Table of Contents**\n\n- [Learning to Learn](#learning-to-learn)\n- [Statistics and Probability](#statistics-and-probability)\n  - [General Resources](#general-resources)\n  - [R Friendly Resources](#r-friendly-resources)\n  - [Specific Topics](#specific-topics)\n  - [Interactive Articles](#interactive-articles)\n  - [Data-related](#data-related)\n- [Datasets](#datasets)\n- [General mathematics](#general-mathematics)\n- [Linear Algebra](#linear-algebra)\n- [Network Science](#network-science)\n- [Algorithms and Data Structures](#algorithms-and-data-structures)\n- [Programming](#programming)\n- [Structural Query Language (SQL)](#structural-query-language-sql)\n- [Statistical Methods and Machine Learning](#statistical-methods-and-machine-learning)\n- [Computational Biology](#computational-biology)\n- [Domain Knowledge](#domain-knowledge)\n- [Data Visualization and Making Figures](#data-visualization-and-making-figures)\n- [Should-Read Data Science Papers](#should-read-data-science-papers)\n- [Software Engineering](#software-engineering)\n- [Reproducible Science](#reproducible-science)\n- [People Skills and Communication](#people-skills-and-communication)\n- [Other Lists](#other-lists)\n- [License](#license)\n\n\u003c!-- END doctoc generated TOC please keep comment here to allow auto update --\u003e\n\n## Learning to learn\n\n\u003e Resources and tips on how to self-learn and learn with others\n\n- [The Thinker's Guide to The Art of Socratic Questioning (PDF)](https://www.criticalthinking.org/TGS_files/SocraticQuestioning2006.pdf) - A checklist of questions to help facilitate directed discussions on topics.\n- [Questions for a Socratic Dialogue (PDF)](https://courses.cs.vt.edu/cs2104/Spring14McQuain/Notes/SocraticQ.pdf) - Nine types of questions that can be used to facilitate understanding.\n\n## Statistics and Probability\n\n\u003e Statistics is the study of the collection, analysis, interpretation,\n\u003e presentation, and organization of data.\n\u003e\n\u003e — [Wikipedia](https://en.wikipedia.org/wiki/Statistics)\n\n### General Resources\n\n- [Handbook of Biological Statistics](http://www.biostathandbook.com/index.html) and [R Supplement](http://rcompanion.org/rcompanion/index.html) - Online set of notes from \"Biological Data Analysis\" course from University of Delaware.\n- [Engineering Statistics Handbook](https://web.archive.org/web/20240924073012fw_/https://www.itl.nist.gov/div898/handbook/toolaids/pff/index.htm) - Handbook to help scientists and engineering incorporate statistical methods.\n- [Stat Trek](http://stattrek.com/) - Teach yourself statistics.\n- [Online Statistics Education](http://onlinestatbook.com/2/index.html) - Developed by Rice University, University of Houston Clear Lake, and Tufts University.\n- [BS704 Probability](http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Probability/index.html) - Boston University course on probability.\n- [StatQuest](https://www.youtube.com/playlist?list=PLblh5JKOoLUIcdlgu78MnlATeyx4cEVeR) - Series of videos on miscellaneous complex topics such as p-values, principle component analysis (PCA), and R-squared.\n- [STAT 505 Applied Multivariate Statistical Analysis](https://onlinecourses.science.psu.edu/stat505/) - Penn State Eberly College of Science course.\n- [StatSoft Electronic Statistics Textbook](https://web.archive.org/web/20200221222547/http://www.statsoft.com/Textbook)\n- [UW Summer Institutes Archive Material](https://si.biostat.washington.edu/about/archives) - Various learning\n  material in statistics, data analysis, machine learning, genetics, and\n  clinical research.\n- [Practical Data Science for Stats][peerjstats] - Collection of curated articles\n  on practical data science.\n- [Statistics for Biologists](https://www.nature.com/collections/qghhqm) - Nature collection of articles on statistical analysis.\n- [Top Upvoted Questions on CrossValidated][topcrossvotes] - Great questions with\n  great answers about topics in statistics and machine learning.\n- [Ordination Methods for Ecologists][ordokstate] - Resource of ordination\n  methodology\n- [Probability Cheatsheet][probcheat] - Compiled by William Chen and Joe\n  Blitzstein\n- [Statistics Done Wrong][wrongstats] - Reviews popular statistical errors and\n  slip-ups committed by scientists every day.\n- [Statistics for Hackers][statshacks] - By Jake VanderPlas (PyCon 2016)\n- [Modern Statistics for Modern Biologists][msmb]\n- [An Introduction to Statistical Learning](https://www.statlearning.com/) - With editions in R and Python.\n- [Library of Statistical Techniques (LOST)](https://lost-stats.github.io/) - Publicly-editable website with the goal of making it easy to execute statistical techniques in statistical software.\n- [List of A/B Testing Calculators](https://www.evanmiller.org/ab-testing/)\n\n[peerjstats]: https://peerj.com/collections/50-practicaldatascistats/\n[topcrossvotes]: https://stats.stackexchange.com/questions?sort=votes\n[ordokstate]: http://ordination.okstate.edu/\n[probcheat]: https://github.com/wzchen/probability_cheatsheet\n[wrongstats]: https://www.statisticsdonewrong.com/\n[statshacks]: https://speakerdeck.com/jakevdp/statistics-for-hackers\n[moderndive]: https://moderndive.com/\n[msmb]: https://web.stanford.edu/class/bios221/book/\n\n\n### R Friendly Resources\n\n- [Quick-R](http://www.statmethods.net/) - Quick reference to statistical methods using R.\n- [Statistical Inference via Data Science: A ModernDive into R and the Tidyverse (Second Edition)](https://moderndive.com/v2/) - Statistical analyses using modern tools in R and tidyverse.\n- [UCLA IDRE Statistics](http://www.ats.ucla.edu/stat/) - Examples of statistical analyses using R, SAS, SPSS, and Stata.\n- [r-statistics.so](http://r-statistics.co/) - Educational resource for machine learning and statistical computing in R.\n- [W2024 Applied Linear Regression Analysis](https://web.archive.org/web/20181222115225/http://www.stat.columbia.edu/~martin/W2024/W2024.html)\n\n\n### Python Friendly Resources\n\n- [Python for Data Analysis, 3E](https://wesmckinney.com/book/) by Wes McKinney\n- [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/) by Al Sweigart\n- [Think Python, 2nd Edition](https://greenteapress.com/wp/think-python-2e/) by Allen B. Downey\n- [Think Stats 2E](https://greenteapress.com/wp/think-stats-2e/) by Allen B. Downey\n\n\n### Specific Topics\n\n- [P-values, False Discovery Rate (FDR) and q-values](http://www.nonlinear.com/support/progenesis/comet/faq/v2.0/pq-values.aspx)\n- [FAQ: How do I interpret odds ratio in logistic regression?](http://www.ats.ucla.edu/stat/mult_pkg/faq/general/odds_ratio.htm)\n- [Standard error of the mean of a sample binomial distribution][stderrbinom]\n- [Common Probability Distributions: The Data Scientist's Crib\n  Sheet][cribsheet] - Data scientists have hundreds of probability\n  distributions from which to choose. Where to start?\n- [Choosing the correct statistical test in SAS, Stata, SPSS, and\n  R][chooseTest] - Table giving general guidelines on choosing statistical\n  tests.\n- [Warning Signs in Experimental Design and Interpretation][warnexp] - Nine\n  common warning signs in experimental design and nine common warning signs in\n  interpretation of experiments by Peter Norvig.\n- [Univariate Distribution Relationships](http://archive.today/2020.05.19-101733/http://www.math.wm.edu/~leemis/chart/UDR/UDR.html) - An interactive, flow chart diagram showing the relationships between variate univariate distributions.\n- [First Internet Gallery of Statistics Jokes](https://about.illinoisstate.edu/gcramsey/first-internet-gallery-of-statistics-jokes/)\n- [PLoS's Ten Simple Rules for Effective Statistical Practice][ploseffective]\n- [Common Statistical Pitfalls in Basic Science Research][commonpitfalls]\n- [Review of Probability Theory - Maleki and Do][cs229prob] (PDF)\n- [Effect Size FAQs][effectsize]\n- [Common Mistakes in Using Statistics - Spotting Them and Avoiding\n  Them][commonstats]\n- [Common Statistical Tests are Linear Models (Or: How to Teach Stats)][statslinmod]\n- [The Permutation Test - A Visual Explanation of Statistical Testing][permtest]\n- [Visualising Residuals][vizres] - Using R and ggplot2.\n- [Forecasting: Principles and Practice, 3rd Ed](https://otexts.com/fpp3/) by Rob J Hyndman and George Athanasopoulos\n\n[stderrbinom]: http://stats.stackexchange.com/a/221102/132399\n[cribsheet]: http://blog.cloudera.com/blog/2015/12/common-probability-distributions-the-data-scientists-crib-sheet\n[chooseTest]: http://stats.idre.ucla.edu/other/mult-pkg/whatstat/\n[warnexp]: http://norvig.com/experiment-design.html\n[ploseffective]: https://doi.org/10.1371/journal.pcbi.1004961\n[commonpitfalls]: https://doi.org/10.1161/JAHA.116.004142\n[cs229prob]: http://cs229.stanford.edu/section/cs229-prob.pdf\n[effectsize]: https://effectsizefaq.com/\n[commonstats]: https://web.ma.utexas.edu/users/mks/CommonMistakes2016/commonmistakeshome2016.html\n[statslinmod]: https://lindeloev.github.io/tests-as-linear/\n[permtest]: https://www.jwilber.me/permutationtest/\n[vizres]: https://web.archive.org/web/20190521215531/https://drsimonj.svbtle.com/visualising-residuals\n\n\n### Interactive Articles\n\n- [Interpreting Cohen's d effect size][cohensd]\n- [Interpreting Correlations][correlation]\n- [Interactive Machine Learning, Deep Learning and Statistics websites][interactivemldlstats]\n\n[cohensd]: http://rpsychologist.com/d3/cohend/\n[correlation]: http://rpsychologist.com/d3/correlation/\n[interactivemldlstats]: https://p.migdal.pl/interactive-machine-learning-list/\n\n\n### Data-related\n\n- [Tidyverse](https://www.tidyverse.org/) - Opinionated collection of R packages designed for\n  data science\n- [Tidymodels](https://www.tidymodels.org/) - Framework that is a collection of packages for\n  modeling and machine learning using tidyverse principles.\n- [Text Mining with R](https://www.tidytextmining.com/) - A tidy approach to performing text analysis\n  in R.\n- [Cross-industry standard process for data mining - Wikipedia](https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining) - Open standard for common processes used in data mining, which can be applied to data science analyses.\n- [The Limits of Data By C. Thi Nguyen](https://issues.org/limits-of-data-nguyen/) - Emphasizes the importance of understanding the context of your data and that it inherently has biases.\n- [Build a Career in Data Science](https://bestbook.cool/) by Emily Robinson and Jacquelie Nolis - A guide on landing your first data science job and being a valued senior employee, rather than on just the technical details of how regression works. The authors also have [an accompanying podcast](https://podcast.bestbook.cool/).\n- [ExcelDemy](https://www.exceldemy.com/) - Excel courses, tutorials, and templates.\n- [Excel Easy](https://www.excel-easy.com/) - Excel tutorials and tips on functions and more.\n\n\n## Datasets\n\n\u003e Biased toward real-world datasets that can be used to practice data science skills.\n\n- [Our World in Data](https://ourworldindata.org/) - Research and data to make progress against the world’s largest problems.\n- [#tidytuesday](https://github.com/rfordatascience/tidytuesday) - Weekly real-world datasets for the Data Science Learning Community to practice their data skills and data visualizations for collective learning.\n- [Federal Reserve Economic Data (FRED)](https://fred.stlouisfed.org/) - Download, graph, and track 825,000 US and international time series from 114 sources for economic data since 1991.\n- [Databases, Tables \u0026 Calculators from the U.S. Bureau of Labor Statistics](https://www.bls.gov/data/)\n\n\n## General mathematics\n\n\u003e Resources generally related to learning and understand mathematical foundations\n\n- [A Gentle Introduction to the Art of Mathematics](https://giam.southernct.edu/GIAM/GIAM.pdf) - Gentle introduction to basic mathematical notation, set theory, writing mathematical proofs, and mathematical thinking.\n\n\n## Linear Algebra\n\n\u003e Linear algebra is the branch of mathematics concerning vector spaces and\n\u003e linear mappings between such spaces.\n\u003e\n\u003e — [Wikipedia](https://en.wikipedia.org/wiki/Linear_algebra)\n\n- [Essence of Linear Algebra][essence] - Excellent, short overview of linear\n  algebra concepts that help develop intuition on the matter.\n- [MIT OCW 18.06SC Linear Algebra][linalgmit] - Taught by Gilbert Strang.\n- [Linear algebra explained in four pages][linbull] - Excerpt from the No\n  Bullshit Guide to Linear Algebra by Ivan Savov.\n- [S.O.S. Mathematics Matrix Algebra][sosmath]\n- [PCA, Eigenvectors, and Eigenvalues (Cross Validated)][eigenstats]\n- [The Matrix Reference Manual](http://archive.today/2012.12.23-020729/http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html) - Reference information about\n  linear algebra and the properties of real and complex matrices.\n- [Linear Algebra Review and Reference - Kolter and Do (PDF)][cs229lin]\n- [Immersive Linear Algebra][immersivelinalg]\n\n[essence]: https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab\n[linalgmit]: http://bit.ly/2cvRwMe\n[linbull]: https://minireference.com/static/tutorials/linear_algebra_in_4_pages.pdf\n[sosmath]: http://www.sosmath.com/matrix/matrix.html\n[eigenstats]: https://stats.stackexchange.com/a/140579/\n[cs229lin]: http://cs229.stanford.edu/section/cs229-linalg.pdf\n[immersivelinalg]: http://immersivemath.com/ila/index.html\n\n\n## Network Science\n\n\u003e Network science is an academic field which studies complex networks such as\n\u003e telecommunication networks, computer networks, biological networks, cognitive\n\u003e and semantic networks, and social networks, considering distinct elements or\n\u003e actors represented by *nodes* (or *vertices*) and the connections between the\n\u003e elements or actors as *links* (or *edges*).\n\u003e\n\u003e — [Wikipedia](https://en.wikipedia.org/wiki/Network_science)\n\n- [Network Science Book][netbook] - Online book with visualizations and\n  interactive tools about network science by Albert-László Barabási.\n- [Graph Theory by Sarada Herke][graphherke] - YouTube series on graph theory.\n- [Network Science][netsci] - Aggregate of all things network science resarch,\n  introductions, people, journals, conferences, datasets, etc.\n- [Handbook of Graphs and Networks in People Analytics][ona] - The second\n  volume in a series of technical textbooks for professionals working in\n  analytics\n- [Awesome Network Analysis][ana] - Curated awesome list of network analysis\n  resources\n\n[netbook]: http://barabasi.com/networksciencebook/\n[graphherke]: https://www.youtube.com/user/DrSaradaHerke/playlists?shelf_id=5\u0026view=50\u0026sort=dd\n[netsci]: http://www.network-science.org/\n[ona]: https://ona-book.org/\n[ana]: https://github.com/briatte/awesome-network-analysis\n\n\n## Algorithms and Data Structures\n\n\u003e In mathematics and computer science, an algorithm is a self-contained\n\u003e step-by-step set of operations to be performed.\n\u003e\n\u003e — [Wikipedia](https://en.wikipedia.org/wiki/Algorithm)\n\u003e\n\u003e In computer science, a data structure is a particular way of organizing\n\u003e and storing data in a computer so that it can be accessed and modified\n\u003e efficiently.\n\u003e\n\u003e — [Wikipedia](https://en.wikipedia.org/wiki/Data_structure)\n\n- [Bioinformatic Algorithms][bioalg] - Algorithm lectures by Phillip Compeau\n  and Pavel Pevzner.\n- [Algorithms for DNA Sequencing][benalg] - Ben Langmead's lectures algorithms\n  used in DNA sequencing.\n- [Rosalind][rosa] - Learn bioinformatics and programming through problem\n  solving.\n- [VisuAlgo][visualgo] - Visualizing data structures and algorithms through\n  animation.\n- [Discrete Mathematics: An Open Introduction](https://discrete.openmathbooks.org/dmoi4.html)\n\n[bioalg]: http://bioinformaticsalgorithms.com/videos.htm\n[benalg]: https://www.youtube.com/playlist?list=PL2mpR0RYFQsBiCWVJSvVAO3OJ2t7DzoHA\n[rosa]: http://rosalind.info/\n[visualgo]: https://visualgo.net/en\n\n\n## Programming\n\n\u003e Computer programming (often shortened to programming) is a process that leads\n\u003e from an original formulation of a computing problem to executable computer\n\u003e programs.\n\u003e\n\u003e — [Wikipedia](https://en.wikipedia.org/wiki/Computer_programming)\n\n- [DevDocs][devdocs] - API documentation browser.\n- [Hyperpolyglot][polyglot] - Commonly used features in programming languages\n  in side-by-side format.\n- [Learn X in Y Minutes][xiny] - Quick start to many programming languages, data\n  structures, and common tools.\n- [How to Report Bugs Effectively][reportbugs]\n- [Rosetta Code][rosetta] - Programming chrestomathy site.\n- [Cookbook for R][rcookbook] - Provide solutions to common tasks and problems\n  in analyzing data.\n- [OverAPI.com][openapi] - Collecting All Cheat Sheets..\n- [The Art of Comments][csscomment] - Essay on how to comment well.\n- [devhints.io][devhints] - Modest collection of cheatsheets.\n- [Teach Yourself Programming in Ten Years][norvigprog]\n- [Code Complete Book Review][codecomplete] - Detailed review and notes of book.\n- [The Pragmatic Programmer Quick Reference][pragmaticcode]\n- [Bash Pitfalls][bashpits] - Common errors that Bash programmers make, along\n  with [Bash FAQs][bashfaqs] and general [Bash Programming][bashprogramming].\n- [Tech Dev Guide][techguide] - By Google.\n- [How to C in 2016][c2016]\n- [explainshell][explainshell] - See help text that matches each argument.\n- [Teach Yourself Computer Science][teachyourselfcs]\n- [Competitive Programming Books][competecode]\n- [Comprehensive Python Cheatsheet][comppy]\n- [Practical Business Python][bizpy]\n- [Full Stack Python](https://www.fullstackpython.com/) - Build, deploy and operate Python apps.\n\n[devdocs]: https://devdocs.io/\n[polyglot]: http://hyperpolyglot.org/\n[xiny]: https://learnxinyminutes.com/\n[reportbugs]: http://www.chiark.greenend.org.uk/~sgtatham/bugs.html\n[rosetta]: https://rosettacode.org/wiki/Rosetta_Code\n[rcookbook]: http://www.cookbook-r.com/\n[openapi]: http://overapi.com/\n[csscomment]: https://css-tricks.com/the-art-of-comments/\n[devhints]: https://devhints.io/\n[norvigprog]: http://norvig.com/21-days.html\n[codecomplete]: http://codecourse.sourceforge.net/materials/Code-Complete-A-Detailed-Book-Review.html\n[pragmaticcode]: https://blog.codinghorror.com/a-pragmatic-quick-reference/\n[bashpits]: http://mywiki.wooledge.org/BashPitfalls\n[bashprogramming]: https://mywiki.wooledge.org/BashProgramming\n[bashfaqs]: https://mywiki.wooledge.org/BashFAQ\n[techguide]: https://techdevguide.withgoogle.com/\n[c2016]: https://matt.sh/howto-c\n[explainshell]: https://explainshell.com/\n[teachyourselfcs]: https://teachyourselfcs.com/\n[competecode]: https://cses.fi/book/index.html\n[comppy]: https://github.com/gto76/python-cheatsheet/\n[bizpy]: https://pbpython.com/\n\n\n## Structural Query Language (SQL)\n\n\u003e SQL is a domain-specific language used to manage data, especially in a relational database management system (RDBMS).\n\u003e \n\u003e — [Wikipedia](https://en.wikipedia.org/wiki/SQL)\n\n- [Select Star SQL](https://selectstarsql.com/) - Interactive online book with a non-toy dataset to learn SQL.\n- [SQLZoo](https://sqlzoo.net/wiki/SQL_Tutorial) - Tutorials learning SQL step-by-step by function.\n- [SQL Tutorial](https://www.sqltutorial.org/) - Quick access tutorials on SQL.\n\n\n## Statistical Methods and Machine Learning\n\n\u003e Machine learning is the subfield of computer science that \"gives computers\n\u003e the ability to learn without being explicitly programmed\".\n\u003e\n\u003e — [Wikipedia](https://en.wikipedia.org/wiki/Machine_learning)\n\n- [Naive Bayes Part 1][nb1] and [Naive Bayes Part 2][nb2]\n- [How to choose a predictive model after k-fold cross-validation?][cvFold]\n- [Parametric versus nonparametric bootstrap resampling][parNonparBootstrap]\n- [Feature engineering using R][featureR]\n- [How to Use t-SNE Effectively][tsne] - Interactive visualization to explore\n  how tSNE behaves in order to use it more effectively.\n- [Accurately Measuring Model Prediction Error][modelerr]\n- [Understanding the Bias-Variance Tradeoff][biasvariance]\n- [Random Forests][rfs] - Creator Leo Breiman's site on random forests.\n- [Google's Machine Learning Crash Course][googleml] - Learn TensorFlow APIs.\n- [Learning Math for Machine Learning by Vincent Chen][learnmath]\n- [Calculus Made Easy][easycalc] (PDF)\n- [Interpretable Machine Learning](https://christophm.github.io/interpretable-ml-book/)\n\n[nb1]: https://youtu.be/XcwH9JGfZOU\n[nb2]: https://youtu.be/k2diLn5Nqbs\n[cvFold]: http://stats.stackexchange.com/a/52277/132399\n[parNonparBootstrap]: http://stats.stackexchange.com/a/54855\n[featureR]: https://blogs.msdn.microsoft.com/microsoftrservertigerteam/2017/03/23/feature-engineering-using-r/\n[tsne]: http://distill.pub/2016/misread-tsne/\n[modelerr]: http://scott.fortmann-roe.com/docs/MeasuringError.html\n[biasvariance]: http://scott.fortmann-roe.com/docs/BiasVariance.html\n[rfs]: https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm\n[googleml]: https://developers.google.com/machine-learning/crash-course/\n[learnmath]: https://blog.ycombinator.com/learning-math-for-machine-learning/\n[easycalc]: http://djm.cc/library/Calculus_Made_Easy_Thompson.pdf\n\n\n## Computational Biology\n\n\u003e Computational biology involves the development and application of\n\u003e data-analytical and theoretical methods, mathematical modeling and\n\u003e computational simulation techniques to the study of biological, behavior, and\n\u003e social systems.\n\u003e\n\u003e — [Wikipedia](https://en.wikipedia.org/wiki/Computational_biology)\n\n- [RPKM measure is inconsistent among samples][rpkm]\n- [RPKM-TPM.r][rpkm-tpm.r] - R script to show RPKM vs TPM\n- [StatQuest: RPKM, FPKM and TPM][statquest]\n- [Why do we use the negative binomial distribution for analysing RNAseq\n  data?][negbionom]\n- [QCFail.com][qcfail] - Articles about common next-generation sequencing\n  problems\n- [Differences between DESeq/edgeR and CuffDiff in RNA-seq][deseq-edger-cuff]\n- [HarvardX Biomedical Data Science Open Online Training][harvardxbd2k]\n- [Question: Can someone please explain in simple terms how DESeq2\n  works?][biostardeseq2]\n- [RNA-seqlopedia][rnauo] - Overview of RNA-seq and choices for a successful\n  experiment.\n- [Theory Behind DESeq2][deseqtheory]\n\n[rpkm]: http://blog.nextgenetics.net/?e=51\n[rpkm-tpm.r]: https://gist.github.com/johnstantongeddes/6925426\n[statquest]: https://youtu.be/TTUrtCY2k-w\n[negbionom]: http://bridgeslab.sph.umich.edu/posts/why-do-we-use-the-negative-binomial-distribution-for-rnaseq\n[qcfail]: https://sequencing.qcfail.com/\n[deseq-edger-cuff]: http://seqanswers.com/forums/archive/index.php/t-10797.html\n[harvardxbd2k]: http://rafalab.github.io/pages/harvardx.html\n[biostardeseq2]: https://www.biostars.org/p/127756/#127941\n[rnauo]: https://rnaseq.uoregon.edu/\n[deseqtheory]: http://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#theory-behind-deseq2\n\n\n## Data Visualization and Making Figures\n\n\u003e Data visualization or data visualisation is viewed by many disciplines as a\n\u003e modern equivalent of visual communication. It involves the creation and study\n\u003e of the visual representation of data, meaning \"information that has been\n\u003e abstracted in some schematic form, including attributes or variables for the\n\u003e units of information\".\n\u003e\n\u003e — [Wikipedia](https://en.wikipedia.org/wiki/Data_visualization)\n\n- [A Compendium of Clean Graphs in R][cleangraphs]\n- [How to Create Publication-Quality Figures][qualityfigs]\n- [Make Better Figures Faster Using Illustrator][bitesizeillustrator]\n- [A Tour Through the Visualization Zoo][vizzoo]\n- [Adobe Illustrator for Scientists][ytillustrate] (YouTube playlist)\n- [WebGraphviz is Graphviz in the Browser][webgraphviz]\n- [Same Stats, Different Graphs: Generating Datasets with Varied Appearance\n  and Identical Statistics through Simulated Annealing][samestats]\n- [from Data to Viz][dattoviz] - Leads you to most appropriate graph for your\n  data.\n- [Beautiful plotting in R: A ggplot2 cheatsheet][zevrossggplot2]\n- [Effectively Using Matplotlib][effectivemat]\n- [Fundamentals of Data Visualization](https://clauswilke.com/dataviz/) by Claus O. Wilke\n- [Practical Typography][practicaltypo] by Matthew Butterick\n- [ditaa](https://github.com/stathissideris/ditaa) - Small command-line utility to convert diagrams using ASCII\n  art\n- [Asciiflow](http://asciiflow.com/) - GUI to easily create ASCII plain text diagrams\n- [Hand drawn feel to diagrams](https://sankhs.com/shakydraw/)\n- [10+ Guidelines for Better Tables in R (2020)](https://themockup.blog/posts/2020-09-04-10-table-rules-in-r/) - Notes on making better tables with accompanying R code\n- [The Design Philosophy of Great Tables (2024)](https://posit-dev.github.io/great-tables/blog/design-philosophy/) - Design philosophy behind the the [great-tables](https://github.com/posit-dev/great-tables) Python package to generate effect tables of data\n- [FriendsDontLetFriends](https://github.com/cxli233/FriendsDontLetFriends) - Opinionated essay about good and bad practices in data visualization with examples.\n- [Visual Analysis Best Practices: A Guidebook - Tableau](https://www.tableau.com/learn/whitepapers/tableau-visual-guidebook?signin=c6cf87638b3864d1c393ffafb79ae10c) - List of techniques to turn data visualizations from good to great\n\n[cleangraphs]: http://shinyapps.org/apps/RGraphCompendium/index.php\n[qualityfigs]: http://b.nanes.org/figures/\n[bitesizeillustrator]: https://bitesizebio.com/8113/make-better-figures-faster-using-illustrator/\n[vizzoo]: https://homes.cs.washington.edu/~jheer/files/zoo/\n[ytillustrate]: https://www.youtube.com/playlist?list=PLhKpKEPEAauYIsyjnIN2YXztNo7BrZVxQ\n[webgraphviz]: http://www.webgraphviz.com/\n[samestats]: https://www.autodeskresearch.com/publications/samestats\n[dattoviz]: https://www.data-to-viz.com/\n[zevrossggplot2]: https://web.archive.org/web/20220501085848/http://zevross.com/blog/2014/08/04/beautiful-plotting-in-r-a-ggplot2-cheatsheet-3/\n[effectivemat]: http://pbpython.com/effective-matplotlib.html\n[practicaltypo]: https://practicaltypography.com\n\n## Should-Read Data Science Papers\n\n\u003e Data science, computational biology, and bioinformatics papers to cover the breadth of their fields.\n\n- [applied-ml](https://github.com/eugeneyan/applied-ml) by Eugene Yan - Curated papers, articles, and blogs on data science and machine learning in production.\n- [List of important publications in data science - Wikipedia](https://en.wikipedia.org/wiki/List_of_important_publications_in_data_science)\n- [How to read a research paper](http://www.theexclusive.org/2017/11/read-a-paper.html) - One question to ask when reading papers.\n- [Zhang Lab Recommendations](http://zhanglab.ccmb.med.umich.edu/literature/)\n- [The Leek group guide to genomics papers](https://github.com/jtleek/genomicspapers)\n- [\"Foundations of Computational and Systems Biology\" Readings](https://ocw.mit.edu/courses/biology/7-91j-foundations-of-computational-and-systems-biology-spring-2014/readings/) - MIT OCW course readings.\n- [Question: What Are The Classic Papers In Bioinformatics?](https://www.biostars.org/p/3204/)\n- [Best Academic Papers About the Microbiome](http://www.richardsprague.com/note/2017/10/16/best-academic-papers-about-the-microbiome/)\n- [Staying Current in Bioinformatics \u0026 Genomics: 2017 Edition](https://gettinggeneticsdone.blogspot.com/2017/02/staying-current-in-bioinformatics-genomics-2017.html) by Stephen Turner\n- [RNA-Seq Analysis, Differential Gene Expression, and Functional Enrichment Analysis](http://diytranscriptomics.com/) (Recent removal of readings page, but course overall is valuable)\n\n\u003e General knowledge mapping and exploration tools\n\n- [Inciteful](https://inciteful.xyz/) - Tools to help you accelerate your research\n\n\n## Software Engineering\n\n\u003e Software engineering is the application of engineering to the development of\n\u003e software in a systematic method.\n\u003e\n\u003e — [Wikipedia](https://en.wikipedia.org/wiki/Software_engineering)\n\n- [\"The Guide to the Software Engineering Body of Knowledge\"][swebook]\n- [Software Engineering - Ian Sommerville][sommerville]\n- [Unix as IDE Series][unixide]\n- [Software Engineering Resources][spiresources] - Aggregation of over 1800\n  software engineering resources on various topics.\n- [Flowchart Symbols Explained][smartdraw]\n- [Write the Docs][writethedocs] - A global community of people who care\n  about documentation.\n- [Amazon Web Services - A Practical Guide][awspractical]\n- [Amazon Web Services in Plain English][awsenglish]\n- [Command-line Tools can be 235x Faster than your Hadoop Cluster](https://adamdrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html) - Simple but effective demonstration of using the right tool for the right amount of data.\n\n[swebook]: https://www.computer.org/web/swebok\n[sommerville]: http://iansommerville.com/software-engineering-book/\n[unixide]: https://sanctum.geek.nz/arabesque/series/unix-as-ide/\n[spiresources]: http://rspa.com/spi/\n[smartdraw]: https://www.smartdraw.com/flowchart/flowchart-symbols.htm\n[writethedocs]: http://www.writethedocs.org/\n[awspractical]: https://github.com/open-guides/og-aws\n[awsenglish]: https://www.expeditedssl.com/aws-in-plain-english\n\n\n## Reproducible Science\n\n\u003e Reproducibility is the ability to get the same research results using the\n\u003e raw data and computer programs provided by the researchers.\n\u003e\n\u003e — [Wikipedia](https://en.wikipedia.org/wiki/Reproducibility)\n\n- [A statistical definition for reproducibility and replicability](https://doi.org/10.1101/066803)\n- [`scifigure`: Visualize Reproducibility and Replicability in a Comparison of Scientific Studies](https://cran.r-project.org/package=scifigure) (R package)\n- [What should Researchers Expect When They Replicate Studies? A Statistical View of Replicability in Psychological Science](https://doi.org/10.1177/1745691616646366)\n- [A Guide to Reproducible Code in Ecology and Evolution](https://archive.org/details/bes-guide-reproducible-code/mode/2up) (PDF)\n- [Docker for Beginners](https://docker-curriculum.com/) - By Prakhar Srivastav.\n- [Riffomonas: Reproducible Research](https://www.youtube.com/playlist?list=PLmNrK_nkqBpL0d2E26TqPkmTAfelYKbQX) - By Patrick D. Schloss.\n\n\n## People Skills and Communication\n\n\u003e People skills are patterns of behavior and behavioral interactions.\n\u003e Among people, it is an umbrella term for skills under three related\n\u003e set of abilities: personal effectiveness, interaction skills, and\n\u003e intercession skills.\n\u003e\n\u003e — [Wikipedia](https://en.wikipedia.org/wiki/People_skills)\n\n- [How to ask good questions][questionevans] - By Julia Evans.\n- [How To Ask Questions The Smart Way][raymondquestions] - By Eric Raymond.\n- [Teaching Tech Together][teachingtech] - By Greg Wilson.\n- [(An Opionionated Talk) On Preparing Good Talks][jhalatalk] (PDF) - By Ranjit Jhala.\n- [CommKit][commkit] - By MIT's Department of Biological Engineering Communication\n  Fellows on successfull scientific communication.\n- [General Principles of Mathematical Communication][maacomm] - By Mathematical\n  Association of America.\n- [Community Tool Box][commtoolbox] - By University of Kansas.\n- [Speech-Words to Minutes][speechtime] - Estimate how many words are need for a given\n  timed speech.\n- [Novelist Cormac McCarthy's tips on how to write a great science paper][scicommmccarthy]\n  The Pulitzer prizewinner shares his advice for pleasing readers, editors and yourself.\n- [Science Writing: Guidelines and Guidelines][scicommzimmer] - Notes from Carl Zimmer\n  on writing about science, medicine, and the environment.\n- [Write the Paper First][writepaper] - Argues that \"writing now is a favor to yourself\"\n  and the benefits of clear writing for organizing thoughts early.\n\n[questionevans]: https://jvns.ca/blog/good-questions/\n[raymondquestions]: http://www.catb.org/~esr/faqs/smart-questions.html\n[teachingtech]: http://teachtogether.tech/\n[jhalatalk]: https://ranjitjhala.github.io/static/PLMW-talk-opinionated.pdf\n[commkit]: http://mitcommlab.mit.edu/be/use-the-commkit/\n[maacomm]: http://mathcomm.org/general-principles-of-communicating-math/\n[commtoolbox]: https://ctb.ku.edu/en\n[speechtime]: http://www.speechinminutes.com/\n[scicommmccarthy]: https://www.nature.com/articles/d41586-019-02918-5\n[scicommzimmer]: https://carlzimmer.com/science-writing-guidelines-and-guidance/\n[writepaper]: https://www.cs.jhu.edu/~jason/advice/write-the-paper-first.html\n\n\n## Other Lists\n\n\u003e Useful lists on their own that may intersect other topics above.\n\n- [Awesome Design Tools][adt]\n- [Fred's ImageMagick Scripts][imagemagickscripts]\n\n[adt]: https://github.com/LisaDziuba/Awesome-Design-Tools\n[imagemagickscripts]: http://www.fmwconcepts.com/imagemagick/index.php\n\n\n## License\n\n[![CC0](http://mirrors.creativecommons.org/presskit/buttons/88x31/svg/cc-zero.svg)](https://creativecommons.org/publicdomain/zero/1.0/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ferictleung%2Fdata-science-resources","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ferictleung%2Fdata-science-resources","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ferictleung%2Fdata-science-resources/lists"}