{"id":15764648,"url":"https://github.com/soodoku/data-science","last_synced_at":"2026-01-11T09:40:00.819Z","repository":{"id":31613034,"uuid":"35178041","full_name":"soodoku/data-science","owner":"soodoku","description":"Lecture Slides for Introduction to Data Science","archived":false,"fork":false,"pushed_at":"2023-02-03T20:49:31.000Z","size":5169,"stargazers_count":25,"open_issues_count":0,"forks_count":7,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-10-11T12:18:15.791Z","etag":null,"topics":["data-science","statistical-learning"],"latest_commit_sha":null,"homepage":"","language":"TeX","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/soodoku.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"License.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-05-06T19:16:02.000Z","updated_at":"2024-08-11T19:07:18.000Z","dependencies_parsed_at":"2024-10-25T11:38:59.427Z","dependency_job_id":"a4d2e68f-5f5b-48c0-96e7-c6b4a64c45ea","html_url":"https://github.com/soodoku/data-science","commit_stats":{"total_commits":70,"total_committers":2,"mean_commits":35.0,"dds":"0.12857142857142856","last_synced_commit":"0af39968f028b6c507945ebdb38bea8771f49303"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soodoku%2Fdata-science","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soodoku%2Fdata-science/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soodoku%2Fdata-science/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soodoku%2Fdata-science/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/soodoku","download_url":"https://codeload.github.com/soodoku/data-science/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246451192,"owners_count":20779576,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","statistical-learning"],"created_at":"2024-10-04T12:04:19.921Z","updated_at":"2026-01-11T09:40:00.772Z","avatar_url":"https://github.com/soodoku.png","language":"TeX","funding_links":[],"categories":[],"sub_categories":[],"readme":"Data Science: Some Basics\n==========================\n\n 1. Introduction to Data Science ([presentation](ds1/ds1_present_web.pdf), [tex](ds1/ds1_web.tex))\n    * What can Big Data do for you? \n    * What is Big Data? \n    * Implications for Statistics and Computation \n    * What is Data Science? \n    * Prerequisites\n \n 2. Get your own (Big) Data ([presentation](ds2/ds2_present_web.pdf), [tex](ds2/ds2_web.tex))\n    * Scrape web pages and pdfs. ([Scripts](https://github.com/soodoku/python-workshop)) \n    * Image to Text ([Python Script using Tesseract](https://github.com/soodoku/image-to-text))\n    * Image to Text in R using the [Abbyy FineReader Cloud OCR](https://github.com/soodoku/abbyyR)\n    * Image to Text in R using the [Captricity API](https://github.com/soodoku/captr)\n    * Web Scraping/API Applications:\n      - [Get Data on Journalists](https://github.com/soodoku/get-journalist-data)\n      - [Get Weather Data](https://github.com/soodoku/get-weather-data)\n      - [Get Cricket Data](https://github.com/soodoku/get-cricket-data)\n      - [Get Congressional Speech Data](https://gist.github.com/soodoku/85d79275c5880f67b4cf)\n      - [Track FB Likes, Twitter Followers, Youtube Views](https://github.com/soodoku/likes-followers-views)\n      - [Track Civil Rights Coverage in NY Times using NYT API](https://github.com/soodoku/nyt-civil-rights)\n    * [Get Social Networking Data](https://github.com/pablobarbera/social-media-workshop)\n    * Regular Expressions\n    * Pre-process text data\n    * [Assignment](ds2/scraping_assignment_web.txt)\n   \n 3. Databases and SQL ([presentation](ds3/ds3_present_web.pdf), [tex](ds3/ds3_web.tex))\n    * What are databases? \n    * Relational Model\n    * Relational Algebra\n    * Basic SQL\n    * Views\n \n 4a. [Introduction to Introduction to Statistical Learning](https://github.com/soodoku/ds)\n \n 4b. Introduction to Statistical Learning ([presentation](ds4/ds4_present_web.pdf), [tex](ds4/ds4_web.tex))\n    * How to learn from data? \n    * Nearest Neighbors\n    * When you don't have good neighbors\n    * Assessing model fit\n    * Clarification about Big Data\n\n 5. Supervised Methods\n\n 6. Unsupervised Methods\n    * PCA, CA\n    * k-means ([presentation](ds6/kmeans.pdf), [tex](ds6/kmeans.tex))\n\n 7. Presenting Analyses\n    * [ggplot2 in brief](graphs/ggplot2.md)\n    * Examples of ggplot in action: \n      - NYT Civil Rights Coverage ([R code](https://github.com/soodoku/nyt-civil-rights/blob/master/plot.R), [Graph](https://github.com/soodoku/nyt-civil-rights/blob/master/nyt_aa.pdf))\n      - Military Experience of UK Prime Ministers ([R code](https://github.com/soodoku/military-experience/blob/master/mil_plots.R), [Graph](https://github.com/soodoku/military-experience/blob/master/ukmil.pdf))\n   - [Suggestions for writing](http://gbytes.gsood.com/on-writing/)\n\n 8. Some Applications\n    * From paper to digital ([presentation](app/PaperToDigital.pdf), [tex](app/PaperToDigital.tex))\n    * Text as Data\n      - [Sentiment Analysis](https://gist.github.com/soodoku/22e4cff2eb6a05be3c0d)\n      - [Model Relationship Between Words and Ideology](https://github.com/soodoku/speech-learn)\n      - [Basic Text Classifier](https://gist.github.com/soodoku/e34dbe0219b0f00a74d5)\n      \nSuggested Books\n--------------------\n\n[The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition](http://www.amazon.com/The-Elements-Statistical-Learning-Prediction/dp/0387848576)    \nBy Trevor Hastie, Robert Tibshirani, Jerome Friedman  \nISBN: 0387848576\n\n[Python Programming: An Introduction to Computer Science](http://www.amazon.com/Python-Programming-Introduction-Computer-Science/dp/1887902996)    \nBy John Zelle  \nISBN: 1590282418\n\n[ggplot2: Elegant Graphics for Data Analysis (Use R!)](http://www.amazon.com/ggplot2-Elegant-Graphics-Data-Analysis/dp/0387981403)    \nBy Hadley Wickham  \nISBN: 0387981403\n\nLicense\n--------------------\nReleased under the [Creative Commons License](License.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsoodoku%2Fdata-science","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsoodoku%2Fdata-science","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsoodoku%2Fdata-science/lists"}