Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/soodoku/data-science

Lecture Slides for Introduction to Data Science
https://github.com/soodoku/data-science

data-science statistical-learning

Last synced: 13 days ago
JSON representation

Lecture Slides for Introduction to Data Science

Awesome Lists containing this project

README

        

Data Science: Some Basics
==========================

1. Introduction to Data Science ([presentation](ds1/ds1_present_web.pdf), [tex](ds1/ds1_web.tex))
* What can Big Data do for you?
* What is Big Data?
* Implications for Statistics and Computation
* What is Data Science?
* Prerequisites

2. Get your own (Big) Data ([presentation](ds2/ds2_present_web.pdf), [tex](ds2/ds2_web.tex))
* Scrape web pages and pdfs. ([Scripts](https://github.com/soodoku/python-workshop))
* Image to Text ([Python Script using Tesseract](https://github.com/soodoku/image-to-text))
* Image to Text in R using the [Abbyy FineReader Cloud OCR](https://github.com/soodoku/abbyyR)
* Image to Text in R using the [Captricity API](https://github.com/soodoku/captr)
* Web Scraping/API Applications:
- [Get Data on Journalists](https://github.com/soodoku/get-journalist-data)
- [Get Weather Data](https://github.com/soodoku/get-weather-data)
- [Get Cricket Data](https://github.com/soodoku/get-cricket-data)
- [Get Congressional Speech Data](https://gist.github.com/soodoku/85d79275c5880f67b4cf)
- [Track FB Likes, Twitter Followers, Youtube Views](https://github.com/soodoku/likes-followers-views)
- [Track Civil Rights Coverage in NY Times using NYT API](https://github.com/soodoku/nyt-civil-rights)
* [Get Social Networking Data](https://github.com/pablobarbera/social-media-workshop)
* Regular Expressions
* Pre-process text data
* [Assignment](ds2/scraping_assignment_web.txt)

3. Databases and SQL ([presentation](ds3/ds3_present_web.pdf), [tex](ds3/ds3_web.tex))
* What are databases?
* Relational Model
* Relational Algebra
* Basic SQL
* Views

4a. [Introduction to Introduction to Statistical Learning](https://github.com/soodoku/ds)

4b. Introduction to Statistical Learning ([presentation](ds4/ds4_present_web.pdf), [tex](ds4/ds4_web.tex))
* How to learn from data?
* Nearest Neighbors
* When you don't have good neighbors
* Assessing model fit
* Clarification about Big Data

5. Supervised Methods

6. Unsupervised Methods
* PCA, CA
* k-means ([presentation](ds6/kmeans.pdf), [tex](ds6/kmeans.tex))

7. Presenting Analyses
* [ggplot2 in brief](graphs/ggplot2.md)
* Examples of ggplot in action:
- NYT Civil Rights Coverage ([R code](https://github.com/soodoku/nyt-civil-rights/blob/master/plot.R), [Graph](https://github.com/soodoku/nyt-civil-rights/blob/master/nyt_aa.pdf))
- Military Experience of UK Prime Ministers ([R code](https://github.com/soodoku/military-experience/blob/master/mil_plots.R), [Graph](https://github.com/soodoku/military-experience/blob/master/ukmil.pdf))
- [Suggestions for writing](http://gbytes.gsood.com/on-writing/)

8. Some Applications
* From paper to digital ([presentation](app/PaperToDigital.pdf), [tex](app/PaperToDigital.tex))
* Text as Data
- [Sentiment Analysis](https://gist.github.com/soodoku/22e4cff2eb6a05be3c0d)
- [Model Relationship Between Words and Ideology](https://github.com/soodoku/speech-learn)
- [Basic Text Classifier](https://gist.github.com/soodoku/e34dbe0219b0f00a74d5)

Suggested Books
--------------------

[The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition](http://www.amazon.com/The-Elements-Statistical-Learning-Prediction/dp/0387848576)
By Trevor Hastie, Robert Tibshirani, Jerome Friedman
ISBN: 0387848576

[Python Programming: An Introduction to Computer Science](http://www.amazon.com/Python-Programming-Introduction-Computer-Science/dp/1887902996)
By John Zelle
ISBN: 1590282418

[ggplot2: Elegant Graphics for Data Analysis (Use R!)](http://www.amazon.com/ggplot2-Elegant-Graphics-Data-Analysis/dp/0387981403)
By Hadley Wickham
ISBN: 0387981403

License
--------------------
Released under the [Creative Commons License](License.md).