Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mengyaohuang/data-manipulation-and-analysis
Data processing implementation with tools in Python
https://github.com/mengyaohuang/data-manipulation-and-analysis
data-analysis nlp-machine-learning pandas-dataframe python
Last synced: about 2 months ago
JSON representation
Data processing implementation with tools in Python
- Host: GitHub
- URL: https://github.com/mengyaohuang/data-manipulation-and-analysis
- Owner: MengyaoHuang
- Created: 2019-03-21T17:33:51.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2019-03-24T21:25:18.000Z (almost 6 years ago)
- Last Synced: 2024-09-14T17:35:29.629Z (4 months ago)
- Topics: data-analysis, nlp-machine-learning, pandas-dataframe, python
- Language: Jupyter Notebook
- Homepage:
- Size: 19.5 MB
- Stars: 1
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Manipulation and Analysis
This part works on data harvesting, processing, aggregation, and analysis in Python jupyter notebook.
## Introduction
- Data analysis is crucial to evaluating and designing solutions and applications, as well as understanding user's information needs and use. In many cases the data we need to access is distributed online among many webpages, stored in a database, or available in a large text file. Often these data (e.g. web server logs) are too large to obtain and/or process manually.
- We need an automated way of gathering data, parsing it, and summarizing it before more advanced analysis.
- Topics would contain techniques of exploratory data analysis, using scripting, text parsing, structured query language, regular expressions, graphing, and clustering methods to explore data.## Guideline
1. [Get Started](https://github.com/MengyaoHuang/Data-Manipulation-and-Analysis/blob/master/Getting_Started.ipynb)
2. [Basic Data Manipulation](https://github.com/MengyaoHuang/Data-Manipulation-and-Analysis/blob/master/Basic%20Data%20Manipulation.ipynb)
3. [Univariate Statistics](https://github.com/MengyaoHuang/Data-Manipulation-and-Analysis/blob/master/Univariate%20Statistics.ipynb)
4. [pandas operations](https://github.com/MengyaoHuang/Data-Manipulation-and-Analysis/blob/master/pandas%20operations.ipynb)
5. [Visualization, Correlation, and Linear Models1](https://github.com/MengyaoHuang/Data-Manipulation-and-Analysis/blob/master/Visualization%2C%20Correlation%2C%20and%20Linear%20Models1.ipynb)
6. [Visualization, Correlation, and Linear Models2-case based](https://github.com/MengyaoHuang/Data-Manipulation-and-Analysis/blob/master/Visualization%2C%20Correlation%2C%20and%20Linear%20Models2-case%20based.ipynb)
7. [Pivoting, contingency tables, crosstabs, mosaic plots and chi-squared](https://github.com/MengyaoHuang/Data-Manipulation-and-Analysis/blob/master/Pivoting%2C%20contingency%20tables%2C%20crosstabs%2C%20mosaic%20plots%20and%20chi-squared.ipynb)
8. [Natural Language Processing Introduction](https://github.com/MengyaoHuang/Data-Manipulation-and-Analysis/blob/master/Natural%20Language%20Processing%20Introduction.ipynb)
9. [Natural Language Processing for Project Gutenberg](https://github.com/MengyaoHuang/Data-Manipulation-and-Analysis/blob/master/Natural%20Language%20Processing%20for%20Project%20Gutenberg.ipynb)
10. [Clustering for handwriting and document](https://github.com/MengyaoHuang/Data-Manipulation-and-Analysis/blob/master/Clustering%20for%20handwriting%20and%20document.ipynb)
11. [Clustering for music preference and Vector Quantization](https://github.com/MengyaoHuang/Data-Manipulation-and-Analysis/blob/master/Clustering%20for%20music%20preference%20and%20Vector%20Quantization.ipynb)
12. [Classification](https://github.com/MengyaoHuang/Data-Manipulation-and-Analysis/blob/master/Classification.ipynb)
13. [Dimensionality Reduction Notes](https://github.com/MengyaoHuang/Data-Manipulation-and-Analysis/blob/master/Dimensionality%20Reduction%20Notes.pdf)
14. [Dimension_Reduction Implementation](https://github.com/MengyaoHuang/Data-Manipulation-and-Analysis/edit/master/README.md)
15. [Dimension Reduction for gene expression dataset](https://github.com/MengyaoHuang/Data-Manipulation-and-Analysis/blob/master/Dimension%20Reduction%20for%20gene%20expression%20dataset.ipynb)Appendix: [Some data ready to use](https://github.com/MengyaoHuang/Data-Manipulation-and-Analysis/tree/master/Some%20data%20ready%20to%20use)