Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ritikesh/getting-and-cleaning-data-project
https://github.com/ritikesh/getting-and-cleaning-data-project
Last synced: 15 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/ritikesh/getting-and-cleaning-data-project
- Owner: ritikesh
- Created: 2014-06-22T14:07:33.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2014-06-22T15:42:08.000Z (over 10 years ago)
- Last Synced: 2024-10-21T21:18:41.142Z (27 days ago)
- Language: R
- Size: 212 KB
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Read Me
=======description
-----------This project is a part of the **Data Science specialization** track's course **Getting and Cleaning Data** on **Coursera** sponsored by **John Hopkins University**. The goal of this project is to collect a dataset from the internet and to clean it, make it more readable, i.e. in data science terms, make it tidy. The data was collected from the accelerometers from the Samsung Galaxy S smartphone for the purpose of activity recognition. A full description is available at the site where the data was obtained:
http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
Included in this repo are:
* README.md - description about this repository(this file)
* run_analysis.R - the main script from producing the tidy data set
* codebook.md - descibes the original dataset, the variables in the tidy data set and transformations used to obtain them
* tidy_data.txt - the final required clean data setrun_analysis.R
--------------
This is the R file that gets data from the internet and cleans it. This file:
- Checks if the dataset is already downloaded else downloads it. Then reads in the required files for cleaning and producing a tidy dataset
- Extracts the columns that are required for the project. i.e the mean and standard deviations only. The selection is done by using R's pattern matching function(grep1). This makes the dataset considerably smaller.
- Next the activities are renamed from numbers assigned to them from '1-6' to something more descriptive. Like '1' to 'WALKING'. This is done to enable easier understanding.
- Next the list of variables assigned are checked and renamed appropriately to follow global standards. This is done using R's subsitute function(gsub).
- The final stage is to calculate average(mean) for all the variables and write them to a new file called tidy_data.txt with descriptive names.