Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/shanshanzhu/Data-Scrappers
This repo contains nodeJS and Python code used to scrap data from a collection of data sources.
https://github.com/shanshanzhu/Data-Scrappers
Last synced: 9 days ago
JSON representation
This repo contains nodeJS and Python code used to scrap data from a collection of data sources.
- Host: GitHub
- URL: https://github.com/shanshanzhu/Data-Scrappers
- Owner: shanshanzhu
- License: mit
- Created: 2013-12-11T22:36:49.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2013-12-12T00:59:14.000Z (almost 11 years ago)
- Last Synced: 2024-08-02T14:09:15.672Z (3 months ago)
- Language: JavaScript
- Homepage:
- Size: 145 KB
- Stars: 20
- Watchers: 3
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Data-Scrapper
=============This repo contains nodeJS and Python code used to scrap data from various data sources.
###USstock
The source stock files (.csv) can be downloaded in this link
http://cs.brown.edu/~pavlo/stocks/history.tar.gzThe code is for clean these files so that they are ready to be converted into Postgres database.
######stockCleanerMultipleFiles.py
convert multiple files. Please set your file location accordingly.
######stockcleanerOneFile.py
clean data for one single file.###sfGov
The datasource is:
https://data.sfgov.org/######removeLastCol_cleanTimestamp.py
this Python file remove the last column from the csv file: Map__Crime_Incidents_-_Previous_Three_Months.csv, downloaded from
https://data.sfgov.org/api/views/gxxq-x39z/rows.csv?accessType=DOWNLOAD
It also format the timestamp column as the same as the stock data time.######urlScrapper.js
This script download all the 150 csv files from sfgov automatically. Please read this blog post for detailed explanation:
http://shanshanzhu.com/2013/12/08/datsy3-how-do-i-scrape-data-from-data-sfgov-org/###helper
This folder contains several helper functions that can be used to transfer multiple csv files to postgres db set up in Microsoft Azure virtual machine.######cloudstorage.js
SetUp file for using azure blob######csvtopostgres.js
Import 1 csv file into PSQL table######csvtopostgresMultipleCSVMultiTables.js
Successfully import >8000 stock csv files into >8000 PSQL tables######dataDownloader.js
A helper function used in urlScrapper.js to download a single csv from 1 url.######phantomJSToGetPageImages.js
PhantomJS helper function to get screenshot of webpage.######psgrDataTypes.js
a helper function to automatically determine the PSQL datatype from input data.###factualGeopulse
http://www.factual.com/products/geopulse-context
NodeJS code to download geoPulase data from factual######factualNode_centerOfUS_SouthWest.js
Parameter starting from center of US, going SouthWest, with 0.05 gap between steps of longitude or latitude.
######factualNode_centerOfUStoNW.js
Parameter starting from center of US, going NorthWest, with 0.05 gap between steps of longitude or latitude.
######factualNodeCal_SF.js
Parameter covering San Francisco, with 0.05 gap between steps of longitude or latitude.