Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mattrichmo/csv-url-crawl
https://github.com/mattrichmo/csv-url-crawl
Last synced: about 9 hours ago
JSON representation
- Host: GitHub
- URL: https://github.com/mattrichmo/csv-url-crawl
- Owner: mattrichmo
- Created: 2024-01-04T21:15:17.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-01-04T23:36:09.000Z (about 1 year ago)
- Last Synced: 2024-01-06T01:08:48.676Z (about 1 year ago)
- Language: JavaScript
- Size: 903 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
CSV-URL-CRAWL
A node.js app to parse urls from a csv, then scrape each page - creating a new pobject for each page under each link, then recompiling that into a flattened csv data export for each link.
Primary purpose of this code was built to a client's specifications in an Upwork Prjoect.
#Important Variables
```
fileName = `links.csv` // set this to your csv file nameMax_Depth = 1 // This should be set to how deep you want to scrape each. 1 being only the main page
```
To Run
```
npm init```
THEN
```
node index.mjs
```saves 3 files for now.
allLinksData.csv: A concatentation of all page data scraped into 1 link object
parentLinkData.csv: just the parents data
chiildLinksData.csv: Just the child data© 2024 all rights reserved Matt Richmond