https://github.com/nielsdejong/southpark-wiki-scraper
https://github.com/nielsdejong/southpark-wiki-scraper
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/nielsdejong/southpark-wiki-scraper
- Owner: nielsdejong
- License: mit
- Created: 2019-10-16T10:54:13.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-10-25T14:31:11.000Z (over 5 years ago)
- Last Synced: 2024-12-18T01:23:38.586Z (6 months ago)
- Language: Python
- Size: 326 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## South Park Wiki Scraper
Scrapes the South Park Wiki and converts it into a format acceptable by Neo4j 3.5.
Accompanying blog post for this code can be found on my website.
### Dependencies
- Python 3.7 or later
- Inflect Engine (`pip install inflect`)
- BeautifulSoup4 HTML Parser (`pip install beautifulsoup4`)### How to run
1. Run `scraper.py` using Python 3.7.
2. Resulting output is written to the `/output/` folder. This may take about 15 minutes.
3. Import the nodes and relationships into Neo4j: \
`./bin/neo4j-admin import --nodes nodes.csv --relationships edges.csv`
4. You're done!