https://github.com/dtaivpp/python-crawler
This is a python bot for crawling a website and doing link analysis
https://github.com/dtaivpp/python-crawler
Last synced: 4 days ago
JSON representation
This is a python bot for crawling a website and doing link analysis
- Host: GitHub
- URL: https://github.com/dtaivpp/python-crawler
- Owner: dtaivpp
- License: gpl-3.0
- Created: 2019-04-12T13:49:11.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2019-04-20T15:00:11.000Z (about 7 years ago)
- Last Synced: 2025-02-26T06:31:59.710Z (over 1 year ago)
- Language: Python
- Size: 30.3 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.MD
- License: LICENSE
Awesome Lists containing this project
README
# Python Crawler
The intent of this project is to experiment with a web crawler that visist a page, collects all the links, and then creates some metadata about the url.
# Process
The crawler starts in crawler.py where it is fed a base url. From the base url it will collect all the url's on the page and create entries for each of them in a links table. Then it will start to iterate over the linkes (only links with the same base url would be visited to limit the scope of this project). Th