https://github.com/dtaivpp/python-crawler

This is a python bot for crawling a website and doing link analysis
https://github.com/dtaivpp/python-crawler

Last synced: 4 days ago
JSON representation

This is a python bot for crawling a website and doing link analysis

Host: GitHub
URL: https://github.com/dtaivpp/python-crawler
Owner: dtaivpp
License: gpl-3.0
Created: 2019-04-12T13:49:11.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2019-04-20T15:00:11.000Z (about 7 years ago)
Last Synced: 2025-02-26T06:31:59.710Z (over 1 year ago)
Language: Python
Size: 30.3 KB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.MD
- License: LICENSE

Awesome Lists containing this project

README

# Python Crawler

The intent of this project is to experiment with a web crawler that visist a page, collects all the links, and then creates some metadata about the url.

# Process
The crawler starts in crawler.py where it is fed a base url. From the base url it will collect all the url's on the page and create entries for each of them in a links table. Then it will start to iterate over the linkes (only links with the same base url would be visited to limit the scope of this project). Th

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dtaivpp/python-crawler

Awesome Lists containing this project

README