Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/twknab/django_ajax_web_crawler
Web crawler which retrieves all links on any page. Python & Django-powered.
https://github.com/twknab/django_ajax_web_crawler
beautifulsoup4 crawler django-application
Last synced: 16 days ago
JSON representation
Web crawler which retrieves all links on any page. Python & Django-powered.
- Host: GitHub
- URL: https://github.com/twknab/django_ajax_web_crawler
- Owner: twknab
- Created: 2017-10-10T02:14:09.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2017-10-10T02:15:00.000Z (over 7 years ago)
- Last Synced: 2024-11-06T10:12:32.634Z (2 months ago)
- Topics: beautifulsoup4, crawler, django-application
- Language: Python
- Size: 362 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# AJAX Web Crawler
This program crawls any user entered URL and supplies the raw HTML and a list of all `hrefs` within the website provided. The data provided written to the DOM via AJAX.
## Technologies:
- Django (for MTV app)
- BeautifulSoup4 (for crawling)
- jQuery### Bugs:
+ Certain domains are not working (`sohumhealing.com`), and there may be an issue
with the secondary filtering of `hrefs`. Further testing with different URLs is required to pinpoint this issue.