Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/twknab/django_ajax_web_crawler

Web crawler which retrieves all links on any page. Python & Django-powered.
https://github.com/twknab/django_ajax_web_crawler

beautifulsoup4 crawler django-application

Last synced: 16 days ago
JSON representation

Web crawler which retrieves all links on any page. Python & Django-powered.

Awesome Lists containing this project

README

        

# AJAX Web Crawler

This program crawls any user entered URL and supplies the raw HTML and a list of all `hrefs` within the website provided. The data provided written to the DOM via AJAX.

## Technologies:

- Django (for MTV app)
- BeautifulSoup4 (for crawling)
- jQuery

### Bugs:
+ Certain domains are not working (`sohumhealing.com`), and there may be an issue
with the secondary filtering of `hrefs`. Further testing with different URLs is required to pinpoint this issue.