https://github.com/srijanshetty/crawler
A bfs/dfs crawler to crawl website conference paper submission deadlines created in python
https://github.com/srijanshetty/crawler
Last synced: about 2 months ago
JSON representation
A bfs/dfs crawler to crawl website conference paper submission deadlines created in python
- Host: GitHub
- URL: https://github.com/srijanshetty/crawler
- Owner: srijanshetty
- Created: 2013-06-02T06:43:04.000Z (almost 12 years ago)
- Default Branch: master
- Last Pushed: 2013-06-03T05:01:00.000Z (almost 12 years ago)
- Last Synced: 2025-01-17T19:51:21.675Z (3 months ago)
- Language: Python
- Size: 125 KB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
#Intro
Final Submission for the crawler project for CS252#Contents
conf_crawl.py: the original dfs traversal version. (In hindsight, dfs is
a very bad idea for crawling.)
bfs_crawl.py: the latest version, does a bfs traversal after querying
from google.
blacklist.txt: Stores a list of urls which have to be ignored while
crawling#Usage
python bfs_crawl.py "Search Query" "Depth of crawl"
#Acknowledgement
I have use the git repo https://github.com/MarioVilas/google.git,
for implementing google search#To do:
Use Markdown for the README