https://github.com/wagoodman/simplewebcrawler

A simple java web crawler to visualize relations between web pages.
https://github.com/wagoodman/simplewebcrawler

Last synced: 7 months ago
JSON representation

A simple java web crawler to visualize relations between web pages.

Host: GitHub
URL: https://github.com/wagoodman/simplewebcrawler
Owner: wagoodman
Created: 2014-08-30T19:04:09.000Z (about 11 years ago)
Default Branch: master
Last Pushed: 2014-08-30T19:04:15.000Z (about 11 years ago)
Last Synced: 2025-01-23T16:23:18.229Z (9 months ago)
Language: Java
Size: 129 KB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          
Simple Web Crawler

==================

Given a web page this just finds all links in that page and visits them. Be careful though! This 

blindly visits as many pages as possible to any link found.

```

Compiler Compliance Level: 	1.6 (Oracle JDK 1.6)

IDE Used: 			Eclipse Kepler for Java Developers (Build 20130614-0229)

```

Import the Project

------------------

	1) Open Eclipse

	2) File > New > Java Project

	3) Unslelect 'Use Default Location' and input the path to the SimpleWebCrawler directory

	4) Select all projects from the detected project list

	5) Click Finish

Usage

-----

	1) In Eclipse, right click "CrawlerApp.java" > RunAs... > Java Application

	2) Input the following items:

		- Search Mechanism: BreadthFirst, BestFirst, DepthFirst

		- Seed URL

		- Threads to use

		- Maximum depth to search to

		- Maximum URL count to search to

	3) Click the button to start

A tree will form in the first tab showing all URL relations; you can drag and zoom in/out.

A table and chart will be shown in the second tab with the details about each URL.

Progress is shown on the bottom bar with the search status in the bottom right corner.

To start another search, close the program and reopen it.

Note: 	when using multiple threads the group will need to discover existing nodes, give the

	program about 15 seconds to get all threads in sync.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/wagoodman/simplewebcrawler

Awesome Lists containing this project

README