https://github.com/foo290/www-net-graph
An undirected graph of world wide web
https://github.com/foo290/www-net-graph
Last synced: 6 months ago
JSON representation
An undirected graph of world wide web
- Host: GitHub
- URL: https://github.com/foo290/www-net-graph
- Owner: foo290
- Created: 2021-07-09T12:31:41.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2021-07-09T13:17:59.000Z (over 4 years ago)
- Last Synced: 2025-02-24T13:48:03.974Z (about 1 year ago)
- Language: HTML
- Size: 4.76 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# www-net-graph
Given a link, will draw all the connected web pages to that link and further more upto a threshold,
as one webpage can be connected to trillions of webpages.
The input of Href parser increases beyond the O(n!) because one webpage gives you at least 40 links, and those 40 will give 40 each and so on,
and because this is not some computational overhead but rather dependent on IO (for fetching the webpage),
this program uses threads to improve performance.
Where normal(without threading) can take upto 5-15 minutes to parse 1000 links, using threading this is achieved in less than 10 secs. (also dependent on your internet speed)
## Sparse
/Screenshot%20from%202021-07-09%2017-45-25.png)
## Dense
/Screenshot%20from%202021-07-09%2013-39-36.png)
### Example:
For given input ```https://twitter.com```
/twitter_net.png)