https://github.com/rudrakshi99/web_crawler

A Spider🕷 or search engine bot that downloads and indexes content from all over the Internet.
https://github.com/rudrakshi99/web_crawler

crawler python spider

Last synced: 11 months ago
JSON representation

A Spider🕷 or search engine bot that downloads and indexes content from all over the Internet.

Host: GitHub
URL: https://github.com/rudrakshi99/web_crawler
Owner: rudrakshi99
Created: 2020-08-05T15:04:45.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2020-08-05T15:20:15.000Z (almost 6 years ago)
Last Synced: 2025-04-07T03:41:20.238Z (about 1 year ago)
Topics: crawler, python, spider
Language: Python
Homepage:
Size: 7.81 KB
Stars: 2
Watchers: 2
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Web Crawler 🕸

A web crawler, **spider** 🕷 , or search engine bot downloads and indexes content from all over the Internet. The goal of such a bot is to learn what (almost) every webpage on the web is about, so that the information can be retrieved when it's needed. They're called "web crawlers" because crawling is the technical term for automatically accessing a website and obtaining data via a software program.

Web crawlers go by many names, including spiders, robots, and bots, and these descriptive names sum up what they do — they crawl across the World Wide Web to index pages for search engines.

Search engines don’t magically know what websites exist on the Internet. The programs have to crawl and index them before they can deliver the right pages for keywords and phrases, or the words people use to find a useful page.

# How does a web crawler work?

The Internet is constantly changing and expanding. Because it is not possible to know how many total webpages there are on the Internet, web crawler bots start from a seed, or a list of known URLs. They crawl the webpages at those URLs first. As they crawl those webpages, they will find hyperlinks to other URLs, and they add those to the list of pages to crawl next.

Search engines crawl or visit sites by passing between the links on pages. However, if you have a new website without links connecting your pages to others, you can ask search engines to crawl your site by submitting your URL on Google Search Console.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rudrakshi99/web_crawler

Awesome Lists containing this project

README