Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/ronierisonmaciel/crawler

Um crawler utilizando BeautifulSoup tem como objetivo extrair informações de sites de maneira eficiente e estruturada. BeautifulSoup é uma biblioteca Python que facilita a análise e extração de dados de páginas HTML e XML. O projeto permite coletar e organizar informações relevantes.

beautifulsoup4 crawler crawling python python3

Last synced: 30 Jan 2025

https://github.com/kbychkov/simplecrawler-app

The GUI for Simplecrawler

crawler simplecrawler spider

Last synced: 18 Jan 2025

https://github.com/beckkramer/puppeteer-traverse

Puppeteer utility to easily run a function you define per route on a set of routes.

crawler crawling nodejs puppeteer

Last synced: 19 Jan 2025

https://github.com/jplitza/urlsearch

Index typical webserver directory listings and then search for arbitrary terms

crawler search

Last synced: 24 Jan 2025

https://github.com/iamtonmoy0/sitemap-crawler

site map crawler with golang and goquery

crawler

Last synced: 05 Jan 2025

https://github.com/izh318/genie-music-artist-album-crawler

지니뮤직에 등록 되어 있는 특정 아티스트의 앨범 정보를 한 번에 크롤링 하는 Python Script 입니다.

crawler genie genie-music gui

Last synced: 28 Dec 2024

https://github.com/datamine/twitter-name-and-shame

Crawler to find Twitter accounts following more than a million users

crawler flask python python-2 twitter

Last synced: 19 Jan 2025

https://github.com/zigai/crawlwright

Web crawling framework powered by Playwright

crawler crawling playwright python scraping wrighter

Last synced: 02 Feb 2025

https://github.com/amirsorouri00/crawler

Page-Rank Public python2 projects whice have been turned into python3.

crawler page-rank python

Last synced: 19 Jan 2025

https://github.com/mattmoony/webcrawler.py

A very simple python webcrawler. This is just a fun little side project, which I used to gather some valuable experience with advanced Python- and Web techniques. 🐍

beautifulsoup crawler indexing mongodb multithreading pymongo python spider web webcrawler

Last synced: 19 Jan 2025

https://github.com/tryagi/firecrawl

Generated C# SDK based on official Firecrawl OpenAPI specification

ai crawler crawling dotnet firecrawl generated generator langchain langchain-dotnet net8 netframework netstandard openapi scrape scraping sdk

Last synced: 14 Oct 2024

https://github.com/d-w-arnold/local-news-data-collection

Web crawler for local news sites - Generates HTML files of each webpage visited and a list of links found on the webpage, as a TXT file 🌎

crawler data-collection python

Last synced: 14 Dec 2024

https://github.com/onetail/crawler-with-kafka-docker

homework to crawler and anaylsis

analysis crawler kafka-docker

Last synced: 24 Jan 2025

https://github.com/onetail/applenews

simple crawler

crawler simple

Last synced: 24 Jan 2025

https://github.com/grayhat12/grawler

A web based Crawler that takes two inputs(search item, number of sites to search)and curently displays Readable Content in Text Format but the Code can be modified to display the HTML code.

crawler scraping scraping-websites scrapper scrapy-crawler

Last synced: 01 Feb 2025

https://github.com/ri0n/unboxer

MP4 crawler and extractor

crawler extractor mp4 object-oriented-design qt

Last synced: 13 Jan 2025

https://github.com/indrasaputra/sulong

Simple application that crawls a specific fundraising website and notifies users if there is a new project

bot crawler go golang telegram telegram-bot

Last synced: 19 Jan 2025

https://github.com/tsaohucn/crawler_fb_page

This is crawler use selenium for facebook pages

crawler facebook-page rails ruby selenium

Last synced: 20 Jan 2025

https://github.com/ariefrahmansyah/crawler

Simple website crawler using Go programming language.

crawler go

Last synced: 01 Feb 2025

https://github.com/lilchen96/pokemon-crawler

Crawl JSON-formatted data for Pokémon, based on the PokeAPI.

crawler pokemon

Last synced: 19 Jan 2025

https://github.com/avsbharadwaj/web_crawler

A basic web crawler that prints out the links and description present on a website rescursively

crawler web

Last synced: 19 Jan 2025

https://github.com/sbstjn/tatort

Query information for upcoming Tatort shows

crawler node nodejs tatort

Last synced: 05 Jan 2025

https://github.com/hvtuananh/twitter_crawler

Daemon to call and get tweets from Twitter Public Stream API

crawler java streaming-api tweets twitter twitter-crawler

Last synced: 23 Oct 2024

https://github.com/triekai/review-radar

An intelligent tool that analyzes Google Maps reviews to detect potential fake reviews and suspicious patterns.

crawler google-maps nextjs openai react

Last synced: 24 Jan 2025

https://github.com/tsaohucn/crawler_fb_user_group

This is crawler use selenium for facebook user groups

crawler facebook-user-groups rails ruby

Last synced: 20 Jan 2025

https://github.com/ma-pony/playwright-spider-utils

Playwright Spider Utils is a utility library for engineers using the Playwright framework to build web crawlers. This project provides common web scraping functions, simplifying the process of crawler development and enhancing productivity.

crawl crawler playwright python scrapy selenium spider spiderman

Last synced: 09 Oct 2024

https://github.com/abx123/crawler

Simple lambda function to crawl daily web novel updates.

crawler firebase-database golang lambda-functions

Last synced: 02 Feb 2025

https://github.com/abx123/coronachan

Simple lambda function to crawl MKN twitter account for daily Malaysia COVID-19 updates.

crawler lambda-functions python

Last synced: 02 Feb 2025

https://github.com/usethisname1419/connectioncrawler

crawls a website and checks for connections

connection crawler http-headers reporting website-analyzer

Last synced: 26 Jan 2025

https://github.com/jayzhan211/python-crawler-startups

python crawler learning

crawler python

Last synced: 25 Jan 2025

https://github.com/aminehsan/datamining-divar.ir

Analyzing and Extracting Insights from Ads on 'divar.ir'

crawler data-mining data-science divar-ir scraping

Last synced: 31 Jan 2025

https://github.com/rafaelmoraes003/tech-news

Analysis and manipulation of news data from a technology website obtained through data scraping using Python.

crawler data-scraping https mongodb parsel pymongo python web-scraping

Last synced: 26 Jan 2025

https://github.com/jlenon7/sef_automation

📑 Crawler that automatically enrol in open vacancies in SEF website.

athenna crawler esm nodejs playwright portugal residence sef typescript

Last synced: 13 Dec 2024

https://github.com/frostming/daily-wallpaper

A small crawler to get wallpapers from Unsplash

crawler python requests unsplash wallpaper

Last synced: 25 Jan 2025

https://github.com/viko16/hatcher

🐣[WIP] Provides APIs by simple configuration.

api api-server cli crawler koa-middleware nodejs spider

Last synced: 26 Jan 2025

https://github.com/gxjansen/website-to-pdf

Creates a PDF based on the content of a website/subomain

claude-3-sonnet crawler python3

Last synced: 05 Feb 2025

https://github.com/istador/mediaindexer

Software for a cronjob to crawl the ViMP media center and generate an index for it as a static website.

crawler website

Last synced: 22 Jan 2025

https://github.com/jpleorx/tagblender

A simple java API to retrieve hashtags from https://www.tagblender.net/

api crawler hashtags java jsoup parser

Last synced: 25 Jan 2025

https://github.com/ark930/douban-movie-crawler

豆瓣影评爬虫

crawler douban movie python

Last synced: 24 Jan 2025

https://github.com/pourmand1376/crawler

Simple Crawler, Indexer and Search Engine Web Application

crawler csharp csharp-code dotnet mvc

Last synced: 14 Jan 2025

https://github.com/rayspock/go-web-crawler

A web crawler to fetch all the links from a given website via go routines.

concurrency crawler golang goroutine

Last synced: 14 Jan 2025

https://github.com/arman2409/datafalcon

Web crawler

crawler extract-data

Last synced: 15 Dec 2024

https://github.com/amazingcoderpro/pythonup

玩转Python!for improving python skills

crawler python

Last synced: 28 Jan 2025

https://github.com/nextlevelshit/node-crawl

Webcrawler for nodejs

crawl crawler javascript nodejs

Last synced: 20 Jan 2025

https://github.com/eivindarvesen/naive-spider

A minimal web crawler

crawler python spider

Last synced: 17 Jan 2025

https://github.com/zenoyang/webcrawler

一些爬虫代码

crawler scrapy spider web-crawler

Last synced: 17 Jan 2025

https://github.com/lucasfogliarini/minhaentradacrawler.consoleapp

Web crawler em C# que usa a biblioteca AngleSharp para extrair detalhes de eventos do site "https://minhaentrada.com.br". Ele analisa o HTML da página e recupera informações como título, data, local e links dos eventos.

anglesharp crawler minhaentrada

Last synced: 31 Dec 2024

https://github.com/sedrubal/webcrawler

Crawl sites and search for security issues.

crawler script security website-auditing

Last synced: 24 Jan 2025

https://github.com/kaymen99/imdb-scraper

IMDB scraper allows to collect movies and tv shows data from the imdb website

crawler python scraper scraping scrapy

Last synced: 22 Jan 2025

https://github.com/leegeunhyeok/python-gongucrawler

파이썬3 공유마당 이미지 및 상세정보 크롤러

crawler python

Last synced: 22 Dec 2024

https://github.com/mstephen19/apify-click-events

Like TypeScript, but for clicking ;) Manage automated clicks, and ensure your Apify web-crawler is only clicking exactly what you allow it to

apify apify-sdk crawler scraper web-automation

Last synced: 04 Feb 2025

https://github.com/mahdijamebozorg/cryptonewscrawler

An end-to-end AI pipeline that performs technical and fundamental analysis of different cryptocurrencies.

crawler crypto cryptocurrency data-mining datamining information-retrieval llm python

Last synced: 16 Jan 2025

https://github.com/phanletrunghieu/webcrawler

A web crawler with Spring MVC

crawler java servlet spring-mvc springframework

Last synced: 28 Jan 2025

https://github.com/jyasskin/pbot-crawler

Crawler for PBOT's website to show what has changed.

crawler

Last synced: 28 Jan 2025

https://github.com/sssshefer/web-crawler-http

Basic web crawler which represents the linking structure of the website

crawler jest jest-tests js

Last synced: 12 Jan 2025

https://github.com/serge45/pytwgasprices

APIs to fetch the latest Taiwan gas prices

crawler gas price python taiwan

Last synced: 14 Jan 2025

https://github.com/nagilum/focus

Simple CLI tool, written in C#, to crawl a site and log the responses.

cli crawl crawler csharp playwright

Last synced: 16 Jan 2025

https://github.com/Kissaki/website-downloader

A website Crawler and downloader. Useful for archiving dynamic websites as static files.

archive crawler csharp download gpl website

Last synced: 23 Oct 2024

https://github.com/ilovebacteria/digikala-api

This python package requests to Digikala API and gets a product detail.

crawler digikala pypi

Last synced: 14 Nov 2024

https://github.com/bingxyz/btcethcrawler

telegram 比特幣、乙太幣廣播頻道

bash bash-script crawler telegram-bot

Last synced: 22 Jan 2025

https://github.com/kimi0230/pstocks

Python 爬股市

crawler numpy pandas python python3 stocks

Last synced: 16 Jan 2025

https://github.com/vaibhavyadav-dev/codeforces-problemset-scrapper

Web Scrapper that scrap the whole problemset of Codeforces into csv or json file.

codeforces competative competative-programming crawler problemset programming python scrapy-crawler scrapy-spider

Last synced: 16 Jan 2025

https://github.com/brnrajoriya/india-s-states-and-cities-crawler

Crawler to crawl india's all states and cities

cities crawler india php script states

Last synced: 16 Jan 2025

https://github.com/erickj3/strike-api

this is a web scraping api with nestsj

api crawler flow nestjs scraping typescript

Last synced: 24 Jan 2025

https://github.com/aristotelesbr/api_quotes

Project test for job.

crawler mongodb rails5

Last synced: 17 Jan 2025

https://github.com/thiiagoms/car-stealth

REST API to all cars that were stolen

api cars crawler student

Last synced: 16 Jan 2025

https://github.com/bruce-lee-ly/crawler

Several fun crawler cases implemented in Python.

crawler python

Last synced: 16 Jan 2025

https://github.com/manikantasanjay/stackoverflow_tag_generator_webcrawler

StackOverFlow Tag Generator Using a WebCrawler.

crawler python

Last synced: 22 Dec 2024

https://github.com/brianbruggeman/vax

A vaccination signup tool

covid-19 crawler signup vaccination

Last synced: 16 Jan 2025

https://github.com/notreeceharris/webstalker

🕸 A Powerful Relational Web Crawler

crawler datamining webcrawler

Last synced: 14 Jan 2025

https://github.com/frobware/grawler

Web Crawler

crawler go

Last synced: 16 Jan 2025

https://github.com/mustafadalga/website-crawler

Hedef web sitesini tarayarak linklerini listeleyen bir web crawler scripti || A web crawler script that lists links by scanning the target website.

crawl crawler crawling-sites hacking hacking-tool web-crawler web-crawler-python web-crawling

Last synced: 18 Jan 2025

https://github.com/tssujt/async-crawler-sample

A simple crawler sample based on asyncio~

aiohttp asyncio crawler

Last synced: 22 Jan 2025

https://github.com/guilhem/cachanais

Populate cache by crawling pages

cache crawler hacktoberfest

Last synced: 22 Dec 2024

https://github.com/mg98/ipfs-replicate

Replicate IPFS' distributed data structure locally, based on network traces.

crawler dag ipfs redisgraph scraper

Last synced: 29 Jan 2025

https://github.com/ymdarake/otenki-crawler

Yet another weather data scraper.

crawler weather weather-data

Last synced: 16 Jan 2025

https://github.com/shentengtu/cht-yp-crawler

Simple Crawler of www.iyp.com.tw.

crawler node-js nodejs yellow-pages yellowpages

Last synced: 11 Jan 2025

https://github.com/kartikmehta8/pycrawler

PyCrawler is a web scraper that takes a link as input and returns all the links connected to the page(s). Goes beyond recursion. Threaded.

crawler cybersecurity python

Last synced: 16 Jan 2025

https://github.com/mawkler/go-web-crawler

Toy web server written in Go

crawler go

Last synced: 31 Jan 2025

https://github.com/pvital/cra-cra

Another web crawler

crawler python

Last synced: 23 Jan 2025

https://github.com/pinpox/go-random-downloader

Download Html using "Random Page"

crawler golang

Last synced: 28 Jan 2025

https://github.com/jeanluc162/prnt-sc-crawler

Crawler for the Website prnt.sc

crawler net5 net50 prntsc screenshots

Last synced: 16 Jan 2025

https://github.com/kodemartin/webcrawler

A simple webcrawler

crawler rust

Last synced: 25 Jan 2025

https://github.com/not-raspberry/aio_crawler

AIO single website crawler

asyncio crawler python3

Last synced: 29 Jan 2025

https://github.com/zahraarshia/cti_crawl

This cyber threat intelligence crawler can be used to gather information from various sources, including open-source and commercial feeds.

crawler cti cyber-news-bot cyber-threat-intelligence mongodb python scrapy sqlite3 web-scraper

Last synced: 09 Jan 2025

https://github.com/joyceannie/moviespider

This project is used to crawl movie data from IMDb. Scrapy framework is used to extract relevant information like movie title, datePublished, summary, genres, director etc.

crawler datascience python scrapy spider webscraper

Last synced: 29 Jan 2025