Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/microlinkhq/ua

A simple redis primitives to incr() and top() user agents

crawler redis user-agent user-agent-parser

Last synced: 12 Nov 2024

https://github.com/dangdungcntt/crawl-fb-v2

Simple script to detect email and phone from facebook comment.

crawler facebook

Last synced: 17 Nov 2024

https://github.com/pnguyen215/instagram-crawler

Instagram Crawler is a Python script to download posts from a specified Instagram account.

crawler crawling-python instagram instagram-crawler scraper scraping-python scraping-websites scrapper scrapy-crawler

Last synced: 12 Nov 2024

https://github.com/madis/flatcrawl

Clojure app for crawling apartment information from http://kv.ee

clojure crawler real-estate webapp

Last synced: 12 Nov 2024

https://github.com/captain-woof/zhi-zhu

Zhi-Zhu is a multithreaded spidering script that recursively searches base webpages and all urls appearing in it, for specific (regex) words.

crawler crawler-python crawling-python python3

Last synced: 31 Dec 2024

https://github.com/feliz-szk/berserk

Berserk: Crawler to increase web traffic(based on tor and privoxy)

anonymizer anonymous-proxy command-line-tool crawler linux privoxy python scraping-websites tor webtraffic-increaser

Last synced: 13 Nov 2024

https://github.com/afuntw/misc-crawler

some small crawler for specific website

crawler

Last synced: 13 Nov 2024

https://github.com/santhin/real-estate

Real estate crawler with ML on scraped data

crawler jupyter-notebook ml real-estate scrapy

Last synced: 24 Nov 2024

https://github.com/orsinium-labs/gpcc

Python library and CLI tool to fetch information from GCP Browser (https://gpc-browser.gs1.org/)

crawler gpc gs1

Last synced: 16 Nov 2024

https://github.com/zabuzard/wslotter

WSlotter is a Selenium driven tool for assigning to events on 'https://www.gruppe-w.de'.

bot crawler gruppe-w

Last synced: 13 Nov 2024

https://github.com/zabuzard/songcrawler

Crawles all song files available on 'http://downloads.khinsider.com/'. Creates a list of direct download links for all such songs, intended for use with JDownloader or similar.

command-line-tool crawler download-musics downloadmanager jdownloader multithreading song-files songs web-crawler

Last synced: 13 Nov 2024

https://github.com/wondervictor/spiderman

2017 Software Course Project

crawler distribute-crawler zhihu-crawler

Last synced: 16 Nov 2024

https://github.com/yassilah/nuxt-crawler

Automatic crawler & search for Nuxt SSG.

algolia crawler nuxt search ssg

Last synced: 16 Nov 2024

https://github.com/tanja-4732/od-get

A Rust tool for recursively crawling & downloading data from open directories

cli crawler open-directory open-directory-downloader rust

Last synced: 14 Nov 2024

https://github.com/karantyagi/web-crawler

BFS and DFS implementations for a wikipedia crawler

beautifulsoup crawler

Last synced: 13 Nov 2024

https://github.com/par7133/splash-bot-crawler

Splash Bot creates splash on the fly of your websites - GPL License 🔥

bot crawler gallery open-source opensource php splash

Last synced: 13 Nov 2024

https://github.com/weaming/simple-crawler

my simple crawler

crawler

Last synced: 13 Nov 2024

https://github.com/ryanchao2012/okbot

A conversation retrieval engine based on PTT corpus

chatbot crawler django ptt

Last synced: 13 Nov 2024

https://github.com/linux0hat/cpp-web-crawler

Explore the web.

cpp crawler sqlite3

Last synced: 13 Nov 2024

https://github.com/dean9703111/shopee_find_mac

用最快的速度找到便宜符合自己要求規格的mac

argparse crawler mac pip python python2 xlsxwriter

Last synced: 13 Nov 2024

https://github.com/tungct/golangcrawler

Crawler goroutine Golang

crawler go

Last synced: 14 Nov 2024

https://github.com/dean9703111/ithelp_total_count

計算 IT邦幫忙文章的瀏覽/Like/留言總數

crawler ithelp total-likes total-responses total-views

Last synced: 13 Nov 2024

https://github.com/dean9703111/humandesign_nodejs

用nodejs爬蟲工具將人類圖網頁上的資訊爬下來,再存到雲端的google excel

crawler googlesheetapi googlesheets nodejs

Last synced: 13 Nov 2024

https://github.com/marcbperez/python-webcrawler

Crawls HTML pages for prices and other pieces of data.

crawler docker gradle python

Last synced: 19 Nov 2024

https://github.com/ycrao/some-spider-code

some spider code 财经资讯以及基金股票外汇价格爬虫

crawler economics fin-eco-news finance forex fund-value spider stock-price

Last synced: 19 Nov 2024

https://github.com/zhaotianff/qzone

想起那天夕阳下的奔跑,那是我逝去的青春

crawler crawling-sites csharp qzone qzone-photos qzone-spider wpf

Last synced: 15 Nov 2024

https://github.com/developerjosh/gogo-crawler

The tool kit for making an anime website with a database full of anime

crawler crawler-js gogoanime gogoanime-api gogoanime-scraper

Last synced: 16 Nov 2024

https://github.com/rbkgh/dailytext-crawler

Crawl jw.org to retrieve daily text

crawler dailytext java jsoup jw

Last synced: 15 Nov 2024

https://github.com/jorgeparavicini/medalytik-python

Python crawlers for a job mediation firm

crawler python scrapy

Last synced: 07 Dec 2024

https://github.com/hctilg/taaghche-dl

Save books purchased from taaghche.com !

crawler downloader pillow-library python3 selenium taaghche

Last synced: 09 Jan 2025

https://github.com/naveenaidu/google-crawler

Google Crawler - Curates the search results

beautifulsoup crawler scraper

Last synced: 17 Nov 2024

https://github.com/henkman/crawlers

:squirrel: some crawlers and downloaders

crawler

Last synced: 15 Nov 2024

https://github.com/mc256/node-static-webpage-crawler

download entire website with its directory structure.

cache-server crawler nodejs static-site

Last synced: 24 Nov 2024

https://github.com/fnkr/gocrawl

Simple web crawler.

crawler http-client

Last synced: 30 Nov 2024

https://github.com/mmqnym/pyppeteer-use-case

Show how to do web crawl via pyppeteer

crawl crawler pyppeteer python

Last synced: 17 Nov 2024

https://github.com/krishpranav/gocralwer

A awsome crawler made in go

crawler

Last synced: 17 Nov 2024

https://github.com/mustafadalga/website-crawler

Hedef web sitesini tarayarak linklerini listeleyen bir web crawler scripti || A web crawler script that lists links by scanning the target website.

crawl crawler crawling-sites hacking hacking-tool web-crawler web-crawler-python web-crawling

Last synced: 17 Nov 2024

https://github.com/uranusx86/dcard-crawler-analyzer

get Dcard & Meteor forum content and analyze !

crawl crawler dcard nlp python word-cloud word-count word-frequency

Last synced: 20 Nov 2024

https://github.com/amirzenoozi/aparat-videos-dataset

Some Simple Information About Aparat Videos for DataScientists

aparat cli crawler data-science data-science-projects pandas python python3 sdk-python sqlite3 video

Last synced: 20 Nov 2024

https://github.com/tcc0lin/magiccrawler

Collect all kinds of interesting crawler scripts and tackle them against the anti-climbing method :bowtie::heavy_check_mark::heavy_check_mark::heavy_check_mark:

crawler python3 spider

Last synced: 17 Nov 2024

https://github.com/raphaelalmeidamartins/python-tech-news

Python data science project developed js at the end of Unit 35 (Computer Science Module) of the Trybe's Web Development course

crawler crawler-python data-science pytest python

Last synced: 17 Nov 2024

https://github.com/hoishing/selenium-crawler

a web crawler written in python, powered by Selenium and Tesseract OCR

crawler python selenium

Last synced: 17 Nov 2024

https://github.com/amirsorouri00/dsl-se

This is a MVP provided based on the "Search Engine And Data Mining" Course. The idea behind this project is the forked project which its link provided is

container crawler distributed-systems docker docker-compose elasticsearch pagerank search-engine

Last synced: 18 Nov 2024

https://github.com/enansari/guess-price-car

Car price estimation based on the information of a car sales site | final project of Maktabkhooneh | حدس قیمت خودرو با ماشین لرنینگ | پروژه نهایی مکتب‌خونه

crawler jadi machine-learning maktabkhoone maktabkhooneh python

Last synced: 09 Jan 2025

https://github.com/dhchenx/quick-crawler

A toolkit for quickly performing crawler functions

crawler crawler-python

Last synced: 01 Dec 2024

https://github.com/shgopher/retuo

A distributed crawler

crawler go

Last synced: 31 Dec 2024

https://github.com/lockblock-dev/crawlarr

Crawlarr is a fast web crawler built in Go. It searches for anchor tags in the HTML pages and follows links. It leverages concurrency to improve speed.

crawler golang

Last synced: 24 Nov 2024

https://github.com/eea/eea-crawler

EEA Crawler contains the tasks (DAGs) used by Apache Airflow to index content from various EEA-Eionet websites into a central Elasticsearch (aka content hub).

airflow-dags crawler elasticsearch etl-pipeline indexing

Last synced: 24 Nov 2024

https://github.com/sefinek/niedlascamu.pl-tracker

Śledzenie zmian na stronie niedlascamu.pl.

crawl crawler niedlascamu tracker tracking

Last synced: 07 Dec 2024

https://github.com/coghost/crawlers

crawlers in one

crawler python3 staticimg weibo

Last synced: 02 Jan 2025

https://github.com/mkfsn/chronos

A light cron-like container service - create cron job easily.

crawler cron cronjob golang

Last synced: 21 Nov 2024

https://github.com/srx-2000/swaiter

a programe to wait until the selenium element has loaded——selenium模拟器元素等待程序

crawler selenium selenium-python

Last synced: 21 Nov 2024

https://github.com/bingxyz/blackcat

使用telegram bot查詢黑貓物流

crawler nodejs telegram-bot

Last synced: 21 Nov 2024

https://github.com/zhs007/lottery-crawler

基于jarvis-task的爬虫,主要用来爬取lottery数据。

crawler jarvis-task

Last synced: 03 Jan 2025

https://github.com/elektrostudios/gamefaqs-platform-exclusive-games-scraper

Crawls exclusive video games released for the platforms specified on GameFAQs website to generate a table in Markdown format with the crawled titles.

console-app console-application crawler dotnet game gamefaqs games megadrive netframework nintendo ps3 ps4 ps5 scraper snes vbnet videogame videogames windows xbox

Last synced: 01 Dec 2024

https://github.com/redco/goose-phantom-environment

Environment for Goose parser which allows to run it in PhantomJS

crawler environment goose goose-parser nodejs parse parser phantomjs scraper

Last synced: 22 Dec 2024

https://github.com/loggerhead/dianping_crawler

基于 Scrapy (python 3.5) 的大众点评爬虫

crawler python-3-5

Last synced: 22 Dec 2024

https://github.com/basemax/jadi-net-blog

This Python script is used to extract posts from a WordPress blog (https://jadi.net/) and save them in HTML format. The script fetches the RSS feed, parses the posts, and saves each post as an individual HTML file.

blog-copier copier crawler crawler-python crawlers jadi-blog jadi-clone jadi-net-blog jadi-net-clone jadinet-blog py python python-crawler wordpress wp

Last synced: 22 Dec 2024

https://github.com/juangesino/gazette

A personal news aggregator application using Meteor.

crawler meteor meteorjs news news-aggregator news-feed scraper

Last synced: 22 Nov 2024

https://github.com/camilamaia/crawl4us

[WIP] A Python web crawler looking wildly for tables 🕵️‍♀️

beautifulsoup4 crawler crawling pypi python-3 python-module scraper scraping tables web-scraping

Last synced: 08 Dec 2024

https://github.com/openpj/manifoldcf-sdk

Apache ManifoldCF SDK is a Maven project focused on helping developers to extend ManifoldCF with new connectors and extensions

apache crawler docker ecm extensions integrations manifoldcf migration sdk search

Last synced: 25 Nov 2024

https://github.com/simonrichardson/crwlr

Crawl all the things!

crawler meshuggah

Last synced: 01 Dec 2024

https://github.com/citiususc/polypus

Polypus: a Big Data Self-Deployable Architecture for Microblogging Text Extraction and Real-Time Sentiment Analysis

analytics bigdata crawler scraper sentiment-analysis twitter

Last synced: 02 Dec 2024

https://github.com/maxgio92/package-crawler

A package crawler for most known Linux distros

crawler go linux package

Last synced: 26 Nov 2024

https://github.com/mouday/httpserver

用于爬虫请求头测试的简单服务器,使用Python + Flask

crawler flask python spider

Last synced: 26 Nov 2024

https://github.com/mouday/freeipproxy

通过抓取免费代理ip维护一个有效的proxy代理池

crawler proxy python spider

Last synced: 26 Nov 2024

https://github.com/adamfisher/scrapyrt.client

A C# client to make calls to a scrapyrt (Scrapy real-time) HTTP endpoint.

crawler scraper scrapy scrapy-crawler scrapy-framework scrapy-spider

Last synced: 26 Nov 2024

https://github.com/rayc2045/ghibli-crawler

Automatically download 1,178 studio Ghibli's work photos

axios crawler ghibli node node-js nodejs puppeteer rest-api restful restful-api

Last synced: 27 Nov 2024

https://github.com/fi1a/crawler

PHP crawler

crawler php

Last synced: 02 Dec 2024

https://github.com/opda0887/bahamut-crawler-to-gmail

發想:使用Python爬蟲取得巴哈姆特版面的最新論壇,並用gmail傳送這些訊息給自己。A thought: Use Python crawler to the latest forums in Bahamut, and use gmail to send these messages to myself.

crawler crawler-python

Last synced: 27 Nov 2024

https://github.com/deptno/nsdi

㉿ nsdi downloader built on puppeteer

crawler downloader nsdi openapi puppeteer

Last synced: 31 Dec 2024

https://github.com/altescy/mincrawler

A minimal web crawler.

configurable crawler python scraping

Last synced: 27 Nov 2024

https://github.com/jiamingla/mvdis_i18n

機車駕照預約考試多語友善版 Non-official

crawler jquery koa koajs nodejs supertest

Last synced: 28 Nov 2024

https://github.com/gozeon/weibo-crawler

微博爬虫

crawler web-crawler

Last synced: 28 Nov 2024

https://github.com/zephyrpersonal/github-trending-crawler

transform github-trending repos to json data

cheerio crawler fetch github node repository spider trending

Last synced: 28 Nov 2024

https://github.com/pxlrbt/website-diff

Utility tool that bundles a crawler and BackstopJS for visual regression testing.

backstopjs crawler visual-regression-testing

Last synced: 28 Nov 2024

https://github.com/victorhuu/amazonmovieintegration

本仓库是同济大学数据仓库的第一个个人作业——利用爬虫与ETL工具整理Amazon的电影数据

crawler data-warehouse movies pandas scrapy xpath

Last synced: 28 Nov 2024

https://github.com/aleclarson/recrawl

Filesystem crawler

crawler fs nodejs

Last synced: 09 Jan 2025

https://github.com/konradlinkowski/wikipediafinder

Find words in wikipage

crawler scraper wikipedia

Last synced: 28 Nov 2024

https://github.com/konradlinkowski/mailcrawler

Crawler to find emails in the websites

crawler scraper

Last synced: 28 Nov 2024

https://github.com/danielemoraschi/go-sitemap-common

Simple GO sitemap generator and crawler.

crawler golang sitemap sitemap-generator

Last synced: 31 Dec 2024

https://github.com/rxcai/python3-weibo-crawler

基于Python3实现的微博小爬虫

crawler python python3 spider weibo

Last synced: 28 Nov 2024

https://github.com/khilnani/spidey.py

Web spiders are usually disliked by websites, but useful for recursive API/page downloads for offline analysis.

cli crawler python scaper web-spider

Last synced: 02 Dec 2024

https://github.com/ccrashzer0/web_crawler

A python based web crawler

crawler internet python python3 webcrawler

Last synced: 28 Nov 2024

https://github.com/cryptoc1/earl

Earl is looking for URLs in your area.

crawler middleware nuget webscraping

Last synced: 28 Nov 2024

https://github.com/nelcifranmagalhaes/web_crawler

A web crawler for all Naruto characters

anime beautifulsoup characters crawler naruto python

Last synced: 03 Dec 2024

https://github.com/comigor/balances

Your checking and savings accounts balances on banks and brokers.

balance bank broker crawler node

Last synced: 09 Dec 2024

https://github.com/flavien-hugs/scrapy-test

Manipulation de la librairie Scrapy. Mini script permet d'extraire l'ensemble des personnages de dessin animé sur Wikipedia.

crawler python scraping scrapy

Last synced: 09 Dec 2024