Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/m-osource/cassiopeiabot
C++ multithread Linux Web Crawler
https://github.com/m-osource/cassiopeiabot
algorithm berkeleydb bot cassiopeia cplusplus crawler download engine hashing html-parser information-retrieval link-analysis multithread open-source regex search web web-crawler webcrawler www
Last synced: 15 days ago
JSON representation
C++ multithread Linux Web Crawler
- Host: GitHub
- URL: https://github.com/m-osource/cassiopeiabot
- Owner: m-osource
- Created: 2023-05-01T11:23:59.000Z (over 1 year ago)
- Default Branch: CassiopeiaBot
- Last Pushed: 2023-05-25T09:20:35.000Z (over 1 year ago)
- Last Synced: 2024-11-10T22:36:57.210Z (2 months ago)
- Topics: algorithm, berkeleydb, bot, cassiopeia, cplusplus, crawler, download, engine, hashing, html-parser, information-retrieval, link-analysis, multithread, open-source, regex, search, web, web-crawler, webcrawler, www
- Homepage:
- Size: 13.3 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Fork of the WIRE-Nic open source project which is itself a fork of the WIRE open source project (Web Information Retrieval Environment) that was developed by Center for Web Research from University of Chile.
More information about it can be found at http://www.cwr.cl/projects/WIRE/ and https://sourceforge.net/projects/wire-nic/.CassiopeiaBot was tested by downloading Sardinian web contents, organizing them internally with the primary objective of allowing the search engine to provide relevant results and a drastic reduction in latency times thanks also to the strategy adopted for their indexing.