Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/diogok/naive-crawler
Very naive crawler implemented in clojure
https://github.com/diogok/naive-crawler
Last synced: about 2 months ago
JSON representation
Very naive crawler implemented in clojure
- Host: GitHub
- URL: https://github.com/diogok/naive-crawler
- Owner: diogok
- Created: 2010-12-25T15:08:03.000Z (about 14 years ago)
- Default Branch: master
- Last Pushed: 2010-12-25T15:14:28.000Z (about 14 years ago)
- Last Synced: 2023-04-13T15:11:47.963Z (over 1 year ago)
- Language: Clojure
- Homepage:
- Size: 89.8 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Naive Crawler
A very naive crawler written a while ago in clojure, it takes an initial url and save this page content and every link found on it of the same domain recursive style and saves it to disk, until there is nothing left... not even memory ;)
For real web crawler take a look at [Bixo](http://github.com/bixo/bixo/), that uses java and hadoop, or if you intent to build your own a nice option is to use [clj-sys/works](https://github.com/clj-sys/work) and maybe [neo4j](http://neo4j.org).