Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/guillim/arachnida
App to scrap the web, for people without coding skills. Fully integrates WebCrawlers (Headless Chrome) and the interface to deal with it.
https://github.com/guillim/arachnida
crawler crawling framework headless-chrome javascipt meteor scraper scrapping
Last synced: about 19 hours ago
JSON representation
App to scrap the web, for people without coding skills. Fully integrates WebCrawlers (Headless Chrome) and the interface to deal with it.
- Host: GitHub
- URL: https://github.com/guillim/arachnida
- Owner: guillim
- License: mit
- Created: 2018-09-21T10:01:07.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2020-10-05T07:16:54.000Z (over 4 years ago)
- Last Synced: 2023-10-20T20:07:41.782Z (over 1 year ago)
- Topics: crawler, crawling, framework, headless-chrome, javascipt, meteor, scraper, scrapping
- Language: JavaScript
- Size: 296 KB
- Stars: 11
- Watchers: 4
- Forks: 12
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Arachnida : simple web interface to pilot crawlers (Under Construction)
=========Scrape the web easily -> no need to be a coding expert.
Arachnida is providing a simple web interface to pilot powerful crawlers (running Headless Chrome)# Install (2 seconds) #
open a terminal, and run:
```
git clone https://github.com/guillim/Arachnida.git arachnida && cd arachnida && meteor
```**Finished !**
# Use (1 minute) #
Now open google chrome (or any browser) and follow this link: http://localhost:3000
You will be able to add a crawler, configure it, and run it in seconds !
### 1. Create a crawler on the main page: ###
First give it a name, and leave the function empty (except if you know what you're doing)
![screenshot](https://ibin.co/4GSHblERpQfn.png)### 2. Configure your crawler: ###
This is the only moment when a bit of coding knowledge is helpful. In the main part, you need to write a JavaScript function that will be executed on every page scraped by the crawler.
For instance, to extract the title of each page, write:
```
return {
title: $('title').text(),
};
```
Yes, jquery is already set up. You simply need to provide the selectors (id, class...)![screenshot](https://ibin.co/4GSHWS9cgqUR.png)
### View the results: ###
![screenshot](https://ibin.co/4GSJEILx9T9s.png)## What's included ##
* See screenshot of your running crawler
* Manually add URL to be scraped, or upload a CSV
* Sign in / Sign up
* Account management: Profile Page, Username, Change password, Delete account...
* Admin for the webmaster: go to `/admin`
* Router
* MongoDB as database# Contribute #
I am looking for people to make pull requests to improve Arachnida. Please do it :)
TO DO:
1. Setup live queue of url to be scraped (ex: at the moment, you can't click straight on a link and scrape it)
2. Live Log from the server brought to the interface to help debugging
3. Results export functionality (CSV & Json)### Thanks ###
Boilerplate: yogiben.
HeadlessChrome layer: yujiosaka