Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/steveoro/goggles_import
Data-Import suite and tools, rebuilt from scratch
https://github.com/steveoro/goggles_import
Last synced: about 1 month ago
JSON representation
Data-Import suite and tools, rebuilt from scratch
- Host: GitHub
- URL: https://github.com/steveoro/goggles_import
- Owner: steveoro
- License: mit
- Created: 2019-06-06T15:42:22.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-01-19T13:22:35.000Z (almost 2 years ago)
- Last Synced: 2024-10-07T18:11:30.104Z (about 1 month ago)
- Language: Ruby
- Size: 2.89 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Goggles -o^o-
## Data-Import tools suiteGoggles is a Rails application developed to manage and browse the results obtained
dynamically from any official Swimming competition. The app is designed also to handle
a lot more, as long as is related to Swimming.This Project covers the new data-import utility and is designed for internal usage only, together with the original `Admin` application.
Official framework Wiki, [here](https://github.com/steveoro/goggles_admin/wiki)
### Dependencies & setup:
- [Goggles Core, for data structures](https://github.com/steveoro/goggles_core)
- [Kiba, for ETL](https://github.com/thbar/kiba)
- Yarn, as main ES6 package manager, for installation see [Yarn package manager](https://yarnpkg.com/lang/en/docs/install/#debian-stable)
- NodeJS + other packages for the data crawler tool: just run a `rails yarn:install` and everything should be taken care of (after Yarn has been installed).### Internal custom Crawler:
Basic dependencies installation (from Rails app root):
```bash
> rails yarn:install
```Run (for example, the FIN crawler):
```bash
> cd crawler
> node fin-crawler.js
```This will expect a `list.csv` file containing the (editable) list of meeting URLs to be crawled.
The crawler will start looping on all URLs found in the `.csv` file, extracting data and will produce a `.json` file for each meeting result page crawled.
Each JSON file will be created in the current running directory (`crawler`) and have as its filename a semi-normalized meeting name with a prefixed unique code.
Data fields for the `list.csv` input file (comma separated):
`URL`,`date`,`isCancelled`,`name`,`place`,`meetingUrl`,`year`
File sample (1 line required header + 1 data line):
```
----8<----
URL,date,isCancelled,name,place,meetingUrl,year
https://www.federnuoto.it/home/master/circuito-supermaster/riepilogo-eventi.html,21/10,,Distanze speciali Lombardia,Brescia,https://www.federnuoto.it/home/master/circuito-supermaster/eventi-circuito-supermaster.html#/risultati/134168:distanze-speciali-master-lombardia.html,"2018"
----8<----
```The data-crawl resulting files should be moved by hand to the `data.new` folder before being processed by Kiba.