Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/steveoro/goggles_import

Data-Import suite and tools, rebuilt from scratch
https://github.com/steveoro/goggles_import

Last synced: about 1 month ago
JSON representation

Data-Import suite and tools, rebuilt from scratch

Host: GitHub
URL: https://github.com/steveoro/goggles_import
Owner: steveoro
License: mit
Created: 2019-06-06T15:42:22.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2023-01-19T13:22:35.000Z (almost 2 years ago)
Last Synced: 2024-10-07T18:11:30.104Z (about 1 month ago)
Language: Ruby
Size: 2.89 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 13
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Goggles -o^o-

## Data-Import tools suite

Goggles is a Rails application developed to manage and browse the results obtained

dynamically from any official Swimming competition. The app is designed also to handle

a lot more, as long as is related to Swimming.

This Project covers the new data-import utility and is designed for internal usage only, together with the original `Admin` application.

Official framework Wiki, [here](https://github.com/steveoro/goggles_admin/wiki)

### Dependencies & setup:

- [Goggles Core, for data structures](https://github.com/steveoro/goggles_core)

- [Kiba, for ETL](https://github.com/thbar/kiba)

- Yarn, as main ES6 package manager, for installation see [Yarn package manager](https://yarnpkg.com/lang/en/docs/install/#debian-stable)

- NodeJS + other packages for the data crawler tool: just run a `rails yarn:install` and everything should be taken care of (after Yarn has been installed).

### Internal custom Crawler:

Basic dependencies installation (from Rails app root):

```bash

> rails yarn:install

```

Run (for example, the FIN crawler):

```bash

> cd crawler

> node fin-crawler.js

```

This will expect a `list.csv` file containing the (editable) list of meeting URLs to be crawled.

The crawler will start looping on all URLs found in the `.csv` file, extracting data and will produce a `.json` file for each meeting result page crawled.

Each JSON file will be created in the current running directory (`crawler`) and have as its filename a semi-normalized meeting name with a prefixed unique code.

Data fields for the `list.csv` input file (comma separated):

    `URL`,`date`,`isCancelled`,`name`,`place`,`meetingUrl`,`year`

File sample (1 line required header + 1 data line):

```

----8<----

URL,date,isCancelled,name,place,meetingUrl,year

https://www.federnuoto.it/home/master/circuito-supermaster/riepilogo-eventi.html,21/10,,Distanze speciali Lombardia,Brescia,https://www.federnuoto.it/home/master/circuito-supermaster/eventi-circuito-supermaster.html#/risultati/134168:distanze-speciali-master-lombardia.html,"2018"

----8<----

```

The data-crawl resulting files should be moved by hand to the `data.new` folder before being processed by Kiba.