https://github.com/maxvalue/kem-contacts

Get all metadata about the Klima- und Energieregionen from https://www.klimaundenergiemodellregionen.at/modellregionen/liste-der-regionen/
https://github.com/maxvalue/kem-contacts

Last synced: 10 months ago
JSON representation

Get all metadata about the Klima- und Energieregionen from https://www.klimaundenergiemodellregionen.at/modellregionen/liste-der-regionen/

Host: GitHub
URL: https://github.com/maxvalue/kem-contacts
Owner: MaxValue
License: mit
Created: 2019-09-02T13:34:19.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2024-05-14T22:16:06.000Z (about 2 years ago)
Last Synced: 2025-03-11T02:47:58.143Z (about 1 year ago)
Language: Python
Size: 89.8 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

          # Web Crawler - KEM Contacts

Get all metadata about the Klima- und Energieregionen from https://www.klimaundenergiemodellregionen.at/modellregionen/liste-der-regionen/

## Contents

* [Getting Started](#getting-started)

*    [Prerequisites](#prerequisites)

* [Deployment](#deployment)

* [Built With](#built-with)

* [Contributing](#contributing)

*    [Roadmap](#roadmap)

* [Versioning](#versioning)

* [Authors](#authors)

* [License](#license)

* [Acknowledgments](#acknowledgments)

*    [Project History](#project-history)

## Getting Started

These instructions will get you a copy of the project up and running on your local machine for development. See deployment for notes on how to deploy the project on a live system.

### Prerequisites

What things you need to install the software and how to install them

I recommend you to use the `setup_development.sh` script by running

```

./setup_development.sh

```

but if you don't want to do that, here is the complete list of dependencies:

* [Python 3.6.8](https://www.python.org/downloads/)

* [Python 3 PIP 9.0.1](https://pip.pypa.io/)

* [Python Venv 3.6.7-1](https://docs.python.org/3/library/venv.html)

* [Scrapy 1.7.3](https://scrapy.org/)

* [Sqlalchemy 1.3.7](https://www.sqlalchemy.org)

* [Pyexcel 0.5.15](https://github.com/pyexcel/pyexcel/)

* [Pyexcel-ods 0.5.6](https://github.com/pyexcel/pyexcel-ods)

* [Pyexcel-xls 0.5.8](https://github.com/pyexcel/pyexcel-xls)

* [Pyexcel-xlsxw 0.4.2](https://github.com/pyexcel/pyexcel-xlsxw)

## Deployment

Activate the environment

```

source venv/bin/activate

```

Change to the scrapy project

```

cd kem

```

Start the crawler

```

scrapy crawl getcontacts

```

After the crawler finishes, you'll want to export the data:

```

./export.py results.db kem getcontacts 1 KEM-Contacts_YYYY-MM-DD

```

where 1 is the job id and YYYY-MM-DD should be replaced by the date on which you crawled the website.

You will see the job id at the beginning of the log `log.txt`: `Job ID is: XX`.

## Built With

* [Ubuntu 18.04.3 LTS](https://ubuntu.com/) - The operating system I use

* [Sublime Text 3](https://www.sublimetext.com/) - The code editor I use

* [Python 3.6.8](https://www.python.org/downloads/) - The programming language

* [Python 3 PIP 9.0.1](https://pip.pypa.io/) - The package manager of the programming language

* [Python Venv 3.6.7-1](https://docs.python.org/3/library/venv.html) - The project bundler of the programming language

* [Scrapy 1.7.3](https://scrapy.org/) - The crawling framework

* [Sqlalchemy 1.3.7](https://www.sqlalchemy.org) - The database interface library

* [Pyexcel 0.5.15](https://github.com/pyexcel/pyexcel/) - For exporting to spreadsheet formats

* [Pyexcel-ods 0.5.6](https://github.com/pyexcel/pyexcel-ods) - For exporting as ODS spreadsheet

* [Pyexcel-xls 0.5.8](https://github.com/pyexcel/pyexcel-xls) - For exporting as XLS spreadsheet

* [Pyexcel-xlsxw 0.4.2](https://github.com/pyexcel/pyexcel-xlsxw) - For exporting as XLSX spreadsheet

## Contributing

Please open an issue if you want to help or have questions.

### Roadmap

Things I already plan to implement, but didn't have yet:

- [ ] Change database scheme to be individual to crawler, make exporter therefore export specific table.

## Versioning

We use [SemVer](http://semver.org/) for versioning. For the versions available, see the tags on this repository.

## Authors

* **Max Fuxjäger** - *Initial work* - [MaxValue](https://gitlab.com/MaxValue)

## License

This project is licensed under the MIT License - see the [LICENSE.txt](LICENSE.txt) file for details.

### Project History

This project was created because I (Max) was asked to crawl this website.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/maxvalue/kem-contacts

Awesome Lists containing this project

README