https://github.com/redjax/ohio_utility_scraper
Find rates for Ohio gas & electric utilities. Pulls from https://energychoice.ohio.gov/ApplesToApplesComparision.aspx for utility prices.
https://github.com/redjax/ohio_utility_scraper
pdm python python3 ruff scraper scrapy sqlalchemy
Last synced: 9 months ago
JSON representation
Find rates for Ohio gas & electric utilities. Pulls from https://energychoice.ohio.gov/ApplesToApplesComparision.aspx for utility prices.
- Host: GitHub
- URL: https://github.com/redjax/ohio_utility_scraper
- Owner: redjax
- Created: 2023-05-01T05:19:08.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-04-25T05:40:27.000Z (about 2 years ago)
- Last Synced: 2025-01-17T14:42:59.878Z (over 1 year ago)
- Topics: pdm, python, python3, ruff, scraper, scrapy, sqlalchemy
- Language: Python
- Homepage:
- Size: 525 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Ohio Energy Provider Comparison
[](https://pdm.fming.dev)
Scrapes [Energy Choice Ohio](https://energychoice.ohio.gov/ApplesToApplesComparision.aspx)'s provider comparison tables.
- [ELECTRIC](https://energychoice.ohio.gov/ApplesToApplesComparision.aspx?Category=Electric&TerritoryId=6&RateCode=1)
- [GAS](https://energychoice.ohio.gov/ApplesToApplesComparision.aspx?Category=NaturalGas&TerritoryId=8&RateCode=1)
## Usage
### With PDM
- Setup environment
- `$ pdm install`
- Run start script
- `$ pdm start`
### Without PDM
- Create `venv`
- `$ virtualenv .venv`
- Activate `venv`
- Linux: `$ . .venv/bin/activate`
- Install requirements
- `$ pip install -r requirements.txt`
- `cd` to app directory
- `$ cd ohioenergy`
- Run crawler(s)
- `$ python main.py`
## Notes
### Run Scrapy spiders from a Python script
#### Scrapy's CrawlerRunner, for running multiple crawlers
Utilized `twisted` for async crawls.
Example single crawler, using the `ohioenergy.spiders.ohioenergyproviders.OhioenergyprovidersSpider` spider:
```
from twisted.internet import reactor
from scrapy.crawler import CrawlerRunner
from ohioenergy.spiders.ohioenergyproviders import OhioenergyprovidersSpider
if __name__ == "__main__":
configure_logging({"LOG_FORMAT": default_fmt})
settings = get_project_settings()
runner = CrawlerRunner(settings=settings)
electric_providers = runner.crawl(OhioenergyprovidersSpider)
## Add runners and a twisted reactor.stop() to runner
electric_providers.addBoth(lambda _: reactor.stop())
## Run crawlers
reactor.run()
```
Example multiple crawlers, using hypothetical `Crawler1` and `Crawler2`:
```
from twisted.internet import reactor
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.utils.project import get_projectsettings
class Spider1(scrapy.Spider):
...
class Spider2(scrapy.Spider):
...
if __name__ == "__main__":
settings = get_project_settings()
runner = CrawlerRunner(settings)
## Add spiders to runner
runner.crawl(Spider1)
runner.crawl(Spider2)
## Join crawlers
crawl = runner.join()
## Set Twisted's reactor.stop()
crawl.addBoth(lambda _: reactor.stop())
## Run crawler
reactor.run()
```
#### Scrapy's CrawlerProcess
Use `scrapy.crawler.CrawlerProcess` to run spiders. Make sure to import spiders into the script.
Example using the `ohioenergy.spiders.ohioenergyproviders.OhioenergyprovidersSpider` spider:
```
## main.py
import scrapy
## Import CrawlerProcess
from scrapy.crawler import CrawlerProcess
## Import scrapy project's settings
from scrapy.utils.project import get_project_settings
## Import OhioenergyprovidersSpider
from ohioenergy.spiders.ohioenergyproviders import OhioenergyprovidersSpider
if __name__ == "__main__":
## Create CrawlerProcess object. Initialize with Scrapy project's settings
process = CrawlerProcess(get_project_settings())
## Prepare crawl
process.crawl(OhioenergyprovidersSpider)
## Start crawl
process.start()
```