Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/indaco/predix-catalog-scraper
Scrape the Predix.io Catalog and generate an excel file listing all the services available on it.
https://github.com/indaco/predix-catalog-scraper
excel predix predix-catalog python script utility
Last synced: 26 days ago
JSON representation
Scrape the Predix.io Catalog and generate an excel file listing all the services available on it.
- Host: GitHub
- URL: https://github.com/indaco/predix-catalog-scraper
- Owner: indaco
- License: gpl-3.0
- Created: 2016-11-04T19:15:58.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2017-11-15T09:44:09.000Z (about 7 years ago)
- Last Synced: 2024-11-11T04:54:53.490Z (2 months ago)
- Topics: excel, predix, predix-catalog, python, script, utility
- Language: Python
- Homepage:
- Size: 881 KB
- Stars: 1
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# predix-catalog-scraper
The GE Digital's Predix Catalog it's a really fast growing list of micro services delivered on top of the Predix Operating System for IIoT.
This script is something like an **Export to Excel** for the Predix Catalog. It scrapes the Predix.io Catalog and generates an excel file listing all the services available on it.
For each tile on the Predix.io Catalog it collects the following information:
- Service Name
- Service Category (e.g. Edge Software and Services, Security, Data Management, ...)
- Service Status (Available, Beta or Soon)
- Vendor Name
- Short Description
- Long Description
- Link to the service specific web page
- Publishing Date## Screenshots
Below a couple of screenshots from the generated excel file showing part of the "services" and the "analytics" sheets:**Services**:
![Services Screenshot](/pictures/1_services.png)
**Analytics**:
![Services Screenshot](/pictures/2_analytics.png)
## Dependencies
Make sure to install the required dependencies both at OS and Python level.
#### OS
- libxml2 (Visit the [official web site](http://www.xmlsoft.org/downloads.html) to download the latest compatible version for your OS)
- PhantomJS (Visit the [official web site](http://phantomjs.org/) for the installation guide)#### Python
- BeautifulSoup
- LXML
- Windows users can download the binary for the installed python version directly from [here](https://pypi.python.org/pypi/lxml/3.6.4)
- Selenium
- XlsxWriterInstall the python libs using _pip_ (_usage of `sudo` can be required based on your OS_):
`$ pip install lxml beautifulsoup4 selenium XlsxWriter`
## How to use it?
This script has been developed and tested against Python v.2.7.11 and v.3.5.2 on Linux (Ubuntu) and Mac OSX 10.x. Windows users should be able to use it once all the dependencies are installed on their local machine.
> **Note for GE Internals:** MyApps Anywhere blocks the http traffic via an unknown browser like PhantomJS. In order to use this script, temporarily disable it, execute the script and then re-enable MyApps Anywhere. I'll try to fix it asap.
##### Python 3.5
```
$ git clone https://github.com/indaco/predix-catalog-scraper
$ cd predix-catalog-scraper
$ python main.py
```##### Python 2.7
```
$ git clone https://github.com/indaco/predix-catalog-scraper
$ cd predix-catalog-scraper
$ git checkout python2.7
$ python main.py
```See the generated file: `output/predix-catalog.xlsx`
- - -
#### DISCLAIMER
This is **not** an official development from the [GE Digital's Predix Team](https://github.com/predixdev)