https://github.com/figurehook/figure_parser
A web parser focus on parsing product information on Japan ACG figure sites.
https://github.com/figurehook/figure_parser
hacktoberfest hacktoberfest2022 parser webparser
Last synced: about 1 month ago
JSON representation
A web parser focus on parsing product information on Japan ACG figure sites.
- Host: GitHub
- URL: https://github.com/figurehook/figure_parser
- Owner: FigureHook
- Created: 2021-07-02T21:59:47.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2024-01-14T14:37:24.000Z (over 2 years ago)
- Last Synced: 2025-09-27T23:16:33.603Z (8 months ago)
- Topics: hacktoberfest, hacktoberfest2022, parser, webparser
- Language: Python
- Homepage:
- Size: 1.07 MB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- Support: SUPPORT_SITES.md
Awesome Lists containing this project
README
[](https://pypi.org/project/figure_parser/)
[](https://pypi.org/project/figure_parser/)
[](https://actions-badge.atrox.dev/FigureHook/figure_parser/goto?ref=main)
[](https://coveralls.io/github/FigureHook/figure_parser)
# Figure Parser
A web parser focus on parsing product information on Japan ACG figure sites.
[Supporting site list](SUPPORT_SITES.md)
## Install
```bash
pip install figure_parser
```
## Usage
```py
from pprint import pprint
import requests as rq
from bs4 import BeautifulSoup
from figure_parser.exceptions import FigureParserException
from figure_parser.factories import GeneralBs4ProductFactory
factory = GeneralBs4ProductFactory.create_factory()
url = "https://www.goodsmile.info/ja/product/11246/PA+15+%E9%AB%98%E6%A0%A1%E8%83%B8%E3%82%AD%E3%83%A5%E3%83%B3%E7%89%A9%E8%AA%9E.html"
resp = rq.get(url)
try:
product = factory.create_product(resp.url, BeautifulSoup(resp.content, 'lxml'))
pprint(product.dict())
except FigureParserException as e:
print(e)
```
```sh
{'adult': False,
'category': 'フィギュア',
'copyright': '© SUNBORN Network Technology Co., Ltd. © SUNBORN Japan Co., '
'Ltd.',
'distributer': 'グッドスマイルカンパニー',
'images': ['https://images.goodsmile.info/cgm/images/product/20210521/11246/85011/large/346a0402da0a835b6969105e77c7bf7f.jpg',
'https://images.goodsmile.info/cgm/images/product/20210521/11246/85012/large/e1fb5ad64d58498477611082c7219759.jpg',
'https://images.goodsmile.info/cgm/images/product/20210521/11246/85013/large/cad59d379e0ac60b8d386eee93253502.jpg',
'https://images.goodsmile.info/cgm/images/product/20210521/11246/85014/large/4e4957b4783cc9b8cc6e6101aaf346b3.jpg',
'https://images.goodsmile.info/cgm/images/product/20210521/11246/85015/large/9bf879603be71259f2d673a84d1b3b2a.jpg',
'https://images.goodsmile.info/cgm/images/product/20210521/11246/85016/large/f464915a47d744441a0574e97016e8d0.jpg',
'https://images.goodsmile.info/cgm/images/product/20210521/11246/85017/large/f8ae4c2ebfb05d3b3c2c9a427d9dd9af.jpg',
'https://images.goodsmile.info/cgm/images/product/20210521/11246/85018/large/d03ebd90e1fd832e5909deba3c78432c.jpg',
'https://images.goodsmile.info/cgm/images/product/20210521/11246/85019/large/1a4421435c14c53857d5125c0f3da4aa.jpg',
'https://images.goodsmile.info/cgm/images/product/20210521/11246/85020/large/eede36da01b9ab86ba35a3e5f30a8394.jpg',
'https://images.goodsmile.info/cgm/images/product/20210521/11246/85021/large/8afccdd56243497830857ec612374266.jpg'],
'jan': None,
'manufacturer': 'Phat!',
'name': 'PA-15 高校胸キュン物語',
'og_image': 'http://images.goodsmile.info/cgm/images/product/20210521/11246/85023/medium/b1a1a49e9bb72ebd95670ca757e22735.jpg',
'order_period': {'end': datetime.datetime(2021, 7, 7, 21, 0),
'start': datetime.datetime(2021, 5, 27, 12, 0)},
'paintworks': ['緋色 (scarlet)'],
'releaser': 'ファット・カンパニー',
'releases': [{'announced_at': None,
'price': 19800,
'release_date': datetime.date(2022, 12, 1),
'tax_including': True}],
'rerelease': False,
'scale': 7,
'sculptors': ['Phat!'],
'series': 'ドールズフロントライン',
'size': 280,
'thumbnail': 'http://images.goodsmile.info/cgm/images/product/20210521/11246/85023/medium/b1a1a49e9bb72ebd95670ca757e22735.jpg',
'url': 'https://www.goodsmile.info/ja/product/11246/PA+15+%E9%AB%98%E6%A0%A1%E8%83%B8%E3%82%AD%E3%83%A5%E3%83%B3%E7%89%A9%E8%AA%9E.html'}
```
# Development
This project is using [poetry](https://python-poetry.org/) as package manager.
Install dependencies
```sh
poetry install
```
Use virtualenv
```sh
poetry shell
```
Install pre-commit
```
pre-commit install
```
Generate new parser (the name should be in snake case)
```sh
python cli.py generate new_site
```
After generating the new site, the test data can be found [here](https://github.com/FigureHook/figure_parser/tree/main/tests/test_parsers/product_case).
Run the test and coverage
```sh
tox
coverage combine
coverage report -m
```
Type check
```sh
mypy
```
Lint the code
```sh
isort -e .
black .
```
If you add or update dependencies
```sh
poetry export --without-hashes --dev -f requirements.txt --output requirements.txt
```
If you use `Makefile`, it provides several useful command.
```
clean-test-cache Clean cache of test.
cov-report Show the coverage of tests.
format Format the code.
freeze Export the requirements.txt file.
help Show this help message.
install Install requirements of project.
lint Lint the code.
test Run the tests.
type-check Type check with mypy.
```