Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nightmachinery/price_detector_fa
extracts product/price/amount tuples from Persian text using rule-based methods
https://github.com/nightmachinery/price_detector_fa
nlp persian price rule-based-nlp
Last synced: 25 days ago
JSON representation
extracts product/price/amount tuples from Persian text using rule-based methods
- Host: GitHub
- URL: https://github.com/nightmachinery/price_detector_fa
- Owner: NightMachinery
- Created: 2022-11-21T05:06:05.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2023-12-15T11:41:44.000Z (about 1 year ago)
- Last Synced: 2024-11-08T11:12:25.954Z (3 months ago)
- Topics: nlp, persian, price, rule-based-nlp
- Language: Jupyter Notebook
- Homepage:
- Size: 167 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: readme.org
Awesome Lists containing this project
README
#+TITLE: price_detector_fa
=price_detector_fa= extracts product/price/amount tuples from Persian text using rule-based methods.
* Contributers
- Feraidoon Mehri
- Fahime Hosseini
- Soroush Vafaie Tabar* Installation
This library does not work on Windows.1. Run the following in this project's directory:
#+begin_example zsh
pip install -e .
bash install.sh
#+end_example2. Install =graphviz= using your OS package manager.
* Usage
#+begin_src jupyter-python :kernel py_310 :session emacs_py_1 :async yes :exports both
from price_detector_fa.samples import *
from price_detector_fa.utils import *
from price_detector_fa.extractors import *
from price_detector_fa.preprocessing import *
from price_detector_fa.hardcoded import *def matching_extract(sample):
output = []
for s in sentence_tokenizer.tokenize(sample):
s_tokens, s_spans = preprocess(s)s_parsed = parser.parse(s_tokens)
s_spans = find_spans(s_parsed, s_spans)matchings = all_extract(s_parsed)
output = output + list(
matching_show(matching, s_spans) for matching in matchings
)
return outputimport pprint
pp = pprint.PrettyPrinter(indent=2)
#+end_src#+RESULTS:
#+begin_src jupyter-python :kernel py_310 :session emacs_py_1 :async yes :exports both
pp.pprint(matching_extract("عباسآقا ده فروند شتر را به بهای پنجاه قران خریداری نموده و و خوشال شدند"))
#+end_src#+RESULTS:
: [ { 'price_amount': ['مقدار: پنجاه'],
: 'price_unit': ['مقدار: قران'],
: 'product_amount': ['مقدار: ده'],
: 'product_name': 'مقدار: شتر',
: 'product_name_span': (18, 21),
: 'product_unit': ['مقدار: فروند']}]#+begin_src jupyter-python :kernel py_310 :session emacs_py_1 :async yes :exports both
pp.pprint(matching_extract("با سه هزار تومان میشود یک عدد بادکنک خرید."))
#+end_src#+RESULTS:
: [ { 'price_amount': ['مقدار: سه هزار'],
: 'price_unit': ['مقدار: تومان'],
: 'product_amount': ['مقدار: یک'],
: 'product_name': 'مقدار: بادکنک خرید .',
: 'product_name_span': (30, 42),
: 'product_unit': ['مقدار: عدد']}]#+begin_src jupyter-python :kernel py_310 :session emacs_py_1 :async yes :exports both
print(sample_16_2)
pp.pprint(matching_extract(sample_16_2))
#+end_src#+RESULTS:
: قیمت هندوانه ارزان شد و قیمت هر گرم طلا هزار تومان است
: [ { 'price_amount': ['مقدار: هزار'],
: 'price_unit': ['مقدار: تومان'],
: 'product_amount': ['مقدار: یک'],
: 'product_name': 'مقدار: طلا',
: 'product_name_span': (37, 40),
: 'product_unit': ['مقدار: گرم']}]