Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/nightmachinery/price_detector_fa

extracts product/price/amount tuples from Persian text using rule-based methods
https://github.com/nightmachinery/price_detector_fa

nlp persian price rule-based-nlp

Last synced: 25 days ago
JSON representation

extracts product/price/amount tuples from Persian text using rule-based methods

Awesome Lists containing this project

README

        

#+TITLE: price_detector_fa

=price_detector_fa= extracts product/price/amount tuples from Persian text using rule-based methods.

* Contributers
- Feraidoon Mehri
- Fahime Hosseini
- Soroush Vafaie Tabar

* Installation
This library does not work on Windows.

1. Run the following in this project's directory:
#+begin_example zsh
pip install -e .
bash install.sh
#+end_example

2. Install =graphviz= using your OS package manager.

* Usage
#+begin_src jupyter-python :kernel py_310 :session emacs_py_1 :async yes :exports both
from price_detector_fa.samples import *
from price_detector_fa.utils import *
from price_detector_fa.extractors import *
from price_detector_fa.preprocessing import *
from price_detector_fa.hardcoded import *

def matching_extract(sample):
output = []
for s in sentence_tokenizer.tokenize(sample):
s_tokens, s_spans = preprocess(s)

s_parsed = parser.parse(s_tokens)
s_spans = find_spans(s_parsed, s_spans)

matchings = all_extract(s_parsed)
output = output + list(
matching_show(matching, s_spans) for matching in matchings
)
return output

import pprint
pp = pprint.PrettyPrinter(indent=2)
#+end_src

#+RESULTS:

#+begin_src jupyter-python :kernel py_310 :session emacs_py_1 :async yes :exports both
pp.pprint(matching_extract("عباس‌آقا ده فروند شتر را به بهای پنجاه قران خریداری نموده و و خوشال شدند"))
#+end_src

#+RESULTS:
: [ { 'price_amount': ['مقدار: پنجاه'],
: 'price_unit': ['مقدار: قران'],
: 'product_amount': ['مقدار: ده'],
: 'product_name': 'مقدار: شتر',
: 'product_name_span': (18, 21),
: 'product_unit': ['مقدار: فروند']}]

#+begin_src jupyter-python :kernel py_310 :session emacs_py_1 :async yes :exports both
pp.pprint(matching_extract("با سه هزار تومان میشود یک عدد بادکنک خرید."))
#+end_src

#+RESULTS:
: [ { 'price_amount': ['مقدار: سه هزار'],
: 'price_unit': ['مقدار: تومان'],
: 'product_amount': ['مقدار: یک'],
: 'product_name': 'مقدار: بادکنک خرید .',
: 'product_name_span': (30, 42),
: 'product_unit': ['مقدار: عدد']}]

#+begin_src jupyter-python :kernel py_310 :session emacs_py_1 :async yes :exports both
print(sample_16_2)
pp.pprint(matching_extract(sample_16_2))
#+end_src

#+RESULTS:
: قیمت هندوانه ارزان شد و قیمت هر گرم طلا هزار تومان است
: [ { 'price_amount': ['مقدار: هزار'],
: 'price_unit': ['مقدار: تومان'],
: 'product_amount': ['مقدار: یک'],
: 'product_name': 'مقدار: طلا',
: 'product_name_span': (37, 40),
: 'product_unit': ['مقدار: گرم']}]