https://github.com/nightmachinery/price_detector_fa

extracts product/price/amount tuples from Persian text using rule-based methods
https://github.com/nightmachinery/price_detector_fa

nlp persian price rule-based-nlp

Last synced: 3 months ago
JSON representation

extracts product/price/amount tuples from Persian text using rule-based methods

Host: GitHub
URL: https://github.com/nightmachinery/price_detector_fa
Owner: NightMachinery
Created: 2022-11-21T05:06:05.000Z (over 3 years ago)
Default Branch: master
Last Pushed: 2023-12-15T11:41:44.000Z (over 2 years ago)
Last Synced: 2024-12-31T17:48:02.851Z (over 1 year ago)
Topics: nlp, persian, price, rule-based-nlp
Language: Jupyter Notebook
Homepage:
Size: 167 KB
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: readme.org

Awesome Lists containing this project

README

          #+TITLE: price_detector_fa

=price_detector_fa= extracts product/price/amount tuples from Persian text using rule-based methods.

* Contributers

- Feraidoon Mehri

- Fahime Hosseini

- Soroush Vafaie Tabar

* Installation

This library does not work on Windows.

1. Run the following in this project's directory:

#+begin_example zsh

pip install -e .

bash install.sh

#+end_example

2. Install =graphviz= using your OS package manager.

* Usage

#+begin_src jupyter-python :kernel py_310 :session emacs_py_1 :async yes :exports both

from price_detector_fa.samples import *

from price_detector_fa.utils import *

from price_detector_fa.extractors import *

from price_detector_fa.preprocessing import *

from price_detector_fa.hardcoded import *

def matching_extract(sample):

    output = []

    for s in sentence_tokenizer.tokenize(sample):

        s_tokens, s_spans = preprocess(s)

        s_parsed = parser.parse(s_tokens)

        s_spans = find_spans(s_parsed, s_spans)

        matchings = all_extract(s_parsed)

        output = output + list(

            matching_show(matching, s_spans) for matching in matchings

        )

    return output

import pprint 

pp = pprint.PrettyPrinter(indent=2)

#+end_src

#+RESULTS:

#+begin_src jupyter-python :kernel py_310 :session emacs_py_1 :async yes :exports both

pp.pprint(matching_extract("عباس‌آقا ده فروند شتر را به بهای پنجاه قران خریداری نموده و و خوشال شدند"))

#+end_src

#+RESULTS:

: [ { 'price_amount': ['مقدار:  پنجاه'],

:     'price_unit': ['مقدار:  قران'],

:     'product_amount': ['مقدار:  ده'],

:     'product_name': 'مقدار:  شتر',

:     'product_name_span': (18, 21),

:     'product_unit': ['مقدار:  فروند']}]

#+begin_src jupyter-python :kernel py_310 :session emacs_py_1 :async yes :exports both

pp.pprint(matching_extract("با سه هزار تومان میشود یک عدد بادکنک خرید."))

#+end_src

#+RESULTS:

: [ { 'price_amount': ['مقدار:  سه هزار'],

:     'price_unit': ['مقدار:  تومان'],

:     'product_amount': ['مقدار:  یک'],

:     'product_name': 'مقدار:  بادکنک خرید .',

:     'product_name_span': (30, 42),

:     'product_unit': ['مقدار:  عدد']}]

#+begin_src jupyter-python :kernel py_310 :session emacs_py_1 :async yes :exports both

print(sample_16_2)

pp.pprint(matching_extract(sample_16_2))

#+end_src

#+RESULTS:

: قیمت هندوانه ارزان شد و قیمت  هر گرم طلا هزار تومان است

: [ { 'price_amount': ['مقدار:  هزار'],

:     'price_unit': ['مقدار:  تومان'],

:     'product_amount': ['مقدار:  یک'],

:     'product_name': 'مقدار:  طلا',

:     'product_name_span': (37, 40),

:     'product_unit': ['مقدار:  گرم']}]

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nightmachinery/price_detector_fa

Awesome Lists containing this project

README