https://github.com/nxgeo/id-svo-extractor

id-svo-extractor: Extract SVO triples from Indonesian text.
https://github.com/nxgeo/id-svo-extractor

artificial-intelligence indonesian-language indonesian-linguistics indonesian-nlp information-extraction knowledge-extraction knowledge-representation natural-language-processing nlp python rdf-triples spacy spacy-stanza stanza text-analysis triple-extraction

Last synced: 6 months ago
JSON representation

id-svo-extractor: Extract SVO triples from Indonesian text.

Host: GitHub
URL: https://github.com/nxgeo/id-svo-extractor
Owner: nxgeo
License: apache-2.0
Created: 2024-09-27T06:21:37.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-09-27T19:09:17.000Z (almost 2 years ago)
Last Synced: 2025-08-28T06:26:16.221Z (11 months ago)
Topics: artificial-intelligence, indonesian-language, indonesian-linguistics, indonesian-nlp, information-extraction, knowledge-extraction, knowledge-representation, natural-language-processing, nlp, python, rdf-triples, spacy, spacy-stanza, stanza, text-analysis, triple-extraction
Language: Python
Homepage: https://pypi.org/project/id-svo-extractor/
Size: 7.81 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          ![(id, svo, extractor)](https://i.imgur.com/FqUBo68.png)

# id-svo-extractor

**id-svo-extractor** is a heuristic tool designed to extract SVO (Subject-Verb-Object) triples from Indonesian text. It uses Stanza's state-of-the-art Indonesian language pipeline for NLP.

## Requirements

To use **id-svo-extractor**, you will need Python v3.10 or higher and the following Python package:

- [spacy-stanza](https://github.com/explosion/spacy-stanza) v1.0.4

You must also download Stanza's Indonesian models for `tokenize`, `mwt`, `pos`, `lemma`, and `depparse` processors before initializing the pipeline.

## Installation

Install the package directly from PyPI:

```sh

pip install id-svo-extractor

```

## Usage

Here's a basic example to get you started.

```python

from id_svo_extractor import create_pipeline

from id_svo_extractor.utils import collect_svo_triples

from stanza import download

# Download Stanza's Indonesian models for tokenize, mwt, pos, lemma, and depparse processors.

# This step is mandatory before initializing the NLP pipeline.

download("id", processors="tokenize,mwt,pos,lemma,depparse")

# Initialize the NLP pipeline.

nlp = create_pipeline()

doc = nlp("Niko dan Okin mendesain brosur promosi dan mencetak poster iklan.")

for sentence in doc.sents:

    # Extracted triples for each sentence are stored in `svo_triples` custom attribute.

    print(sentence._.svo_triples)

    # Output:

    # [ SVOTriple(s=[Niko], v=[mendesain], o=[brosur, promosi]),

    #   SVOTriple(s=[Okin], v=[mendesain], o=[brosur, promosi]),

    #   SVOTriple(s=[Niko], v=[mencetak], o=[poster, iklan]),

    #   SVOTriple(s=[Okin], v=[mencetak], o=[poster, iklan]) ]

print(collect_svo_triples(doc))

# Output:

# [ ('Niko', 'mendesain', 'brosur promosi'),

#   ('Okin', 'mendesain', 'brosur promosi'),

#   ('Niko', 'mencetak', 'poster iklan'),

#   ('Okin', 'mencetak', 'poster iklan') ]

```

## License

This project is licensed under the Apache License 2.0.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nxgeo/id-svo-extractor

Awesome Lists containing this project

README