https://github.com/mhw32/adaware-nlp
A toolkit of random natural language predictive models.
https://github.com/mhw32/adaware-nlp
dependency-parsing knowledge-graph pos-tagging summarization
Last synced: 4 months ago
JSON representation
A toolkit of random natural language predictive models.
- Host: GitHub
- URL: https://github.com/mhw32/adaware-nlp
- Owner: mhw32
- Created: 2016-09-02T18:14:35.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2016-12-09T01:37:01.000Z (about 9 years ago)
- Last Synced: 2025-04-21T02:17:45.496Z (8 months ago)
- Topics: dependency-parsing, knowledge-graph, pos-tagging, summarization
- Language: Python
- Homepage:
- Size: 10.4 MB
- Stars: 8
- Watchers: 5
- Forks: 2
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Ada
Goal of this app is to extract information and output a summarized version of a given piece of text, image, or video. By doing so, the algorithm should produce a knowledge graph in which the user can ask more questions about the text.
Knowledge graphs should be shared between users into (multiple or single) graphs. The algorithm can then draw on more information to provide the user additional informations.
## vision
With the ability to convert data into a knowledge graph, the summarization tool is only a start. The larger goal is to convert the algorithm into a AI/ML based frontend for the web. Instead of googling things, you just ask questions and receive answers (more "ergonomic") approach.
**Step 1 :** a chrome extension that you can query.
**Step 2 :** a browser that isn't really a browser. It's just a AI that you can chat with to surf the semantic web.
**Step 3:** an operating system basically like the one from the movie "Her".
## technical pieces
- Text analysis
- Tokenizer (Learn keywords)
- Learn the high dimensional space that text lies in
- Convert the features into a knowledge graph (unsupervised clustering possibly)
- Summarization
- App
- Chrome Extension
- Server to store the knowledge graph (try neo4j)
## natural language processing: AdaNLP
A lot of Ada is using deep learning to extract semantic meaning from text data. Like CoreNLP, AdaNLP pulls information like part-of-speech, name-entity, and dependencies. These are stored in a JSON and are useful for processing later in the pipeline like creating a `SemanticGraph`.
```
{
'num_sentences': 0,
'sentence-0': {
'tokens': ['i', 'like', 'dogs'],
'lemmas': ['i', 'like', 'dog'],
'pos_tags': ['LS', 'IN', 'NNS'],
'ner_tags': ['O', 'O', 'O'],
'dep_tags': ...
}
}
```
## semantic graph: AdaGraph
The semantic graph is a list of `GraphNodes` with a dictionary of edges. The edges are stored in a `defaultdict` of `defaultdict` where the inner dictionary stores edge type. For example, an edge type can be `adjacency`, indicating neighbor ondes or `coref` which connect corefering nodes.
```
defaultdict(>,
{'e_0_1': defaultdict(>,
{'adjacency-1': 0.15, 'sentence-0': 0.05}),
'e_0_2': defaultdict(>,
{'adjacency-2': 0.1, 'sentence-0': 0.05}),
'e_0_3': defaultdict(>,
{'adjacency-3': 0.08, 'sentence-0': 0.05}),
'e_0_4': defaultdict(>,
{'dependency-det': 1.0,
'dependency-root': 1.0,
'sentence-0': 0.05}),
}
)
```
#### AdaGraph Example
The semantic graph for the sentence: The quick brown fox jumped over the lazy dog.