Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/samraatz/insightshogun

This was a small assignment project I had to complete for textify AI, it scrapes news, uses new to recognise entities , maps them onto neo4j , uses openai's api to find intricate personal/additional relationships and outputs them as graphs too, then the relationships titles entities etc are uploaded to a local Postgres server
https://github.com/samraatz/insightshogun

knowledge-graph neo4j ner webscraping

Last synced: about 1 month ago
JSON representation

This was a small assignment project I had to complete for textify AI, it scrapes news, uses new to recognise entities , maps them onto neo4j , uses openai's api to find intricate personal/additional relationships and outputs them as graphs too, then the relationships titles entities etc are uploaded to a local Postgres server

Awesome Lists containing this project

README

        

# Insight Shogun

![Insight Shogun Banner](https://github.com/samraatz/InsightShogun/blob/main/banner.png)

### Unleashing the Power of NLP and Knowledge Graphs

**Insight Shogun** is a comprehensive tool that combines the prowess of web scraping, natural language processing (NLP), and knowledge graph creation. Inspired by the strategic mastery of a Shogun, this project aims to provide deep insights from news articles, establishing meaningful connections between entities and topics and further create a database to store them.

### Features

- **Web Scraping**: Efficiently extract article titles, publication dates, content, and URLs from various news websites using `requests` and `BeautifulSoup`.
![Web Scraping](https://github.com/samraatz/InsightShogun/blob/main/ws.png)
- **Named Entity Recognition (NER)**: Identify key entities such as people, organizations, and locations within the content using `spaCy`.
![Named Entity Recognition](https://github.com/samraatz/InsightShogun/blob/main/ner.png)
- **Keywords Extraction & LLM Integration**: Utilize the OpenAI API to extract significant keywords from the articles,uncover additional relationships between entities, identifying deeper insights and connections within the content. adding another layer of analysis.
![Keywords Extraction](https://github.com/samraatz/InsightShogun/blob/main/key.png)
- **Knowledge Graph Creation**: Build and visualize a knowledge graph using `Neo4j`, representing relationships between entities and articles.
![Knowledge Graph Creation](https://github.com/samraatz/InsightShogun/blob/main/kg.png)
- **PostgreSQL Database Integration**: Upload extracted data to a PostgreSQL database for persistent storage and further analysis.
![Database Integration](https://github.com/samraatz/InsightShogun/blob/main/db.png)

### How It Works

1. **Scrape Articles**: Extract detailed information from news websites.
![Scrape Articles](https://github.com/samraatz/InsightShogun/blob/main/scrape.png)
2. **Process Content**: Use NLP techniques to identify entities and keywords within the articles.
![Process Content](https://github.com/samraatz/InsightShogun/blob/main/spacy.png)
3. **Build Knowledge Graph**: Create nodes for entities and articles, and establish relationships based on their co-occurrence and shared keywords.
![Build Knowledge Graph](https://github.com/samraatz/InsightShogun/blob/main/kg1.png)
4. **Enhance with LLM**: Utilize the power of language models to reveal hidden relationships and deeper insights.
![Enhance with LLM](https://github.com/samraatz/InsightShogun/blob/main/er.png)
5. **Upload to PostgreSQL**: Store the extracted data in a PostgreSQL database for persistent storage and further analysis.
![Upload to PostgreSQL](https://github.com/samraatz/InsightShogun/blob/main/db_upload.png)

### Installation

1. Clone the repository:
```bash
git clone https://github.com/your-username/insight-shogun.git
```
2. Install the required dependencies:
```bash
pip install -r requirements.txt
```
3. Download the spaCy model:
```bash
python -m spacy download en_core_web_sm
```

### Usage

1. Update the configuration with your Neo4j and OpenAI API credentials.
2. Run the main script:
```bash
python main.py
```

### Requirements

- Python 3.6+
- BeautifulSoup4
- Requests
- spaCy
- Neo4j
- OpenAI API

### Contributing

Contributions are welcome! Please fork the repository and submit a pull request for any enhancements or bug fixes.

### License

This project is licensed under the MIT License.