Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/samraatz/insightshogun
This was a small assignment project I had to complete for textify AI, it scrapes news, uses new to recognise entities , maps them onto neo4j , uses openai's api to find intricate personal/additional relationships and outputs them as graphs too, then the relationships titles entities etc are uploaded to a local Postgres server
https://github.com/samraatz/insightshogun
knowledge-graph neo4j ner webscraping
Last synced: about 1 month ago
JSON representation
This was a small assignment project I had to complete for textify AI, it scrapes news, uses new to recognise entities , maps them onto neo4j , uses openai's api to find intricate personal/additional relationships and outputs them as graphs too, then the relationships titles entities etc are uploaded to a local Postgres server
- Host: GitHub
- URL: https://github.com/samraatz/insightshogun
- Owner: samraatz
- Created: 2024-07-24T17:34:59.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-07-24T20:33:34.000Z (6 months ago)
- Last Synced: 2024-11-03T08:42:00.421Z (3 months ago)
- Topics: knowledge-graph, neo4j, ner, webscraping
- Language: Python
- Homepage: https://www.youtube.com/watch?v=LSd4CCz1OUY
- Size: 8.01 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Insight Shogun
![Insight Shogun Banner](https://github.com/samraatz/InsightShogun/blob/main/banner.png)
### Unleashing the Power of NLP and Knowledge Graphs
**Insight Shogun** is a comprehensive tool that combines the prowess of web scraping, natural language processing (NLP), and knowledge graph creation. Inspired by the strategic mastery of a Shogun, this project aims to provide deep insights from news articles, establishing meaningful connections between entities and topics and further create a database to store them.
### Features
- **Web Scraping**: Efficiently extract article titles, publication dates, content, and URLs from various news websites using `requests` and `BeautifulSoup`.
![Web Scraping](https://github.com/samraatz/InsightShogun/blob/main/ws.png)
- **Named Entity Recognition (NER)**: Identify key entities such as people, organizations, and locations within the content using `spaCy`.
![Named Entity Recognition](https://github.com/samraatz/InsightShogun/blob/main/ner.png)
- **Keywords Extraction & LLM Integration**: Utilize the OpenAI API to extract significant keywords from the articles,uncover additional relationships between entities, identifying deeper insights and connections within the content. adding another layer of analysis.
![Keywords Extraction](https://github.com/samraatz/InsightShogun/blob/main/key.png)
- **Knowledge Graph Creation**: Build and visualize a knowledge graph using `Neo4j`, representing relationships between entities and articles.
![Knowledge Graph Creation](https://github.com/samraatz/InsightShogun/blob/main/kg.png)
- **PostgreSQL Database Integration**: Upload extracted data to a PostgreSQL database for persistent storage and further analysis.
![Database Integration](https://github.com/samraatz/InsightShogun/blob/main/db.png)### How It Works
1. **Scrape Articles**: Extract detailed information from news websites.
![Scrape Articles](https://github.com/samraatz/InsightShogun/blob/main/scrape.png)
2. **Process Content**: Use NLP techniques to identify entities and keywords within the articles.
![Process Content](https://github.com/samraatz/InsightShogun/blob/main/spacy.png)
3. **Build Knowledge Graph**: Create nodes for entities and articles, and establish relationships based on their co-occurrence and shared keywords.
![Build Knowledge Graph](https://github.com/samraatz/InsightShogun/blob/main/kg1.png)
4. **Enhance with LLM**: Utilize the power of language models to reveal hidden relationships and deeper insights.
![Enhance with LLM](https://github.com/samraatz/InsightShogun/blob/main/er.png)
5. **Upload to PostgreSQL**: Store the extracted data in a PostgreSQL database for persistent storage and further analysis.
![Upload to PostgreSQL](https://github.com/samraatz/InsightShogun/blob/main/db_upload.png)### Installation
1. Clone the repository:
```bash
git clone https://github.com/your-username/insight-shogun.git
```
2. Install the required dependencies:
```bash
pip install -r requirements.txt
```
3. Download the spaCy model:
```bash
python -m spacy download en_core_web_sm
```### Usage
1. Update the configuration with your Neo4j and OpenAI API credentials.
2. Run the main script:
```bash
python main.py
```### Requirements
- Python 3.6+
- BeautifulSoup4
- Requests
- spaCy
- Neo4j
- OpenAI API### Contributing
Contributions are welcome! Please fork the repository and submit a pull request for any enhancements or bug fixes.
### License
This project is licensed under the MIT License.