Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/adesoji1/openai-gpt-4-api-for-data-summarization
OpenAI GPT-4 API for Data Summarization
https://github.com/adesoji1/openai-gpt-4-api-for-data-summarization
discord-bot mongodb openai-api pymongo-database python
Last synced: 15 days ago
JSON representation
OpenAI GPT-4 API for Data Summarization
- Host: GitHub
- URL: https://github.com/adesoji1/openai-gpt-4-api-for-data-summarization
- Owner: Adesoji1
- Created: 2024-06-11T08:48:13.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-06-11T10:06:00.000Z (5 months ago)
- Last Synced: 2024-06-12T13:05:03.372Z (5 months ago)
- Topics: discord-bot, mongodb, openai-api, pymongo-database, python
- Homepage:
- Size: 8.79 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
To set up an automated system that fetches data from MongoDB, processes and summarizes it using the OpenAI API, and posts summaries to a Discord channel, you will need to follow several steps. Follow the detailed plan and the necessary scripts i wrote below:
### Steps
1. **Set Up Environment**:
- Install required libraries.
- Set up MongoDB.
- Set up OpenAI API.
- Set up Discord bot.2. **Fetch Data from MongoDB**:
- Connect to MongoDB and fetch data every hour.3. **Data Cleaning**:
- Remove duplicates, tokenize, normalize text, remove stop words.4. **Contextual and Semantic Filtering**:
- Use NLP models to identify important posts.5. **Generate Summaries Using OpenAI API**:
- Summarize the filtered posts.6. **Store Summaries in MongoDB**:
- Save the generated summaries back to MongoDB.7. **Post Summaries to Discord Channel**:
- Use a Discord bot to post the summaries.### Script
#### 1. Install Required Libraries
```bash
pip install pymongo discord.py openai nltk schedule
```#### 2. Set Up MongoDB Connection
```python
from pymongo import MongoClientdef get_mongo_client():
client = MongoClient("mongodb://localhost:27017/")
return clientdef fetch_posts(client):
db = client['social_media']
collection = db['posts']
posts = list(collection.find({"timestamp": {"$gte": }}))
return posts
```#### 3. Data Cleaning
```python
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import stringnltk.download('stopwords')
nltk.download('punkt')def clean_text(text):
tokens = word_tokenize(text)
tokens = [word.lower() for word in tokens if word.isalnum()]
tokens = [word for word in tokens if word not in stopwords.words('english')]
return ' '.join(tokens)
```#### 4. Contextual and Semantic Filtering
```python
from transformers import pipelinedef filter_posts(posts):
nlp_model = pipeline("sentiment-analysis")
important_posts = []
for post in posts:
if any(keyword in post['text'].lower() for keyword in ['update', 'announcement', 'change', 'development']):
sentiment = nlp_model(post['text'])[0]
if sentiment['label'] == 'POSITIVE' or sentiment['label'] == 'NEGATIVE':
important_posts.append(post)
return important_posts
```#### 5. Generate Summaries Using OpenAI API
```python
from openai import OpenAIclient = OpenAI(api_key='YOUR_OPENAI_API_KEY')
def generate_summary(post_text):
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": f"Summarize the following text: {post_text}"}],
max_tokens=50
)
summary = response.choices[0].message['content'].strip()
return summary```
#### 6. Store Summaries in MongoDB
```python
def store_summaries(client, summaries):
db = client['social_media']
collection = db['summaries']
collection.insert_many(summaries)
```#### 7. Post Summaries to Discord Channel
```python
import discordclient = discord.Client()
@client.event
async def on_ready():
print(f'We have logged in as {client.user}')async def post_summary(channel_id, summary):
channel = client.get_channel(channel_id)
await channel.send(summary)client.run('YOUR_DISCORD_BOT_TOKEN')
```#### 8. Main Script to Orchestrate Everything
```python
import schedule
import timedef job():
mongo_client = get_mongo_client()
posts = fetch_posts(mongo_client)
cleaned_posts = [clean_text(post['text']) for post in posts]
important_posts = filter_posts(cleaned_posts)
summaries = [{'post_id': post['_id'], 'summary': generate_summary(post['text'])} for post in important_posts]
store_summaries(mongo_client, summaries)
for summary in summaries:
post_summary('YOUR_DISCORD_CHANNEL_ID', summary['summary'])schedule.every().hour.do(job)
while True:
schedule.run_pending()
time.sleep(1)
```### Detailed Steps Explanation
1. **Set Up Environment**:
- Install all necessary Python libraries.
- Set up MongoDB with collections for posts and summaries.
- Register and set up a Discord bot.2. **Fetch Data from MongoDB**:
- Connect to MongoDB and fetch posts from the last hour.3. **Data Cleaning**:
- Clean the text data by removing duplicates, tokenizing, normalizing text, and removing stop words.4. **Contextual and Semantic Filtering**:
- Use NLP models to filter out posts that contain important keywords and have a significant sentiment.5. **Generate Summaries Using OpenAI API**:
- Summarize the filtered posts using the OpenAI API.6. **Store Summaries in MongoDB**:
- Save the generated summaries back into the MongoDB summaries collection.7. **Post Summaries to Discord Channel**:
- Use a Discord bot to post the summaries to a specific Discord channel.By following these steps and using the provided scripts, you can set up an automated system to fetch, process, summarize, and post social media data.