https://github.com/thesurlydev/feed-fetcher
Fetch data and feeds from a variety of sources
https://github.com/thesurlydev/feed-fetcher
Last synced: 9 months ago
JSON representation
Fetch data and feeds from a variety of sources
- Host: GitHub
- URL: https://github.com/thesurlydev/feed-fetcher
- Owner: thesurlydev
- Created: 2023-06-12T17:55:11.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-04-05T15:59:16.000Z (about 2 years ago)
- Last Synced: 2025-02-06T18:52:28.675Z (over 1 year ago)
- Language: Rust
- Homepage:
- Size: 258 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# feed-fetcher
## Features
* Auto-discovery and import of feeds given a website URL; prefix with `https://`
* Import from [OPML](https://en.wikipedia.org/wiki/OPML) file; prefix with `opml!` followed by path or URL
* Import a single feed from a URL; prefix with `feed!` followed by path or URL
* Import your ebooks and use [ChatGPT](https://openai.com/blog/chatgpt) to ask questions about them; prefix with `ebook!` followed by path or URL
## Roadmap
* Make it trivial to add data sources
* Watch directories and/or hosted files for changes and automatically import?
* Import any source supported by [LangChain](https://python.langchain.com/docs/ecosystem/integrations/)
* Percolator
* Bubble up new answers and associated source to a ChatGPT prompt as new posts come in.
* Follow a person and monitor their posts for new answers to your questions.
* Follow a topic with associated questions and monitor for new answers.
* Return source(s) of answers to questions
* Manage the data used to answer questions
* Source types and labels
* Investment opportunities as they are discovered
## Use cases
* Near real-time financial data and company news for investors
* Personalized assistant using your private data
* Highly targeted news for niche interests
* Personalized audit trail of your online activity
* Personalized search engine
* Monitor infrastructure and suggest improvements and cost savings
* Build niche databases for research and analysis
## Daily interactions
* [RSS](https://en.wikipedia.org/wiki/RSS)
* [Atom](https://en.wikipedia.org/wiki/Atom_(Web_standard))
* [JSON Feed](https://jsonfeed.org/)
* [OPML](https://en.wikipedia.org/wiki/OPML)
* [Google Takeout](https://takeout.google.com/)
* [Google Drive](https://www.google.com/drive/)
* [Google Search](https://www.google.com/)
* [Google Maps](https://www.google.com/maps)
* [Google Calendar](https://calendar.google.com/)
* [Google Trends](https://trends.google.com/trends/)
* [Slack](https://slack.com/)
* [Discord](https://discord.com/)
* [Twitter](https://twitter.com/)
* [Mastodon](https://joinmastodon.org/)
* [Reddit](https://www.reddit.com/)
* [GitHub](https://github.com)
* Books
* Email
* Slack
* News
* Reddit
* GitHub
* YouTube
* LinkedIn
* Wikipedia
## Typical questions
* What is the latest news about $COMPANY?
## Components
* Vector store
* FAISS
* Embeddings
* HuggingFace
* Model
* GPT4all, [privateGPT](https://github.com/imartinez/privateGPT)
---
To self-host a comprehensive GPT system that incorporates various data sources, including personal files, you would need several components. Here's an overview of the essential components:
1. Hardware Infrastructure: You'll require suitable hardware infrastructure to host and run the GPT system. This typically includes servers or powerful machines with sufficient processing power, memory, and storage capacity to handle the workload.
2. GPT Model: You'll need the actual GPT model, which can be obtained by training a model like GPT-3.5 or by using a pre-trained model. OpenAI provides pre-trained GPT models that can be used for a wide range of natural language processing tasks.
3. Data Storage: You'll need a storage system to store the various data sources, including personal files. This can be a local file storage system or a cloud-based storage solution. Ensure that the storage system is reliable, secure, and capable of handling the amount of data you plan to include in your GPT system.
4. Data Preprocessing Pipeline: Depending on the format and structure of your data sources, you may need to preprocess the data to make it compatible with the GPT model. This may involve cleaning, formatting, and converting the data into a suitable representation that the model can understand. Design and implement a preprocessing pipeline to handle these tasks efficiently.
5. Integration Layer: You'll need an integration layer to manage the interaction between your GPT system and the various data sources. This layer should provide the necessary APIs or interfaces for accessing and retrieving data from different sources, including personal files. It may involve setting up protocols, authentication mechanisms, and access controls to ensure secure and controlled access to the data.
6. Application Interface: Design and develop an application interface that allows users to interact with the GPT system. This can be a web-based interface, a command-line interface, or an API that other applications can integrate with. The interface should facilitate input to the GPT model and present the generated output in a user-friendly manner.
7. Security Measures: Implement appropriate security measures to protect the sensitive data in your GPT system. This includes encryption of data at rest and in transit, access controls, user authentication, and regular security audits to identify and address potential vulnerabilities.
8. Monitoring and Maintenance: Set up monitoring tools and processes to keep track of the system's performance, resource utilization, and potential issues. Regular maintenance tasks, such as model updates, data updates, and system upgrades, should be planned and executed to ensure the system's optimal functioning.
Remember, building a comprehensive GPT system with personal file integration can be a complex task, requiring expertise in machine learning, data engineering, and system administration. It's advisable to consult with experts or seek professional assistance to ensure a smooth implementation.