https://github.com/ayushverma135/multiformat-interpreter

MultiFormat Interpreter is a Python script designed to automate content extraction from various file types and generate code based on user prompts using the GPT-4All model.
https://github.com/ayushverma135/multiformat-interpreter

gpt4all interpreter llm models multiformat multiformats python

Last synced: 8 months ago
JSON representation

MultiFormat Interpreter is a Python script designed to automate content extraction from various file types and generate code based on user prompts using the GPT-4All model.

Host: GitHub
URL: https://github.com/ayushverma135/multiformat-interpreter
Owner: Ayushverma135
License: bsd-3-clause
Created: 2024-07-03T21:11:47.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-07-06T20:45:51.000Z (over 1 year ago)
Last Synced: 2025-01-01T01:45:46.135Z (9 months ago)
Topics: gpt4all, interpreter, llm, models, multiformat, multiformats, python
Language: Jupyter Notebook
Homepage:
Size: 12.7 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# MultiFormat-Interpreter
The MultiFormat Interpreter project involves developing a Python script to automate the extraction of content from various file types such as PDF, TXT, CSV, JSON, and XLSX. This script leverages libraries like `PyPDF2` for PDFs, `pandas` for CSV and Excel files, and `json` for JSON files to read and process the content. The user interacts with the script by providing the file path and a prompt for code generation. This uses the GPT-4All model to generate code based on the extracted content and the user’s prompt.

## GPT-4ALL
The GPT-4All model is a variant of the GPT (Generative Pre-trained Transformer) architecture designed to handle a wide range of natural language processing tasks. Developed by Anthropic, GPT-4All aims to be a versatile tool for generating text across various domains, including code generation, text completion, translation, and more. It builds upon advancements in transformer-based models and is tailored to support applications requiring nuanced understanding and generation of human-like text.
![](https://storage.googleapis.com/gweb-research2023-media/original_images/22bf59cc5a1f2256b0c458250f076d58-USP.gif)
Key features of the GPT-4All model are:
1. **Local Operation:** GPT-4All operates locally without needing API calls or GPUs, enhancing accessibility and reducing reliance on cloud services.
2. **Offline Capability:** Once downloaded, GPT-4All works offline, ensuring users can access AI-powered text generation tools even without internet connectivity.
3. **Versatile Applications:** The model supports various tasks like natural language understanding, text generation, and summarization across different domains and languages.
4. **User-Friendly Interface:** Designed for ease of use, GPT-4All offers an intuitive interface for inputting prompts and receiving text outputs efficiently.
5. **Privacy and Security:** Operating locally ensures data privacy and security compliance, crucial for handling sensitive information.
6. **Customization:** Users can customize GPT-4All for specific tasks, enhancing its adaptability to different applications.
7. **Continual Updates:** Anthropic provides regular updates and support, ensuring ongoing improvements for enhanced performance and usability.

Dowmload GPT_4ALL: [click here](https://github.com/nomic-ai/gpt4all?tab=readme-ov-file)

To know about Llama.cpp: [click here](https://github.com/ggerganov/llama.cpp)

## Objectives:
File Content Extraction: The script should be able to read and extract text content from various file formats including PDF, TXT, CSV, JSON, and XLSX.
Code Generation: Using the extracted content and a user-provided prompt, the script should generate code utilizing the GPT-4All model.

## Features:
1. __File Type Support:__

- PDF: Extracts text from PDF documents using the PyPDF2 library.
- TXT: Reads text files directly.
- CSV: Reads and converts CSV file content to a string format using the pandas library.
- JSON: Reads and pretty-prints JSON data.
- XLSX: Reads and converts Excel file content to a string format using the pandas library.

2. __User Interaction:__

- Prompts the user to enter the path to the file they wish to process.
- Displays the content of the selected file.
- Asks the user for a prompt to guide the code generation.

3. __Code Generation:__

- Uses the GPT-4All model to generate code based on the combined input of file content and user prompt.
- Outputs the generated code for the user.

## Workflow:
1. __User Input:__
- The user provides the path to a file.
- The user inputs a prompt for code generation.
2. __Content Reading:__
- The script reads the file content based on its type.
- The content is displayed to the user.
3. __Prompt Handling:__
- The script combines the file content with the user prompt to create a comprehensive input for the GPT model.
4. __Code Generation:__
- The GPT-4All model processes the input and generates relevant code.
- The generated code is displayed to the user.

## Prerequisites:
- Python 3.x
- Required libraries: PyPDF2, pandas, json, gpt4all

## Installation:
- Clone the repository:

git clone https://github.com/Ayushverma135/MultiFormat-Interpreter.git
cd MultiFormat-Interpreter

- Install required Python libraries:

- Ensure that the required libraries (PyPDF2, pandas, gpt4all) are imported at the beginning of the notebook.
- If not already installed, install the libraries directly in a code cell at the beginning of the notebook:

!pip install PyPDF2 pandas gpt4all
## Usage:
- Open and Run the Notebook:
- Open the Google Colab notebook `MultiFormat Interpreter.ipynb ` in your browser.
- Run each cell sequentially to execute the script.

## Contributing
Contributions to enhance or fix issues in the Google Colab notebook are welcome! Please fork the repository and submit pull requests for any improvements.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ayushverma135/multiformat-interpreter

Awesome Lists containing this project

README