Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/adritpal08/universal-web-scraper-using-generative-ai

Effortless Data Extraction, Powered by : Generative AI
https://github.com/adritpal08/universal-web-scraper-using-generative-ai

gemini gemma2-9b generative-ai groq groq-api llama3-8b llama31 llava-phi3 llm mixtral-8x7b python3 scraper streamlit websraping

Last synced: 2 days ago
JSON representation

Effortless Data Extraction, Powered by : Generative AI

Awesome Lists containing this project

README

        

# Smart & Universal Web Scraper: Effortless Data Extraction, Powered by Generative AI 🦑

The Smart & Universal Web Scrapper is an intelligent data extraction tool powered by Generative AI. It simplifies the process of scraping data from any website by allowing users to provide the website link and the required data fields. With its versatile capabilities, this tool can extract data seamlessly and present it in a tabular format, which can be downloaded in various formats such as Excel, JSON, or Markdown. Its smart, user-friendly interface ensures efficient and accurate data extraction for all your web scraping needs.

## How it Works

1. **Launch the Application:** Open the Universal Web Scrapper on your system.
2. **Select an LLM Model:** Choose the desired Large Language Model (LLM) from the available options.
3. **Input the Website Link:** Paste the URL of the website from which you want to scrape data.
4. **Define the Data Fields:** Specify the data fields you want to extract from the website.
5. **Automatic Data Extraction:** The application intelligently scrapes the data and organizes it into a clear, structured table.
6. **Download the Data:** Export the scraped data in your preferred format (Excel, JSON, Markdown).

## It leverages the following technologies:

`Python:` Python is a popular, versatile programming language known for its simplicity and readability. It is widely used for various applications, including web development, data analysis, machine learning, and automation tasks. Python's extensive ecosystem of libraries and frameworks makes it a powerful tool for developers.

`LLaMA 3.1 (70b):` LLaMA (Lean Large-Language Model) is a family of large language models developed by Meta AI. The 3.1 (70b) version refers to a specific model variant with 70 billion parameters. Large language models like LLaMA are trained on vast amounts of text data, allowing them to understand and generate human-like text for various natural language processing tasks.

`Groq API:` Groq API provides access to Groq's powerful AI inference platform. It enables developers to leverage their advanced hardware and software for rapid and efficient AI model execution.

`Streamlit:` Streamlit is an open-source Python library that simplifies the process of building interactive data visualization and machine learning web applications. It allows developers to create user interfaces by writing Python scripts, making it easier to share data-driven applications with others.

## Running the Project
1. **Fork or Clone the Repository::**

Fork or clone this repository to your local machine using Git.

2. **Install Requirements:**

Install the necessary libraries.

```bash
pip install -r requirements.txt

```

3. **Set Up Environment Variables:**

Create a `.env` file in your project directory and add any required API keys (e.g., Google API key, Groq API KEY).

4. **Run the streamlit application file:**

```bash
streamlit run app.py
```

## License :

[![GPLv3 License](https://img.shields.io/badge/License-GPL%20v3-yellow.svg)](https://opensource.org/licenses/) [GNU General Public License v3.0](https://github.com/AdritPal08/universal-web-scraper-using-generative-ai/blob/main/LICENSE)

## Follow Me :

[![linkedin](https://img.shields.io/badge/linkedin-0A66C2?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/adritpal/)

## Authors

- [@Adrit Pal](https://github.com/AdritPal08)

##
- If you like my work and it helped you in anyway then please do ⭐ the repository it will motivate me to make more amazing projects