Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/benitomartin/scraping-to-sql
Open Source Contribution to Justicio Project
https://github.com/benitomartin/scraping-to-sql
beautifulsoup fitz mysql pymupdf python requests
Last synced: 2 days ago
JSON representation
Open Source Contribution to Justicio Project
- Host: GitHub
- URL: https://github.com/benitomartin/scraping-to-sql
- Owner: benitomartin
- Created: 2024-07-31T06:08:34.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-09-05T13:02:40.000Z (4 months ago)
- Last Synced: 2024-11-08T10:09:59.365Z (about 2 months ago)
- Topics: beautifulsoup, fitz, mysql, pymupdf, python, requests
- Language: Jupyter Notebook
- Homepage: https://github.com/bukosabino/justicio
- Size: 6.46 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Justicio Web Scraping to SQL
Justicio is a Question/Answering Assistant that generates answers from user questions about the official state gazette of Spain: Boletín Oficial del Estado (BOE).
At this moment we are running a user-free service: [Website](https://justicio.es/v2/acerca) and [Repository](https://github.com/bukosabino/justicio)
All BOE articles are embedded in vectors and stored in a vector database. When a question is asked, the question is embedded in the same latent space and the most relevant text is retrieved from the vector database by performing a query using the embedded question. The retrieved pieces of text are then sent to the LLM to construct an answer.
## Tech Stack
![Jupyter Notebook](https://img.shields.io/badge/jupyter-%23FA0F00.svg?style=for-the-badge&logo=jupyter&logoColor=white)
![MySQL](https://img.shields.io/badge/mysql-%2300f.svg?style=for-the-badge&logo=mysql&logoColor=white)
![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)
![Pandas](https://img.shields.io/badge/pandas-%23150458.svg?style=for-the-badge&logo=pandas&logoColor=white)## Contributions
Web scraping of the municipal regulations of La Coruña and Oviedo and saving the file in an SQL dump for further usage in the vector database.