https://github.com/giog97/find_similar_tables_on_pubtables-1m
Find similar tables on the PubTables-1M dataset
https://github.com/giog97/find_similar_tables_on_pubtables-1m
data-analysis data-visualization datamining dm tables
Last synced: 6 months ago
JSON representation
Find similar tables on the PubTables-1M dataset
- Host: GitHub
- URL: https://github.com/giog97/find_similar_tables_on_pubtables-1m
- Owner: Giog97
- Created: 2025-04-03T14:12:16.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-04-03T14:48:21.000Z (7 months ago)
- Last Synced: 2025-04-03T15:38:03.022Z (7 months ago)
- Topics: data-analysis, data-visualization, datamining, dm, tables
- Language: Jupyter Notebook
- Homepage:
- Size: 6.72 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Similar Table Search on PubTables-1M
## Description
This notebook implements a system for analyzing and searching for similar tables in the **PubTables-1M** dataset. It uses XML to extract structural features of tables and compares them based on specific metrics.## Features
- Loading XML files containing tables.
- Extracting structural features (number of rows, columns, and cells).
- Analyzing and comparing tables to identify similarities.## Requirements
Before running the notebook, make sure you have installed the following libraries:
```bash
pip install numpy
```## Usage
1. Load the **PubTables-1M** dataset.
2. Run the cells to extract table features.
3. Analyze the results to identify similar tables.## Author
This project was developed to analyze the structure of tables in the PubTables-1M dataset. If you have any suggestions or questions, feel free to contribute!