An open API service indexing awesome lists of open source software.

https://github.com/giog97/find_similar_tables_on_pubtables-1m

Find similar tables on the PubTables-1M dataset
https://github.com/giog97/find_similar_tables_on_pubtables-1m

data-analysis data-visualization datamining dm tables

Last synced: 6 months ago
JSON representation

Find similar tables on the PubTables-1M dataset

Awesome Lists containing this project

README

          

# Similar Table Search on PubTables-1M

## Description
This notebook implements a system for analyzing and searching for similar tables in the **PubTables-1M** dataset. It uses XML to extract structural features of tables and compares them based on specific metrics.

## Features
- Loading XML files containing tables.
- Extracting structural features (number of rows, columns, and cells).
- Analyzing and comparing tables to identify similarities.

## Requirements
Before running the notebook, make sure you have installed the following libraries:
```bash
pip install numpy
```

## Usage
1. Load the **PubTables-1M** dataset.
2. Run the cells to extract table features.
3. Analyze the results to identify similar tables.

## Author
This project was developed to analyze the structure of tables in the PubTables-1M dataset. If you have any suggestions or questions, feel free to contribute!