An open API service indexing awesome lists of open source software.

https://github.com/solrikk/datadigger

DataDigger is a powerful and intuitive web application designed to extract and analyze data from web pages.
https://github.com/solrikk/datadigger

business-intelligence content-extraction data-analysis data-collection data-extraction data-mining go golang-api html-parser marketing-tools metadata-extraction research-tools seo-tools web-application web-crawling web-scraping web-tools

Last synced: 2 months ago
JSON representation

DataDigger is a powerful and intuitive web application designed to extract and analyze data from web pages.

Awesome Lists containing this project

README

        

![DataDigger Logo](https://github.com/Solrikk/DataDigger/blob/main/assets/result/images/orb6.png)

⭐English⭐ | Russian | German | Japanese | Korean | Chinese

-----------------

# DataDigger

## Overview

**DataDigger** is a powerful web application designed to extract and analyze structured data from websites. Built with Go, it provides a seamless experience for data extraction, analysis, and export.

## 📊 Example Output

DataDigger organizes extracted data into the following categories:

| Content Type | HTML Tag | Text | URL | Metadata | Date |
|--------------|----------|------|-----|----------|------|
| title | title | Website Title | | | 2023-05-20 |
| heading | h1 | Main Heading | | | 2023-05-20 |
| paragraph | p | Content text... | | | 2023-05-20 |
| link | a | Link text | https://example.com | | 2023-05-20 |
| image | img | Alt text | https://example.com/image.jpg | | 2023-05-20 |
| metadata | description | Site description | | | 2023-05-20 |

## Key Features

- **Comprehensive Data Extraction**: Automatically collects and organizes:
- Page titles and metadata
- Headings (H1-H6)
- Paragraph text
- Lists (ordered and unordered)
- Links with their text and URLs
- Images with their alt text and URLs
- Tables with formatted content

- **Excel Export**: One-click export to Excel (.xlsx) format with properly formatted sheets and columns

- **User-Friendly Interface**: Clean, intuitive design that requires no technical knowledge

- **Real-Time Processing**: Fast and efficient scraping engine with immediate results

## How It Works

1. Enter the URL of any website you want to analyze in the input field
2. Click "Extract Data" and let DataDigger work its magic
3. Receive a structured Excel file with all the extracted data
4. Review organized content categorized by type and HTML element

## Use Cases

- **Market Research**: Analyze competitor websites and product information
- **Content Aggregation**: Build databases of information from multiple sources
- **SEO Analysis**: Extract and analyze headings, metadata, and content structure
- **Data Journalism**: Collect data for reporting and analysis
- **Academic Research**: Gather information from online sources for studies

## Technical Details

DataDigger is built with:
- Go (Golang) for the backend processing
- GoQuery for HTML parsing
- Excelize for Excel file generation
- Clean HTML/CSS/JavaScript frontend

## Getting Started

### Prerequisites
- Go 1.19 or higher

### Running Locally
1. Clone the repository
2. Run `go mod download` to install dependencies
3. Start the server with `go run main.go`
4. Access the application at http://0.0.0.0:8080

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Contributing

Contributions are welcome! Feel free to submit a pull request or open an issue.

-----------------

Made with ❤️ by Solrikk