https://github.com/solrikk/datadigger

DataDigger is a powerful and intuitive web application designed to extract and analyze data from web pages.
https://github.com/solrikk/datadigger

business-intelligence content-extraction data-analysis data-collection data-extraction data-mining go golang-api html-parser marketing-tools metadata-extraction research-tools seo-tools web-application web-crawling web-scraping web-tools

Last synced: 2 months ago
JSON representation

DataDigger is a powerful and intuitive web application designed to extract and analyze data from web pages.

Host: GitHub
URL: https://github.com/solrikk/datadigger
Owner: Solrikk
License: mit
Created: 2024-06-25T17:00:48.000Z (12 months ago)
Default Branch: main
Last Pushed: 2025-03-02T08:20:16.000Z (4 months ago)
Last Synced: 2025-04-15T22:56:28.867Z (2 months ago)
Topics: business-intelligence, content-extraction, data-analysis, data-collection, data-extraction, data-mining, go, golang-api, html-parser, marketing-tools, metadata-extraction, research-tools, seo-tools, web-application, web-crawling, web-scraping, web-tools
Language: Go
Homepage: https://data-digger-sollrikk.replit.app
Size: 38.1 KB
Stars: 4
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

![DataDigger Logo](https://github.com/Solrikk/DataDigger/blob/main/assets/result/images/orb6.png)

⭐English⭐ | Russian | German | Japanese | Korean | Chinese

-----------------

# DataDigger

## Overview

**DataDigger** is a powerful web application designed to extract and analyze structured data from websites. Built with Go, it provides a seamless experience for data extraction, analysis, and export.

## 📊 Example Output

DataDigger organizes extracted data into the following categories:

| Content Type | HTML Tag | Text | URL | Metadata | Date |
|--------------|----------|------|-----|----------|------|
| title | title | Website Title | | | 2023-05-20 |
| heading | h1 | Main Heading | | | 2023-05-20 |
| paragraph | p | Content text... | | | 2023-05-20 |
| link | a | Link text | https://example.com | | 2023-05-20 |
| image | img | Alt text | https://example.com/image.jpg | | 2023-05-20 |
| metadata | description | Site description | | | 2023-05-20 |

## Key Features

- **Comprehensive Data Extraction**: Automatically collects and organizes:
- Page titles and metadata
- Headings (H1-H6)
- Paragraph text
- Lists (ordered and unordered)
- Links with their text and URLs
- Images with their alt text and URLs
- Tables with formatted content

- **Excel Export**: One-click export to Excel (.xlsx) format with properly formatted sheets and columns

- **User-Friendly Interface**: Clean, intuitive design that requires no technical knowledge

- **Real-Time Processing**: Fast and efficient scraping engine with immediate results

## How It Works

1. Enter the URL of any website you want to analyze in the input field
2. Click "Extract Data" and let DataDigger work its magic
3. Receive a structured Excel file with all the extracted data
4. Review organized content categorized by type and HTML element

## Use Cases

- **Market Research**: Analyze competitor websites and product information
- **Content Aggregation**: Build databases of information from multiple sources
- **SEO Analysis**: Extract and analyze headings, metadata, and content structure
- **Data Journalism**: Collect data for reporting and analysis
- **Academic Research**: Gather information from online sources for studies

## Technical Details

DataDigger is built with:
- Go (Golang) for the backend processing
- GoQuery for HTML parsing
- Excelize for Excel file generation
- Clean HTML/CSS/JavaScript frontend

## Getting Started

### Prerequisites
- Go 1.19 or higher

### Running Locally
1. Clone the repository
2. Run `go mod download` to install dependencies
3. Start the server with `go run main.go`
4. Access the application at http://0.0.0.0:8080

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Contributing

Contributions are welcome! Feel free to submit a pull request or open an issue.

-----------------

Made with ❤️ by Solrikk

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/solrikk/datadigger

Awesome Lists containing this project

README

⭐English⭐ | Russian | German | Japanese | Korean | Chinese