{"id":22620408,"url":"https://github.com/solrikk/datadigger","last_synced_at":"2025-04-15T22:56:32.844Z","repository":{"id":246076814,"uuid":"820034160","full_name":"Solrikk/DataDigger","owner":"Solrikk","description":"DataDigger is a powerful and intuitive web application designed to extract and analyze data from web pages. ","archived":false,"fork":false,"pushed_at":"2025-03-02T08:20:16.000Z","size":39,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-15T22:56:28.867Z","etag":null,"topics":["business-intelligence","content-extraction","data-analysis","data-collection","data-extraction","data-mining","go","golang-api","html-parser","marketing-tools","metadata-extraction","research-tools","seo-tools","web-application","web-crawling","web-scraping","web-tools"],"latest_commit_sha":null,"homepage":"https://data-digger-sollrikk.replit.app","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Solrikk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-25T17:00:48.000Z","updated_at":"2025-03-02T08:20:20.000Z","dependencies_parsed_at":"2024-06-25T18:54:28.801Z","dependency_job_id":"29c938f3-774b-4f7e-ad9e-8b0e5e6c0389","html_url":"https://github.com/Solrikk/DataDigger","commit_stats":null,"previous_names":["solrikk/datadigger"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Solrikk%2FDataDigger","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Solrikk%2FDataDigger/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Solrikk%2FDataDigger/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Solrikk%2FDataDigger/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Solrikk","download_url":"https://codeload.github.com/Solrikk/DataDigger/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249167439,"owners_count":21223505,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["business-intelligence","content-extraction","data-analysis","data-collection","data-extraction","data-mining","go","golang-api","html-parser","marketing-tools","metadata-extraction","research-tools","seo-tools","web-application","web-crawling","web-scraping","web-tools"],"created_at":"2024-12-08T22:13:36.596Z","updated_at":"2025-04-15T22:56:32.823Z","avatar_url":"https://github.com/Solrikk.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n![DataDigger Logo](https://github.com/Solrikk/DataDigger/blob/main/assets/result/images/orb6.png)\n\n\u003cdiv align=\"center\"\u003e \u003ch3\u003e \u003ca href=\"https://github.com/Solrikk/DataDigger/blob/main/README.md\"\u003e⭐English⭐\u003c/a\u003e | \u003ca href=\"https://github.com/Solrikk/DataDigger/blob/main/assets/docs/README_RU.md\"\u003eRussian\u003c/a\u003e | \u003ca href=\"https://github.com/Solrikk/DataDigger/blob/main/README_GE.md\"\u003eGerman\u003c/a\u003e | \u003ca href=\"https://github.com/Solrikk/DataDigger/blob/main/README_JP.md\"\u003eJapanese\u003c/a\u003e | \u003ca href=\"README_KR.md\"\u003eKorean\u003c/a\u003e | \u003ca href=\"README_CN.md\"\u003eChinese\u003c/a\u003e \u003c/h3\u003e \u003c/div\u003e\n\n-----------------\n\n# DataDigger\n\n## Overview\n\n**DataDigger** is a powerful web application designed to extract and analyze structured data from websites. Built with Go, it provides a seamless experience for data extraction, analysis, and export.\n\n## 📊 Example Output\n\nDataDigger organizes extracted data into the following categories:\n\n| Content Type | HTML Tag | Text | URL | Metadata | Date |\n|--------------|----------|------|-----|----------|------|\n| title | title | Website Title | | | 2023-05-20 |\n| heading | h1 | Main Heading | | | 2023-05-20 |\n| paragraph | p | Content text... | | | 2023-05-20 |\n| link | a | Link text | https://example.com | | 2023-05-20 |\n| image | img | Alt text | https://example.com/image.jpg | | 2023-05-20 |\n| metadata | description | Site description | | | 2023-05-20 |\n\n## Key Features\n\n- **Comprehensive Data Extraction**: Automatically collects and organizes:\n  - Page titles and metadata\n  - Headings (H1-H6)\n  - Paragraph text\n  - Lists (ordered and unordered)\n  - Links with their text and URLs\n  - Images with their alt text and URLs\n  - Tables with formatted content\n\n- **Excel Export**: One-click export to Excel (.xlsx) format with properly formatted sheets and columns\n\n- **User-Friendly Interface**: Clean, intuitive design that requires no technical knowledge\n\n- **Real-Time Processing**: Fast and efficient scraping engine with immediate results\n\n## How It Works\n\n1. Enter the URL of any website you want to analyze in the input field\n2. Click \"Extract Data\" and let DataDigger work its magic\n3. Receive a structured Excel file with all the extracted data\n4. Review organized content categorized by type and HTML element\n\n## Use Cases\n\n- **Market Research**: Analyze competitor websites and product information\n- **Content Aggregation**: Build databases of information from multiple sources\n- **SEO Analysis**: Extract and analyze headings, metadata, and content structure\n- **Data Journalism**: Collect data for reporting and analysis\n- **Academic Research**: Gather information from online sources for studies\n\n## Technical Details\n\nDataDigger is built with:\n- Go (Golang) for the backend processing\n- GoQuery for HTML parsing\n- Excelize for Excel file generation\n- Clean HTML/CSS/JavaScript frontend\n\n## Getting Started\n\n### Prerequisites\n- Go 1.19 or higher\n\n### Running Locally\n1. Clone the repository\n2. Run `go mod download` to install dependencies\n3. Start the server with `go run main.go`\n4. Access the application at http://0.0.0.0:8080\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Contributing\n\nContributions are welcome! Feel free to submit a pull request or open an issue.\n\n-----------------\n\n\u003cp align=\"center\"\u003eMade with ❤️ by \u003ca href=\"https://github.com/Solrikk\"\u003eSolrikk\u003c/a\u003e\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsolrikk%2Fdatadigger","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsolrikk%2Fdatadigger","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsolrikk%2Fdatadigger/lists"}