Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bitartisan1/netdigger
A .NET 8.0 C# WPF desktop application for web scraping data into structured databases with a modern UI, comprehensive logging and optimized high performance.
https://github.com/bitartisan1/netdigger
csharp data data-scraper data-scraping database desktop dotnet internet logging scraper ui url web-scraper web-scrapers web-scraping web-scrapping
Last synced: about 1 month ago
JSON representation
A .NET 8.0 C# WPF desktop application for web scraping data into structured databases with a modern UI, comprehensive logging and optimized high performance.
- Host: GitHub
- URL: https://github.com/bitartisan1/netdigger
- Owner: bitArtisan1
- License: agpl-3.0
- Created: 2024-07-17T19:01:17.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-08-06T17:40:35.000Z (5 months ago)
- Last Synced: 2024-08-06T21:07:48.167Z (5 months ago)
- Topics: csharp, data, data-scraper, data-scraping, database, desktop, dotnet, internet, logging, scraper, ui, url, web-scraper, web-scrapers, web-scraping, web-scrapping
- Language: C#
- Homepage:
- Size: 375 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# netDigger
netDigger is a web scraping application built using .NET 8.0 and C# WPF in Visual Studio. It collects various types of data and exports them into organized and structured databases. The application features a modern UI design with detailed and comprehensive logging.
A Powerful In-Depth Web Scraping Application.
## Features
- **Asynchronous Web Scraping**: Efficiently scrape web pages using asynchronous tasks with minimized latency and multi-threading for parallel processing.
- **Data Collection**: Collects data such as PDFs, CSVs, DOCX, XLS, PPTX, TXT, Images, Videos, JSON, DBSQL, XML, HTML, PHP, JS, Archives, and Miscellaneous files.
- **Comprehensive JSON, XML, and HTML Parsing**: Utilizes advanced parsing techniques to extract valuable information from JSON, XML, and HTML documents, including finding and processing hidden element data and meta data.
- **Database Integration**: Organizes scraped URLs into SQLite databases based on their file types.
- **Modern UI Design**: User-friendly WPF interface with rich text logging.
- **Detailed Logging**: Comprehensive log messages with timestamps, log levels, and thread IDs.
- **Export Options**: Export scraped data to database files, CSV, and TXT formats.
- **Multi OS Support**: Compatible with Windows x64/x86/ARM, Linux and MacOS.## Technologies Used
- **.NET 8.0**
- **C#**
- **WPF (Windows Presentation Foundation)**
- **AngleSharp** for HTML parsing
- **PuppeteerSharp**: A headless browser automation library for .NET.
- **Newtonsoft.Json (Json.NET)**: A popular library for working with JSON in .NET.
- **SQLite** for database management
- **Concurrent Collections** for thread-safe operations## Prerequisites
- .NET 8.0 Desktop Runtime or SDK Framework.
- Visual Studio 2022. (In case you want to build it yourself).## Installation
1. Clone the repository:
```sh
git clone https://github.com/your-username/netDigger.git
cd netDigger
```
2. Open the solution file (netDigger.sln) in Visual Studio.3. Build the project:
Select `Build > Build Solution`.
Run the application:Select `Debug > Start Debugging` or press `F5`.
## Contribution
1. Fork the repository.
2. Create a new branch (git checkout -b feature-branch).
3. Commit your changes (git commit -m 'Add new feature').
4. Push to the branch (git push origin feature-branch).
5. Create a new Pull Request.## License
This project is licensed under the __GNU Affero General Public License v3.0__. See the __LICENSE__ file for more details.## Support Me
If you find RepoUp useful, consider supporting me by:- Starring the repository on GitHub
- Sharing the tool with others
- Providing feedback and suggestions
- Follow me for more :)
---
For any issues or feature requests, please open an issue on GitHub. Happy coding!