An open API service indexing awesome lists of open source software.

https://github.com/raoumer/dwx

Deep Web Extractor (DWX): Deep Web Extractor system is using statistical machine learning models for crawling and data discovery from the Deep Web (i.e., massive and quality portion of World Wide Web) to build knowledge based databases.
https://github.com/raoumer/dwx

data-discovery data-science data-visualization machine-learning python

Last synced: 6 months ago
JSON representation

Deep Web Extractor (DWX): Deep Web Extractor system is using statistical machine learning models for crawling and data discovery from the Deep Web (i.e., massive and quality portion of World Wide Web) to build knowledge based databases.

Awesome Lists containing this project

README

          

# Deep Web Extractor (DWX)
**Deep Web Extractor** system is using statistical machine learning models for crawling and data discovery from the Deep Web (i.e., massive and quality portion of World Wide Web) to build knowledge based databases.

The main objectives are performed by this system as given below:
1. To discover and extract the deep web's content of quality for web searchers.
2. To discover automated means for identifying search-able web form interfaces and directing queries to them to digout information.
3. To build domain specific data repositories (e.g. real estate, newspapers, health, etc.) for purposeful analysis and building knowledge base databases.
4. To handle the complex queries, like queries containing different range values, not entertained by traditional search engines.
5. To facilitate Law and Enforcement Agencies to detect Fraudulent web user.

The proposed architecture of Deep Web Extractor (DWX) system is shown in Figure:

![DWX](https://raoumer.github.io/dwx/img/dwx.png "DWX Architecture")