https://github.com/raoumer/dwx
Deep Web Extractor (DWX): Deep Web Extractor system is using statistical machine learning models for crawling and data discovery from the Deep Web (i.e., massive and quality portion of World Wide Web) to build knowledge based databases.
https://github.com/raoumer/dwx
data-discovery data-science data-visualization machine-learning python
Last synced: 6 months ago
JSON representation
Deep Web Extractor (DWX): Deep Web Extractor system is using statistical machine learning models for crawling and data discovery from the Deep Web (i.e., massive and quality portion of World Wide Web) to build knowledge based databases.
- Host: GitHub
- URL: https://github.com/raoumer/dwx
- Owner: RaoUmer
- Created: 2017-03-17T17:14:57.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2017-03-17T17:20:30.000Z (over 8 years ago)
- Last Synced: 2025-03-29T12:30:35.891Z (7 months ago)
- Topics: data-discovery, data-science, data-visualization, machine-learning, python
- Language: HTML
- Homepage: http://raoumer.github.io/dwx/
- Size: 1.57 MB
- Stars: 5
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Deep Web Extractor (DWX)
**Deep Web Extractor** system is using statistical machine learning models for crawling and data discovery from the Deep Web (i.e., massive and quality portion of World Wide Web) to build knowledge based databases.
The main objectives are performed by this system as given below:
1. To discover and extract the deep web's content of quality for web searchers.
2. To discover automated means for identifying search-able web form interfaces and directing queries to them to digout information.
3. To build domain specific data repositories (e.g. real estate, newspapers, health, etc.) for purposeful analysis and building knowledge base databases.
4. To handle the complex queries, like queries containing different range values, not entertained by traditional search engines.
5. To facilitate Law and Enforcement Agencies to detect Fraudulent web user.
The proposed architecture of Deep Web Extractor (DWX) system is shown in Figure:
