An open API service indexing awesome lists of open source software.

https://github.com/streamlined2/webcrawler

Web crawler application that collects domain statistical information and saves it to database
https://github.com/streamlined2/webcrawler

dao-layer freemarker front-controller-pattern heroku-deployment heroku-maven-plugin httpclient java-17 jetty-server jpa-hibernate jsoup-library lombok mvc-pattern postgresql service-layer servlet

Last synced: about 1 month ago
JSON representation

Web crawler application that collects domain statistical information and saves it to database

Awesome Lists containing this project

README

          

# webCrawler

Simple web crawler to collect domain statistical information and save it to DB

Application deployed to Heroku service
https://very-simple-web-crawler.herokuapp.com/crawler


DB setup script webCrawler/setup.sql


  • Java 17

  • JEE front controller servlet

  • MVC pattern

  • service, DAO layers

  • JPA transaction programmatic management

  • JPA/Lombok annotations for entity classes



  • Embedded Jetty server

  • PostgreSQL 13.4

  • Freemarker templates

  • Java HttpClient to fetch resources

  • Jsoup to analyze HTML