https://github.com/danpersa/address-updater
https://github.com/danpersa/address-updater
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/danpersa/address-updater
- Owner: danpersa
- Created: 2014-07-28T21:17:07.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2014-07-31T19:00:26.000Z (almost 11 years ago)
- Last Synced: 2025-01-22T04:31:22.592Z (4 months ago)
- Language: Java
- Size: 199 KB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## README
### Technologies
- This is a **Java 8** application
- It uses **Spring 4** for dependency injection
- The application is a **Spring Batch** application.
- I'm using **Spring Boot** to quickly start the Spring Batch Job.
- I use **Spring Data JPA** to persist the Entities in a database.
- I'm using the **Apache Tika** library to extract the text inside of the HTML body.
- I'm using **Lucene** to index the text of the body of the HTML, in order to faster find the city of the addresses.
- I'm using **regular expressions** to identify the addresses inside of the text
- **Guava** for preconditionsFor tests:
- **Mockito** for mocking objects
- **Hamcrest Matchers** for matching inside assertions
- I've created some integration tests, with the **SpringJUnit4ClassRunner** runner### Architecture
- The classes follow the SRP - Single Responsibility Principle
- The entities are immutable, so they can be safely shared between layers
- There is a clear separation between the layers, with one way dependencies### Flow Description
I start a job. This job retrieves batches of companies.
- For each batch of companies:
* I use another thread to process the extraction of addresses. (I use the @Async annotation)
* For each company in the batch:
- I retrieve the html from the server, using the url
- I extract the body of the html, and transform it to text
- I process the text, to eliminate end of lines, white spaces
- I regulate the street names
- I search and extract addresses from the text of the page using regular expressions
- I add new addresses to the company, in case they changedI've written:
- **Integration Tests** for **Services** and **Repositories**
- **Unit Tests** for the other classes
## Sequence Diagrams
The first part of the flow:
The second part of the flow
## TODOS
There are two TODOS inside of the code:
- Implement the sending of the email
- Use a JMS queue to decouple the part where we identify the changed addresses and the part where we send emails and save them
- Create more UMLs and update the existing ones
- Create builders for the entities