https://github.com/csothen/htmlparser
Website parser and analyser
https://github.com/csothen/htmlparser
go golang parser
Last synced: over 1 year ago
JSON representation
Website parser and analyser
- Host: GitHub
- URL: https://github.com/csothen/htmlparser
- Owner: csothen
- License: mit
- Created: 2021-03-01T17:01:33.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2021-03-05T16:41:02.000Z (over 5 years ago)
- Last Synced: 2025-02-02T00:38:54.902Z (over 1 year ago)
- Topics: go, golang, parser
- Language: Go
- Homepage:
- Size: 27.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# HTML Parser Challenge
## Introduction
HTML Parser is capable of analysing the HTML content of a website and return an analysis result for that same page inluding the following:
- HTML Version
- Title of the website
- Number of headings by depth level
- Number of internal links (links to same domain)
- Number of external links (links to different domains)
- Number of inacccessible links (broken links or websites that aren't responding correly)
- If the website contains a login form
## Quick Start
### Run the server
To run the html parser server it is needed to do the following:
- Make sure you have [docker installed](https://docs.docker.com/engine/install/)
- Have [docker-compose installed](https://docs.docker.com/compose/install/) (optional)
- Download the repository into your machine and change directory into it
#### With docker-compose
- Simply run `docker-compose up`
#### With docker
- Run `docker build -t htmlparser .`
- Run `docker run --publish 9090:9090 --name htmlparser htmlparser`
### Make requests
To make a request to the server:
- Open your platform of choice to make API requests, for example [Postman](https://www.postman.com/).
- Make a POST request to `localhost:9090/api/parse` with a body containing an url as shown below

- Or using curl:
``` bash
curl -X POST -H "Content-Type: application/json" \
-d '{"url": "https://facebook.com"}' \
localhost:9090/api/parse
```
## Improvements
- Shorten response time by introducing request caching