Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/siddhant-vij/html-link-parser

Go-based package to parse links (<a> tags) from an HTML file.
https://github.com/siddhant-vij/html-link-parser

dfs gophercises io-readers parsing-html recursion

Last synced: about 2 months ago
JSON representation

Go-based package to parse links (<a> tags) from an HTML file.

Awesome Lists containing this project

README

        

# HTML Link Parser

[Gophercises](https://gophercises.com/) Exercise Details:

In this exercise your goal is create a package that makes it easy to parse an HTML file and extract all of the links (`...` tags). For each extracted link you should return a data structure that includes the `href`.

Links will be nested in different HTML elements, and it is very possible that you will have to deal with HTML similar to code below.

```html

Something in a span
Text not in a span
Bold text!

```

In situations like these we want to get output that looks roughly like:

```go
Link{
Href: "/dog",
}
```

Once you have a working program, try to write some tests for it to practice using the testing package in go.


## Technical Notes

- Use the `x/net/html` package. Package html implements an HTML5-compliant tokenizer and parser.
- Ignore nested links. Eg with following HTML:
```html

Something here
nested dog link

```
It is okay if your code returns only the outside link - for the purposes of this exercise.


*Include the nested links as well in the output.*
- Test the code with example files included in the project repository. *Improve your tests and edge-case coverage.* Add Examples and Documentation for the code. Run the following in this order, using go tooling:
- tests
- go test
- coverage
- go test -cover
- go test -coverprofile coverage.out
- coverage shown in web browser
- go tool cover -html=coverage.out
- examples shown in documentation in a web browser
- godoc -http=:8080