Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/futuresea-dev/backend-engineer-take-home-test
Backend Engineer Take Home Test
https://github.com/futuresea-dev/backend-engineer-take-home-test
bs4 collection python requests urlib validator
Last synced: 11 days ago
JSON representation
Backend Engineer Take Home Test
- Host: GitHub
- URL: https://github.com/futuresea-dev/backend-engineer-take-home-test
- Owner: futuresea-dev
- Created: 2021-10-05T00:24:20.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-10-05T16:43:15.000Z (over 3 years ago)
- Last Synced: 2024-11-07T20:50:38.444Z (2 months ago)
- Topics: bs4, collection, python, requests, urlib, validator
- Language: Python
- Homepage:
- Size: 2.93 KB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Backend-Engineer-Take-Home-Test
Backend Engineer Take Home Test
# Prompt### Introduction
Please only spend 2-4 hours on this. We are interested in the code design and the choices you make. Any works that make the code better (especially when you work in a team) are also welcomed. If you run out of time, please include a document outlining how you would finish.
Restrictions:
- Choose any programming language other than bash.
- Do not invoke unix utilities (i.e. curl, wget, etc)### Problem
Please implement a command line program that can fetch web pages and saves them to disk for later retrieval and browsing.
**Section 1**
For example if we invoked your program like this: `./fetch [https://www.google.com](https://www.google.com)` then in our current directory we should have a file containing the contents of `www.google.com`. (i.e. `/home/myusername/fetch/www.google.com.html`).
We should be able to specify as many urls as we want:
```jsx
$> ./fetch https://www.google.com https://autify.com <...>
$> ls
autify.com.html www.google.com.html
```If the program runs into any errors while downloading the html it should print the error to the console.
**Section 2**
Record metadata about what was fetched:
- What was date and time of last fetch
- How many links are on the page
- How many images are on the pageModify the script to print this metadata.
For example (it can work differently if you like)
```jsx
$> ./fetch --metadata https://www.google.com
site: www.google.com
num_links: 35
images: 3
last_fetch: Tue Mar 16 2021 15:46 UTC
```### Extra credit (only if you have time)
Archive all assets as well as html so that you load the web page locally with all assets loading properly.