Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/futuresea-dev/backend-engineer-take-home-test

Backend Engineer Take Home Test
https://github.com/futuresea-dev/backend-engineer-take-home-test

bs4 collection python requests urlib validator

Last synced: 11 days ago
JSON representation

Backend Engineer Take Home Test

Host: GitHub
URL: https://github.com/futuresea-dev/backend-engineer-take-home-test
Owner: futuresea-dev
Created: 2021-10-05T00:24:20.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2021-10-05T16:43:15.000Z (over 3 years ago)
Last Synced: 2024-11-07T20:50:38.444Z (2 months ago)
Topics: bs4, collection, python, requests, urlib, validator
Language: Python
Homepage:
Size: 2.93 KB
Stars: 0
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Backend-Engineer-Take-Home-Test
Backend Engineer Take Home Test
# Prompt

### Introduction

Please only spend 2-4 hours on this. We are interested in the code design and the choices you make. Any works that make the code better (especially when you work in a team) are also welcomed. If you run out of time, please include a document outlining how you would finish.

Restrictions:

- Choose any programming language other than bash.
- Do not invoke unix utilities (i.e. curl, wget, etc)

### Problem

Please implement a command line program that can fetch web pages and saves them to disk for later retrieval and browsing.

**Section 1**

For example if we invoked your program like this: `./fetch [https://www.google.com](https://www.google.com)` then in our current directory we should have a file containing the contents of `www.google.com`. (i.e. `/home/myusername/fetch/www.google.com.html`).

We should be able to specify as many urls as we want:

```jsx
$> ./fetch https://www.google.com https://autify.com <...>
$> ls
autify.com.html www.google.com.html
```

If the program runs into any errors while downloading the html it should print the error to the console.

**Section 2**

Record metadata about what was fetched:

- What was date and time of last fetch
- How many links are on the page
- How many images are on the page

Modify the script to print this metadata.

For example (it can work differently if you like)

```jsx
$> ./fetch --metadata https://www.google.com
site: www.google.com
num_links: 35
images: 3
last_fetch: Tue Mar 16 2021 15:46 UTC
```

### Extra credit (only if you have time)

Archive all assets as well as html so that you load the web page locally with all assets loading properly.