https://github.com/gitname/sediment-api

CSV file parser and web server for WHONDRS sediment data (exercise)
https://github.com/gitname/sediment-api
docker fastapi mongodb python
Last synced: 21 days ago
JSON representation
CSV file parser and web server for WHONDRS sediment data (exercise)
Host: GitHub
URL: https://github.com/gitname/sediment-api
Owner: gitname
Created: 2023-03-21T19:59:40.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2023-04-11T17:23:41.000Z (about 3 years ago)
Last Synced: 2026-01-14T08:39:03.022Z (3 months ago)
Topics: docker, fastapi, mongodb, python
Language: Python
Homepage:
Size: 96.7 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # Sediment API

![test](https://github.com/gitname/sediment-api/actions/workflows/test.yml/badge.svg)

> Welcome to the Sediment API repository! I wrote the code in this repository as part of an exercise.

> The exercise prompt was, in summary, to (a) write a Python script someone could use to extract data records

> from a CSV file and store them in a MongoDB database;

> and (b) create an HTTP API that someone could use to retrieve a data record in JSON format.

## Table of contents

* [Sediment API](#sediment-api)

  * [Table of contents](#table-of-contents)

  * [Overview](#overview)

  * [Usage](#usage)

    * [Exercise-specific](#exercise-specific)

    * [General](#general)

  * [Development](#development)

    * [Environment](#environment)

    * [Testing](#testing)

    * [Static type checking](#static-type-checking)

    * [Code formatting](#code-formatting)

    * [Dependencies](#dependencies)

  * [Roadmap](#roadmap)

## Overview

This repository contains Python scripts people can use to extract data from CSV files,

store that data in a MongoDB database,

and provide access to that data via an HTTP API.

The scripts are:

1. `parser/parser.py`: A file parser people can use to extract data from a CSV file and insert it into a database

2. `server/server.py`: A web server people can use to provide access to that data via an HTTP API

Here's a diagram showing how data flows into, between, and out of those scripts.

```mermaid

%% This is a flowchart written using Mermaid syntax.

%% GitHub will render it as an image.

%%

%% References: 

%% - https://mermaid.js.org/syntax/flowchart.html

%% - https://github.blog/2022-02-14-include-diagrams-markdown-files-mermaid/

flowchart LR

    parser[[parser.py]]

    db[(Database)]

    file[CSV File]

    client[HTTP Client]

    server[[server.py]]

    parser --> db

    db --> server

    subgraph File Parser

        parser

    end

    file -. CSV .-> parser

    subgraph Web Server

        server

    end

    server -. JSON .-> client

```

## Usage

### Exercise-specific

Here's how you can produce the behavior described in the exercise prompt.

1. Install [Docker](https://www.docker.com/) onto your computer.

2. Clone (or download and extract) this repository onto your computer.

3. Open a console in the root folder of the repository.

4. Copy the example config file and name the copy "`.env`".

   ```shell

   cp .env.example .env

   ```

5. Start the web server and MongoDB (in Docker containers).

   ```shell

   docker-compose up

   ```

   > **Note:** That command will run the containers in the foreground, taking over your console. You can open a new

   > console to issue the remaining commands.

6. Run the parser (in the `app` container).

   ```shell

   docker exec -it app python parser/parser.py parser/example_data/WHONDRS_S19S_Sediment_GrainSize.csv

   ```

7. In a web browser, visit http://localhost:8000/samples/S19S_0001_BULK-D

   - The web browser will show a sample in JSON format.

### General

Here's how you can use the system in general.

1. Do steps 1-5 shown in the "Exercise-specific" section above.

2. (Optional) Put a CSV file you want to parse, anywhere within the repository's file tree.

   > **Note:** All files within the repository's file tree are accessible within the `app` container

   > (within the `app` container, the root folder of the repository is located at `/code`).

3. Run the parser, specifying the path to the CSV file you want to parse.

   ```shell

   # Specify the path as it would be specified within the `app` container.

   docker exec -it app python parser/parser.py 

   ```

   > **Note:** You can specify the path as either an absolute path, using `/code` to refer to the root folder of the

   > repository (e.g. `/code/path/to/file.csv`); or a relative path, relative to the root folder of the repository

   > (e.g. `./path/to/file.csv`).

4. Submit an HTTP GET request to a URL having the format: `http://localhost:8000/samples/`

5. (Optional) Visit the **interactive API documentation** at http://localhost:8000/docs

You can also **run tests**, **perform static type checking**, and **format the code**.

Instructions for doing those things are in the "Development" section below.

## Development

> **Note:** You can issue all the commands shown in this section from the root folder of the repository.

### Environment

This repository contains a Docker-based development environment.

You can configure the development environment (and the Python scripts) by copying the `.env.example` file

and naming it `.env`.

```shell

cp .env.example .env

```

> **Note:** The default values in `.env.example` are adequate for running the Python scripts in

> the development environment.

You can then instantiate the development environment by issuing the following command:

```shell

docker-compose up

# Or, if you've made changes to the Dockerfile or to `requirements.txt`:

docker-compose up --build

```

> **Note:** That will cause Docker to instantiate a container for each service described in `docker-compose.yml`.

> - The `mongo` container will automatically start running MongoDB.

> - The `app` container, which has all the Python scripts' dependencies installed,

>   will automatically start running the web server.

With the development environment up and running, you can access a `bash` shell running on the `app` container

by issuing the following command:

```shell

docker exec -it app bash

```

### Testing

The tests in this repository were written using [pytest](https://docs.pytest.org/), a Python test framework and test

runner.

With the development environment up and running, you can **run all the tests** in the repository by issuing the

following command:

```shell

# From the `app` container:

pytest

# Or, from the Docker host:

docker exec -it app pytest

```

> **Note:** You can invoke `pytest` with the `-v` option to see a list of the tests that were run.

In addition, you can use the tool, [coverage](https://coverage.readthedocs.io/), to measure code coverage while running

the tests—and to subsequently display a code coverage report—by issuing the following command:

```shell

# From the `app` container:

coverage run -m pytest && coverage report

# Or, from the Docker host:

docker exec -it app bash -c "coverage run -m pytest && coverage report"

```

> **Note:** You can invoke `coverage report` with the `-m` option (as in, `coverage report -m`) to see which lines of

> code were "missed" (i.e. not executed).

### Static type checking

You can use [mypy](https://mypy.readthedocs.io/en/latest/) to perform static type checking on the Python code in this

repository.

With the development environment up and running, you can **perform static type checking** by issuing the following

command:

```shell

# From the `app` container:

mypy

# Or, from the Docker host:

docker exec -it app mypy

```

> **Note:** When you run `mypy` as shown above, it will run according to the configuration specified in `mypy.ini`.

### Code formatting

The Python code in this repository is formatted using [Black](https://black.readthedocs.io/en/stable/), which is

an "[opinionated](https://black.readthedocs.io/en/stable/the_black_code_style/index.html)"—but still PEP 8-compliant—code

formatter.

With the development environment up and running, you can **format all the Python code** in the repository by issuing the

following command:

```shell

# From the `app` container:

black .

# Or, from the Docker host:

docker exec -it app black .

```

### Dependencies

I wrote the Python scripts in this repository using Python 3.10.

The `requirements.txt` file contains a list of all the dependencies of the Python scripts in this repository.

I generated the file by issuing the following command:

```shell

# From the `app` container:

pip freeze > requirements.txt

# Or, from the Docker host:

docker exec -it app pip freeze > requirements.txt

```

The table below contains the names of all the packages I explicitly installed via `pip install `:

| Name                | Description                    | I use it to...                  | References                                                     |

|---------------------|--------------------------------|---------------------------------|----------------------------------------------------------------|

| `black`             | Code formatter                 | Format Python code              | [Documentation](https://black.readthedocs.io/en/stable)        |

| `coverage`          | Code coverage measurement tool | Measure test coverage           | [Documentation](https://coverage.readthedocs.io/)                 |

| `fastapi`           | HTTP API framework             | Process HTTP requests           | [Documentation](https://fastapi.tiangolo.com/)                 |

| `httpx`             | HTTP client                    | Submit HTTP requests (in tests) | [Documentation](https://www.python-httpx.org/)                 |

| `mypy`              | Static type checker            | Verify data type consistency    | [Documentation](https://mypy.readthedocs.io/en/latest/)        |

| `pymongo`           | Synchronous MongoDB driver     | Interact with the database      | [Documentation](https://www.mongodb.com/docs/drivers/pymongo/) |

| `pytest`            | Test framework                 | Run the tests                   | [Documentation](https://docs.pytest.org/en/7.2.x/)             |

| `python-dotenv`     | Configuration loader           | Read the `.env` file            | [Documentation](https://pypi.org/project/python-dotenv/)       |

| `typer[all]`        | CLI framework                  | Process CLI input and output    | [Documentation](https://typer.tiangolo.com/)                   |

| `uvicorn[standard]` | ASGI web server                | Serve the FastAPI app           | [Documentation](https://www.uvicorn.org/)                      |

> **Note:** Packages listed in `requirements.txt` that are not listed above, are packages that were automatically

> installed by `pip` when I installed the packages listed above. In other words, they are "dependencies of

> dependencies" (i.e. dependencies of the packages listed above).

## Roadmap

Here are some additional things I'm thinking about doing in this repository:

1. Create a Pydantic model representing the "sample" object and use it to

   (a) [validate and sanitize](https://docs.pydantic.dev/usage/validators/) the data extracted from the CSV file

   (e.g. `"-9999" → None`); (b) display the API response's

   [JSON schema](https://fastapi.tiangolo.com/tutorial/response-model/#see-it-in-the-docs) in the API docs and

   (c) [filter out](https://fastapi.tiangolo.com/tutorial/response-model/#fastapi-data-filtering)

   the `_id` field from the API response. Item (a) would happen in `parser.py` and items (b) and (c) would happen

   in `server.py`. Items (a) and (c) are already happening, but not via a Pydantic model.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/gitname/sediment-api

Awesome Lists containing this project

README