https://github.com/kashirin-alex/data-engineer-interview

Last synced: 21 days ago
JSON representation

Host: GitHub
URL: https://github.com/kashirin-alex/data-engineer-interview
Owner: kashirin-alex
Created: 2021-12-18T16:52:15.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2021-12-18T17:03:31.000Z (over 3 years ago)
Last Synced: 2025-02-12T18:53:35.906Z (2 months ago)
Language: Python
Size: 3.91 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Data-Engineer-interview

It is minimal non-persistent analyzer. (analyzing a single LogFile - no cross logfile data remains) \

It better be with database support and logs(csv-data) is a received from a request to queue-service

    

1) The `log_alerts.py` expects the logfile `output.csv` on the path

2) The logfile is renamed while processsed and after processed renamed with timestamp

3) Alerts can be defined in the dict `alerts`

   * distinct - define the key for the count  

   * by - define the either matching field to value

 

 ---

 

 #### Database support: (for the case with [SWC-DB](https://www.swcdb.org))

 1) define alerts

 2) insert/index by key=[rounded(ts/duration), distinct/s,,] value=+1 (column with duration TTL)\

    (clears expired logs for the alert duration)

    * there won't be the need for the `self.tracker` object;

      only iter csv and update the cells over the alerts-duration and distinct kinds 

 3) select the cells with COUNTER >= alert-count

 * Size and Tracked Durations won't be a subject of Worker-Host consumes more resources

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kashirin-alex/data-engineer-interview

Awesome Lists containing this project

README