https://github.com/calcuis/structural-data-auditing-model

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/calcuis/structural-data-auditing-model
Owner: calcuis
Created: 2024-08-09T02:18:49.000Z (10 months ago)
Default Branch: main
Last Pushed: 2024-08-21T23:03:11.000Z (10 months ago)
Last Synced: 2025-01-21T13:11:33.052Z (5 months ago)
Language: Python
Size: 38.1 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

#### Model-2 (basic level)
- Read the content of `file1.csv` and `file2.csv`
- Calculate the `Required Sum` of the `Score` field from `file1.csv`
- Iterate through the rows of `file2.csv`, accumulating rows until the sum of their `Score` fields matches the `Required Sum`
- Save the selected rows to `results.csv` as final output

test it out by
```
py model2.py
```
#### Model-5 (advanced level)
- Calculating `Required Sum`: the sum of non-null `Score` values from `file1.csv` is calculated
- Initialization: variables `current_sum` and `output_rows` are initialized to track the running sum of scores and store matching rows, respectively
- Auditing Logic: iterate through `file2.csv`, selecting rows where the `Score` matches and the running total remains within the `Required Sum` (adjustment will be made if necessary); if a `Score` in `file1.csv` is null, the corresponding row in `file2.csv` is skipped (null handler); if the running total reaches the `Required Sum`, the loop stops (or it will stop until all the rows are scanned)
- The selected rows are saved to `results.csv`, ensuring the sum of the `Score` field matches the sum in `file1.csv`, and the `Time` field sequence follows `file2.csv`

test it out by
```
py model5.py
```
#### Model-9b (unstable loop)
- Loop until the `current_sum` reach the `required_sum`
- Reference model only right away (not for production)

test it out by
```
py model9b.py
```
#### Coexisting record detector
- Compare each record of `file1.csv` and `file2.csv`
- Coexist column is created using a lambda function that assigns 1 to each field if a record exists in both files, and null if not
- Save the output as `results.csv`

test it out by
```
py detector.py
```
#### Unmatched record detector
- Scan all csv files within the current directory
- List out those unmatched record(s) with source info

test it out by
```
py unmatched.py
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/calcuis/structural-data-auditing-model

Awesome Lists containing this project

README