https://github.com/ans-4175/sstest

playing with python
https://github.com/ans-4175/sstest

Last synced: 3 months ago
JSON representation

playing with python

Host: GitHub
URL: https://github.com/ans-4175/sstest
Owner: ans-4175
Created: 2016-02-17T18:15:15.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2016-02-17T18:37:59.000Z (over 9 years ago)
Last Synced: 2024-10-12T10:03:50.913Z (9 months ago)
Language: Python
Size: 5.86 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# README
## Problem 1
`python prob1.py [filename]`
It will produce file output with name `[filename].out`
> If you need a seeding file execute this `python seeding.py [filename] [number_of_lines]`

i.e
```
python seeding1.py age.jkt 10000000
python prob1.py age.jkt
```

## Problem 2
I believe my script will solve it, but maybe take more time. I would like to split parallelism wisely based on CPU cores, find suitable tuning. I would like to find tuning how many I should split input files into chunks, and number of parallelism.
> Based on my benchmark and shitty use of multiprocessing (first time using Python's Pool). With my laptop (i5 with 8GB memory, I use pool's count same as my cores' count and chunked files into 1024 * CPU cores' count), I need around ~2GB for processing 7 million entries. It must be my mistakes not patiently try better tuning or good use of multiprocessing

## Problem 3
`python prob3.py [filename] [name_to_find] [phone_to_find]`
It will print in console True|False whether your arguments exists
> If you need a seeding file execute this `python seeding3.py [filename] [number_of_lines]`

i.e
```
python seeding3.py buyers.jkt 10000000
python prob3.py buyers.jkt namasiapa 14045
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ans-4175/sstest

Awesome Lists containing this project

README