Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/denver-code/for_test_python
Speed while generating id from uuid4x4
https://github.com/denver-code/for_test_python
c for generating loops python speed timepy uuid4
Last synced: 3 days ago
JSON representation
Speed while generating id from uuid4x4
- Host: GitHub
- URL: https://github.com/denver-code/for_test_python
- Owner: denver-code
- Created: 2022-07-30T15:54:18.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2022-07-30T17:27:13.000Z (over 2 years ago)
- Last Synced: 2023-03-05T14:17:06.031Z (almost 2 years ago)
- Topics: c, for, generating, loops, python, speed, timepy, uuid4
- Language: Python
- Homepage:
- Size: 38.1 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Speed while generating id from uuid4x4
### Description:
A simple script for checking speed and making predictions for large numbers of IDs out of 4 UUID4s.
There are several loops in the file, the first three are just a speed test of the standard for loop.
from 4 to 8 - test cycles, in which I tested different generation methods to choose the fastest and most correct solution for me.# How to run:
## Use terminal or cmd to run this project.
```bash
$ git clone https://github.com/denver-code/for_test_python
$ cd for_test_python
$ python -m venv .venv
Linux:
$ source .venv/bin/activate
Windows:
$ .venv\scripts\activate.bat
$ python main.py
```## Execution example:
![Result image](result_main.png)# Checking
## Checking forecasts with a real figure:We will use for the projected forecast - the picture above, there is all the approximate time, which is built from a different number of necessary IDs.
## Step 1
In the first line, we see the value of the need_items variable, this is 10. an example of the ID I need is:
```cmd
8e6ca7e8-bc09-4f1f-9313-c2fc10aee4d-231903b8-191f-4fe4-b6b9-27b76aab7a93-87e4d3b1-d3b1-4feb-a9c9-eff58b407 -45b3-8a6e-6caeffde97db
```
Its length is 147 characters, among them - 128 characters (letters and numbers). That is, 36 possible options for 1 position, there are 128 such positions.
We recall the 11th grade of the school, or rather, combinatorics and constellations, we make the equation 128 ^ 36, and we get a lot of options for IDs.
With a large number of ids already ready, there may be problems with generation, but this can be fixed by calling the generator function again (the execution time will increase noticeably.) approximately 7.
```python
237005577332262213973186563043e+75
```
or```python
94.672220753319050022924562134291
```
I don’t know how much it is exactly, but it’s a lot if the number of 128 character IDs is only from numbers - this is:
```python
1180591620717411303424(128^10)
```
possible combinations, then there are even more).
The upper bound of int32 is 2,147,483,647.
What is noticeably more than our number of combinations, bigint, int64 or other data structures can fix it.
## Step 2
Now you can observe the real execution time of each value from the screenshot, I repeat, it is approximate there.### Need items = 10
Execution time:
```python
#loop9: 0.00024s
```
Approximate time from example: `0.00024s`### Need items = 20
Execution time:
```python
#loop9: 0.00062s
```
Approximate time from example: `0.00053s`### Need items = 100
Execution time:
```python
#loop9: 0.00331s
```
Approximate time from example: `0.00359s`### Need items = 1000
Execution time:
```python
#loop9: 0.03314s
```
Approximate time from example: `0.03587s`### Need items = 10000
Execution time:
```python
#loop9: 1.20738s
```
Approximate time from example: `1.07595s`## Checking conclusion
On each computer, the result may be different, it depends on the characteristics.
Usually, even on the same computer, you have to repeat the same process several times to get the (rounded up, it will never be the same) answer as the last time.
Because of this, it is impossible to predict the exact execution time for the following values from the approximate statistics.
so when running the speed test on 10, it will take us `0.00024` seconds, but this time changes with each run, due to rounding, we can achieve a repetition of "exactly" (`0.0024`), but if we do not round, the result is different.
So from the statistics we can take for the number 20 - in the example it is `0.00053`, in the real test of the number 20 it is `0.00062`, that is, the difference is small, but it is present.
Predicting accurately - I didn't succeed, because I selected the coefficients based on the analysis of past launches of a certain number of IDs. It is different for each Hx increase.
However, we have an approximate time, we can imagine how long it will take to generate 1000 IDs.
At 10,000, we have a slightly larger deviation, due to the large number of IDs, compared to the small number of the original 10 IDs.
### The higher the number, the more difficult it is to predict even the approximate time. Only by analysis and selection of the coefficient.# Loops source code
need_items = 10
## loop1
Basic loop iterating current position.
```python
def loop1() -> int:
result = 0
for num in range(need_items):
result += num
return result
```## loop2
Basic loop iterating only one.
```python
def loop2() -> int:
result = 0
for num in range(need_items):
result += 1
return result
```## loop3
Basic loop with iteration of one, and output to the console (time slows down noticeably)
```python
def loop3() -> int:
result = 0
for num in range(need_items):
result += 1
print(f"loop3 printed {result=}", end='\r')
print(end="\n")
return result
```## loop4
the first code of the generator, which has a nested function for creating an ID, in which a list is formed, and uuid4 is filled 4 times from the for loop.
```python
def loop4() -> int:db = []
def generate_id():
_uuid_list = []for i in range(1, 5):
_uuid_list.append(str(uuid4()))_id = "-".join(_uuid_list)
if _id not in db:
return _id
return generate_id()for num in range(need_items):
db.append(generate_id())return len(db)
```## loop5
Everything is the same as in the 4th loop, but instead of a loop - 4 lines of adding uuid4.
```python
def loop5() -> int:db = []
def generate_id():
_uuid_list = []_uuid_list.append(str(uuid4()))
_uuid_list.append(str(uuid4()))
_uuid_list.append(str(uuid4()))
_uuid_list.append(str(uuid4()))_id = "-".join(_uuid_list)
if _id not in db:
return _id
return generate_id()for num in range(need_items):
db.append(generate_id())return len(db)
```## loop6
Everything is the same as in the 5th loop, but instead of adding lines, and for loop - elements are added directly from the list during initialization.
```python
def loop6() -> int:db = []
def generate_id():
_uuid_list = [
str(uuid4()),
str(uuid4()),
str(uuid4()),
str(uuid4())
]_id = "-".join(_uuid_list)
if _id not in db:
return _id
return generate_id()for num in range(need_items):
db.append(generate_id())return len(db)
```## loop7
We get rid of the built-in generation function, this reduces the time very noticeably. We also remove the initialization of the list in a separate anonymous variable. And add a list with an attachment of IDs.
And also remove the check of whether the element is in the list.
```python
def loop7() -> int:
db = []for num in range(need_items):
_id = "-".join([
str(uuid4()),
str(uuid4()),
str(uuid4()),
str(uuid4())
])db.append(_id)
return len(db)
```## loop8
Exactly identical to option 7 - but already fixing, and designed to check the time with 7 and 8 loops.
In the example for 7 it is `0.00012` and for 8 it is `0.00022` seconds, the difference is almost 2 times, and here we can understand why it is impossible to accurately predict, calculate what time will be for a larger number of elements, even if we have the same number of elements different results, with a difference of 2 times, the deviation is very large.
```python
def loop8() -> int:
db = []for num in range(need_items):
_id = "-".join([
str(uuid4()),
str(uuid4()),
str(uuid4()),
str(uuid4())
])db.append(_id)
return len(db)
```## loop9 - The final version on which statistics are generated.
Difference from 7 and 8 loop - added check if there is already an id in the list, and added a print to see if there will be a repeated element during generation.
On small counts, it is unlikely that there will be matches, and the result will be like nid_items.
But on large numbers, repetitions are already possible during generation, and we will be able to find out how many there were.
```python
def loop9() -> int:
db = []for num in range(need_items):
_id = "-".join([
str(uuid4()),
str(uuid4()),
str(uuid4()),
str(uuid4())
])if _id not in db:
db.append(_id)
else:
continue
print(f"nine loop db count: {len(db)}")
return len(db)
```
# Generate analysis and selection of the coefficient.
I use a multiplication factor a little more than 2, 10, 100, 1000 - due to the fact that it did not give the correct result.
I end up using:
rounding to 5 decimal places.
to increase by 2 times - coefficient 2.2
to 10 times - 15.
to 100 - 150.
so that in 1000 - 4500.
As a result, it works approximately correctly. But it can work worse for large numbers.
```python
def loop9() -> int:
db = []for num in range(need_items):
_id = "-".join([
str(uuid4()),
str(uuid4()),
str(uuid4()),
str(uuid4())
])if _id not in db:
db.append(_id)
else:
continue
print(f"nine loop db count: {len(db)}")
return len(db)loop9_time = timeit.timeit(loop9, number=1)
print(f"loop9: {round(loop9_time, 5)}s")
print(
f"loop9 {need_items}*2={need_items*2} about time: {round(loop9_time*2.2, 5)}s")
print(
f"loop9 {need_items}*10={need_items*10} about time: {round(loop9_time*15, 5)}s")
print(
f"loop9 {need_items}*100={need_items*100} about time: {round(loop9_time*150, 5)}s")
print(f"loop9 {need_items}*1000={need_items*1000} about time: {round(loop9_time*4500, 5)}s")
```