https://github.com/scrapfly/fingerprint-generator
Browser fingerprint data generator
https://github.com/scrapfly/fingerprint-generator
Last synced: 9 months ago
JSON representation
Browser fingerprint data generator
- Host: GitHub
- URL: https://github.com/scrapfly/fingerprint-generator
- Owner: scrapfly
- License: apache-2.0
- Created: 2025-02-18T02:06:19.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-04-02T07:40:06.000Z (9 months ago)
- Last Synced: 2025-04-06T06:53:52.686Z (9 months ago)
- Language: Python
- Homepage: https://pypi.org/project/fpgen
- Size: 118 KB
- Stars: 46
- Watchers: 3
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Fingerprint Generator
A fast browser data generator that mimics actual traffic patterns in the wild. With extensive data coverage.
Created by daijro. Data provided by Scrapfly.
---
## Features
- Uses a Bayesian generative network to mimic real-world web traffic patterns
- Extensive data coverage for **nearly all known** browser data points
- Creates complete fingerprints in a few milliseconds ⚡
- Easily specify custom criteria for any data point (e.g. "only Windows + Chrome, with Intel GPUs")
- Simple for humans to use 🚀
## Demo Video
Here is a demonstration of what fpgen generates & its ability to filter data points:
https://github.com/user-attachments/assets/5c56691a-5804-4007-b179-0bae7069a111
---
# Installation
Install the package using pip:
```bash
pip install fpgen
```
### Downloading the model
Fetch the latest model:
```bash
fpgen fetch
```
This will be ran automatically on the first import, or every 5 weeks.
To decompress the model for faster generation (_up to 10-50x faster!_), run:
```bash
fpgen decompress
```
Note: This action will use an additional 100mb+ of storage.
CLI Usage
```
Usage: python -m fpgen [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
decompress Decompress model files for speed efficiency (will take 100mb+)
fetch Fetch the latest model from GitHub
recompress Compress model files after running decompress
remove Remove all downloaded and/or extracted model files
```
---
# Usage
### Generate a fingerprint
Simple usage:
```python
>>> import fpgen
>>> fpgen.generate(browser='Chrome', os='Windows')
```
Or use the Generator object to pass filters downward:
```python
>>> gen = fpgen.Generator(browser='Chrome') # Filter by Chrome
>>> gen.generate(os='Windows') # Generate Windows & Chrome fingerprints
```
Parameters list
```
Initializes the Generator with the given options.
Values passed to the Generator object will be inherited when calling Generator.generate()
Parameters:
conditions (dict, optional): Conditions for the generated fingerprint.
window_bounds (WindowBounds, optional): Constrain the output window size.
strict (bool, optional): Whether to raise an exception if the conditions are too strict.
flatten (bool, optional): Whether to flatten the output dictionary
target (Optional[Union[str, StrContainer]]): Only generate specific value(s)
**conditions_kwargs: Conditions for the generated fingerprint (passed as kwargs)
```
[See example output.](https://raw.githubusercontent.com/scrapfly/fingerprint-generator/refs/heads/main/assets/example-output.json)
---
## Filtering the output
### Setting fingerprint criteria
You can narrow down generated fingerprints by specifying filters for **any** data field.
```python
# Only generate fingerprints with Windows, Chrome, and Intel GPU:
>>> fpgen.generate(
... os='Windows',
... browser='Chrome',
... gpu={'vendor': 'Google Inc. (Intel)'}
... )
```
This can also be passed as a dictionary.
```python
>>> fpgen.generate({
... 'os': 'Windows',
... 'browser': 'Chrome',
... 'gpu': {'vendor': 'Google Inc. (Intel)'},
... })
```
### Multiple constraints
Pass in multiple constraints for the generator to select from using a tuple.
```python
>>> fpgen.generate({
... 'os': ('Windows', 'MacOS'),
... 'browser': ('Firefox', 'Chrome'),
... })
```
If you are passing many nested constraints, run `fpgen decompress` to improve model performance.
## Custom filters
Data can be filtered by passing in callable functions.
### Examples
Set the minimum browser version:
```python
# Constrain client:
>>> fpgen.generate(client={'browser': {'major': lambda ver: int(ver) >= 130}})
# Or, just pass a dot seperated path to client.browser.major:
>>> fpgen.generate({'client.browser.major': lambda ver: int(ver) >= 130})
```
Only allow NVIDIA GPUs:
```python
# Note: Strings are lowercased before they're passed.
>>> fpgen.generate(gpu={'vendor': lambda vdr: 'nvidia' in vdr})
```
Limit the maximum/minimum window size:
```python
# Set allowed ranges for outerWidth & outerHeight:
>>> fpgen.generate(
... window={
... 'outerWidth': lambda width: 1000 <= width <= 2000,
... 'outerHeight': lambda height: 500 <= height <= 1500
... }
... )
```
Or, filter the window dictionary directly.
```python
def window_filter(window):
if not (1000 <= window['outerWidth'] <= 2000):
return False
if not (500 <= window['outerHeight'] <= 1500):
return False
return True
fpgen.generate(window=window_filter)
```
---
## Only generate specific data
To generate specific data fields, use the `target` parameter with a string or a list of strings.
### Examples
Only generate HTTP headers:
```python
>>> fpgen.generate(target='headers')
{'accept': '*/*', 'accept-encoding': 'gzip, deflate, br, zstd', 'accept-language': 'en-US,en;q=0.9', 'priority': 'u=1, i', 'sec-ch-ua': '"Google Chrome";v="131", "Chromium";v="131", "Not_A Brand";v="24"', 'sec-ch-ua-mobile': None, 'sec-ch-ua-platform': '"Windows"', 'sec-fetch-dest': 'empty', 'sec-fetch-mode': 'cors', 'sec-fetch-site': 'same-site', 'sec-gpc': None, 'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36'}
```
Or, by using the generate_target shortcut.
```python
>>> fpgen.generate_target('headers')
{'accept': '*/*', 'accept-encoding': 'gzip, deflate, br, zstd', 'accept-language': 'en-GB,en;q=0.9,en-US;q=0.8,sk;q=0.7', 'priority': 'u=1, i', 'sec-ch-ua': '"Google Chrome";v="131", "Chromium";v="131", "Not_A Brand";v="24"', 'sec-ch-ua-mobile': None, 'sec-ch-ua-platform': '"Windows"', 'sec-fetch-dest': 'empty', 'sec-fetch-mode': 'cors', 'sec-fetch-site': 'same-site', 'sec-gpc': None, 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'}
```
Generate a User-Agent for Windows & Chrome:
```python
>>> fpgen.generate(
... os='Windows',
... browser='Chrome',
... # Nested targets must be seperated by dots:
... target='headers.user-agent'
... )
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:134.0) Gecko/20100101 Firefox/134.0'
```
Generate a Firefox TLS fingerprint:
```python
>>> fpgen.generate(
... browser='Firefox',
... target='network.tls.scrapfly_fp'
... )
{'version': '772', 'ch_ciphers': '4865-4867-4866-49195-49199-52393-52392-49196-49200-49162-49161-49171-49172-156-157-47-53', 'ch_extensions': '0-5-10-11-13-16-23-27-28-34-35-43-45-51-65037-65281', 'groups': '4588-29-23-24-25-256-257', 'points': '0', 'compression': '0', 'supported_versions': '772-771', 'supported_protocols': 'h2-http11', 'key_shares': '4588-29-23', 'psk': '1', 'signature_algs': '1027-1283-1539-2052-2053-2054-1025-1281-1537-515-513', 'early_data': '0'}
```
You can provide multiple targets as a list.
---
## Get the probabilities of a target
Calculate the probability distribution of a target given any filter:
```python
>>> fpgen.trace(target='browser', os='Windows')
[, , , , , ]
```
Multiple targets can be passed as a list/tuple.
Here is an example of tracking the probability of browser & OS given a GPU vendor:
```python
>>> fpgen.trace(
... target=('browser', 'os'),
... gpu={'vendor': 'Google Inc. (Intel)'}
... )
{'browser': [, , , , , ],
'os': [, , , ]}
```
This also works in the Generator object:
```python
>>> gen = fpgen.Generator(os='ChromeOS')
>>> gen.trace(target='browser')
[]
```
Parameters for trace
```
Compute the probability distribution(s) of a target variable given conditions.
Parameters:
target (str): The target variable name.
conditions (Dict[str, Any], optional): A dictionary mapping variable names
flatten (bool, optional): If True, return a flattened dictionary.
**conditions_kwargs: Additional conditions to apply
Returns:
A dictionary mapping probabilities to the target's possible values.
```
### Reading TraceResult
To read the output `TraceResult` object:
```python
>>> chrome = fpgen.trace(target='browser', os='ChromeOS')[0]
>>> chrome.probability
1.0
>>> chrome.value
'Chrome'
```
---
## Query possible values
You can get a list of a target's possible values by passing it into `fpgen.query`:
List all possible browsers:
```python
>>> fpgen.query('browser')
['Chrome', 'Edge', 'Firefox', 'Opera', 'Safari', 'Samsung Internet', 'Yandex Browser']
```
Passing a nested target:
```python
>>> fpgen.query('navigator.maxTouchPoints') # Dot seperated path
[0, 1, 2, 5, 6, 9, 10, 17, 20, 40, 256]
```
Parameters for query
```
Query a list of possibilities given a target.
Parameters:
target (str): Target node to query possible values for
flatten (bool, optional): Whether to flatten the output dictionary
sort (bool, optional): Whether to sort the output arrays
```
> [!NOTE]
> Since fpgen is trained on live data, queries may occasionally return invalid or anomalous values. Values lower than a 0.001% probability will not appear in traces or generated fingerprints.
---
## Generated data
Here is a rough list of the data fpgen can generate:
- **Browser data:**
- All navigator data
- All mimetype data: Audio, video, media source, play types, PDF, etc
- All window viewport data (position, inner/outer viewport sizes, toolbar & scrollbar sizes, etc)
- All screen data
- Supported & unsupported DRM modules
- Memory heap limit
* **System data:**
- GPU data (vendor, renderer, WebGL/WebGL2, extensions, context attributes, parameters, shader precision formats, etc)
- Battery data (charging, charging time, discharging time, level)
- Screen size, color depth, taskbar size, etc.
- Full fonts list
- Cast receiver data
- **Network data:**
- HTTP headers
- TLS fingerprint data
- HTTP/2 fingerprint & frames
- RTC video & audio capabilities, codecs, clock rates, mimetypes, header extensions, etc
* **Audio data:**
- Audio signal
- All Audio API constants (AnalyserNode, BiquadFilterNode, DynamicsCompressorNode, OscillatorNode, etc)
- **Internationalization data:**
- Regional internationalization (Locale, calendar, numbering system, timezone, date format, etc)
- Voices
* **_And much more!_**
For a more complete list, see the [full example output](https://raw.githubusercontent.com/scrapfly/fingerprint-generator/refs/heads/main/assets/example-output.json).
---