https://github.com/avishrantssh/golicense-classifier
A Python package to find license expressions and copyright statements in a codebase.
https://github.com/avishrantssh/golicense-classifier
copyright copyright-scan ctypes foss license-scan licensing python spdx
Last synced: 6 months ago
JSON representation
A Python package to find license expressions and copyright statements in a codebase.
- Host: GitHub
- URL: https://github.com/avishrantssh/golicense-classifier
- Owner: AvishrantsSh
- License: mit
- Created: 2021-05-24T17:01:22.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2021-08-12T08:38:23.000Z (about 4 years ago)
- Last Synced: 2025-03-13T08:56:00.123Z (7 months ago)
- Topics: copyright, copyright-scan, ctypes, foss, license-scan, licensing, python, spdx
- Language: Python
- Homepage: https://pypi.org/project/golicense-classifier/
- Size: 21.9 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
GoLicense-Classifier
====================
![]()
![]()
![]()
___
A Python package to find license expressions and copyright statements in a codebase.
Based on **Google LicenseClassifer V2**, GoLicense-Classifier (or **glc** for short) focuses on performance without compromising with accuracy.
Installation
------------
_Note: Currently, this package only supports Linux Platform. Work is in progress for Windows and Mac._Installing GoLicense-Classifier is as simple as
```sh
pip install golicense-classifier
```Or, you can build the package from source as
```sh
git clone https://github.com/AvishrantsSh/GoLicense-Classifier.git
make dev
make package
```Usage
-----
To get started, import `LicenseClassifier` class from the module as```python
from LicenseClassifier.classifier import LicenseClassifier
```_Note: Work on Copyright Statement is still in beta phase. Expect some issues, mostly with binary files_
The class comes bundled with some handy functions, each suited for a different task.
1. `scan_directory`
This method is used to recursively walk through a directory and find license expressions and copyright statements. It returns a dictionary object with keys `header` and `files`.
### Usage
___
```python
classifier = LicenseClassifier()
res = classifier.scan_directory('PATH_TO_DIR')
```
### Optional Parameters
___
- `max_size`
Maximum size of file in MB. Default is set to 10MB. Set `max_size < 0` to ignore size constraints- `use_buffer`
`(Experimental)` Set to `True` to use buffered file scanning. `max_size` will be used as buffer size.- `use_scancode_mapping`
Set to `True` if you want to use Scancode license key mappings. Default is set to `True`.
2. `scan_file`
This method is used to find license expressions and copyright statements in a single file.
### Usage
___
```python
classifier = LicenseClassifier()
res = classifier.scan_file('PATH_TO_FILE')
```
### Optional Parameters
___
- `max_size`
Maximum size of file in MB. Default is set to 10MB. Set `max_size < 0` to ignore size constraints- `use_buffer`
`(Experimental)` Set to `True` to use buffered file scanning. `max_size` will be used as buffer size.- `use_scancode_mapping`
Set to `True` if you want to use Scancode license key mappings. Default is set to `True`.
Further Customization
---------------------
You can set custom threshold for scanning purpose that best suits your need. Simply change the parameter `threshold` during object creation as
```python
classifier = LicenseClassifier(threshold = 0.9)
```Contributing
------------
Contributions are what makes the open source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.To get started, read the [Contributing Guide](CONTRIBUTING.md).
References
----------
1. Google LicenseClassfifer V2 https://github.com/google/licenseclassifier/
2. Ctypes Shared Library Code https://github.com/AvishrantsSh/Ctypes_LicenseClassifier
3. Ctypes https://docs.python.org/3/library/ctypes.html