https://github.com/spdx/tools-python

A Python library to parse, validate and create SPDX documents.
https://github.com/spdx/tools-python
licensing parsing python rdf spdx
Last synced: 9 months ago
JSON representation
A Python library to parse, validate and create SPDX documents.
Host: GitHub
URL: https://github.com/spdx/tools-python
Owner: spdx
License: apache-2.0
Created: 2015-03-23T21:54:39.000Z (almost 11 years ago)
Default Branch: main
Last Pushed: 2024-09-27T08:31:30.000Z (over 1 year ago)
Last Synced: 2025-04-10T15:10:25.117Z (9 months ago)
Topics: licensing, parsing, python, rdf, spdx
Language: Python
Homepage: http://spdx.org
Size: 3.38 MB
Stars: 206
Watchers: 24
Forks: 137
Open Issues: 87
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project

README

          # Python library to parse, validate and create SPDX documents

CI status (Linux, macOS and Windows): [![Install and Test][1]][2]

[1]: https://github.com/spdx/tools-python/actions/workflows/install_and_test.yml/badge.svg

[2]: https://github.com/spdx/tools-python/actions/workflows/install_and_test.yml

# Breaking changes v0.7 -> v0.8

Please be aware that the upcoming 0.8 release has undergone a significant refactoring in preparation for the upcoming

SPDX v3.0 release, leading to breaking changes in the API.

Please refer to the [migration guide](https://github.com/spdx/tools-python/wiki/How-to-migrate-from-0.7-to-0.8)

to update your existing code.

The main features of v0.8 are:

- full validation of SPDX documents against the v2.2 and v2.3 specification

- support for SPDX's RDF format with all v2.3 features

- experimental support for the upcoming SPDX v3 specification. Note, however, that support is neither complete nor 

  stable at this point, as the spec is still evolving. SPDX3-related code is contained in a separate subpackage "spdx3" 

  and its use is optional. We do not recommend using it in production code yet.

# Information

This library implements SPDX parsers, convertors, validators and handlers in Python.

- Home: https://github.com/spdx/tools-python

- Issues: https://github.com/spdx/tools-python/issues

- PyPI: https://pypi.python.org/pypi/spdx-tools

- Browse the API: https://spdx.github.io/tools-python

Important updates regarding this library are shared via the SPDX tech mailing list: https://lists.spdx.org/g/Spdx-tech.

# License

[Apache-2.0](LICENSE)

# Features

* API to create and manipulate SPDX v2.2 and v2.3 documents

* Parse, convert, create and validate SPDX files

* supported formats: Tag/Value, RDF, JSON, YAML, XML

* visualize the structure of a SPDX document by creating an `AGraph`. Note: This is an optional feature and requires 

additional installation of optional dependencies

## Experimental support for SPDX 3.0

* Create v3.0 elements and payloads

* Convert v2.2/v2.3 documents to v3.0

* Serialize to JSON-LD

See [Quickstart to SPDX 3.0](#quickstart-to-spdx-30) below.  

The implementation is based on the descriptive markdown files in the repository https://github.com/spdx/spdx-3-model (latest commit: a5372a3c145dbdfc1381fc1f791c68889aafc7ff).

# Installation

As always you should work in a virtualenv (venv). You can install a local clone

of this repo with `yourenv/bin/pip install .` or install it from PyPI 

(check for the [newest release](https://pypi.org/project/spdx-tools/#history) and install it like

`yourenv/bin/pip install spdx-tools==0.8.0a2`). Note that on Windows it would be `Scripts`

instead of `bin`.

# How to use

## Command-line usage

1. **PARSING/VALIDATING** (for parsing any format):

* Use `pyspdxtools -i ` where `` is the location of the file. The input format is inferred automatically from the file ending.

* If you are using a source distribution, try running:  

  `pyspdxtools -i tests/data/SPDXJSONExample-v2.3.spdx.json`

2. **CONVERTING** (for converting one format to another):

* Use `pyspdxtools -i  -o ` where `` is the location of the file to be converted

  and `` is the location of the output file. The input and output formats are inferred automatically from the file endings.

* If you are using a source distribution, try running:  

  `pyspdxtools -i tests/data/SPDXJSONExample-v2.3.spdx.json -o output.tag` 

* If you want to skip the validation process, provide the `--novalidation` flag, like so:  

  `pyspdxtools -i tests/data/SPDXJSONExample-v2.3.spdx.json -o output.tag --novalidation`  

  (use this with caution: note that undetected invalid documents may lead to unexpected behavior of the tool)

  

* For help use `pyspdxtools --help`

3. **GRAPH GENERATION** (optional feature)

* This feature generates a graph representing all elements in the SPDX document and their connections based on the provided

  relationships. The graph can be rendered to a picture. Below is an example for the file `tests/data/SPDXJSONExample-v2.3.spdx.json`:

![SPDXJSONExample-v2.3.spdx.png](assets/SPDXJSONExample-v2.3.spdx.png)

* Make sure you install the optional dependencies `networkx` and `pygraphviz`. To do so run `pip install ".[graph_generation]"`.

* Use `pyspdxtools -i  --graph -o ` where `` is an output file name with valid format for `pygraphviz` (check 

  the documentation [here](https://pygraphviz.github.io/documentation/stable/reference/agraph.html#pygraphviz.AGraph.draw)). 

* If you are using a source distribution, try running

  `pyspdxtools -i tests/data/SPDXJSONExample-v2.3.spdx.json --graph -o SPDXJSONExample-v2.3.spdx.png` to generate 

  a png with an overview of the structure of the example file.  

## Library usage

1. **DATA MODEL**

  * The `spdx_tools.spdx.model` package constitutes the internal SPDX v2.3 data model (v2.2 is simply a subset of this). All relevant classes for SPDX document creation are exposed in the `__init__.py` found [here](src%2Fspdx_tools%2Fspdx%2Fmodel%2F__init__.py).

  * SPDX objects are implemented via `@dataclass_with_properties`, a custom extension of `@dataclass`.

    * Each class starts with a list of its properties and their possible types. When no default value is provided, the property is mandatory and must be set during initialization.

    * Using the type hints, type checking is enforced when initializing a new instance or setting/getting a property on an instance

      (wrong types will raise `ConstructorTypeError` or `TypeError`, respectively). This makes it easy to catch invalid properties early and only construct valid documents.

    * Note: in-place manipulations like `list.append(item)` will circumvent the type checking (a `TypeError` will still be raised when reading `list` again). We recommend using `list = list + [item]` instead.

  * The main entry point of an SPDX document is the `Document` class from the [document.py](src%2Fspdx_tools%2Fspdx%2Fmodel%2Fdocument.py) module, which links to all other classes.

  * For license handling, the [license_expression](https://github.com/nexB/license-expression) library is used.

  * Note on `documentDescribes` and `hasFiles`: These fields will be converted to relationships in the internal data model. As they are deprecated, these fields will not be written in the output.

2. **PARSING**

  * Use `parse_file(file_name)` from the `parse_anything.py` module to parse an arbitrary file with one of the supported file endings.

  * Successful parsing will return a `Document` instance. Unsuccessful parsing will raise `SPDXParsingError` with a list of all encountered problems.

3. **VALIDATING**

  * Use `validate_full_spdx_document(document)` to validate an instance of the `Document` class.

  * This will return a list of `ValidationMessage` objects, each consisting of a String describing the invalidity and a `ValidationContext` to pinpoint the source of the validation error.

  * Validation depends on the SPDX version of the document. Note that only versions `SPDX-2.2` and `SPDX-2.3` are supported by this tool.

4. **WRITING**

  * Use `write_file(document, file_name)` from the `write_anything.py` module to write a `Document` instance to the specified file.

    The serialization format is determined from the filename ending.

  * Validation is performed per default prior to the writing process, which is cancelled if the document is invalid. You can skip the validation via `write_file(document, file_name, validate=False)`.

    Caution: Only valid documents can be serialized reliably; serialization of invalid documents is not supported.

## Example

Here are some examples of possible use cases to quickly get you started with the spdx-tools.

If you want more examples, like how to create an SPDX document from scratch, have a look [at the examples folder](examples).

```python

import logging

from license_expression import get_spdx_licensing

from spdx_tools.spdx.model import (Checksum, ChecksumAlgorithm, File, 

                                   FileType, Relationship, RelationshipType)

from spdx_tools.spdx.parser.parse_anything import parse_file

from spdx_tools.spdx.validation.document_validator import validate_full_spdx_document

from spdx_tools.spdx.writer.write_anything import write_file

# read in an SPDX document from a file

document = parse_file("spdx_document.json")

# change the document's name

document.creation_info.name = "new document name"

# define a file and a DESCRIBES relationship between the file and the document

checksum = Checksum(ChecksumAlgorithm.SHA1, "71c4025dd9897b364f3ebbb42c484ff43d00791c")

file = File(name="./fileName.py", spdx_id="SPDXRef-File", checksums=[checksum], 

            file_types=[FileType.TEXT], 

            license_concluded=get_spdx_licensing().parse("MIT and GPL-2.0"),

            license_comment="licenseComment", copyright_text="copyrightText")

relationship = Relationship("SPDXRef-DOCUMENT", RelationshipType.DESCRIBES, "SPDXRef-File")

# add the file and the relationship to the document 

# (note that we do not use "document.files.append(file)" as that would circumvent the type checking)

document.files = document.files + [file]

document.relationships = document.relationships + [relationship]

# validate the edited document and log the validation messages

# (depending on your use case, you might also want to utilize the validation_message.context)

validation_messages = validate_full_spdx_document(document)

for validation_message in validation_messages:

    logging.warning(validation_message.validation_message)

# if there are no validation messages, the document is valid 

# and we can safely serialize it without validating again

if not validation_messages:

    write_file(document, "new_spdx_document.rdf", validate=False)

```

# Quickstart to SPDX 3.0

In contrast to SPDX v2, all elements are now subclasses of the central `Element` class.

This includes packages, files, snippets, relationships, annotations, but also SBOMs, SpdxDocuments, and more.  

For serialization purposes, all Elements that are to be serialized into the same file are collected in a `Payload`.

This is just a dictionary that maps each Element's SpdxId to itself.

Use the `write_payload()` functions to serialize a payload.

There currently are two options:  

* The `spdx_tools.spdx3.writer.json_ld.json_ld_writer` module generates a JSON-LD file of the payload.

* The `spdx_tools.spdx3.writer.console.payload_writer` module prints a debug output to console. Note that this is not an official part of the SPDX specification and will probably be dropped as soon as a better standard emerges.

You can convert an SPDX v2 document to v3 via the `spdx_tools.spdx3.bump_from_spdx2.spdx_document` module.

The `bump_spdx_document()` function will return a payload containing an `SpdxDocument` Element and one Element for each package, file, snippet, relationship, or annotation contained in the v2 document.

# Dependencies

* PyYAML: https://pypi.org/project/PyYAML/ for handling YAML.

* xmltodict: https://pypi.org/project/xmltodict/ for handling XML.

* rdflib: https://pypi.python.org/pypi/rdflib/ for handling RDF.

* ply: https://pypi.org/project/ply/ for handling tag-value.

* click: https://pypi.org/project/click/ for creating the CLI interface.

* beartype: https://pypi.org/project/beartype/ for type checking.

* uritools: https://pypi.org/project/uritools/ for validation of URIs.

* license-expression: https://pypi.org/project/license-expression/ for handling SPDX license expressions.

# Support

* Submit issues, questions or feedback at https://github.com/spdx/tools-python/issues

* Join the chat at https://gitter.im/spdx-org/Lobby

* Join the discussion on https://lists.spdx.org/g/spdx-tech and

  https://spdx.dev/participate/tech/

# Contributing

Contributions are very welcome! See [CONTRIBUTING.md](./CONTRIBUTING.md) for instructions on how to contribute to the

codebase.

# History

This is the result of an initial GSoC contribution by @[ah450](https://github.com/ah450)

(or https://github.com/a-h-i) and is maintained by a community of SPDX adopters and enthusiasts.

In order to prepare for the release of SPDX v3.0, the repository has undergone a major refactoring during the time from 11/2022 to 07/2023.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/spdx/tools-python

Awesome Lists containing this project

README