{"id":48161715,"url":"https://github.com/certtools/tag2domain","last_synced_at":"2026-04-04T17:25:48.308Z","repository":{"id":49626867,"uuid":"262275867","full_name":"certtools/tag2domain","owner":"certtools","description":"A mapping project between tags (annotations, labels) and domain names","archived":false,"fork":false,"pushed_at":"2024-04-25T05:56:46.000Z","size":406,"stargazers_count":11,"open_issues_count":5,"forks_count":5,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-09-17T16:58:54.597Z","etag":null,"topics":["cybersecurity","machine-tags","misp","taxonomies","taxonomy","taxonomy-database"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/certtools.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2020-05-08T09:02:29.000Z","updated_at":"2023-08-19T09:59:42.000Z","dependencies_parsed_at":"2024-01-01T23:21:04.834Z","dependency_job_id":"8f752406-0504-43a5-9ad0-8cccc026a8c9","html_url":"https://github.com/certtools/tag2domain","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/certtools/tag2domain","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/certtools%2Ftag2domain","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/certtools%2Ftag2domain/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/certtools%2Ftag2domain/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/certtools%2Ftag2domain/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/certtools","download_url":"https://codeload.github.com/certtools/tag2domain/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/certtools%2Ftag2domain/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31407644,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T10:20:44.708Z","status":"ssl_error","status_checked_at":"2026-04-04T10:20:06.846Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cybersecurity","machine-tags","misp","taxonomies","taxonomy","taxonomy-database"],"created_at":"2026-04-04T17:25:47.156Z","updated_at":"2026-04-04T17:25:48.300Z","avatar_url":"https://github.com/certtools.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# tag2domain\n\n## Concept\nThe tag2domain project is a framework for creating mappings of tags (labels, annotations) to domain names. To get an overview over the concepts you can read [this blogpost](https://cert.at/en/blog/2020/7/tag2domain).\n\nTags are meant to be like little sticky notes that are attached to some entity, for example, a domain name. The tags that belong to a certain topic of interest are grouped together into _taxonomies_. Additionally, each tag can be associated with a _value_ that further specifies the tag. Within a taxonomy tags can be grouped into _categories_. For the purpose of this documentation we describe a tag by writing\n\u003e (entity) taxonomy : category :: tag = value\n\nAs an example take the taxonomy \"Website Language\" that describes the language a website presents itself in. The tags are the languages, categories are language families and values are the confidence that the language has been detected correctly. Some example tags would then be\n\n\u003e (domain_example_one.at) Website Languages : Indo-European :: German = confidence-high\n\n\u003e (domain_example_two.at) Website Languages : Afro-Asiatic :: Arabic = manually-tagged\n\nThe same Website could present itself with multiple languages, so one website can be associated with multiple tags from the same taxonomy. Additionally, the set of languages a homepage is available in could change so each tag has a start timestamp and an end timestamp. The end timestamp may be unspecified to indicate that a property still applies (i.e. its absence has not yet been observed).\n\nThis library provides a database schema to maintain such a tagging infrastructure, services that convert individual measurements to tags, and an API that provides access to the gathered data.\n\n## Measurements\n\nUsually, properties are detected in discrete measurements, e.g. a web crawler visits a homepage and determines that a web page is mostly written in english. In the context of tag2domain, measurements are represented by JSON objects that have the following fields:\n\n**MEASUREMENT**\n\n| key                   | example                               | description                                               | required |\n|-----------------------|---------------------------------------|-----------------------------------------------------------|----------|\n| version               | 1                                     | version of the measurement format                          | yes      |\n| tag_type              | \"domain\"                              | tag type - each type corresponds to an intersection table | yes      |\n| tagged_id             | 42                                    | ID of the entity the measurement has been taken of. This ID must exist in the entity table that corresponds to the tag type| yes      |\n| taxonomy              | \"Website Languages\"                  | taxonomy that has been measured                           | yes      |\n| producer              | \"Language Detection Bot\"              | name of the process that did the measurement              | yes      |\n| measured_at           | \"2020-12-20T12:00:00\"                 | time when the measurement was taken                       | yes      |\n| measurement_id        | \"ldb:576894\"                          | optional measurement ID                                   | no       |\n| autogenerate_tags     | true                                  | if true tag types that are not already in the taxonomy\u003cbr\u003eare inserted. This requires that the allows_autotags \u003cbr\u003ecolumn of the taxonomy is set to true. | no       |\n| autogenerate_values   | false                                 | if true values that are not already in the taxonomy\u003cbr\u003eare inserted. This requires that the allows_autovalues \u003cbr\u003ecolumn of the taxonomy is set to true. | no       |\n| tags                  | [_tag_1_, _tag_2_, ...]               | list of tags that have been found                         | yes      |\n\nEach tag is itself a JSON object:\n\n**TAG**\n\n| key                   | example                               | description                                                   | required                         |\n|-----------------------|---------------------------------------|---------------------------------------------------------------|----------------------------------|\n| tag                   | \"de\"                                  | name or ID of the tag to be set                               | yes                              |\n| value                 | \"confidence:high\"                     | value to be set                                               | no                               |\n| description           | \"Language: German\"                    | dsecription of the tag. Required if autogenerate_tags == true | yes if autogenerate_tags is true |\n| extras                | {}                                    | JSON object that contains further details about the tag       | yes if autogenerate_tags is true |\n\ntag2domain assumes that a measurement describes a whole taxonomy. This means, that if a tag (e.g. \"en\") is not in a measurement of the taxonomy \"Website Languages\" then tag2domain assumes that there is no english language on the website.\n\nExamples of measurements can be found [here](examples/measurements/README.md).\nThe schema for measurements can be found [here](py_tag2domain/schema/measurement.json).\n\n## Components and database schema\nThis package provides\n+ a postgres database schema that implements tag2domain,\n+ the tag2domain-api, a REST interface that allows querying of the tag2domain database and that (optionally) receives property measurements, and\n+ msm2tag2domain, a script that reads measurements from a kafka queue or a file and turns them into tag2domain tags.\n![tag2domain components](static/components.svg)\n\nThe basic database schema is shown in the picture below. tag2domain is based on three core tables:\n+ _taxonomy_: contains the available taxonomies\n+ _tags_: contains the available tags\n+ _taxonomy_tag_val_: contains the available values\n\nThe tags defined in these tables are assigned to _entities_ using (possibly multiple) _intersection tables_. In the example below the intersection table is named _domain_tags_ and assigns tags to domains. These domains are themselves rows in a table named _domains_. This _domains_ table must be maintained by the user (i.e. by some other program) and is not modified by tag2domain.\n\nThe glue section combines information from the intersection tables and the rest of the database into a single table that provides data to the API. In addition entities can be filtered by other criteria by setting up appropriate filter tables. See [Advanced DB Configuration](docs/advanced_db_config.md) for details of this configuration.\n\n![EER Diagram](static/schema.svg)\n\n# Requirements\ntag2domain is written in python and uses [PostgreSQL](https://www.postgresql.org/) databases. The included demos use docker to run the code.\n\ntag2domain is written so that it can be integrated into an existing database. Doing so requires some knowledge of PostgreSQL databases.\n\n# Installation\nTo get started clone the repository and navigate to the root directory:\n``` bash\ngit clone https://gitlab.sbg.nic.at/labs/dwh/tag2domain.git\ncd tag2domain/\n```\n\nThe three main components of the repository can be found in these subfolders:\n+ py_tag2domain (`py_tag2domain/`) - a python library that manages tags\n+ tag2domain-api (`tag2domain_api/`) - a REST API for creating and fetching tags\n+ msm2tag2domain (`msm2tag2domain/`) - a service that reads measurements from kafka queue and writes them to the database\n\nTo use tag2domain you will likely integrate its database tables into an existing database. A tutorial to do this integration can be found [below](#customizing-the-tag2domain-setup). To get you started quickly some \"all-in-one\" demo setups are included that come with a preconfigured database. In this setup you can explore the features that are available and use the configuration files as a template for your own configuration.\n\n## Demo setups\nThe demo setups use [docker](https://www.docker.com/) and [docker-compose](https://docs.docker.com/compose/) to run the services. Please first make sure both are set up and working.\n\n### The all-in-one setup\nThe _all-in-one_ setup consists of a mock database that mimics a registry and the tag2domain-api that allows inserting and reading tags. To start the demo move to the root folder of this repository and run the following commands:\n``` bash\n# copy the environment file\ncp docker/all-in-one-demo/example.env ./.env\n\n# open the .env file with your favourite editor and set the POSTGRES_USER and\n# POSTGRES_PASSWORD options\n\n# build the tag2domain-api container\ndocker-compose -f docker-compose.all-in-one.yml build\n\n# bring up the database and tag2domain-api\ndocker-compose -f docker-compose.all-in-one.yml up -d\n\n# check that the containers are running (both containers should be Up)\ndocker-compose -f docker-compose.all-in-one.yml ps\n```\nOpen http://localhost:8001/docs in your browser  (or the equivalent address if you\ndon't run the containers on your local machine) and you should see the API documentation.\n\n\n\nTo inspect the database tables you can either make the mock database available\nto the outside of the docker network by modifying docker-compose.all-in-one.yml\nor you can exec into the db container:\n``` bash\n# open a bash process in the db container\ndocker-compose -f docker-compose.all-in-one.yml exec db bash\n\n# inside the container open psql\n# See the .env file for POSTGRES_USER setting\ndb\u003e psql -U \u003cPOSTGRES_USER\u003e tag2domain_mock_db\n```\n\nA good place to start is the `/api/v1/meta/taxonomies` endpoint that gives you\nan overview over the taxonomies that are available in the mock database. The\n`/api/v1/domains/bytaxonomy` endpoint with taxonomy `colors` can then be used\nto retrieve a list of domains with this tag. To retrieve the tags that are set\nfor `domain_test1.at` use the `/api/v1/bydomain/` endpoint.\n\n### The kafka setup\nThis setup uses the same mock database as the all-in-one setup but measurements\nare fetched from a kafka topic by the _msm2tag2domain_ service. This setup\nrequires a preexisting kafka installation.\n\nFirst we have to configure the kafka setup:\n``` bash\n# Run these commands from the root folder of the repository\n\n# copy the env file\ncp docker/all-in-one-demo/example.env ./.env\n\n# open the .env file with your favourite editor and set the POSTGRES_USER and\n# POSTGRES_PASSWORD options\n\n# copy the configuration for msm2tag2domain\ncp msm2tag2domain/docker/msm2tag2domain.cfg.example msm2tag2domain/docker/msm2tag2domain.cfg\n\n# open msm2tag2domain/docker/msm2tag2domain.cfg and configure the database\n# connection and the parameters for the kafka connection\n\n# build the tag2domain-api and the msm2tag2domain container\ndocker-compose -f docker-compose.all-in-one-kafka.yml build\n\n# bring up the database, tag2domain-api, and msm2tag2domain\ndocker-compose -f docker-compose.all-in-one-kafka.yml up -d\n\n# check that the containers are running (all three containers should be up)\ndocker-compose -f docker-compose.all-in-one-kafka.yml ps\n```\n\nSee the [measurement examples](examples/measurements/README.md) for a guide on\nhow to submit measurements to the kafka queue.\n\n## Customizing the tag2domain setup\nIn practice you will likely want to integrate tag2domain into an existing\ndatabase so the tags can refer to entities that already exist. In this section\nwe show how tag2domain can be configured to work with an existing database.\n\n### Creating the tag2domain tables\nAs described in a [previous section](#components-and-database-schema) tag2domain\ncan be linked to an existing database table that contains the entities that are\nto be tagged.\n\nTo set up the tag2domain tables first generate a config file:\n``` bash\n# copy the example configuration\ncp examples/db/db.config.sh.example db.config.sh\n\n# open db.config.sh in your favourite text editor and set the parameters\n```\n\nNow generate the tag2domain tables:\n``` bash\n# Load the configuration into your shell environment\nsource db.config.sh\n\n# Create the core tables\nbash db/db_master_script.sh\n```\n\nYou can now check your database: there should be a new schema that contains\nthree tables: tags, taxonomy and taxonomy_tag_val.\n\nTo generate an intersection table, first specify the entity table the intersection\ntable will refer to and the column that contains the IDs to be referred to:\n``` bash\nexport TAG2DOMAIN_INTXN_TABLE_NAME=\u003cYOUR INTERSECTION TABLE NAME\u003e\nexport TAG2DOMAIN_ENTITY_TABLE=\u003cYOUR ENTITY TABLE NAME\u003e\nexport TAG2DOMAIN_ENTITY_ID_COLUMN=\u003cYOUR ID COLUMN\u003e\n# TAG2DOMAIN_ENTITY_NAME_COLUMN is only required if the create_glue.sh script\n# is used\nexport TAG2DOMAIN_ENTITY_NAME_COLUMN=\u003cYOUR NAME COLUMN\u003e\n```\nNote, that the entity table must exist prior to creating the intersection\ntable. Now run the `create_intersection_table.sh` script:\n```\nbash scripts/db/create_intersection_table.sh\n```\nThis will generate an intersection table in the tag2domain schema.\n\nTo configure the tag2domain services we will need intersection table\nconfigurations. Such a configuration can be generated using this command:\n``` bash\nbash scripts/db/create_intxn_table_config.sh \u003cTAG TYPE\u003e\n```\n`\u003cTAG TYPE\u003e` is the tag type name that will later be used to differentiate\nbetween different intersection tables.\n\nIf only a single intersection table is used, the glue part of the database can\nnow be generated using this command:\n```\nbash scripts/db/create_glue.sh\n```\nThis script generates a single view name `v_unified_tags` that combines the\nentity table with the intersection table and the SQL functions that are used by\ntag2domain-api. If multiple intersection tables are used one usually has to\ntailor the glue tables to the application at hand. See the\n[Advanced DB Configuration](docs/advanced_db_config.md) document for details.\nThis document also covers how domain filters can be defined.\n\n### Configuring the tag2domain services\nTo configure tag2domain three files need to created/edited:\n+ edit `docker-compose.external-db.yml` that defines the API and msm2tag2domain services\n+ create a .env file. `docker/all-in-one-demo/example.env` is\na good place to start.\n+ create a file `docker/external-db/tag2domain.cfg`. A template can be found\nhere: `docker/external-db/tag2domain.cfg.example`. The intersection tables\nare specified in this configuration file. You can either use the script\n`scipts/db/create_intxn_table_config.sh` as described in the previous section,\nor use `scripts/db/intersection_table_config.template` as a template.\n\n## Running tests\nTests are provided in the `tests/` folder. Most of the tests require a running database:\n``` bash\n# copy an example environment file\ncp docker/all-in-one-demo/example.env .env\n\n# open .env with your favourite text editor and set the database parameters\n\n# deploy the database in a docker container\ndocker-compose -f docker-compose.test-db.yml up -d\n\n# check that the container is up and running\ndocker-compose -f docker-compose.test-db.yml ps\n```\n\nNow that the database is running we can prepare the configuration for the tests\nand install the required packages:\n``` bash\n# copy the example configuration file\ncp tests/config/db.cfg.example tests/config/db.cfg\n\n# open tests/config/db.cfg in your favourite text editor and configure the\n# database connection\n\n# install the required python packages\npip install -r py_tag2domain/requirements.txt\npip install -r tag2domain_api/requirements.txt\n\n# install pytest\npip install pytest\n```\n\nThe tests can now be run using this command:\n``` bash\npytest tests/\n```\n\n# The global awesome taxonomy list project\n\nThere is a [global taxonomy list](https://github.com/aaronkaplan/awesome-taxonomyzoo-list) on github, which serves as a place for anyone to propose taxonomies and document them.\nThis list should be  constantly growing by community contributions (should this project kick off). The important aspect for us  is that every taxonomy is described as machine readable [machine-tag](https://github.com/MISP/misp-taxonomies) format.\nHence, we can include other taxonomies into our DB rather easily.\n\n# Funded by\n\nThis project was partially funded by the CEF framework\n\n![Co-financed by the Connecting Europe Facility of the European Union](static/cef_logo.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcerttools%2Ftag2domain","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcerttools%2Ftag2domain","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcerttools%2Ftag2domain/lists"}