Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/vladimiralexiev/crunchbase-fibo

Exploring FIBO Complexity With Crunchbase: Representing Crunchbase IPOs in FIBO
https://github.com/vladimiralexiev/crunchbase-fibo

crunchbase fibo fintech ipo ontologies semtech

Last synced: about 1 month ago
JSON representation

Exploring FIBO Complexity With Crunchbase: Representing Crunchbase IPOs in FIBO

Awesome Lists containing this project

README

        






Exploring FIBO Complexity With Crunchbase

html {
color: #1a1a1a;
background-color: #fdfdfd;
}
body {
margin: 0 auto;
max-width: 100%;
padding-left: 50px;
padding-right: 50px;
padding-top: 50px;
padding-bottom: 50px;
hyphens: auto;
overflow-wrap: break-word;
text-rendering: optimizeLegibility;
font-kerning: normal;
}
@media (max-width: 600px) {
body {
font-size: 0.9em;
padding: 12px;
}
h1 {
font-size: 1.8em;
}
}
@media print {
html {
background-color: white;
}
body {
background-color: transparent;
color: black;
font-size: 12pt;
}
p, h2, h3 {
orphans: 3;
widows: 3;
}
h2, h3, h4 {
page-break-after: avoid;
}
}
p {
margin: 1em 0;
}
a {
color: #1a1a1a;
}
a:visited {
color: #1a1a1a;
}
img {
max-width: 100%;
}
svg {
height: auto;
max-width: 100%;
}
h1, h2, h3, h4, h5, h6 {
margin-top: 1.4em;
}
h5, h6 {
font-size: 1em;
font-style: italic;
}
h6 {
font-weight: normal;
}
ol, ul {
padding-left: 1.7em;
margin-top: 1em;
}
li > ol, li > ul {
margin-top: 0;
}
blockquote {
margin: 1em 0 1em 1.7em;
padding-left: 1em;
border-left: 2px solid #e6e6e6;
color: #606060;
}
code {
font-family: Menlo, Monaco, Consolas, 'Lucida Console', monospace;
font-size: 85%;
margin: 0;
hyphens: manual;
}
pre {
margin: 1em 0;
overflow: auto;
}
pre code {
padding: 0;
overflow: visible;
overflow-wrap: normal;
}
.sourceCode {
background-color: transparent;
overflow: visible;
}
hr {
background-color: #1a1a1a;
border: none;
height: 1px;
margin: 1em 0;
}
table {
margin: 1em 0;
border-collapse: collapse;
width: 100%;
overflow-x: auto;
display: block;
font-variant-numeric: lining-nums tabular-nums;
}
table caption {
margin-bottom: 0.75em;
}
tbody {
margin-top: 0.5em;
border-top: 1px solid #1a1a1a;
border-bottom: 1px solid #1a1a1a;
}
th {
border-top: 1px solid #1a1a1a;
padding: 0.25em 0.5em 0.25em 0.5em;
}
td {
padding: 0.125em 0.5em 0.25em 0.5em;
}
header {
margin-bottom: 4em;
text-align: center;
}
#TOC li {
list-style: none;
}
#TOC ul {
padding-left: 1.3em;
}
#TOC > ul {
padding-left: 0;
}
#TOC a:not(:hover) {
text-decoration: none;
}
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
div.columns{display: flex; gap: min(4vw, 1.5em);}
div.column{flex: auto; overflow-x: auto;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
/* The extra [class] is a hack that increases specificity enough to
override a similar rule in reveal.js */
ul.task-list[class]{list-style: none;}
ul.task-list li input[type="checkbox"] {
font-size: inherit;
width: 0.8em;
margin: 0 0.8em 0.2em -1.6em;
vertical-align: middle;
}
.display.math{display: block; text-align: center; margin: 0.5rem auto;}
/* CSS for citations */
div.csl-bib-body { }
div.csl-entry {
clear: both;
margin-bottom: 0em;
}
.hanging-indent div.csl-entry {
margin-left:2em;
text-indent:-2em;
}
div.csl-left-margin {
min-width:2em;
float:left;
}
div.csl-right-inline {
margin-left:2em;
padding-left:1em;
}
div.csl-indent {
margin-left: 2em;
}







hljs.registerLanguage('ttl', ttl_highlight);
hljs.registerLanguage('turtle', ttl_highlight);
hljs.registerLanguage('sparql', sparql_highlight);
hljs.registerLanguage('pie', pie_highlight);
// hljs.initHighlightingOnLoad(); // Deprecated as of 10.6.0. Use highlightAll() now.
hljs.highlightAll();

Exploring FIBO Complexity With Crunchbase


Representing Crunchbase IPOs in FIBO


Vladimir Alexiev


14-Apr-2024


1 Introduction


(Read this as HTML)


The Financial Industry Business Ontology (FIBO) by the Enterprise Data Management Council (EDMC)
is a family of ontologies and a reference model for representing data in the financial world using semantic technologies.
It is used in fintech Knowledge Graph (KG) projects because it offers a comprehensive and principled approach to representing financial data,
and a wide set of predefined models that can be used to implement data harmonization and financial data integration.
The 2022Q2 FIBO release consisted of 290 ontologies using 380 prefixes
(see [5, 6] for details)
that cover topics such as
legal entities, contracts, agency, trusts, regulators, securities, loans, derivatives, etc.
FIBO's reach and flexible ontological approach allow the integration of a wide variety of financial data,
but it comes at the price of more complex representation.


Crunchbase (CB) is a well-known dataset by TechCrunch that includes companies, key people, funding rounds, acquisitions, Initial Public Offerings (IPOs), etc.
It has about 2M companies with a good mix of established enterprises (including 47k public companies), mid-range companies and startups.
We (Ontotext and other Wikidata contributors) have matched 72k CB companies to Wikidata, see this query.


I explore the representation of Crunchbase data (more specifically IPOs) in FIBO
and compare it to the simplest possible semantic representation.
I therefore illustrate the complexity of FIBO, and explain its flexibility along the way.
I finish with some discussion and conclusions as to when FIBO can bring value to fintech KG projects.



1.1 Open Source Project


This example is available as open source at https://github.com/VladimirAlexiev/crunchbase-fibo
and includes the following files:




  • Makefile: orchestrate file generation with make


  • ipos-sample.csv: 4 sample CSV rows


  • cb-model.ttl, cb-model.png: simple CB model (all CB data) and generated image


  • ipos-agents.ttl, ipos-agents.png: part of model and generated image


  • ipos-offering.png, ipos-offering.ttl: part of model and generated image


  • ipos-financials.png, ipos-financials.ttl: part of model and generated image


  • ipos-currencies.ttl, ipos-currencies.png: part of model and generated image


  • ipos-fibo.ttl, ipos-fibo.png: full FIBO IPO model (concatenated) and generated image


  • ipos-fibo.ru: OntoRefine SPARQL UPDATE transformation generated from the full FIBO IPO model


  • common.h: C preprocessor file with SPARQL "functions" used in generating transformation


  • prefixes.ttl, prefixes.rq: all used prefixes in Turtle and SPARQL format


  • README.md, README.html: this writeup


  • bibliography.bib, bibliography.html: bibliography source and rendered HTML


  • acm-sig-proceedings-long-author-list.csl: bibliography style


  • pandoc-defaults.yaml: pandoc settings


  • pandoc-header.html: JavaScript to enable syntax highlighting


  • css/*: styles for syntax highlighting



2 Crunchbase Data


Crunchbase consists of 18 tables (available as CSV and a JSON API) that cover companies, universities, persons, financial transactions, events (conferences and workshops), etc.
The gist Crunchbase Challenge describes the complete database, but in this blog post I focus on Initial Public Offerings (IPOs).
An IPO is when a company goes public (is listed at a stock exchange and starts trading), which is considered one possible "exit" for its investors and founders.


Crunchbase has a table ipos with the following fields:

field
type
descr

uuid
string
Unique identifier, never changes

name
string
Entity name (often empty for IPOs)

type
string
Entity type (always "ipo" for IPOs)

permalink
string
Suffix of cb_url. Despite the name, sometimes changes

cb_url
anyURI
Full Crunchbase URL of the IPO event

rank
integer
Crunchbase rank (smaller is "more important")

created_at
dateTime
When the Crunchbase record was created

updated_at
dateTime
When the Crunchbase record was updated

org_uuid
string
Points to the company that was listed

stock_exchange_symbol
string
Exchange code. Uses internal Crunchbase codes that are ambiguous

stock_symbol
string
Ticker on the exchange

went_public_on
dateTime
When the company went public (IPO date)

share_price_usd
decimal
The share price for the stock at the time of IPO, in US dollars

share_price
decimal
The share price for the stock at the time of IPO, in local currency

share_price_currency_code
string
Local currency of share price

valuation_price_usd
decimal
Valuation of the Organization at IPO, in US dollars

valuation_price
decimal
Valuation of the Organization at IPO, in local currency

valuation_price_currency_code
string
Local currency of the valuation

money_raised_usd
decimal
Total amount raised from the IPO, in US dollars

money_raised
decimal
Total amount raised from the IPO, in local currency

money_raised_currency_code
string
Local currency of the total amount raised

There are also organization attributes (org_name, org_cb_url, country_code, state_code, region, city) that are redundant, thus not shown above.



  • The first 8 fields are present in pretty much every CB entity

  • The next 13 fields are specific to the IPO entity


  • stock_exchange_symbol uses internal Crunchbase exchange codes that are sometimes ambiguous
    (for example, "MSE" may mean the Madrid Stock Exchange, Metropolitan Exchange in India, Mongolia Stock Exchange, etc).
    We (Ontotext) have mapped all 156 codes to Wikidata, together with the unambiguous Market Identification Codes (ISO 10383 MIC),
    see this query:


select ?item ?itemLabel ?cb ?mic {

?item wdt:P7534 ?mic; # ISO MIC
p:P528 [ps:P528 ?cb; # catalog code
pq:P972 wd:Q10846831]. # catalog
service wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}



  • stock_symbol is the company code on that exchange.
    This is a valuable field for coreferencing public companies across datasets, but one should beware that:

    • Stock symbols are only 85% unique world-wide, so they need to be interpreted in conjunction with the Exchange field

    • The same company may have several stock symbols (even on the same exchange over time), and symbols may be sold and bought between companies over time



  • "Share price" is the stock price at the time of IPO (not sure whether the opening or closing price)

  • "Valuation" (market capitalization) is the product of all outstanding shares times the share price

  • "Money raised" is the product of share price times the shares sold during the IPO

  • These 3 financial factors are represented with 3 fields each, to account for local currencies and USD


For example, here are the first 4 CB IPO records

field
example1
example2
example3
example4

uuid
72d30ebd-53ef-2486-6c29-22785c5173ce
3ad2b068-2d97-f646-0b80-1e5f3d7adfc4
a265c6f6-4b96-4079-096a-967a37f3da2b
ee426509-826e-5dd0-9309-e79c8f384904

type
ipo
ipo
ipo
ipo

permalink
microsoft-ipo--72d30ebd
the-walt-disney-company-ipo--3ad2b068
divx-ipo--a265c6f6
xo-group-ipo--ee426509

cb_url
https://www.crunchbase.com/ipo/microsoft-ipo--72d30ebd
https://www.crunchbase.com/ipo/the-walt-disney-company-ipo--3ad2b068
https://www.crunchbase.com/ipo/divx-ipo--a265c6f6
https://www.crunchbase.com/ipo/xo-group-ipo--ee426509

rank
31712
44186
14752
19369

created_at
2008-02-09 05:25:18
2008-02-09 05:40:32
2008-02-25 23:52:11
2008-02-29 00:31:34

updated_at
2018-02-12 23:11:05
2019-02-25 22:31:49
2018-02-12 23:57:54
2018-02-12 23:41:42

org_uuid
fd80725f-53fc-7009-9878-aeecf1e9ffbb
756936c0-c335-f0ae-0a3d-fe26bdff5695
73296f0d-85a5-78d5-90b3-86c5f8981ba9
ff8439cf-097c-a88a-9bb9-dd83d23aa14b

stock_exchange_symbol
nasdaq
nyse
nasdaq
nyse

stock_symbol
MSFT
DIS
DIVX
XOXO

went_public_on
1986-03-13
1978-01-13
2006-10-22
1999-12-02

share_price_usd

16.0

share_price

16.0

share_price_currency_code

USD

valuation_price_usd

160000000

valuation_price

160000000

valuation_price_currency_code

USD

money_raised_usd

300000000
145000000
35000000

money_raised

300000000
145000000
35000000

money_raised_currency_code

USD
USD
USD


3 Simple Semantic Model of Crunchbase


The simplest possible semantic representation of CB uses one class per table
(subsidiary tables like org_descriptions and org_parent are merged to other classes; and org_parents is a direct RDF relation).


I make URLs based on entity type and uuid.
I leverage the fact that UUIDs are globally unique (no conflicts between classes) to put Organizations and People under the abstract superclass Agent and in the same URL namespace cb/agent:
this is needed since some financial transactions (in particular funding_round) can involve a mix of persons and organizations.


The following diagram shows the complete CB semantic model of 18 entities:



It is generated from a semantic model in Turtle format (cb-model.ttl) using rdfpuml [3] and PlantUML.
This model is concatenated from 18 individual per-table models that include source field names in parentheses (also seen on the overall model).


The individual models are used to generate semantic transformations for Ontotext Refine in the form of SPARQL Update queries.
The same updates are used for both initial ingest and update, leveraging a global timestamp and using named graphs per each table row (over 10M graphs).
See gist Crunchbase Challenge and [1] for more details,
including the model source cb-model.ttl and timing (performance) information of the semantic conversions.


Important: Please note that this model covers all of CB, not just CB IPOs.
The IPO class (node) is shown with red border.



4 FIBO Model of Crunchbase IPOs


Now I turn to representing IPOs in FIBO by using the ontologies from the FIBO 2024Q1 release (see gist Converting FIBO from RDF to Turtle).


I use the following colors in the diagrams below.



  • lightblue: agents (Offeror=Issuer, Exchange=RegistrationAuthority)

  • yellow: Shares and share-related events (Offering, Listing)

  • red: Financial factors (share price, market capitalization, money raised)

  • green: currencies (and their codes)

  • lightgreen: other codes and codesets


See the next sections for entity diagrams of different kinds of entities (partial models),
followed by an overall model diagram, and a binding of the colors to specific RDF classes.



4.1 IPO Agents


First I represent the agents involved in the IPO: Exchange and Issuer.
(Perhaps the main agent is the public that invests money into the shares, but they are not individually represented in FIBO).




  • Stock exchanges are identified by internal CB codes.
    (As mentioned above, some CB Exchange codes are ambiguous, unlike the ISO 10383 MIC standard).

    • FIBO distinguishes between entities and their identifiers/codes in various coding schemes, which allows us to capture the CB exchange code scheme.

    • A FIBO related project is working on representing the MIC coding scheme, which involves not just exchanges and their primary markets, but also secondary markets on these exchanges.

    • A future effort could coreference CB and MIC exchanges precisely in FIBO, leveraging our work in Wikidata.



  • FIBO distinguishes the share Issuer/Offeror (a role) from the Agent playing that role (a CB organization)

  • As you can see, many entities obtain multiple types (classes). This allows to capture not only typical situations (as per CB) but also to cover exceptional corner cases

    • The Issuer of shares is typically also their Offeror, but I guess it's possible for the Offeror to be a third party

    • The Exchange is also typically the RegistrationAuthority of tickers and shares on that exchange, but I guess exceptions are possible





4.2 Share, Offering, Listing, Ticker


Now I represent the main entities of the IPO event, which in FIBO are:



  • Share

  • Offering of the share to the general public

  • Listing of the share at a stock exchange

  • Ticker that is allocated to the share



Please note that PublicOffering includes a number of non-FIBO properties (i.e. CB custom properties):


  cb:uuid       '(uuid)';

cb:name '(name)';
cb:permalink '(permalink)';
cb:url '(cb_url)'^^xsd:anyURI;
cb:rank '(rank)'^^xsd:integer;
cb:createdAt 'fixDate(created_at)'^^xsd:dateTime;
cb:updatedAt 'fixDate(updated_at)'^^xsd:dateTime.

Truth be told, these are not very semantic:



  • They include CB data about an event (name, permalink, url, rank) that is not properly attributed to CB;


    • url could be mapped to a standard FIBO property.
      FIBO has a property for "home page" but it's not fair to say that CB's page about an IPO is its home page,
      because in all likelihood there are better pages at the Exchange and Company websites describing the IPO

    • If you need a proper ontology for representing web pages and their relation to real-world things, use schema.org.
      It has no less than 5 properties for relations of varying strength and nuance: url, sameAs, mainEntity, about, mentions.
      (And there's also keywords used for free-text keywords.)



  • They also include book-keeping information (uuid, createdAt, updatedAt)
    that is not about the event, but about CB's record of the event.


However, I didn't want to complicate the model even further by placing these fields in yet more FIBO nodes.
Furthermore, we'd have to repeat the same event nodes for both Offering and Listing.



4.3 Financial Factors


The 3 financial factors (share price, market capitalization, money raised) are expressed in 2 currencies each:



Of the 3 factors, Market Capitalization is represented using a more complex pattern:



  • The <.../marketCap/...> node being MarketCapitalization, which is based on the pricePerShare and has value <.../marketCapValue/...>

  • The <.../marketCapValue/...> node being MonetaryAmount


Please note the URL patterns used by the financial nodes in "national currency" vs in "USD":


<cb/ipo/(uuid)/pricePerShare/(share_price_currency_code)>      vs <cb/ipo/(uuid)/pricePerShare/USD>

<cb/ipo/(uuid)/marketCap/(valuation_price_currency_code)> vs <cb/ipo/(uuid)/marketCap/USD>
<cb/ipo/(uuid)/marketCapValue/(valuation_price_currency_code)> vs <cb/ipo/(uuid)/marketCapValue/USD>

In an initial version of the mapping, I used a simpler pattern:


<cb/ipo/(uuid)/pricePerShare>                                  vs <cb/ipo/(uuid)/pricePerShareUsd>

<cb/ipo/(uuid)/marketCap> vs <cb/ipo/(uuid)/marketCapUsd>
<cb/ipo/(uuid)/marketCapValue> vs <cb/ipo/(uuid)/marketCapValueUsd>

The difference is subtle but crucial: the current URL pattern effectively merges the financial nodes where the "national currency" is "USD".
Since many IPOs are denominated in USD, I can save a significant number of nodes by using URL patterns that incorporate the currency code.


I assume that the monetary fields (eg (money_raised) vs (money_raised_usd)) are identical for US IPOs.
If that's true, then the multiple instances of the same statement will be collapsed by the semantic repository on data ingestion,
so we won't have duplicate statements in the repository.



4.4 Currencies


Finally, I represent the currencies used for the Financials.
This includes USD plus up to 3 national currencies for the 3 financials.
(I can't imagine an IPO that would use different national currencies for its 3 financials, but Crunchbase has used separate fields, so I represent separate nodes).



Please note that these nodes are shared between all IPOs, so they are not a large number.
And for IPOs that use the same currency for their 3 financials, only 1 pair of nodes will be generated.


FIBO distinguishes betwen currencies and their codes in a particular code set (in this case "CrunchBase currency code set").
But CB uses ISO 4217 standard currency codes, and FIBO already includes such data in the ontology FND/Accounting/ISO4217-CurrencyCodes.rdf, e.g.:


fibo-fnd-acc-4217:USD

rdf:type fibo-fnd-acc-cur:CurrencyIdentifier , owl:NamedIndividual ;
fibo-fnd-rel-rel:hasTag "USD" ;
rdfs:label "USD" ;
cmns-dsg:denotes fibo-fnd-acc-4217:USDollar ;
cmns-id:identifies fibo-fnd-acc-4217:USDollar ;

fibo-fnd-acc-4217:USDollar
rdf:type fibo-fnd-acc-cur:Currency , owl:NamedIndividual ;
cmns-dsg:hasName "US Dollar" .
rdfs:label "US Dollar" ;
fibo-fnd-acc-cur:hasNumericCode "840" ;
cmns-cxtdsg:isUsedBy lcc-3166-1:VirginIslandsBritish ...


So why didn't I reuse the FIBO currency nodes and instead made CB currency nodes?
The reason is subtle and is described in fibo/issues/1816:



  • As you can see above, currency nodes use the currency name, eg fibo-fnd-acc-4217:USDollar

  • But in CB we have the currency code, eg USD

  • The financial nodes must connect to the currency nodes, not the currency code nodes

  • Given fibo-fnd-acc-4217:USD, we can very easily find fibo-fnd-acc-4217:USDollar with a simple SPARQL query

  • However, most semantic ETL tools (e.g. TARQL, RML, Karma) cannot run a SPARQL query (or even access a simple RDF pattern) while performing ETL over tabular data

  • Only tools like OntoRefine, SPARQL Generate and XSPARQL can join tabular source data to RDF data in a repository.


So I decided to play it safe and use the CB currency nodes.
The EDM Council has agreed to change the node names a bit to:


fibo-fnd-acc-4217:CurrencyCode-USD, fibo-fnd-acc-4217:Currency-USDollar

And to add a separate coreferencing file that will have sameAs statements for each currency like this:


fibo-fnd-acc-4217:Currency-USDollar owl:sameAs fibo-fnd-acc-4217:Currency-USD

Generated with the following SPARQL query:


construct {

?curr owl:sameAs ?currAsCode
} where {
?code a fibo-fnd-acc-cur:CurrencyIdentifier; fibo-fnd-rel-rel:hasTag ?c; cmns-id:identifies ?curr.
bind(iri(concat(str(fibo-fnd-acc-4217:),"Currency-",?c)) as ?currAsCode)
}

This will effectively add alias URLs for each currency (the two sameAs URLs shown above),
so now external data like CB can safely refer to currency nodes like fibo-fnd-acc-4217:Currency-USD.



4.5 Overall FIBO IPOs Model


Finally, I present the overall FIBO IPOs model:



It's a complex graph:



  • It has 25 nodes, of which 11 are shared with other IPOs (the currencies and currency codes) and 14 are per-IPO

  • It has 136 triples; compare to the simplest model that uses 1 node and 22 triples

  • It uses terms from many ontologies: 13 from FIBO and 8 from the Object Management Group (OMG)'s Commons Ontology Library (see prefixes.ttl).


@prefix cb:                <https://ontotext.com/crunchbase/ontology/> .

@prefix cmns-cds: <https://www.omg.org/spec/Commons/CodesAndCodeSets/> .
@prefix cmns-col: <https://www.omg.org/spec/Commons/Collections/> .
@prefix cmns-cxtdsg: <https://www.omg.org/spec/Commons/ContextualDesignators/> .
@prefix cmns-dsg: <https://www.omg.org/spec/Commons/Designators/> .
@prefix cmns-dt: <https://www.omg.org/spec/Commons/DatesAndTimes/> .
@prefix cmns-id: <https://www.omg.org/spec/Commons/Identifiers/> .
@prefix cmns-qtu: <https://www.omg.org/spec/Commons/QuantitiesAndUnits/> .
@prefix cmns-rlcmp: <https://www.omg.org/spec/Commons/RolesAndCompositions/> .
@prefix fibo-fbc-fct-mkt: <https://spec.edmcouncil.org/fibo/ontology/FBC/FunctionalEntities/Markets/> .
@prefix fibo-fbc-fct-ra: <https://spec.edmcouncil.org/fibo/ontology/FBC/FunctionalEntities/RegistrationAuthorities/> .
@prefix fibo-fbc-fi-fi: <https://spec.edmcouncil.org/fibo/ontology/FBC/FinancialInstruments/FinancialInstruments/> .
@prefix fibo-fbc-pas-fpas: <https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/FinancialProductsAndServices/> .
@prefix fibo-fnd-acc-cur: <https://spec.edmcouncil.org/fibo/ontology/FND/Accounting/CurrencyAmount/> .
@prefix fibo-fnd-agr-ctr: <https://spec.edmcouncil.org/fibo/ontology/FND/Agreements/Contracts/> .
@prefix fibo-fnd-arr-id: <https://spec.edmcouncil.org/fibo/ontology/FND/Arrangements/IdentifiersAndIndices/> .
@prefix fibo-fnd-rel-rel: <https://spec.edmcouncil.org/fibo/ontology/FND/Relations/Relations/> .
@prefix fibo-ind-mkt-bas: <https://spec.edmcouncil.org/fibo/ontology/IND/MarketIndices/BasketIndices/> .
@prefix fibo-sec-eq-eq: <https://spec.edmcouncil.org/fibo/ontology/SEC/Equities/EquityInstruments/> .
@prefix fibo-sec-sec-id: <https://spec.edmcouncil.org/fibo/ontology/SEC/Securities/SecuritiesIdentification/> .
@prefix fibo-sec-sec-iss: <https://spec.edmcouncil.org/fibo/ontology/SEC/Securities/SecuritiesIssuance/> .
@prefix fibo-sec-sec-lst: <https://spec.edmcouncil.org/fibo/ontology/SEC/Securities/SecuritiesListings/> .


What is the reason for this complexity?



  • FIBO distinguishes between Offering (the IPO), Share, Listing (of the share at an exchange), Ticker (a ReassignableIdentifier)

  • FIBO represents the financials as explicit nodes (and Market Capitalization as 2 nodes).

  • FIBO represents the stock exchange and its code as two separate nodes

  • Finally, FIBO represents currencies and their codes as two separate nodes



4.6 FIBO Changes


I have validated this representation with the EDM Council (see fibo/issues/1808):
performed 3 iterations and implemented all suggested additions/corrections.


The initial version of this paper (5-Sep-2023) used FIBO 2022Q2.
But there were numerous changes to FIBO since then. This includes:



  • Integration of a number of patterns that are now part of the OMG's Commons Ontology Library Specification v1.1 beta, many of which were derived from FIBO and improved on.

  • Additional contentw as added to FIBO in securities and derivatives that was not available when this example was initially created.


Elisa Kendall graciously updated the example to use the new ontology terms.
For the curious, a detailed diff can be seen on Github.
The following namespaces were changed:



  • ADDED


@prefix cmns-cds:         <https://www.omg.org/spec/Commons/CodesAndCodeSets/> .

@prefix cmns-col: <https://www.omg.org/spec/Commons/Collections/> .
@prefix cmns-dsg: <https://www.omg.org/spec/Commons/Designators/> .
@prefix cmns-dt: <https://www.omg.org/spec/Commons/DatesAndTimes/> .
@prefix cmns-cxtdsg: <https://www.omg.org/spec/Commons/ContextualDesignators/> .
@prefix cmns-id: <https://www.omg.org/spec/Commons/Identifiers/> .
@prefix cmns-qtu: <https://www.omg.org/spec/Commons/QuantitiesAndUnits/> .
@prefix cmns-rlcmp: <https://www.omg.org/spec/Commons/RolesAndCompositions/> .


  • DELETED


@prefix fibo-fnd-utl-alx: <https://spec.edmcouncil.org/fibo/ontology/FND/Utilities/Analytics/> .

@prefix fibo-fnd-dt-fd: <https://spec.edmcouncil.org/fibo/ontology/FND/DatesAndTimes/FinancialDates/> .
@prefix lcc-lr: <https://www.omg.org/spec/LCC/Languages/LanguageRepresentation/> .


4.7 How I Made the Overall Model?


I made the overall model by simply concatenating the Turtle files of the individual models presented in earlier sections.
Concatenating any Turtle files produces a valid turtle file, since @prefix and @base can occur anywhere in Turtle.
(But don't abuse this feature by redefining prefixes mid-way!)


If you have played with UML XMI files, the simplicity of composing semantic models by simply concatenating them should be refreshing.
It mirrors the simplicity of semantic data integration by "simply" converting any kind of data to RDF, using consistent URLs, and "pouring" all data into a semantic repository.


I have used some rdfpuml (PlantUML) layout instructions in the models to improve the look of the overall model.
These specify the direction and length of a few arrows, and set colored circles for the classes (called "UML stereotype").


cmns-cxtdsg:appliesTo                  puml:arrow puml:up.

cmns-id:identifies puml:arrow puml:up.
cmns-qtu:hasArgument puml:arrow puml:up.
cmns-rlcmp:isPlayedBy puml:arrow puml:up.
fibo-fbc-fi-fi:isDenominatedIn puml:arrow puml:down-4.
fibo-fnd-acc-cur:isPriceFor puml:arrow puml:up.
fibo-fnd-rel-rel:isIssuedBy puml:arrow puml:up.

cmns-cds:CodeSet puml:stereotype "(C,lightgreen)".
cmns-id:IdentificationScheme puml:stereotype "(S,lightgreen)". # also cmns-cds:CodeSet
cmns-id:Identifier puml:stereotype "(I,lightgreen)".
fibo-fbc-fct-mkt:Exchange puml:stereotype "(X,lightblue)". # also fibo-fbc-fct-ra:RegistrationAuthority
fibo-fbc-fi-fi:Issuer puml:stereotype "(I,lightblue)". # also fibo-fbc-pas-fpas:Offeror
fibo-fnd-acc-cur:Currency puml:stereotype "(C,green)".
fibo-fnd-acc-cur:CurrencyIdentifier puml:stereotype "(I,green)".
fibo-fnd-acc-cur:MonetaryAmount puml:stereotype "(A,red)".
fibo-fnd-acc-cur:MonetaryPrice puml:stereotype "(P,red)".
fibo-ind-mkt-bas:MarketCapitalization puml:stereotype "(C,red)".
fibo-sec-eq-eq:PricePerShare puml:stereotype "(P,red)".
fibo-sec-sec-id:TickerSymbol puml:stereotype "(T,lightgreen)".
fibo-sec-sec-iss:PublicOffering puml:stereotype "(O,yellow)".
fibo-sec-sec-lst:ListedSecurity puml:stereotype "(S,yellow)". # also fibo-sec-eq-eq:Share
fibo-sec-sec-lst:Listing puml:stereotype "(L,yellow)".



4.8 Generating Semantic Transformation


Although the FIBO IPO model is considerably more complex than the CB IPO model,
we can use the rdf2sparql script (part of the RDF by Example open source project)
to generate a semantic transformation automatically [1].


ipos-fibo.ru is an OntoRefine SPARQL UPDATE transformation (240 lines)
generated from the FIBO IPO model that has the following parts:



  • 22 prefixes, collected manually from the various model parts


  • delete graph that empties the named graph for the row,
    specified with a comment in the first model file: # GRAPH <cb/ipos/(uuid)>:


delete {graph ?cb_ipos_uuid_URL {?_s_ ?_p_ ?_o_}}

where {
service <rdf-mapper:ontorefine:PROJECT_ID> {
bind(?c_uuid as ?uuid)
bind(iri(concat("cb/ipos/",?uuid)) as ?cb_ipos_uuid_URL)
bind(?c_updated_at as ?updated_at)
}
<cb> cb:updatedAt ?UPDATED_AT_DT
bind(replace(str(?UPDATED_AT_DT),'T',' ') as ?UPDATED_AT) filter(?updated_at > ?UPDATED_AT)
graph ?cb_ipos_uuid_URL {?_s_ ?_p_ ?_o_}};

CrunchBase has about 18 tables and each row has uuid and updated_at timestamp.
This allows us to process the whole dump and daily updates using the same generated set of scripts,
by storing each row of each table in a separate named graph (about 14M graps).
We clear only graphs that have been updated since the last ingest.
The service <rdf-mapper:ontorefine:PROJECT_ID> pattern is executed against an OntoRefine virtual SPARQL endpoint,
whereas cb:updatedAt fetches a global timestamp recorded in the real RDF repository.
See rdf2sparql: Global Filtering for details.



  • 144 lines in the insert graph part that come from the IPO model

  • 57 bind to compute values from source CSV fields.


Binds come in several varieties:



  • Simple data cleaning, eg


bind(REPLACE(?created_at,' ','T') as ?created_at_FIXDATE)


  • Attach datatypes, eg


bind(strdt(?created_at_FIXDATE,xsd:dateTime) as ?created_at_FIXDATE_xsd_dateTime)


  • Compute URLs, eg


bind(iri(concat("cb/ipo/",?uuid,"/pricePerShare/USD")) as ?cb_ipo_uuid_pricePerShare_USD_URL)

Here are some of the binds.
Hopefully you can appreciate how a simple declarative model
is used to generate a complex (even cryptic) transformation:


    bind(REPLACE(?created_at,' ','T') as ?created_at_FIXDATE)

bind(strdt(?created_at_FIXDATE,xsd:dateTime) as ?created_at_FIXDATE_xsd_dateTime)
bind(REPLACE(?updated_at,' ','T') as ?updated_at_FIXDATE)
bind(strdt(?updated_at_FIXDATE,xsd:dateTime) as ?updated_at_FIXDATE_xsd_dateTime)
bind(iri(concat("cb/ipo/",?uuid,"/listing")) as ?cb_ipo_uuid_listing_URL)
bind(iri(concat("cb/currency/",?share_price_currency_code)) as ?cb_currency_share_price_currency_code_URL)
bind(iri(concat("cb/ipo/",?uuid,"/ticker")) as ?cb_ipo_uuid_ticker_URL)
bind(iri(concat("cb/currency/",?valuation_price_currency_code)) as ?cb_currency_valuation_price_currency_code_URL)
bind(iri(concat("cb/currency/",?money_raised_currency_code)) as ?cb_currency_money_raised_currency_code_URL)
bind(strdt(?money_raised,xsd:decimal) as ?money_raised_xsd_decimal)
bind(strdt(?money_raised_usd,xsd:decimal) as ?money_raised_usd_xsd_decimal)
bind(iri(concat("cb/ipo/",?uuid,"/marketCap/",?valuation_price_currency_code)) as ?cb_ipo_uuid_marketCap_valuation_price_currency_code_URL)
bind(iri(concat("cb/ipo/",?uuid,"/marketCapValue/",?valuation_price_currency_code)) as ?cb_ipo_uuid_marketCapValue_valuation_price_currency_code_URL)
bind(iri(concat("cb/ipo/",?uuid,"/pricePerShare/",?share_price_currency_code)) as ?cb_ipo_uuid_pricePerShare_share_price_currency_code_URL)
bind(iri(concat("cb/ipo/",?uuid,"/marketCap/USD")) as ?cb_ipo_uuid_marketCap_USD_URL)
bind(iri(concat("cb/ipo/",?uuid,"/marketCapValue/USD")) as ?cb_ipo_uuid_marketCapValue_USD_URL)
bind(iri(concat("cb/ipo/",?uuid,"/pricePerShare/USD")) as ?cb_ipo_uuid_pricePerShare_USD_URL)


5 Conclusion


FIBO is good for capturing real-world complexity and integrating data from numerous sources, e.g.



  • If you need to integrate "tape ticker" information (data about stock prices),
    these will mesh nicely with the CB data on "price per share" on the day of the IPO.
    The FIBO IPO model uses single pricePerShare nodes, but it's easy to multiply these by adding a timestamp to the node URL.
    In contrast, the simple CB model has only one slot for "price per share".

  • You can represent evolutions like a ticker being reassigned to another company,
    or the exchange listing being purchased by another company


However, that flexibility comes at the price of higher complexity. Compare:



  • The first figure that shows all of CB in the simplest possible semantic form, to

  • The last figure that shows only IPOs represented in FIBO


Should you use FIBO in fintech applications?



  • As a definitive reference model, yes.

  • As the "physical" semantic integration model: depends on your requirements including:

    • Number and variety of sources

    • Ability to ingest data sources in the future, without changing

    • Complexity of competence questions

    • Ability of data consumers (query writers) to navigate the complex graph efficiently

    • Possibility to introduce some query abstractions or "facades" to simplify querying.
      As such examples I can point out to GraphQL (e.g. as implemented over RDF by Ontotext Semantic Objects)
      and to CIDOC CRM Fundamental Relations search [2, 4].




Such calls rest ultimately with the Semantic Data Architect, who should evaluate the tradeoffs of different representations.


Finally, I close with a couple of quips regarding FIBO:



  • FIBO definitely comes "With Batteries Included".

  • But as the Romans said, "Caveat Emptor".

  • And as Americans say, "Your Mileage May Vary".



5.1 Acknowledgements



  • I am grateful to Elisa Kendall and Pawel Garbacz for their feedback on my IPO representation in FIBO.

  • I am especially grateful to Elisa Kendall or updating the example to FIBO version 2024Q1.



5.2 References




[1]

Alexiev, V. 2023. Generation of Declarative Transformations from Semantic Models. European Data Conference on Reference Data and Semantics (ENDORSE 2023) (Mar. 2023).



[2]

Alexiev, V. 2012. Implementing CIDOC CRM Search Based on Fundamental Relations and OWLIM Rules. Workshop on Semantic Digital Archives (SDA 2012), part of International Conference on Theory and Practice of Digital Libraries (TPDL 2012) (Paphos, Cyprus, Sep. 2012).



[3]

Alexiev, V. 2016. RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation. Semantic Web in Libraries (SWIB) (Bonn, Germany, Nov. 2016).



[4]

Alexiev, V., Manov, D., Parvanova, J. and Petrov, S. 2013. Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM). Workshop practical experiences with CIDOC CRM and its extensions (CRMEX 2013) at TPDL 2013 (Valetta, Malta, Sep. 2013).



[5]

Allemang, D., Garbacz, P., Grądzki, P., Kendall, E. and Trypuz, R. 2021. An Infrastructure for Collaborative Ontology Development: Lessons Learned from Developing the Financial Industry Business Ontology (FIBO). Formal Ontology in Information Systems: Proceedings of the Twelfth International Conference (FOIS 2021). F. Neuhaus and B. Brodaric, eds. IOS Press.



[6]

Garbacz, P. and Kendall, E.F. 2022. Reasoning in the FIBO ontology - a challenge. 2nd Semantic Reasoning Evaluation Challenge and 3rd SeMantic Answer Type, Relation and Entity Prediction Tasks Challenge (SemREC/SMART@ISWC) (2022).