{"id":34941990,"url":"https://github.com/cdk/cdk-paper-2","last_synced_at":"2026-04-25T01:34:04.901Z","repository":{"id":137507743,"uuid":"143969518","full_name":"cdk/cdk-paper-2","owner":"cdk","description":"The green Open Access version of the second CDK paper.","archived":false,"fork":false,"pushed_at":"2023-08-15T22:09:51.000Z","size":294,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-12-28T06:48:49.451Z","etag":null,"topics":["cdk","cheminformatics","java","r","statistics"],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cdk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2018-08-08T06:20:43.000Z","updated_at":"2023-08-15T22:09:55.000Z","dependencies_parsed_at":"2025-09-09T22:00:05.992Z","dependency_job_id":"d5986574-fcf6-479c-97c8-be5ed3933d04","html_url":"https://github.com/cdk/cdk-paper-2","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cdk/cdk-paper-2","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cdk%2Fcdk-paper-2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cdk%2Fcdk-paper-2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cdk%2Fcdk-paper-2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cdk%2Fcdk-paper-2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cdk","download_url":"https://codeload.github.com/cdk/cdk-paper-2/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cdk%2Fcdk-paper-2/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32247223,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-24T13:21:15.438Z","status":"ssl_error","status_checked_at":"2026-04-24T13:21:15.005Z","response_time":64,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cdk","cheminformatics","java","r","statistics"],"created_at":"2025-12-26T19:20:24.424Z","updated_at":"2026-04-25T01:34:04.894Z","avatar_url":"https://github.com/cdk.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"## The green Open Access version of the second CDK paper.\n\nThe [second paper](https://www.ingentaconnect.com/content/ben/cpd/2006/00000012/00000017/art00005)\nwas published in [Current Pharmaceutical Design](http://benthamscience.com/journals/current-pharmaceutical-design/)\nwith DOI [10.2174/138161206777585274](https://doi.org/10.2174/138161206777585274), and\nwas not published as (CC-BY) Open Access. However, SHERPA/RoMEO [reports](http://www.sherpa.ac.uk/romeo/search.php?issn=1381-6128)\nthat pre- or post-print versions can be archived (just not the publisher PDF). So here goes...\n\nCopyright (C) 2016 The Authors.\n\nOh, and check our what [Wikidata](https://wikidata.org/) knows about [this article using Scholia](https://scholia.toolforge.org/work/Q27065423)...\n\n---\n\n\u003cscript type=\"application/ld+json\"\u003e\n{\n  \"@context\": \"https://schema.org\",\n  \"@graph\": [\n    {\n        \"@id\": \"#issue\",\n        \"@type\": \"PublicationIssue\",\n        \"issueNumber\": \"17\",\n        \"datePublished\": \"2006\",\n        \"isPartOf\": {\n            \"@id\": \"#periodical\",\n            \"@type\": [\n                \"PublicationVolume\",\n                \"Periodical\"\n            ],\n            \"name\": \"Current Pharmaceutical Design\",\n            \"issn\": [\n                \"1381-6128\",\n                \"1873-4286\"\n            ],\n            \"volumeNumber\": \"12\"\n        }\n    },\n    {\n        \"@type\": \"ScholarlyArticle\",\n        \"isPartOf\": \"#issue\",\n        \"description\": \"The Chemistry Development Kit (CDK) provides methods for common tasks in molecular informatics, including 2D and 3D rendering of chemical structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc. Implemented in Java, it is used both for server-side computational services, possibly equipped with a web interface, as well as for applications and client-side applets. This article introduces the CDK's new QSAR capabilities and the recently introduced interface to statistical software.\",\n        \"sameAs\": \"https://doi.org/10.2174/138161206777585274\",\n        \"pageEnd\": \"2111\",\n        \"pageStart\": \"2120\",\n        \"name\": \"Recent Developments of the Chemistry Development Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics\",\n        \"author\": [\n          \"Steinbeck, Christoph\",\n          \"Hoppe, Christian\",\n          \"Kuhn, Stefan\",\n          \"Floris, Matteo\",\n          \"Guha, Rajarshi\",\n          \"Willighagen, Egon L.\"\n        ]\n    }\n  ]\n}\n\u003c/script\u003e\n# Recent Developments of the Chemistry Development Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics\n\nChristoph Steinbeck\u003csup\u003ef\u003c/sup\u003e \u003ca href=\"https://orcid.org/0000-0001-6966-0814\"\u003e\u003cimg src=\"orcid_16x16.png\" \u003e\u003c/a\u003e; Christian Hoppe\u003csup\u003ef\u003c/sup\u003e; Stefan Kuhn\u003csup\u003ef\u003c/sup\u003e \u003ca href=\"https://orcid.org/0000-0002-5990-4157\"\u003e\u003cimg src=\"orcid_16x16.png\" \u003e\u003c/a\u003e; Matteo Floris \u003ca href=\"https://orcid.org/\t0000-0003-4385-9336\"\u003e\u003cimg src=\"orcid_16x16.png\" \u003e\u003c/a\u003e; Rajarshi Guha\u003csup\u003e1\u003c/sup\u003e \u003ca href=\"https://orcid.org/0000-0001-7403-8819\"\u003e\u003cimg src=\"orcid_16x16.png\" \u003e\u003c/a\u003e; Egon L. Willighagen\u003csup\u003e§\u003c/sup\u003e \u003ca href=\"https://orcid.org/0000-0001-7542-0286\"\u003e\u003cimg src=\"orcid_16x16.png\" \u003e\u003c/a\u003e\n\n\u003csup\u003ef\u003c/sup\u003e Cologne University Bioinformatics Center (CUBIC), Cologne, Germany\n\n\u003csup\u003e1\u003c/sup\u003e Pennsylvania State University, PA, USA\n\n\u003csup\u003e§\u003c/sup\u003e Institute for Molecules and Materials, Radboud University Nijmegen, The Netherlands\n\n## Abstract\n\nThe Chemistry Development Kit (CDK) provides methods for common tasks in molecular informatics, including 2D and 3D rendering of chemical structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc. Implemented in Java, it is used both for server-side computational services, possibly equipped with a web interface, as well as for applications and client-side applets. This article introduces the CDK's new QSAR capabilities and the recently introduced interface to statistical software.\n\n## Introduction\n\nChemoinformatics is a scientific discipline, which attempts to solve problems in chemistry with methods devel-\noped in computer science. This rather broad definition by\nJohann Gasteiger [1] covers a number of overlapping topics\nfrom a diverse set of fields - including mathematics, statistics, computer science, pattern recognition and machine\nlearning - applied to creating, processing and understanding\nchemical information. Examples of chemoinformatics applications include characterization of molecular structures using\ngraph theoretical methods, detecting structure property\ntrends using neural networks and other statistical methods\nand applications of efficient algorithms to detect substructures. With the advent of high throughput methods such as\nhigh throughput screening and combinatorial chemistry, the\ndemands – in terms of greater speed as well as greater accuracy - made on chemoinformatics methods and tools have\nincreased.\n\nThough the term, chemoinformatics, may have been\ncoined recently, work in this field has been in progress for\nthe last 20 years and the benefits made available to the drug\ndevelopment and design community from this field have\nresulted in it becoming an area for commercial ventures. A\nwhole software industry focused specifically on pharmaceutical chemoinformatics (such as, but not limited to, [2-4]), is\nnow competing for a rather small market.\nIn the academic community a lot of research has been\ndone in isolated areas of chemoinformatics, but for a long\ntime no attempt was made to create a general purpose, publicly available software package to support even the most\nprominent areas. Recently, this situation has changed significantly. The enormous progress made in the molecular sci-\nences such as large scale genomics, proteomics and metabolomics projects has only been possible because it was\nsupported by a bioinformatics software culture of openness.\n\nFrom the very beginning, there was an understanding that\ncommunal progress could only been made if tools were\nopenly shared and if people were freed from wasting productivity by reinventing the wheel again and again. This\nculture of openness has clearly influenced the research\ncommunity in chemistry which has now widely adopted it\nand continues to publish high quality, peer-reviewed software under open source licenses at a good rate.\nHere we report recent advancements of the Chemistry\nDevelopment Kit (CDK), an open source Java library for\nstructural chemo- and bioinformatics. The CDK originated in\nthe lab of one of us (CS) but was quickly adopted by a community of researchers and is now an actively developed open\nsource project supported by more than 30 contributors\nworld-wide. The CDK is used in a number of academic and\ncommercial chemoinformatics projects [5-13]. Access to\nsource code and documentation is provided via [http://cdk.sourceforge.net/](http://cdk.sourceforge.net/).\nWe have discussed the general architecture\nof the Chemistry Development Kit in an earlier article [14].\nAn overview of CDK's basic capabilities is given in Fig. 1.\nHere, we will focus on recent advancements of CDK in\nareas of interest for pharmaceutical design, such as the ability to compute molecular descriptors and the ability to interface with the open source statistics package R [15].\n\n![Figure 1](./images-000.jpg)\n\n**Figure 1**: An overview of the functionality available in the Chemistry Development Kit (CDK).\n\n## Molecular Descriptors\n\nThe function of a chemoinformatics toolkit, by definition,\nis to represent, generate and process chemical information.\nOne such source of information may be found in molecular\ndescriptors. These are sets of numeric values that mathematically characterize the structure and environment of a\nmolecule. Molecular descriptors are used in a number of areas\nsuch as database searching and QSAR modeling. Recently,\none line of work on the CDK project has been to focus on\nfeatures that would make it useful for inclusion in QSAR\nmodeling environments. To this end a number of molecular\ndescriptor routines have been added to the framework. Table\n1 gives an overview of descriptors currently implemented in\nthe CDK. This section discusses the general design of the\ndescriptor package.\n\nA fundamental decision made in the design of the package was to supplement descriptor implementations with\nmeta-data. In this context, meta-data includes information\nregarding the author (called vendor), version and title of the\nimplementation, and a reference to the dictionary describing\nthe descriptors. Descriptor entries in this dictionary contain\ninformation such as a reference to original literature, mathematical formulae describing the descriptor, links to related\ndescriptors, and other details on the exact algorithm used to\ncalculate descriptor values. These dictionaries are not specific to the CDK but are developed within an independent\nopen source QSAR project ([http://qsar.sf.net/](http://qsar.sf.net/)) and the descriptor and\nmeta-data dictionaries are available online from this project.\n\n**Table 1**: A summary of the types of descriptors currently available in the CDK. Download as [CVS](table-000.csv).\n\n| Class          | Implemented descriptors                     | Ref.    |\n|----------------|---------------------------------------------|---------|\n| Constitutional | Atom and bond counts, molecular weight      |         |\n|                | Aromatic atom and bond counts               |         |\n|                | Hydrogen bond donor/acceptor counts         |         |\n|                | Rotatable bond count                        |         |\n|                | Proton type                                 |         |\n|                | Pi-contact of two atoms                     | [16]    |\n|                | Proton RDF                                  | [17]    |\n|                | Rule of Five                                | [18]    |\n|                | XLogP                                       | [19]    |\n| Topological    | Xₜ indices (°Xₜ and ¹Xₜ)                    | [20-22] |\n|                | Xᵥ indices (°Xᵥ and ¹Xᵥ)                    | [20-22] |\n|                | Wiener number                               | [23]    |\n|                | Zagreb index                                | [24]    |\n|                | Vertex adjacency information                |         |\n|                | Atomic degree                               |         |\n|                | Petitjean number                            | [25]    |\n|                | K shape indices (¹K,²K ,³K)                 | [26-28] |\n| Geometric      | Gravitational indices                       | [29]    |\n|                | Shortest path bond count                    | [16]    |\n|                | Moment of inertia                           | [30]    |\n|                | Distance in space                           | [16]    |\n| Electronic     | Sigma electronegativity                     |         |\n|                | Proton partial charges                      |         |\n|                | Van der Waals radii                         |         |\n|                | Number of valence electrons                 |         |\n|                | Polarizability (effective, sum, difference) |         |\n| Hybrid         | BCUT, WHIM                                  | [31-35] |\n|                | Topological surface area                    |         |\n\n\nThe goal of the meta-data is to allow the user to determine information regarding the descriptor as well as the\ndescriptor value itself. This is important, since in many cases\ndescriptor implementations and definitions are separate. As a result, one ends up with a large set of numbers which are not\nclosely tied to meaning. The inclusion of meta-data in the\nCDK descriptor implementations alleviates this problem.\nAnother example of the use of meta-data is to differentiate\nbetween descriptors that return different types of values. For\nexample the BCUT [31] descriptors are essentially the n\nhighest and lowest eigenvalues of the weighted Burden matrix [36]. Hence the descriptor value is a vector of numbers.\nOn the other hand constitutional descriptors, such as the\ncount of halogen atoms, return a single number. The use of\ndescriptor meta-data allows the user to identify the nature of\nthe return values of different descriptors.\n\nAn important use of the meta-data dictionaries is to allow, in conjunction with namespaces, multiple\nimplementations of a given descriptor to coexist. This is important in\napplications where a user already has descriptor routines\n(which may clash with descriptor routines present in the\nCDK) and would like to include them in the CDK framework. The use of meta-data allows different programs to\ncalculate descriptors from the dictionaries, and then mark the\ncalculated descriptor values with implementation details, so\nthat clashes will not occur.\n\n![Figure 2](./images-001.jpg)\n\n**Figure 2**: UML diagram of the Descriptor and DescriptorResult interfacesby the interface.\n\nTo allow for easy inclusion of new descriptor routines, a\nDescriptor interface was created (see Fig. 2). This interface\ndescribes a number of methods that each descriptor must\nimplement. These include methods to perform the\ncalculation, set parameters, extract meta-data and so on. Hence each\ndescriptor routine is a Java class that implements this \ninterface. The design of the descriptor package as a set of classes\nallows for the automated calculation of descriptors. This is\nachieved by a compile time feature, which recognizes \nimplemented descriptor classes (via JavaDoc tags) and builds a\nlist of these classes. This list is then available at runtime,\nallowing the user to use all or a subset of the available\ndescriptors. As a result new descriptor routines can simply be\nplaced in the correct location of the class hierarchy and a\nrecompile of the CDK will result in the new routines being\nautomatically available.\n\nAnother feature of the descriptor package is the uniform\ntreatment of descriptor return values. Different descriptors\nwill return different types of information. For example, a\nconstitutional descriptor such as the count of carbon atoms\nreturns a single number whereas the gravitational index descriptor returns nine values. Other descriptors (such as\nBCUT) return a variable number of values. To allow for uniform access to the descriptor return values, all descriptors\nreturn a class implementing the DescriptorResult interface.\nCurrently five classes are present in the org.openscience.\ncdk.qsar.result package implementing three simple and two\ncomplex return types (see Fig. 3). As a result of this design,\nall descriptors return a uniform value, which can be inspected to correctly obtain the actual calculated values.\n\n![Figure 3](./images-002.jpg)\n\n**Figure 3**: UML diagram of the DescriptorResult interface and five classes from the org.openscience.cdk.qsar.result package that implement the interface.\n\nOnce descriptors are calculated we need to consider the\nquestion of storing the results. In many cases descriptors will\nbe calculated for a set of molecules and further processing\nwill be carried out within the program. However, a useful\nfeature is persistence of calculated values. A trivial approach\nis to write the descriptor values and associated meta-data to a\nplain text file. However a more structured approach is the\nuse of CML [37, 38], a subset of XML designed to encapsulate chemical information. The CDK contains functionality\nto store descriptor data in CML formatted files. This leads to\neasy transfer of data between CML enabled applications. The\neasy conversion of descriptor values and associated metadata\nto CML format also opens up the possibility of the use the\nCDK descriptor package as a component of a web service\napplication. This would allow easy access to descriptor\nfunctionality (both numeric data as well meta-data) for web\noriented services.\n\nTo conclude this section we present code snippets which\nshow the ease with which sets of descriptors can be calculated and an example from the CML output of such a\ncalculation, exhibiting descriptor values and associated meta-data.\nTo calculate all available descriptors, one simply instantiates\nthe DescriptorEngine class and calls the process method:\n\n```java\nDescriptorEngine engine = new DescriptorEngine();\nengine.process(molecule);\n```\n\nIn case a subset, such as topological and electronic descriptors, are required, a simple modification of the\npreceding code will suffice:\n\n```java\nString[] types = {\"topological\",\"geometric\"};\nDescriptorEngine engine = new DescriptorEngine(types);\nengine.process(molecule);\n```\n\nFinally, single descriptor values are calculated using the\nfollowing scheme:\n\n```java\nDescriptor descriptor = new XLogPDescriptor();\nObject [] params = {new Boolean(true)};\ndescriptor.setParameters(params);\ndouble xLogP = ((DoubleResult)descriptor\n .calculate(mol).getValue()).doubleValue();\n```\n\nIn all of these examples the molecule variable is an object\nthat encapsulates information about a molecule (atoms,\nbonds etc.). The important feature here is that the descriptor\ninformation (values and meta-data) is stored within the object. This information can be accessed using keys which are\ndefined by the descriptor classes. To utilize the calculation\nresults, the information can be extracted and written out in\nthe CML format. An excerpt from the output of a calculation, showing the results from the calculation of the\ntopological surface area descriptor is shown below:\n\n```xml\n\u003cpropertyList\u003e\n  \u003cproperty xmlns:qsardict=\"http://qsar.sourceforge.net/dicts/qsar-descriptors\"\u003e\n    \u003cmetadataList xmlns:qsarmeta=\"http://qsar.sourceforge.net/dicts/qsar-descriptors-metadata\"\u003e\n     \u003cmetadata dictRef=\"qsarmeta:implementationTitle\" content=\"org.openscience.cdk.qsar.TPSADescriptor\"/\u003e\n     \u003cmetadata dictRef=\"qsarmeta:implementationIdentifier\"\n               content=\"$Id: cdk-article.tex,v 1.62 2005/02/28 15:07:21 stein Exp $\"/\u003e\n     \u003cmetadata dictRef=\"qsarmeta:implementationVendor\" content=\"The Chemistry Development Kit\"/\u003e\n     \u003cmetadataList title=\"qsarmeta:descriptorParameters\"\u003e\n       \u003cmetadata title=\"useAromaticity\" content=\"false\"/\u003e\n     \u003c/metadataList\u003e\n   \u003c/metadataList\u003e\n   \u003cscalar dataType=\"xsd:double\"dictRef=\"qsardict:tpsa\"\u003e34.14\u003c/scalar\u003e\n  \u003c/property\u003e\n\u003c/propertyList\u003e\n```\n\nClearly, a large amount of detailed information is available for a given descriptor. For example, the above excerpt\nindicates the CDK class used to calculate the descriptor, the\nimplementor identifier and vendor. Finally, the actual value\nof the descriptor along with its name and a dictionary key are\nindicated in the \u003cscalar\u003e entry.\n \n## Interfacing the CDK framework with statistical software\n\nAs reported above, the CDK was recently enhanced by\nthe addition of a number of molecular descriptor classes. The\ngoal of these classes was to allow the use of the CDK\nframework in QSAR modeling environments. However,\nmolecular descriptors are only one part of the process of\nbuilding QSAR models. A vital component of a QSAR\nmodeling framework consists of statistical and mathematical\nmodeling capabilities. As a result, a recent addition to the\nCDK was an interface that would allow the CDK to be integrated\nwith statistical and mathematical software for the purposes\nof QSAR modeling. The goal of this statistical interface\nis to allow the user of the CDK to employ a statistical\npackage (such as R [15], Matlab, Weka [39], SAS) to develop\nQSAR models using chemical information generated\n(or processed) by the CDK.\n \nIn the context of interfacing the CDK with statistical environments,\nthere are two possible scenarios. First, we may\nconsider the situation where the CDK is used to provide\nchemoinformatics functionality within a statistical environment.\nSecond is the situation where the CDK uses a statistical\nenvironment to provide statistical and mathematical\nfunctionality to the user of the CDK framework. Recent\nwork allows the CDK to be used in both cases, using R as\nthe statistical environment. R is a language and environment\nfor statistical computing and graphics [15]. It is a GNU project\nwhich is similar to the S language and environment\nwhich was developed at Bell Laboratories (formerly AT\u0026T,\nnow Lucent Technologies) by John Chambers and colleagues.\nR can be considered as a different implementation\nof S. R provides a wide variety of statistical (linear and\nnon-linear modelling, classical statistical tests, time-series\nanalysis, classification, clustering, ...) and graphical techniques,\nand is highly extensible. The S language is often the vehicle\nof choice for research in statistical methodology, and R\nprovides an open source route to participation in that activity.\n\nThis section will describe some details of the use of the\nCDK framework as a chemoinformatics backend in a\nstatistical environment and the reverse situation - the use of R as a\nstatistical backend to the CDK framework.\n\n### Requirements\n\nThe underlying mechanism that allows the CDK\nframework to be interfaced with the R environment is the SJava\n[40] package for R that provides a bridge between Java\nprograms in general and the R environment. The SJava package\nallows the use of Java classes and methods in an R session as\nwell as access to R functions from Java code.\nThe use of the CDK within R is relatively straight forward\nas SJava provides methods to access Java objects and\nto associate methods in a similar fashion to R objects and\nfunction calls. The reverse case is a little more involved and\nrequires some infrastructure to be developed on both the Java\nside and R side.\n\n### Accessing the CDK from within R\n\nWe discuss the use of the CDK framework from within R\nby example. First, we consider a clustering of binary\nfingerprints and then we consider the calculation of molecular\ndescriptors using the CDK. In the following discussion we\nshow some examples of the code that performs these tasks.\nFurther details may be found in reference [41].\n\nThe CDK framework implements a binary fingerprint\nalgorithm based on a path generation step, followed by\nhashing the path strings and projecting the hash numbers on\na bit string using a pseudo random number generator seeded\nby the previously computed hash numbers. Fingerprints\nallow the user to rapidly calculate a structural representation of\nthe molecule and have been shown to be a very useful tool\nfor clustering molecular structures. A wide variety of\nclustering algorithms are available and R implements a number\nof them.\n\nIn this example, our aim is to calculate fingerprints for a\nset of molecules and then use these fingerprints to perform a\nclustering. The default values for bit length (1024) and path\nlength (6) were used to generate the fingerprints. The first\nstep is to load a molecular structure file. This can be\nachieved by the following R code:\n\n```R\nfilereader \u003c- .JNew('FileReader', .JNew('File',f) )\nreader \u003c- .Java(.JNew('ReaderFactory'), 'createReader', filereader)\ncontent \u003c- .Java(reader,'read', .JNew('ChemFile'))\ncontainer \u003c- .Java('ChemFileManipulator', 'getAllAtomContainers', content)\n```\n\nIt is clear that the sequence of calls to the CDK functions\nare very similar to what would be used if one were loading a\nstructure file from a Java program. In fact, Java code written\nusing the CDK can generally be converted very easily to R\nwith the help of the “JNew and” Java functions provided by\nthe S Java package. The result of the above code is that the R\nvariable container contains a reference to the Java object\nrepresenting the structure information for the molecule (or\nmolecules) contained in the file. It should be noted that this\nvariable cannot, in general, be used by other R functions,\nunless they are designed to manipulate Java objects. In this\ncase we will be manipulating the structures with the help of\nthe CDK and thus, we rely on the SJava package to handle\nthe details of transferring data between R and Java.\n\nOnce we have loaded the structure information, we can\nthen extract each molecule from the array and then evaluate\nthe fingerprint for that molecule. This can be accomplished\nby the following two lines:\n\n```R\nmolecule \u003c- .JavaGetArrayElement(container,0)\nfp \u003c- .Java('Fingerprinter','getFingerprint', molecule)\n```\n\nThe first line retrieves the first structure stored in the\narray, using the. JavaGetArrayElement function provided by\nSJava and then calculates the fingerprint with the default\nvalues mentioned above. At this stage the fingerprint is\nactually a Java Bitset object and, as described above, cannot be\ndirectly manipulated in R. To get around this we can convert\nthis object to a Java String, which is automatically converted\nto an R character vector. The resultant character vector\nrequires some simple processing, the result of which is a\nnumeric vector that specifies which bit positions were set in the\nfingerprint.\n\n![Figure 4](./images-003.jpg)\n\n**Figure 4**: A silhouette plot depicting the clustering of a set of moleculesthe R environment.\n\nThe above procedure can be repeated for a set of mole-\ncules by creating an R function. The result of this would be\nto obtain a set of fingerprint vectors. These may then be\nmanipulated within R using the fingerprint tools package [42]\nfor R to obtain a similarity matrix. This matrix is to be used\nas input to the various clustering routines (such as pam,\nagnes, and hclust) available in R. Fig. 4 shows the silhouette\nplot [43] obtained from a clustering (using the pam [43]\nalgorithm) of a set of molecules using binary fingerprints. The\nplot indicates the quality of clustering, as measured by the\nextent of cluster structure detected by the algorithm and\ngraphically characterizes the silhouette values, (si). In\ngeneral higher values of si indicate stronger membership to the\nassigned class and negative values indicate that an \nobservation belongs to the other class. Average values of si greater\nthan 0.5 indicate the presence of reasonable structure in the\nclustering. A more detailed description of this application\nmay be found in reference [41].\n\nThe second application that shows how the CDK can be\nused to provide chemoinformatics support in a general\nstatistical environment is the calculation of molecular descrip-\ntors for use in subsequent modeling (such as linear\nregression models). As before, the procedure to access the CDK\ndescriptor functionality from within R is very similar to the\nsteps required in a corresponding Java program. We have\nalready described how one may load a structure file using the\nCDK, so we concern ourselves with calculating a molecular\ndescriptor for a given structure.\n\nSince molecular descriptors are designed as individual\nclasses, we first create an instance of the descriptor and then\ncall its calculate method to obtain the result.\n\n```R\ndesc \u003c- .JavaConstructor('GravitationalIndexDescriptor')\nvalue \u003c- .Java(desc, 'calculate', molecule)\n```\n\nHere, molecule is the variable that contains the structure\nloaded from molecular structure file. As mentioned above,\nthe design of the descriptor package in the CDK framework\nis very general. As a result all descriptors return uniform\nobjects, which must be inspected to access the actual result\nof the calculation. Simple R functions may be written to hide\nthis aspect from the casual user. By providing a number of\ndescriptor names, a large pool of descriptors can be\ncalculated automatically and then used in subsequent calculations.\n\nSince the main purpose of descriptor calculation is\nfurther use in statistical modeling, information regarding the\ndescriptors may also be required to be carried over into the\nmodeling environment. Currently, there are no fixed \nguidelines on how this should be managed. This aspect is also\nconnected to the naming of descriptors within the R session.\nCurrently, some form of mangling must be carried out, by\nhand, on the descriptor name to obtain usable names for \nmanipulation within the R session. Future developments will let\nthe R user obtain a suitable name from the CDK framework\nitself.\n\n### Accessing R from within the CDK\n\nAs shown above, accessing CDK functionality from an\nexternal environment is relatively straightforward. The above\ndiscussion focused on R, but any environment that can link\nto Java libraries can use the CDK framework. In this section\nwe consider how the CDK has been designed to be able to\naccess external statistical packages, focusing on the interface\nwith R.\n\nAs before, the SJava package allows the CDK to access\nR functionality. In this case SJava wraps the R engine as a\nJava class and provides methods to call R functions from\nJava. The design of the CDK-R interface involved the\ndevelopment of some infrastructure on both the R and CDK sides.\nMuch of the design was based on the problem of transferring\nJava objects to R and vice versa. Since the aim of this\ninterface is to obtain statistical models from R we focus on the\ntransfer of a complex R object from an R session to the\nCDK. The mechanism by which complex R objects (such as\na linear regression or neural network model) are passed back\nto CDK is based on the use of matcher and converter\nfunctions on the R side and wrapper classes on the CDK side. We\nfirst consider the matcher and converter functions.\n\nThe need for these functions arise because SJava knows\nhow it should convert, say, a simple numeric variable in R to\nits corresponding Java primitive. However it does not know\nhow to convert a linear regression model object to a Java\nvariable. As a result, the developer must write an R function,\nwhich does this conversion (via wrapper classes, see below)\nand then register it with the SJava package on the R side, at\ninitialization time. Thus, the developer may register several\nconverter functions for different types of R objects. When\nthe CDK calls an R function, the return value of the R\nfunction is converted to a Java type using one of the converters\nregistered previously. In case multiple converter functions\navailable, SJava selects a valid converter using a matcher\nfunction. These are simple functions that indicate which\nconverter function can be used to convert a given R object to\na Java type and essentially check the class associated with R\nobjects. Since arbitrary classes can be assigned to an R ob-\nject, this provides a lot of flexibility for the developer trying\nto return arbitrary R objects back to a CDK function. As with\nconverter functions, matcher functions are also registered\nwith SJava during initialization.\n\nGiven converters and matchers, a CDK function is able to\nreceive complex R objects from the R engine, a functionality\nmanaged by wrapper classes. These are CDK classes that are\nwritten to wrap the information contained in an R object. As\nan example consider a linear regression model object in R.\nThe corresponding CDK wrapper class would contain fields\nrepresenting the estimated coefficients, residuals, fitted\nvalues and so on. In addition to wrapping the information\npresent in the R object, the wrapper classes must provide \nmethods to set and access these fields.\n\n![Figure 5](./images-004.jpg)\n\n**Figure 5**: UML diagram of the Model interface, RModel and LinearRegressionModel classes.\n\nAt this stage we have the required infrastructure that \nallows the CDK to call arbitrary R functions and receive \narbitrary R objects. With this infrastructure a user of the CDK\nframework has full access to the statistical capabilities of R.\nHowever, good object oriented design should hide \ncomplexity. The issue of complexity arises due to the flexibility of R.\nVarious R modeling routines allow data to be presented in\nmultiple forms, allow the setting of various parameters and\nso on. Furthermore, the wrapper classes described above are\nreally internal classes and the user should not be required to\ndeal with them. This situation led to the development of front\nend classes which represent specific types of statistical \nmodels. These classes allow the user to set input data and model\nparameters, build the model and then make subsequent \npredictions using the model, essentially wrapping access to R\nmodeling routines. All front end classes implement the\nModel interface. In the case of the R based modeling \nroutines, the front end classes do not directly implement the\nModel interface but are designed as subclasses of an abstract\nbase class, RModel. This design is due to the initialization\nrequirements of the R session. Fig. 5 shows the UML \ndiagram of the Model hierarchy. Currently, the CDK contains a\nfront end class that represents linear regression models. All\nthe information regarding the model itself (estimated \ncoefficients, fitted values, degrees of freedom etc.) are provided to\nthe user via this front end class. As a result of this, details of\nthe CDK-R interface are hidden from the user. Table 2 \nsummarizes the wrapper and front end classes currently \nimplemented in the CDK and their corresponding R equivalents.\nAt this point let us consider what is involved in making calls\nto R from the CDK in a little more detail. R provides a \nnumber of functions for the development of various types of\nmodels. The return value of these functions are generally\ncomplex R objects. Though it would be possible to directly\ncall these R functions from the CDK, the design of the CDK-R\ninterface dictates that these R functions be wrapped. That\nis, rather than the CDK calling the original R function, it will\ncall the wrapper R function instead. This leads to a number\nof advantages. First, this approach allows for data validation\non the R side. For vector and matrix input, this is much more\neasily done in R than in Java. Second, if preprocessing is\nrequired (such as preparing a distance matrix from an input\ndata matrix) this can be done in the wrapper function. Third,\nthe use of the wrapper function allows the developer to \nassign unique classes to R objects and thus allow them to be\nconverted to corresponding Java objects. This is important\nwhen different R functions return objects of the same class.\nWithout unique class assignments, the SJava package would\nnot be able to determine which converter should be used to\nreturn the R object to the CDK. Fig. 6 summarizes the flow\nof execution in the CDK-R interface.\n\n![Figure 6](./images-005.jpg)\n\n**Figure 6**: The flow of execution in the CDK-R interface that occurs when a CDK based program uses R to obtain a statistical model.\n\nFinally, an important aspect of the CDK-R interface is\ninitialization. This stage is significant due to the fact the \nembedded R engine is not multithreaded. The initialization\nstage ensures that only one instance of the R engine is \nrunning at any time and is also responsible for loading the \nvarious R wrapper functions as well as registering matcher and\nconverter functions in the R session. In addition various R\nlibraries required to support various statistical functionality\nused by the CDK are also loaded. Further details of the\nlow level design of this interface may be found in reference [44].\n\n**Table 2:** A summary of the wrapper and front end classes currently implemented in the CDK and the R objects that they represent. Download as [CVS](table-001.csv).\n\n| Type of Class | Class Name                   | R Equivalent      |\n|---------------|------------------------------|-------------------|\n| Wrapper       | LinearRegressionModelFit     | lm object         |\n|               | LinearRegressionModelPredict | lm.predict object |\n| Front end     | LinearRegressionModel        | lm function       |\n\nThe CDK-R interface described above provides access to\nthe full functionality of the R environment. This access \nrequires infrastructure to be implemented on the CDK side (in\nthe form of wrapper classes and front end classes) as well as\non the R side (matcher, converter and wrapper functions).\nHowever, due to the object oriented design of both R and\nCDK, much of the internal complexity can be hidden from\nthe user of the API. Currently, the interface provides \nmodeling capabilities using linear regression. Implementing \nsupport for other types of models is relatively straightforward\nand such support will appear in future versions of the CDK.\nThough the above discussion has focused on the CDK-R\ninterface, the design of the QSAR modeling package in the\nCDK is general enough to allow interfaces between other\nmodeling packages such as Matlab or Weka to be \nimplemented. Future work involves the development of such \ninterfaces, expanding the flexibility of QSAR modeling using the\nCDK framework.\n\n## 3D model builder\n\nIn order to propel the development of 3D modelling \napplications based on the CDK, a 3D model builder was added,\nwhich can quickly compute geometries for molecular models\nbased solely on connectivity information. In order to do this\nwe followed a common approach in the 3D structure \ngeneration process [45]. In the beginning the molecule is \nfragmented into acyclic and cyclic portions, handled separately\nand re-assembled at the end of the whole process. The \ngeometry of acyclic parts is generated by a rule- and data-based\nmethod. Internal coordinates, such as bond length and \nangles, where taken from experimental or calculated data \ncollections such as the MMFF94 force field [46]. With this data\nand the assumption to generate extended chains (dihedral\nangle of 180 degrees) we create a Z-matrix for the whole\nchain, which is then converted into cartesian coordinates.\n\n![Figure 7](./images-006.jpg)\n\n**Figure 7**: A SMILES is parsed into an internal connection matrix. 2D and 3D coordinates are then generated using the StructureDiagramGenerator and the 3DModelBuilder, respectively. The depictions are generated with JChemPaint [9,48] and Jmol [49].\n\nFor cyclic systems we followed a knowledge-based \napproach in collecting and storing unique ring systems\n(ignoring different conformations) to use them as templates in the\n3D structure generation process [45]. Therefore we \ndownloaded a collection of small molecules as MOL files from the\nNCI databank ([http://cactus.nci.nih.gov/ncidb2/download.html](http://cactus.nci.nih.gov/ncidb2/download.html)) [47].\nTo extract the molecule data stored in this file\n(249,071 3D-structures) the IteratingMDLReader from the\nCDK software package was used. Using various CDK\nfunctions, the ring systems are identified and partitioned into\nconnected rings which share at least an atom, a bond or three\nor more atoms with another ring. After a scan of all 249,071\nNCI molecules, we collected 11,610 unique ring systems.\n\nIn order to build a 3D structure for a new modeling\ncandidate, we first examined its molecular structure for the\nexistence of one of the template ring systems. If a template ring\nsystem can be identified, its coordinates were assigned to the\nmodeling candidate and aliphatic chains were layed out\nthereafter. Currently, molecules with unknown ring systems\ncannot be handled by this approach. For these cases, we are\ncurrently implementing a distance geometry algorithm.\n\nTo test robustness, the ability to use big files, to check for\nvariety of chemical types and to check for the conversion\nrate we use the structures submitted in the NMRShiftDB [7].\n\nFrom these 11,064 molecules about 17% could not be\nconverted due to ring system problems. The method needs\non average 0.5 sec/molecule (Intel Pentium 2.66GHz,\n512KB cache, 1GB RAM).\n\n# Conclusion\n\nWe have presented two new capabilities recently\nintroduced in the Chemistry Development Kit (CDK) related to\ndrug design. The CDK is available to the public at\n[http://cdk.sourceforge.net/](http://cdk.sourceforge.net/). Its new ability to compute 3D starting\ngeometries in a quick model building step will propel the\ncurrent development of force field methods within the CDK.\nThose, again, will aid our efforts to create a molecular\ndocking environment based on the CDK - an area of\npharmaceutical chemoinformatics, which is clearly underrepresented\nin the current package. The inclusion of molecular descriptors\nand the ability to interface with the open source statisti-\ncal software package, R, now provides QSAR modelling\ncapabilities which are essential for the use of the CDK in a\npharmaceutical chemoinformatics context.\n\nA lot of new functionality has been added to the CDK\nsince our last report on the toolkit in a scientific journal [14].\nFor more technical references, the authors would like to\npoint the interested reader to the \"CDK News\" (ISSN 1614-7553),\nwhich was established in the middle of 2004 and is\ncurrently seeing its fifth issue. The CDK News can be\ndownloaded from [http://cdk.sourceforge.net/](http://cdk.sourceforge.net/) in PDF format.\nIt is focused on publishing articles that provide practical and\ndetailed guidelines and examples on the use of specific CDK\nfunctionality, updates on newly added features and links to\narticles and projects related to the CDK. In addition to the\nabove mentioned documentation, a showcase web application\nfor CDK functionality has been created at\n[http://www.chemistry-development-kit.org/](http://www.chemistry-development-kit.org/).\nThis web site allows the\nuser to easily create molecular structures via file upload or\nby pasting SMILES, and to apply various CDK functions on\nthem. While the Chemistry Development Kit is already used\nin a variety of academic and commercial software projects,\nthe extensions reported in this article are expected to widen\nthe scope and use of the CDK even further.\n\n# Acknowledgment\n\nThe authors would like to thank all members of the CDK\nproject for their contributions, corrections and helpful\ncomments, and Jörg Wegner for discussion of the design of the\nQSAR interfaces. Financial support for CS, CH, MF and SK\nfrom the German Federal Ministry of Education and Research\n(BMBF) is highly acknowledged.\n\n# References\n\n1. Russo E. Chemistry plans a structural overhaul. Naturejobs 2002; 4-7.\n2. \"Daylight Chemical Information Systems, Inc.\", [http://www.daylight.com/](http://www.daylight.com/), accessed on Feb 2005.\n3. \"Accelrys, Inc.\", [http://www.accelrys.com/](http://www.accelrys.com/), accessed on Feb 2005.\n4. \"Chemical Computing Group, Inc.\", [http://www.chemcomp.com/](http://www.chemcomp.com/), accessed on Feb 2005.\n5. Steinbeck C. SENECA: A platform-independent, distributed, and parallel system for computer-assisted structure elucidation in organic chemistry. J Chem Inf Comput Sci 2001; 41: 1500-1507.\n6. Han Y, Steinbeck C. An evolutionary algorithm based strategy for computer-assisted molecular structure elucidation. J Chem Inf Comput Sci 2004; 44: 489-498.\n7. Steinbeck C, Kuhn S, Krause S. NMRShiftDB - Constructing a Chemical Information System with Open Source Components. J\nChem Inf Comput Sci 2003; 43: 1733-1739.\n8. Steinbeck C, Kuhn S. NMRShiftDB - Compound identification and structure elucidation support through a free community-build web database. Phytochemistry 2004; 65: 2711-2717.\n9. Steinbeck C, Krause S, Willighagen E. JChemPaint - Using the Collaborative Forces of the Internet to Develop a Free Editor for 2D Chemical Structures. Molecules 2000; 5: 93-98.\n10. Murray-Rust P, Rzepa H, Williamson M, Willighagen E. Chemical Markup, XML, and the World Wide Web. 5. Applications of\nChemical Metadata in RSS Aggregators. J Chem Inf Comput Sci 2004; 44: 462-469.\n11. Wittig U, Weidemann A, Kania R, Peiss C, Rojas I. Classification of chemical compounds to support complex queries in a pathway database. Comp Funct Genom 2004; 5: 156-162.\n12. \"JOELib - a java based computational chemistry package\", [http://joelib.sourceforge.net/](http://joelib.sourceforge.net/), accessed on Feb 2005.\n13. Zhang Y, Murray-Rust P, Dove M, Glen R, Rzepa H, Townsend J, et al. JUMBO - An XML infrastructure for eScience. Proceedings of UK e-Science All Hands Meeting 2004.\n14. Steinbeck C, Han YQ, Kuhn S, Horlacher O, Luttmann E, Willighagen E. The Chemistry Development Kit (CDK): An open-source\nJava library for chemo- and bioinformatics. J Chem Inf Comput Sci 2003; 43: 493-500.\n15. R Development Core Team, \"R: A language and environment for statistical computing\", R Foundation for Statistical Computing, Vienna, Austria 2004 ISBN 3-900051-07-0.\n16. Meiler J. PROSHIFT: Protein chemical shift prediction using artificial neural networks. J Biomol NMR 200i3: 26: 25-37.\n17. De Sousa A, Hemmer M, Gasteiger J. Prediction of 1H-NMR Chemical Shifts Using Neural Networks. Anal Chem 2002; 74: 80-90.\n18. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ. Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings. Adv Drug Deliv Rev 1997; 23: 3-25.\n19. Wang R, Lai L. A New Atom-Additive Method for Calculating Partition Coefficients. J Chem Inf Comput Sci 1997; 37: 615-621.\n20. Kier L, Hall L, Murray W. Molecular connectivity I: Relationship to local anesthesia. J Pharm Sci 1975; 64.\n21. Kier L, Hall L. Molecular Connectivity in Structure Activity Analysis; Research Studies Press: Letchworth, Herfordshire, England 1986.\n22. Kier L, Hall L. Molecular connectivity VII: Specific treatment to heteroatoms. J Pharm Sci 1976; 65: 1806-1809.\n23. Wiener H. Correlation of Heat of Isomerization and Difference in Heat of Vaporization of Isomers Among Paraffin Hydrocarbons. J Am Chem Soc 1947; 69: 17-20.\n24. Gutman I, Ruscic B, Trinajstic N, Wilcox Jr C. Graph Theory and Molecular Orbitals. XII. Acyclic Polyenes. J Chem Phys 1975; 62:3399-3405.\n25. Petitejean M. Applications of the Radius Diameter Diagram to the Classification of Topological and Geometric Shapes of Chemical Compounds. J Chem Inf Comput Sci 1992; 32: 331-337.\n26. Kier L. A Shape Index from Molecular Graphs. Quant Struct.-Act Relat Pharmacol Chem Biol 1985; 4:109-116.\n27. Kier L. Shape Indexes for Orders One and Three from Molecular graphs. Quant Struct-Act Relat Pharmacol Chem Bio 1986; 5: 1-7.\n28. Kier L. Distinguishing Atom Differences in a Molecular Graph Index. Quant Struct-Act Relat Pharmacol Chem Bio 1986; 5: 7-12.\n29. Katritzky A, Mu L, Lobanov V, Karelson M. Correlation of Boiling Points with Molecular Structure. 1. A Training Set of 298 Diverse Organics and a Test Set of 9 Simple Inorganics. J Phys Chem 1996; 100:10400-10407.\n30. Goldstein H. Classical Mechanics; Addison Wesley: Reading, MA, 1950.\n31. Pearlman R, Smith K. M. Novel Software Tools for Chemical Diversity. Perspect Drug Disc Des 1998; 9:339-353.\n32. Pearlman R, Smith K. Metric Validation and Receptor Relevant Subspace Concept. J Chem Inf Comput Sci 1999; 39:28-35.\n33. Todeschini R, Lasagni R, Marengo E. New Molecular Descriptors for 2D and 3D Structures. Theory. J Chemometrics 1994; 8: 263-273.\n34. Todeschini R, Grammatica P. 3D Modelling and Prediction by WHIM Descriptors. Part 5. Theory, Development and Chemical Meaning of WHIM Descriptors. Quant Struct Act Relat 1997; 16:113-119.\n35. Ertl P, Rohde B, Selzer P. Fast Calculation of Molecular Polar Surface Area as a Sum of Fragment Based Contributions and Its Application to the Prediction of Drug Transport Properties. J Med Chem 2000; 43: 3714-3717.\n36. Burden F. Molecular Identification Matrix for Substructure Searches. J Chem Inf Comput Sci 1989; 29: 225-227.\n37. Murray-Rust P, Rzepa H. Chemical Markup XML, and the World-wide Web. 1. Basic Principles. J Chem Inf Comp Sci 1999; 39:\n928-942.\n38. Murray-Rust P, Rzepa H. Chemical Markup XML, and the World-wide Web. 2. Information Objects and the CMLDOM. J Chem Inf Comp Sci 2001; 41: 1113-1123.\n39. Witten I, Frank E. Data Mining: Practical machine learning tools with Java implementations; Morgan Kaufmann: San Francisco 2000.\n40. \"SJava\", [http://www.omegahat.org/RSJava/](http://www.omegahat.org/RSJava/), accessed on Feb 2005.\n41. Guha R. Using the CDK as a backend to R. CDK News 2005; 2:2-6.\n42. \"Binary Fingerprint Tools\", [http://blue.chem.psu.edu/~rajarshi/code/R](http://blue.chem.psu.edu/~rajarshi/code/R), accessed on Feb 2005.\n43. Kaufman L, Rousseeuw P. Finding Groups in Data: An Introduction to Cluster Analysis; Wiley: New York 1990.\n44. Guha R. Using R to provide statistical functionality for QSAR modeling in CDK. CDK News 2005; 2: 2-6.\n45. Sadowski J. 3D Structure Generation; volume 1 of Handbook of Chemoinformatics Wiley-VCH 2003.\n46. Halgren T. Merck Molecular Force Field. I. Basis, Form, Scope,Parameterization, and Performance of MMFF94*. J Comp Chem 1996; 17: 490-519.\n47. Ihlenfeldt W, Takahasi Y, Abe H, Sasaki S. CACTVS: A Chemistry Algorithm Development Environment; Daijuukagakutouronkai Dainijuukai Kouzoukasseisoukan Shin-pojiumu Kouenyoushishuu Kyoto University Press 1992.\n48. \"The JChemPaint Structure Editor\", [http://jchempaint.sf.net/](http://jchempaint.sf.net/), accessed Feb 2005.\n49. \"The Jmol 3D Molecular Visualization Software\", [http://www.jmol.org/](http://www.jmol.org/), accessed Feb 2005.\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcdk%2Fcdk-paper-2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcdk%2Fcdk-paper-2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcdk%2Fcdk-paper-2/lists"}