Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/maisikoleni/javadoc-search
Fast Server-side Search Engine for Javadoc API
https://github.com/maisikoleni/javadoc-search
api java javadoc maven quarkus qute regular-expression search search-algorithm search-engine trie
Last synced: about 1 month ago
JSON representation
Fast Server-side Search Engine for Javadoc API
- Host: GitHub
- URL: https://github.com/maisikoleni/javadoc-search
- Owner: MaisiKoleni
- License: eupl-1.2
- Created: 2022-07-02T14:52:35.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-11-28T18:36:21.000Z (about 1 month ago)
- Last Synced: 2024-11-28T19:34:29.408Z (about 1 month ago)
- Topics: api, java, javadoc, maven, quarkus, qute, regular-expression, search, search-algorithm, search-engine, trie
- Language: HTML
- Homepage:
- Size: 349 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.adoc
- License: LICENSE
Awesome Lists containing this project
README
:encoding: utf-8
:lang: en
:title: Javadoc Search
:description: Fast Server-side Search Engine for Javadoc API
:keywords: search, java, api, search-engine, maven, regular-expression, trie, javadoc, search-algorithm, qute, quarkus
:author: Christian Femers
:showtitle:
:icons: font= Javadoc Search
A fast server-side search engine for the Javadoc API
of selected JDK releases and other libraries.*Available at https://javadoc-search.maisikoleni.net*
Can also be added to a bowser as custom search engine:
- Search URL Template: +
`https://javadoc-search.maisikoleni.net/api/v2/libraries/jdk-latest/search/redirect?query=%s`
- Search Suggestions URL Template: +
`https://javadoc-search.maisikoleni.net/api/v2/libraries/jdk-latest/search/suggestions?query=%s`[WARNING]
====
The above is a fairly weak server and not suitable for benchmarking.
====You can also replace `jdk-latest` which always points to the
latest full release with one of the following:- `jdk-11` for the JDK 11 API
- `jdk-17` for the JDK 17 API
- `junit5-latest` for the current JUnit 5 APIThe same works for the website in the browser as well.
== Details
Realized as https://quarkus.io[Quarkus] web application mainly with https://quarkus.io/guides/qute[Qute],
https://quarkus.io/guides/resteasy[RESTEasy] and https://microstream.one/platforms/microstream-for-java/[Microstream].
The client-side uses https://getbootstrap.com[Bootstrap] and https://htmx.org[htmx].The search engine is implemented as a https://en.wikipedia.org/wiki/Radix_tree[sparse radix trie]
generated from the `xxx-search-index.js` files of a Javadoc site.
It is also mixed with a https://en.wikipedia.org/wiki/Suffix_tree[suffix tree]
for name segments of the Javadoc elements.The following example shows the trie generated for `java.lang.Long`
and `java.util.OptionalLong`, both in the `java.base` module.
The trie will be compressed such that the equally colored nodes point to the same object..Example trie
image::docs/example-trie.svg[]To search in that trie like one can in the official Javadoc implementation,
the nodes in the trie are being traversed recursively
based on a custom regular expression matcher implementation.For example, `*j.u.OpL*` gets transformed into a regular expression
equivalent to
[source,regexp]
----
j\p{javaLowerCase}*+\s*+•*+\.\s*+u\p{javaLowerCase}*+\s*+•*+\.\s*+Op\p{javaLowerCase}*+L[^•]*+
----
and gets compiled into:
[source,java]
----
CompiledRegex [
0x0000006a, // matches 'j'
0xc0000000, // star referencing char class #0 -> "\p{javaLowerCase}"
0xc0000001, // star referencing char class #1 -> "\s"
0x80000378, // star referencing literal '•'
0x0000002e, // matches '.'
0xc0000001, // star referencing char class #1 -> "\s"
0x00000075, // matches 'u'
0xc0000000, // star referencing char class #0 -> "\p{javaLowerCase}"
0xc0000001, // star referencing char class #1 -> "\s"
0x80000378, // star referencing literal '•'
0x0000002e, // matches '.'
0xc0000001, // star referencing char class #1 -> "\s"
0x0000004f, // matches 'O'
0x00000070, // matches 'p'
0xc0000000, // star referencing char class #0 -> "\p{javaLowerCase}"
0x0000004c, // matches 'L'
0xc0000002 // star referencing char class #2 -> "[^•]"
]
----
These instructions are an array of `int` and
they are accompanied by an array of CharPredicates,
which are functions of type `char -> boolean`
and represent the character classes in the custom regex implementation.The instructions are encoded as follows:
[options="header"]
|===
| Bit 31 | Bit 30 | Bits 29 ... 0
//-------------------------------------------------------------------------------------
| Wrapped by a star? | Represents a char class? | `char` value or `CharPredicate` index
|===This compiled custom regex implementation has the
disadvantage that it does not support nested groups or
repetitions of more than a single character.
This is however not an issue for our purpose,
as we only need to star single characters.The advantage, however, is
that all its state can be stored in a single `long`,
which can easily be passed around methods
and avoids the generation of objects.This `long` value is composed as follows:
[options="header"]
|===
| Bit 63 | Bits 62 ... 32 | Bit 31 | Bits 30 ... 0
//-------------------------------
| Did the match fail?
| Number of stars that matched at least one character
| Did the current star already match one character?
| Next index in the instruction array
|===As the state is only a `long` that is stored by the user
and this possessive matcher implementation can advance the state
step by step consuming a single character,
collecting all matches by traversing the trie
and branching at the transitions is fairly simple.All of that allows for the following search syntax:
- Query `*ja.l.o*` will find:
* `java.base/java.lang.Object`
* `java.base/java.lang.Override`
* `java.base/java.lang.OutOfMemoryError`
- Query `*LDT*` will find:
* `java.base/java.time.LocalDateTime`
* `java.base/java.time.chrono.ChronoLocalDateTime`
* ...
- Query `*math max*` will find:
* `java.base/java.lang.Math.max(double, double)`
* `java.base/java.lang.Math.max(long, long)`
* ...
- Query `*j.u.s.Col.jo*` will find:
* `java.base/java.util.stream.Collectors.joining()`
* `java.base/java.util.stream.Collectors.joining(CharSequence, CharSequence, CharSequence)`
* `java.base/java.util.stream.Collectors.joining(CharSequence)`
- Query `*.C~rs*` will find:
* `java.base/java.util.stream.Collectors`
* ...The search engine implementation is _mostly_ compliant with the
https://docs.oracle.com/en/java/javase/20/docs/specs/javadoc/javadoc-search-spec.html[Javadoc Search Specification]
but since major changes got introduced with JDK 19
(search terms can be separated by spaces),
I felt free to include custom behavior.[NOTE]
====
One example for that is `~`, which allows skipping lower case characters
until the following part of the search term matches.
Like the custom regular expression implementation, `~` is possessive
and the engine will not backtrack to find a match.
====As a result of the trie and simplified regex matching combination,
the search is very fast and rarely exceeds 20 ms.
Most of the time, search completes in about 1 ms.
The longer the search term, the faster the results are available.The matching results are currently ranked by:
- Where the match of the query starts. +
A match starting at the beginning is the best case, a match starting
at or after a separator comes next and matches that start
within identifiers (upper-case letters, underscore) have the lowest rank.
`java.desktop/javax.print.attribute.SetOfIntegerSyntax`
is therefore a better match for `*Set*`
than `java.base/java.util.AbstractSet`.
- How good they match the query. +
`java.base/java.io.File.isHidden()` is a better match for `*File.isH*`
than `java.base/java.nio.file.Files.isHidden(Path)` because
it does not require further lower case characters after `File`.
- Natural order of the entries. +
Results with smaller char values at the same position come first.
The implementation is very much like `String.compareTo`
on the last segment of the qualified name.Case insensitive matching is only performed
if case-sensitive did not yield _any_ results.== Building
This project requires at least JDK 17.
Building should be straightforward since this is a regular Maven project.
- `mvn clean` to clean up generated artifacts (does not remove the `database` folder)
- `mvn test` runs all tests
- `mvn quarkus:dev` starts the
https://quarkus.io/guides/dev-mode-differences[Quarkus dev mode] (live reload)
- `mvn package` bundles the web application using the `prod` profile
into `target/quarkus-app`It might be helpful to add the missing search-index files
in `src/main/resources/net/maisikoleni/javadoc/service/jdk-latest`
(see `jdk-index-files-go-here.txt` there for details), but this is not required
as they are then fetched from the official site and cached in `database/javadoc-indexes`.To run outside Maven, `--add-exports java.base/jdk.internal.misc=ALL-UNNAMED` is required
as MicroStream requires Unsafe at the moment.== License
Javadoc Search was created by Christian Femers in 2022
and the program and the accompanying materials are made available
under the terms of the
https://joinup.ec.europa.eu/collection/eupl/eupl-text-eupl-12[European Union Public License 1.2]
(see the https://github.com/MaisiKoleni/javadoc-search/blob/main/LICENSE["LICENSE" file],
`SPDX-License-Identifier: EUPL-1.2-or-later`)
with the following exceptions:- `src/test/java/net/maisikoleni/javadoc/parser/JdkUrlGeneration.java` (`GPL-2.0-with-classpath-exception`)