Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ApryseSDK/pdftron-document-search
Build search across multiple documents client-side in your file storage
https://github.com/ApryseSDK/pdftron-document-search
algolia-instantsearch extract-text seach-documents search-office-text search-pdf
Last synced: 3 months ago
JSON representation
Build search across multiple documents client-side in your file storage
- Host: GitHub
- URL: https://github.com/ApryseSDK/pdftron-document-search
- Owner: ApryseSDK
- License: other
- Created: 2020-07-09T22:41:13.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2023-03-30T22:16:58.000Z (almost 2 years ago)
- Last Synced: 2024-10-18T20:41:20.471Z (3 months ago)
- Topics: algolia-instantsearch, extract-text, seach-documents, search-office-text, search-pdf
- Language: JavaScript
- Homepage:
- Size: 73.8 MB
- Stars: 43
- Watchers: 10
- Forks: 12
- Open Issues: 15
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# PDFTron Document Search
PDFTron Document Search demonstrates building an application where users can search across multiple documents using:
- [PDFTron PDF SDK](https://www.pdftron.com) for text extraction and viewing of the documents
- [Firebase](https://firebase.google.com/) for storage
- [Algolia](https://www.algolia.com/) for searchingWatch [a quick video](https://youtu.be/IQATnzHTp7Q) that walks you throught the app. I also put together [a blog to help you get started](https://www.pdftron.com/blog/indexed-search/search-multiple-documents-using-javascript/).
![Screenshot](https://github.com/PDFTron/pdftron-document-search/blob/master/search.png)
This repo is designed to help to get started in creating your own document searching workflow.
## Install
```
npm install
```## Algolia Configuration
This application uses Algolia to search documents. However, be aware that Algolia is not the only third-party search provider. Consider alternatives such as [ElasticSearch](https://www.elastic.co/).
To get started with this sample, please register a new app with [Algolia](https://www.algolia.com/users/sign_up).
Create a new index called `document_search`:
![Screenshot](https://github.com/PDFTron/pdftron-document-search/blob/master/algolia2.png)After you configured your app, create `.env` file in the root of the directory and place the following:
```
REACT_APP_ALGOLIA_APP_ID=your_key_goes_here
REACT_APP_ALGOLIA_API_KEY=your_key_goes_here
REACT_APP_ALGOLIA_SEARCH_KEY=your_key_goes_here
REACT_APP_ALGOLIA_INDEX_NAME=document_search
```
The above information can be found under API Keys in your Algolia Dashboard.
![Screenshot](https://github.com/PDFTron/pdftron-document-search/blob/master/algolia.png)## Firebase Configuration
This application uses Firebase to store documents. You can use any other backend of your choice.
However, to get started with this sample, please register a new app with [Firebase](https://firebase.google.com/).Make sure you create a storage bucket, and enable authentication for email and Google.
![Screenshot](https://github.com/PDFTron/pdftron-sign-app/blob/master/firebase_authentication.png)After you have registered an app, create `.env` file in the root of the directory and place the following:
```
REACT_APP_API_KEY=your_key_goes_here
REACT_APP_MESSAGING_SENDER_ID=your_key_goes_here
REACT_APP_APP_ID=your_key_goes_here
REACT_APP_AUTH_DOMAIN=your_domain_goes_here
REACT_APP_DATABASE_URL=your_database_go_here
REACT_APP_PROJECT_ID=your_project_id
REACT_APP_STORAGE_BUCKET=your_storage_bucket
```
The above information can be found under settings of your Firebase app.
![Screenshot](https://github.com/PDFTron/pdftron-sign-app/blob/master/firebase.png)Change `Firestore Database` rules to:
```
rules_version = '2';
service cloud.firestore {
match /databases/{database}/documents {
match /{document=**} {
allow read, write: if request.auth != null;
}
}
}
```Change `Storage` rules to:
```
rules_version = '2';
service firebase.storage {
match /b/{bucket}/o {
match /{allPaths=**} {
allow read, write: if request.auth != null;
}
}
}
```Now you can run the application and start uploading your documents.
## CORS
You will need to set up CORS on your Firestore to allow WebViewer to access files stored in your bucket. I created a CORS file called `cors.json`:
```
[
{
"origin": ["*"],
"method": ["GET"],
"maxAgeSeconds": 3600
}
]
```And then used gsutil to update it:
https://cloud.google.com/storage/docs/configuring-cors## Run
```
npm start
```## Project structure
```
src/
app/ - Redux Store Configuration
components/ - React components
Navigate/ - Component responsible for navigating between different screens
PasswordReset/ - Reset password
Profile/ - Profile information and a sign out button
Search/ - Search previously uploaded documents
SignIn/ - Sign in
SignUp/ - Sign up
Upload - Upload a document, which will be indexed for searching and saved to file storage.
View/ - View document with the search result highlighted
App - Configuration for navigation, authentication
index - Entry point and configuration for React-Redux
firebase/ - Firebase configuration for authentication, updating documents, storing PDFs
tools/ - Helper function to copy over PDFTron dependencies into /public on post-install
```## API documentation
See [API documentation](https://www.pdftron.com/documentation/web/guides/ui/apis).
## Contributing
See [contributing](./CONTRIBUTING.md).
## License
See [license](./LICENSE).
![](https://onepixel.pdftron.com/webviewer-ui)