https://github.com/codelibs/fess-testdata
Test Data Repository for Crawling/Parsing
https://github.com/codelibs/fess-testdata
Last synced: 5 months ago
JSON representation
Test Data Repository for Crawling/Parsing
- Host: GitHub
- URL: https://github.com/codelibs/fess-testdata
- Owner: codelibs
- Created: 2014-07-03T22:23:42.000Z (almost 12 years ago)
- Default Branch: master
- Last Pushed: 2025-01-09T07:00:02.000Z (over 1 year ago)
- Last Synced: 2025-01-28T10:36:21.916Z (over 1 year ago)
- Language: Rich Text Format
- Size: 824 KB
- Stars: 11
- Watchers: 8
- Forks: 11
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Test Data Repository for Search Systems
## Overview
A repository of test data for verifying whether search systems can crawl and index various file types.
Feel free to submit a pull request if you have files you want to test.
## Directory Structure
```
fess-testdata/
├── files/ # Test data files
├── docker/ # Docker configurations for crawling environments
├── tools/ # Utility scripts
└── build/ # Build-related files
```
## How to Create Test Files
### File Naming
Add the prefix "test" and use the appropriate file extension.
### File Content
Include the text "Lorem ipsum. (ロレム・イプサム) 吾輩は猫である。" in the content section of the file.
Do not include this text in metadata sections (to clearly identify where content was extracted from).
### Directory
Place files in the appropriate category directory under `files/`.
## Test Data Files
### Documents
| Type | Location |
|:----:|:-----------|
|Text|[files/text/test_utf8.txt](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/text/test_utf8.txt)|
|HTML|[files/html/test.html](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/html/test.html)|
|HTML|[files/html/test_utf8.html](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/html/test_utf8.html)|
|HTML|[files/html/test_sjis.html](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/html/test_sjis.html)|
|HTML|[files/html/test_hankaku.html](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/html/test_hankaku.html)|
|HTML|[files/html/test_nocharset.html](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/html/test_nocharset.html)|
|XML|[files/xml/test_utf8.xml](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/xml/test_utf8.xml)|
|XML|[files/xml/test_sjis.xml](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/xml/test_sjis.xml)|
|XML|[files/xml/test_entity.xml](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/xml/test_entity.xml)|
|XML|[files/xml/test.mm](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/xml/test.mm)|
|PDF|[files/pdf/test.pdf](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/pdf/test.pdf)|
|PDF|[files/pdf/test.ps](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/pdf/test.ps)|
|Markdown|[files/markdown/test.md](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/markdown/test.md)|
|AsciiDoc|[files/markdown/test.adoc](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/markdown/test.adoc)|
|reStructuredText|[files/markdown/test.rst](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/markdown/test.rst)|
|LaTeX|[files/latex/test.tex](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/latex/test.tex)|
|EPUB|[files/ebook/test.epub](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/ebook/test.epub)|
|CHM|[files/help/test.chm](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/help/test.chm)|
### Office Documents
| Type | Location |
|:----:|:-----------|
|MS Word|[files/msoffice/test.doc](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/msoffice/test.doc)|
|MS Word|[files/msoffice/test.docx](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/msoffice/test.docx)|
|MS Excel|[files/msoffice/test.xls](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/msoffice/test.xls)|
|MS Excel|[files/msoffice/test.xlsx](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/msoffice/test.xlsx)|
|MS PowerPoint|[files/msoffice/test.ppt](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/msoffice/test.ppt)|
|MS PowerPoint|[files/msoffice/test.pptx](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/msoffice/test.pptx)|
|MS Visio|[files/msoffice/test.vsdx](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/msoffice/test.vsdx)|
|MS Project|[files/msoffice/test.mpp](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/msoffice/test.mpp)|
|MS Publisher|[files/msoffice/test.pub](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/msoffice/test.pub)|
|RTF|[files/msoffice/test.rtf](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/msoffice/test.rtf)|
|OpenDocument Text|[files/opendocument/test.odt](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/opendocument/test.odt)|
|OpenDocument Spreadsheet|[files/opendocument/test.ods](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/opendocument/test.ods)|
|OpenDocument Presentation|[files/opendocument/test.odp](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/opendocument/test.odp)|
|Apple Pages|[files/iwork/test.pages](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/iwork/test.pages)|
|Apple Numbers|[files/iwork/test.numbers](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/iwork/test.numbers)|
|Apple Keynote|[files/iwork/test.key](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/iwork/test.key)|
|Lotus 1-2-3|[files/lotus/test.123](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/lotus/test.123)|
|Hancom|[files/hancom/test.hwp](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/hancom/test.hwp)|
|Ichitaro|files/ichitaro/|
|DocuWorks|files/docuworks/|
### Database
| Type | Location |
|:----:|:-----------|
|MS Access|[files/database/test.accdb](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/database/test.accdb)|
|MS Access (Legacy)|[files/database/test.mdb](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/database/test.mdb)|
|FileMaker|[files/database/test.fmp12](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/database/test.fmp12)|
|dBase|[files/database/test.dbf](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/database/test.dbf)|
### Media & Images
| Type | Location |
|:----:|:-----------|
|PNG|[files/images/test.png](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/images/test.png)|
|JPEG|[files/images/test.jpg](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/images/test.jpg)|
|GIF|[files/images/test.gif](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/images/test.gif)|
|BMP|[files/images/test.bmp](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/images/test.bmp)|
|TIFF|[files/images/test.tiff](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/images/test.tiff)|
|SVG|[files/images/test.svg](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/images/test.svg)|
|MP3|[files/media/test.mp3](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/media/test.mp3)|
### Source Code
| Type | Location |
|:----:|:-----------|
|C|[files/source_code/test.c](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.c)|
|C++|[files/source_code/test.cpp](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.cpp)|
|Java|[files/source_code/test.java](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.java)|
|JavaScript|[files/source_code/test.js](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.js)|
|TypeScript|[files/source_code/test.ts](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.ts)|
|Python|[files/source_code/test.py](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.py)|
|Ruby|[files/source_code/test.rb](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.rb)|
|Go|[files/source_code/test.go](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.go)|
|Rust|[files/source_code/test.rs](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.rs)|
|Swift|[files/source_code/test.swift](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.swift)|
|Kotlin|[files/source_code/test.kt](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.kt)|
|PHP|[files/source_code/test.php](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.php)|
|SQL|[files/source_code/test.sql](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.sql)|
|CSS|[files/source_code/test.css](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.css)|
|SCSS|[files/source_code/test.scss](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.scss)|
### Scripts & Configuration
| Type | Location |
|:----:|:-----------|
|Bash|[files/scripts/test.bash](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/scripts/test.bash)|
|Perl|[files/scripts/test.pl](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/scripts/test.pl)|
|Lua|[files/scripts/test.lua](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/scripts/test.lua)|
|PowerShell|[files/scripts/test.ps1](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/scripts/test.ps1)|
|JSON|[files/config/test.json](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/config/test.json)|
|YAML|[files/config/test.yaml](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/config/test.yaml)|
|TOML|[files/config/test.toml](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/config/test.toml)|
|INI|[files/config/test.ini](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/config/test.ini)|
|Properties|[files/config/test.properties](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/config/test.properties)|
### Archives
| Type | Location |
|:----:|:-----------|
|ZIP|[files/archive/test.zip](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/archive/test.zip)|
|TAR|[files/archive/test.tar](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/archive/test.tar)|
|TAR.GZ|[files/archive/test.tar.gz](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/archive/test.tar.gz)|
|BZ2|[files/archive/test.txt.bz2](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/archive/test.txt.bz2)|
|XZ|[files/archive/test.txt.xz](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/archive/test.txt.xz)|
### Email
| Type | Location |
|:----:|:-----------|
|EML|[files/email/test.eml](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/email/test.eml)|
|MSG|[files/email/test.msg](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/email/test.msg)|
### Data
| Type | Location |
|:----:|:-----------|
|CSV|[files/data/test.csv](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/data/test.csv)|
|TSV|[files/data/test.tsv](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/data/test.tsv)|
|GeoJSON|[files/geodata/test.geojson](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/geodata/test.geojson)|
|KML|[files/geodata/test.kml](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/geodata/test.kml)|
|Jupyter Notebook|[files/notebooks/test.ipynb](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/notebooks/test.ipynb)|
|Log|[files/logs/test.log](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/logs/test.log)|
### Other
| Type | Location |
|:----:|:-----------|
|Adobe Illustrator|[files/ai/test.ai](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/ai/test.ai)|
|AutoCAD|files/cad/|
|Font (TTF)|[files/fonts/test.ttf](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/fonts/test.ttf)|
|ISO|[files/disk-images/test.iso](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/disk-images/test.iso)|
|Patch|[files/patches/test.patch](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/patches/test.patch)|
|Diff|[files/patches/test.diff](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/patches/test.diff)|
|Old-style Characters|[files/other/old_style.txt](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/other/old_style.txt)|
## Docker Environments
The `docker/` directory contains Docker Compose configurations for setting up various data source crawling environments.
| Environment | Description |
|:----:|:-----------|
|basic|Basic Authentication|
|digest|Digest Authentication|
|ldap|LDAP|
|ftp|FTP|
|samba|Samba|
|webdav|WebDAV|
|mysql|MySQL|
|postgresql|PostgreSQL|
|mariadb|MariaDB|
|oracle|Oracle|
|mssql|SQL Server|
|db2|DB2|
|mongodb|MongoDB|
|elasticsearch|Elasticsearch|
|solr|Solr|
|redis|Redis|
|cassandra|Cassandra|
|couchdb|CouchDB|
|minio|MinIO (S3 Compatible)|
|gitlab|GitLab|
|gitea|Gitea|
|redmine|Redmine|
|wordpress|WordPress|
|bugzilla|Bugzilla|
|mantis|MantisBT|
|taiga|Taiga|
|keycloak|Keycloak|
|authentik|Authentik|
## Tools
The `tools/` directory contains utility scripts for data store operations.
| Script | Description |
|:----:|:-----------|
|csvdatastore.sh|CSV Data Store|
|csvlistdatastore.sh|CSV List Data Store|
|csvgeodatastore.sh|CSV Geo Data Store|
|esdatastore.sh|Elasticsearch Data Store|
|eslistdatastore.sh|Elasticsearch List Data Store|
|create_roledata.sh|Role Data Creation|
|encrypt_roles.sh|Role Encryption|
|thumbnail_check.sh|Thumbnail Check|