{"id":21994472,"url":"https://github.com/codelibs/fess-testdata","last_synced_at":"2026-01-30T10:11:19.173Z","repository":{"id":18302689,"uuid":"21478731","full_name":"codelibs/fess-testdata","owner":"codelibs","description":"Test Data Repository for Crawling/Parsing","archived":false,"fork":false,"pushed_at":"2025-01-09T07:00:02.000Z","size":844,"stargazers_count":11,"open_issues_count":0,"forks_count":11,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-01-28T10:36:21.916Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rich Text Format","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/codelibs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2014-07-03T22:23:42.000Z","updated_at":"2025-01-09T07:00:06.000Z","dependencies_parsed_at":"2025-01-09T08:19:10.537Z","dependency_job_id":"ba83258e-2343-44a2-b5b3-9f0e9bb5a8d5","html_url":"https://github.com/codelibs/fess-testdata","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codelibs%2Ffess-testdata","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codelibs%2Ffess-testdata/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codelibs%2Ffess-testdata/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codelibs%2Ffess-testdata/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/codelibs","download_url":"https://codeload.github.com/codelibs/fess-testdata/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245052673,"owners_count":20553172,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-29T21:09:18.605Z","updated_at":"2026-01-30T10:11:19.167Z","avatar_url":"https://github.com/codelibs.png","language":"Rich Text Format","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Test Data Repository for Search Systems\n\n## Overview\n\nA repository of test data for verifying whether search systems can crawl and index various file types.\nFeel free to submit a pull request if you have files you want to test.\n\n## Directory Structure\n\n```\nfess-testdata/\n├── files/          # Test data files\n├── docker/         # Docker configurations for crawling environments\n├── tools/          # Utility scripts\n└── build/          # Build-related files\n```\n\n## How to Create Test Files\n\n### File Naming\n\nAdd the prefix \"test\" and use the appropriate file extension.\n\n### File Content\n\nInclude the text \"Lorem ipsum. (ロレム・イプサム) 吾輩は猫である。\" in the content section of the file.\nDo not include this text in metadata sections (to clearly identify where content was extracted from).\n\n### Directory\n\nPlace files in the appropriate category directory under `files/`.\n\n## Test Data Files\n\n### Documents\n\n| Type | Location |\n|:----:|:-----------|\n|Text|[files/text/test_utf8.txt](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/text/test_utf8.txt)|\n|HTML|[files/html/test.html](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/html/test.html)|\n|HTML|[files/html/test_utf8.html](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/html/test_utf8.html)|\n|HTML|[files/html/test_sjis.html](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/html/test_sjis.html)|\n|HTML|[files/html/test_hankaku.html](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/html/test_hankaku.html)|\n|HTML|[files/html/test_nocharset.html](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/html/test_nocharset.html)|\n|XML|[files/xml/test_utf8.xml](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/xml/test_utf8.xml)|\n|XML|[files/xml/test_sjis.xml](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/xml/test_sjis.xml)|\n|XML|[files/xml/test_entity.xml](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/xml/test_entity.xml)|\n|XML|[files/xml/test.mm](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/xml/test.mm)|\n|PDF|[files/pdf/test.pdf](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/pdf/test.pdf)|\n|PDF|[files/pdf/test.ps](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/pdf/test.ps)|\n|Markdown|[files/markdown/test.md](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/markdown/test.md)|\n|AsciiDoc|[files/markdown/test.adoc](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/markdown/test.adoc)|\n|reStructuredText|[files/markdown/test.rst](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/markdown/test.rst)|\n|LaTeX|[files/latex/test.tex](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/latex/test.tex)|\n|EPUB|[files/ebook/test.epub](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/ebook/test.epub)|\n|CHM|[files/help/test.chm](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/help/test.chm)|\n\n### Office Documents\n\n| Type | Location |\n|:----:|:-----------|\n|MS Word|[files/msoffice/test.doc](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/msoffice/test.doc)|\n|MS Word|[files/msoffice/test.docx](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/msoffice/test.docx)|\n|MS Excel|[files/msoffice/test.xls](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/msoffice/test.xls)|\n|MS Excel|[files/msoffice/test.xlsx](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/msoffice/test.xlsx)|\n|MS PowerPoint|[files/msoffice/test.ppt](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/msoffice/test.ppt)|\n|MS PowerPoint|[files/msoffice/test.pptx](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/msoffice/test.pptx)|\n|MS Visio|[files/msoffice/test.vsdx](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/msoffice/test.vsdx)|\n|MS Project|[files/msoffice/test.mpp](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/msoffice/test.mpp)|\n|MS Publisher|[files/msoffice/test.pub](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/msoffice/test.pub)|\n|RTF|[files/msoffice/test.rtf](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/msoffice/test.rtf)|\n|OpenDocument Text|[files/opendocument/test.odt](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/opendocument/test.odt)|\n|OpenDocument Spreadsheet|[files/opendocument/test.ods](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/opendocument/test.ods)|\n|OpenDocument Presentation|[files/opendocument/test.odp](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/opendocument/test.odp)|\n|Apple Pages|[files/iwork/test.pages](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/iwork/test.pages)|\n|Apple Numbers|[files/iwork/test.numbers](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/iwork/test.numbers)|\n|Apple Keynote|[files/iwork/test.key](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/iwork/test.key)|\n|Lotus 1-2-3|[files/lotus/test.123](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/lotus/test.123)|\n|Hancom|[files/hancom/test.hwp](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/hancom/test.hwp)|\n|Ichitaro|files/ichitaro/|\n|DocuWorks|files/docuworks/|\n\n### Database\n\n| Type | Location |\n|:----:|:-----------|\n|MS Access|[files/database/test.accdb](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/database/test.accdb)|\n|MS Access (Legacy)|[files/database/test.mdb](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/database/test.mdb)|\n|FileMaker|[files/database/test.fmp12](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/database/test.fmp12)|\n|dBase|[files/database/test.dbf](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/database/test.dbf)|\n\n### Media \u0026 Images\n\n| Type | Location |\n|:----:|:-----------|\n|PNG|[files/images/test.png](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/images/test.png)|\n|JPEG|[files/images/test.jpg](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/images/test.jpg)|\n|GIF|[files/images/test.gif](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/images/test.gif)|\n|BMP|[files/images/test.bmp](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/images/test.bmp)|\n|TIFF|[files/images/test.tiff](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/images/test.tiff)|\n|SVG|[files/images/test.svg](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/images/test.svg)|\n|MP3|[files/media/test.mp3](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/media/test.mp3)|\n\n### Source Code\n\n| Type | Location |\n|:----:|:-----------|\n|C|[files/source_code/test.c](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.c)|\n|C++|[files/source_code/test.cpp](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.cpp)|\n|Java|[files/source_code/test.java](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.java)|\n|JavaScript|[files/source_code/test.js](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.js)|\n|TypeScript|[files/source_code/test.ts](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.ts)|\n|Python|[files/source_code/test.py](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.py)|\n|Ruby|[files/source_code/test.rb](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.rb)|\n|Go|[files/source_code/test.go](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.go)|\n|Rust|[files/source_code/test.rs](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.rs)|\n|Swift|[files/source_code/test.swift](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.swift)|\n|Kotlin|[files/source_code/test.kt](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.kt)|\n|PHP|[files/source_code/test.php](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.php)|\n|SQL|[files/source_code/test.sql](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.sql)|\n|CSS|[files/source_code/test.css](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.css)|\n|SCSS|[files/source_code/test.scss](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/source_code/test.scss)|\n\n### Scripts \u0026 Configuration\n\n| Type | Location |\n|:----:|:-----------|\n|Bash|[files/scripts/test.bash](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/scripts/test.bash)|\n|Perl|[files/scripts/test.pl](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/scripts/test.pl)|\n|Lua|[files/scripts/test.lua](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/scripts/test.lua)|\n|PowerShell|[files/scripts/test.ps1](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/scripts/test.ps1)|\n|JSON|[files/config/test.json](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/config/test.json)|\n|YAML|[files/config/test.yaml](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/config/test.yaml)|\n|TOML|[files/config/test.toml](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/config/test.toml)|\n|INI|[files/config/test.ini](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/config/test.ini)|\n|Properties|[files/config/test.properties](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/config/test.properties)|\n\n### Archives\n\n| Type | Location |\n|:----:|:-----------|\n|ZIP|[files/archive/test.zip](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/archive/test.zip)|\n|TAR|[files/archive/test.tar](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/archive/test.tar)|\n|TAR.GZ|[files/archive/test.tar.gz](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/archive/test.tar.gz)|\n|BZ2|[files/archive/test.txt.bz2](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/archive/test.txt.bz2)|\n|XZ|[files/archive/test.txt.xz](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/archive/test.txt.xz)|\n\n### Email\n\n| Type | Location |\n|:----:|:-----------|\n|EML|[files/email/test.eml](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/email/test.eml)|\n|MSG|[files/email/test.msg](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/email/test.msg)|\n\n### Data\n\n| Type | Location |\n|:----:|:-----------|\n|CSV|[files/data/test.csv](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/data/test.csv)|\n|TSV|[files/data/test.tsv](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/data/test.tsv)|\n|GeoJSON|[files/geodata/test.geojson](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/geodata/test.geojson)|\n|KML|[files/geodata/test.kml](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/geodata/test.kml)|\n|Jupyter Notebook|[files/notebooks/test.ipynb](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/notebooks/test.ipynb)|\n|Log|[files/logs/test.log](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/logs/test.log)|\n\n### Other\n\n| Type | Location |\n|:----:|:-----------|\n|Adobe Illustrator|[files/ai/test.ai](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/ai/test.ai)|\n|AutoCAD|files/cad/|\n|Font (TTF)|[files/fonts/test.ttf](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/fonts/test.ttf)|\n|ISO|[files/disk-images/test.iso](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/disk-images/test.iso)|\n|Patch|[files/patches/test.patch](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/patches/test.patch)|\n|Diff|[files/patches/test.diff](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/patches/test.diff)|\n|Old-style Characters|[files/other/old_style.txt](https://raw.githubusercontent.com/codelibs/fess-testdata/master/files/other/old_style.txt)|\n\n## Docker Environments\n\nThe `docker/` directory contains Docker Compose configurations for setting up various data source crawling environments.\n\n| Environment | Description |\n|:----:|:-----------|\n|basic|Basic Authentication|\n|digest|Digest Authentication|\n|ldap|LDAP|\n|ftp|FTP|\n|samba|Samba|\n|webdav|WebDAV|\n|mysql|MySQL|\n|postgresql|PostgreSQL|\n|mariadb|MariaDB|\n|oracle|Oracle|\n|mssql|SQL Server|\n|db2|DB2|\n|mongodb|MongoDB|\n|elasticsearch|Elasticsearch|\n|solr|Solr|\n|redis|Redis|\n|cassandra|Cassandra|\n|couchdb|CouchDB|\n|minio|MinIO (S3 Compatible)|\n|gitlab|GitLab|\n|gitea|Gitea|\n|redmine|Redmine|\n|wordpress|WordPress|\n|bugzilla|Bugzilla|\n|mantis|MantisBT|\n|taiga|Taiga|\n|keycloak|Keycloak|\n|authentik|Authentik|\n\n## Tools\n\nThe `tools/` directory contains utility scripts for data store operations.\n\n| Script | Description |\n|:----:|:-----------|\n|csvdatastore.sh|CSV Data Store|\n|csvlistdatastore.sh|CSV List Data Store|\n|csvgeodatastore.sh|CSV Geo Data Store|\n|esdatastore.sh|Elasticsearch Data Store|\n|eslistdatastore.sh|Elasticsearch List Data Store|\n|create_roledata.sh|Role Data Creation|\n|encrypt_roles.sh|Role Encryption|\n|thumbnail_check.sh|Thumbnail Check|\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodelibs%2Ffess-testdata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcodelibs%2Ffess-testdata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodelibs%2Ffess-testdata/lists"}