{"id":28802698,"url":"https://github.com/ryanfb/papers-bics","last_synced_at":"2026-02-01T03:37:48.041Z","repository":{"id":1151425,"uuid":"1035334","full_name":"ryanfb/papers-BICS","owner":"ryanfb","description":"Chapter for Digital Classicist BICS volume","archived":false,"fork":false,"pushed_at":"2014-02-18T16:40:58.000Z","size":6683,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-06-18T08:07:32.525Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://ryanfb.github.io/papers-BICS/","language":"TeX","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ryanfb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2010-10-29T16:28:46.000Z","updated_at":"2014-02-18T16:40:59.000Z","dependencies_parsed_at":"2022-08-16T12:15:28.035Z","dependency_job_id":null,"html_url":"https://github.com/ryanfb/papers-BICS","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/ryanfb/papers-BICS","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ryanfb%2Fpapers-BICS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ryanfb%2Fpapers-BICS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ryanfb%2Fpapers-BICS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ryanfb%2Fpapers-BICS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ryanfb","download_url":"https://codeload.github.com/ryanfb/papers-BICS/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ryanfb%2Fpapers-BICS/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28966775,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-01T02:14:24.993Z","status":"ssl_error","status_checked_at":"2026-02-01T02:13:55.706Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-18T08:07:22.275Z","updated_at":"2026-02-01T03:37:48.036Z","avatar_url":"https://github.com/ryanfb.png","language":"TeX","funding_links":[],"categories":[],"sub_categories":[],"readme":"5000-7000 words\n\nLinks\n=====\n\n* [Digital Papyrology](http://www.stoa.org/archives/1263)\n* [Rome Wasn't Digitized in a Day: Building a Cyberinfrastructure for Digital Classicists](http://www.clir.org/pubs/archives/Babeu2010.pdf)\n* [Integrating Digital Papyrology](http://hdl.handle.net/2451/29592)\n* [Background and Funding](http://idp.atlantides.org/trac/idp/wiki/BackgroundAndFunding)\n* [Tools for Collaborative Editing](http://wiki.digitalclassicist.org/OSCE_Scaife_Paper)\n\nAbstract\n========\n\nThe Son of Suda On-Line (SoSOL) represents the first steps towards a\ncollaborative, editorially-controlled, online editor for the Duke Databank of\nDocumentary Papyri (DDbDP). Funded by the Andrew W. Mellon Foundation’s\nIntegrating Digital Papyrology Phase 2 (IDP2), SoSOL provides a strongly\nversion-controlled front-end for editing and reviewing papyrological texts\nmarked up in EpiDoc XML. This is accomplished in a tagless environment through\nthe use of a dual-syntax grammar which provides a bidirectional unambiguous\nmapping between EpiDoc and a plaintext Leiden-style markup dubbed Leiden+. For\nversion control, SoSOL uses the distributed version control system Git as its\nbackend. This allows us to essentially have a “forked” repository for each\nuser of the system while using very little space, yet still track change\nhistory in a robust way so as to enable intelligent automatic merging of\nsubmitted changes. While any user can edit anything, submitted changes must\npass through an editorial control workflow. Here editors can vote and comment\non the submission (as well as make editorial interventions) before it is\nincluded in the public, “canonical” version of the repository. The process is\ndesigned to maintain transparency and accurate attribution, with\npost-submission editorial interventions appropriately attributed to the editor\nwho made them. The entire framework is implemented as a Ruby on Rails web\napplication, tested and deployed with JRuby so that it can run in any Java\nServlet Container such as Tomcat.\n\nWhile development and documentation is still ongoing, in March of 2010 we\nbegan to introduce papyrologists to using SoSOL at EpiDoc workshops in order\nto gather feedback and make improvements. The results so far have been\nencouraging, with over one thousand changes made to the DDbDP through SoSOL in\njust four months. Previously, though the DDbDP was in electronic form, it was\nvery difficult for third parties to submit new texts to it or make emendations\nand corrections to existing texts. With the integration, consolidation, and\nEpiDoc standardization achieved under phase one of the Integrating Digital\nPapyrology project laying the groundwork, SoSOL has been able to provide a\nconvenient interface and scholarly workflow for editing the DDbDP’s large\ncorpus of ancient documentary papyri (approximately 55,000 texts). We also\nhope to extend the usefulness of this tool by making it a reusable open-source\nsoftware component. Effectively SoSOL will be the core component which manages\nversion control, users, and editorial workflow, while our project-specific\ncomponents for editing EpiDoc texts and aggregating disparate sources into\npublications would become a separate piece of software which uses the SoSOL\ncomponent, called the Papyrological Editor. The core of the tool, which allows\nanyone to edit while retaining scholarly integrity, transparently integrating\na rich distributed version control backend, could help reduce the friction of\ncontribution to a wide range of projects.\n\nDH Abstract\n===========\n\nThe Son of Suda On Line (SoSOL) is one of the main components of the\nIntegrating Digital Papyrology project (IDP), aiming to provide a repurposable\nweb-based editor for the digital resources in the DDbDP and HGV. SoSOL\nintegrates a number of technologies to provide a truly next-generation online\nediting environment. Using JRuby with the Rails web framework, it is able to\ntake advantage of Rails’s wide support in the web development community, as\nwell as Java’s excellent XML libraries and support. This includes the use of\nXSugar to define an alternate, lightweight syntax for EpiDoc XML markup,\ncalled Leiden+. Because XSugar uses a single grammar to define both syntaxes\nin a reversible and bidirectional manner, this is ideal for reducing side\neffects of transforming text in our version-controlled system. SoSOL uses the\nGit distributed version control system as its versioning backend, allowing it\nto use the powerful branching and merging strategies it provides, and enabling\nfully-auditable version control. SoSOL also provides for editorial control of\nchanges to the main data repository, enabling the democracy of allowing anyone\nto change anything they choose while preserving the academic integrity of\ncanonical published data. This talk will provide a demonstration of these\nfeatures of SoSOL as implemented for IDP2, as well as a discussion of its\nrepurposable design for applicability to other projects and the ongoing\ndocumentation work being done to increase usability and adoption in the wider\ncommunity.\n\nNext-Generation Version Control\n-------------------------------\nMany online editing environments, such as MediaWiki, use an SQL database as\nthe sole mechanism for storing revisions. This can lead to a number of\nproblems, such as scaling (most SQL servers are not performance optimized for\nlarge text fields) and distribution of data (see for example the database\ndownloads of the English Wikipedia, which have been notoriously problematic\nfor obtaining the full revision history). Most importantly, they typically\nimpose a centralized, linear, single-branch version history. Because Git is a\ndistributed version control system, it does not impose any centralized\nworkflow. As a result, branching and merging have been given high priority in\nits development, allowing for much more concurrent editing activity while\nminimizing the difficulty of merging changes. SoSOL’s use of Git is to have\none “canonical” Git repository for public, approved data and to which commits\nare restricted. Users and boards each get their own Git repositories which act\nas forks of the canonical repository. This allows them to freely make changes\nto their repository while preserving the version history as needed when these\nchanges are merged back into the canonical repository. These repositories can\nalso be easily mirrored, downloaded, and worked with offline and outside of\nSoSOL due to the distributed nature of Git. This enables a true democracy of\ndata, wherein institutions still retain control and approval of the data which\nthey put their names on, but any individual may easily obtain the full dataset\nand revision history to edit, contribute to, and republish under the terms of\nlicense.\n\nAlternative Syntax for XML Editing\n----------------------------------\nWhile XML encoding has many advantages, users inexperienced with its use may\nfind its syntax difficult or verbose. It is still desirable to harness the\nexpertise of these users in other areas and ease their ability to add content\nto the system, while retaining the semantically explicit nature of XML markup.\nTo do this, we have used XSugar to allow the definition of a “tagless” syntax\nfor EpiDoc XML, which resembles that of the traditional printed Leiden\nconventions for epigraphic and papyrological texts where possible. Structures\nwhich are semantically ambiguous or undefined in Leiden but available in\nEpiDoc (e.g. markup of numbers and their corresponding value) have been given\nadditional text markup, referred to comprehensively as Leiden+. XSugar enables\nthe definition of this syntax in a single, bidirectional grammar file which\ndefines all components of both Leiden+ and EpiDoc XML as correspondences,\nwhich can be statically checked for reversibility and validity. This provides\nmuch more rigorous guarantees of these properties than alternatives such as\nusing separate XSLT stylesheets for each direction of the transform, as well\nas encoding the relation between the components of each syntax in a single\nlocation.\n\nRepurposable Design\n-------------------\nDue to institutional requirements, the DDbDP and HGV datasets needed separate\neditorial control and publishing mechanisms. In addition, their control over\ndifferent types of content necessitated different editing mechanisms for each\ncomponent. These requirements informed the design of how SoSOL interacts with\ndata under its control and how this design is repurposable for use in other\nprojects. The two high-level abstractions of data made by SoSOL are\n“publications” and “identifiers”. Identifiers are unique strings which can be\nmapped to a specific file path in the repository, while publications are\narbitrary aggregations of identifiers. By defining an identifier superclass\nwhich defines common functionality for interacting with the data repository,\nwe can then subclass this to provide functionality specific to a given\ncategory of data. The SoSOL implementation for IDP2, for example, provides\nidentifier subclasses for DDbDP transcriptions, HGV metadata, and HGV\ntranslations. Editorial boards consequently have editorial control for only\ncertain subclasses of identifiers. Publications in turn allow representation\nand aggregation of the complex many-to-many relationships these components can\nhave (for example, a document with two sides that may have one transcription\nand two metadata components). Packaging these related elements together both\nallows the user to switch between them and editorial boards to check related\ndata which they may not have editorial control over but still require to make\ninformed decisions about validity and approval. SoSOL can thus be integrated\ninto other systems by implementing the identifier subclasses necessary for the\ngiven dataset as well as coherent means for aggregating these components into\npublications.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fryanfb%2Fpapers-bics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fryanfb%2Fpapers-bics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fryanfb%2Fpapers-bics/lists"}