{"id":17285035,"url":"https://github.com/fgregg/smered","last_synced_at":"2025-06-23T07:02:16.477Z","repository":{"id":146617612,"uuid":"84686549","full_name":"fgregg/smered","owner":"fgregg","description":"Mirror of https://bitbucket.org/resteorts/smered","archived":false,"fork":false,"pushed_at":"2017-03-12T03:27:58.000Z","size":4697,"stargazers_count":5,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-14T10:49:29.438Z","etag":null,"topics":["deduplication","entity-resolution","record-linkage"],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fgregg.png","metadata":{"files":{"readme":"README","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-03-11T23:25:09.000Z","updated_at":"2024-02-07T22:15:42.000Z","dependencies_parsed_at":"2023-05-14T20:30:57.406Z","dependency_job_id":null,"html_url":"https://github.com/fgregg/smered","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/fgregg/smered","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fgregg%2Fsmered","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fgregg%2Fsmered/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fgregg%2Fsmered/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fgregg%2Fsmered/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fgregg","download_url":"https://codeload.github.com/fgregg/smered/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fgregg%2Fsmered/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261433935,"owners_count":23157197,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deduplication","entity-resolution","record-linkage"],"created_at":"2024-10-15T09:55:32.088Z","updated_at":"2025-06-23T07:02:11.466Z","avatar_url":"https://github.com/fgregg.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Bayesian record linkage\n\nThis repository builds a program to perform Bayesian record linkage using the model described in a forthcoming paper (ref. when available). The source code comes with an [ant](http://ant.apache.org) build script. To compile the program simply run 'ant' from the base directory. This will create an executable jar file named MHSampler.jar. You can then run the program by calling\n\n    \u003e java -jar MHSampler.jar CONFIG_FILE FILE FILE ...\n\nwhere the first command-line argument, `CONFIG_FILE`, is an XML configuration file and the remaining command-line arguments are whitespace-delimited data files that you wish to link. You should supply at least two files to link. For example:\n\n    \u003e java -jar MHSampler.jar config.xml *.dat\n\n(assuming there are at least two `.dat` files in the current directory).\n\n## Configuration file format\n\nThe configuration file is an XML file with a top-level `\u003cconfiguration\u003e` element which contains `\u003coptions\u003e`, `\u003cschema\u003e`, and `\u003cblocking-fields\u003e` elements.\n\nThe `\u003coptions\u003e` element is optional. If present, it contains elements corresponding to the specific options you wish to set. Options are set using the `value` attribute. Supported options are:\n\n* `\u003callDedup\u003e`, boolean, if true then all files are assumed to be deduplicated (default: false).\n* `\u003cinnerIterations\u003e`, positive integer, number of split-merge (MH) steps per outer iteration (default: 10,000).\n* `\u003cthinIterations\u003e`, positive integer, write output every so many Gibbs iterations (default: 100).\n* `\u003cburnIn\u003e`, positive integer, begin taking averages only after this many Gibbs iterations (default 5,000).\n* `\u003cmaxOuterIterations\u003e`, positive integer, number of Gibb's iterations (default: 1,005,001).\n\nFor example, to specify a burn-in of 7,000, you would write\n\n    \u003coptions\u003e\n        \u003cburnIn value=\"7000\" /\u003e\n        \u003c!-- additional options ... --\u003e\n    \u003c/options\u003e\n\nThe `\u003cschema\u003e` element contains a number of `\u003cfield\u003e` elements corresponding to the fields in the files you wish to match. Each `\u003cfield\u003e` element has a `name` and `type` attribute. The `type` must be one of `KEY` or `VAR`. There can be at most one field of type `KEY` and, if present, it must be the first field. The `\u003cschema\u003e` element is required.\n\nThe `\u003cblocking-fields\u003e` element is optional, and consists of a number of `\u003cfield\u003e` elements. Each `\u003cfield\u003e` element must have a `name` attribute, and the names given should correspond to names of fields in the `\u003cschema\u003e`.\n\nHere is a complete example configuration file:\n\n    \u003cconfiguration\u003e\n      \u003coptions\u003e\n        \u003callDedup value=\"true\" /\u003e\n        \u003cburnIn value=\"7000\" /\u003e\n      \u003c/options\u003e\n\n      \u003cschema\u003e\n        \u003cfield name=\"ID\" type=\"KEY\" /\u003e\n        \u003cfield name=\"Department\" type=\"VAR\" /\u003e\n        \u003cfield name=\"Occupation\" type=\"VAR\" /\u003e\n        \u003cfield name=\"Office\" type=\"VAR\" /\u003e\n      \u003c/schema\u003e\n\n      \u003cblocking-fields\u003e\n        \u003cfield name=\"Office\" /\u003e\n      \u003c/blocking-fields\u003e\n    \u003c/configuration\u003e","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffgregg%2Fsmered","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffgregg%2Fsmered","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffgregg%2Fsmered/lists"}