{"id":22279053,"url":"https://github.com/marklogic/marklogic-contentpump","last_synced_at":"2025-07-28T18:31:13.743Z","repository":{"id":38982653,"uuid":"61568170","full_name":"marklogic/marklogic-contentpump","owner":"marklogic","description":"MarkLogic Contentpump (mlcp)","archived":false,"fork":false,"pushed_at":"2024-04-14T07:11:46.000Z","size":313430,"stargazers_count":31,"open_issues_count":30,"forks_count":24,"subscribers_count":24,"default_branch":"develop","last_synced_at":"2024-04-15T02:05:46.246Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://developer.marklogic.com/products/mlcp","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/marklogic.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2016-06-20T17:48:12.000Z","updated_at":"2024-06-28T18:20:47.224Z","dependencies_parsed_at":"2023-12-21T10:19:41.459Z","dependency_job_id":"da45daf1-2d2a-45c3-9d20-c04f2aeebf16","html_url":"https://github.com/marklogic/marklogic-contentpump","commit_stats":null,"previous_names":[],"tags_count":53,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marklogic%2Fmarklogic-contentpump","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marklogic%2Fmarklogic-contentpump/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marklogic%2Fmarklogic-contentpump/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marklogic%2Fmarklogic-contentpump/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/marklogic","download_url":"https://codeload.github.com/marklogic/marklogic-contentpump/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":227941964,"owners_count":17844683,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-03T15:18:02.241Z","updated_at":"2025-07-28T18:31:13.730Z","avatar_url":"https://github.com/marklogic.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MarkLogic Content Pump\n\nMarkLogic Content Pump (mlcp) is a command-line tool that provides the fastest way to import, export, and copy data to or from MarkLogic databases. Core features of mlcp include:\n\n* Bulk load billions of local files\n* Split and load large, aggregate XML files or delimited text\n* Bulk load billions of triples or quads from RDF files\n* Archive and restore database contents across environments\n* Export data from a database to a file system\n* Copy subsets of data between databases\n\nYou can run mlcp across many threads on a single machine or across many nodes in a cluster. Mlcp can now run against MarkLogic clusters hosted on AWS/Azure. \n\nThe MarkLogic Connector for Hadoop is an extension to Hadoop’s MapReduce framework that allows you to easily and efficiently communicate with a MarkLogic database from within a Hadoop job. From 10.0-5, Hadoop Connector is removed from a separate release, but mlcp still uses Hadoop Connector as an internal dependency.\n\n## Release Notes\n\n### What's New in mlcp 12.0\n\n- Replaced the commons-csv-marklogic library with the standard Apache Commons CSV library for improved compatibility and maintainability\n- Upgraded commons-csv and commons-io libraries to the latest stable versions.\n- Upgraded commons-beanutils, xstream and avro libraries to mitigate security vulnerabilities.\n- Removed unused dependencies: hadoop-shaded-protobuf_3_25, jena-shaded-guava.\n- Excluded a few transitive dependencies to improve security and maintenance.\n\n## Getting Started\n\n- [Getting Started with mlcp](http://docs.marklogic.com/guide/mlcp/getting-started)\n\n## Documentation\n\nFor official product documentation, please refer to:\n\n- [mlcp User Guide](http://docs.marklogic.com/guide/mlcp)\n\nWiki pages of this project contain useful information when you work on development:\n\n- [Wiki Page of marklogic-contentpump](https://github.com/marklogic/marklogic-contentpump/wiki)\n\n## Required Software\n\n- [Required Software for mlcp](http://docs.marklogic.com/guide/mlcp/install#id_44231)\n- [Apache Maven](https://maven.apache.org/) (version \u003e= 3.6.3) is required to build mlcp and the Hadoop Connector.\n\n## Build\n\nSteps to build mlcp:\n\n``` bash\n$ git clone https://github.com/marklogic/marklogic-contentpump.git\n$ cd marklogic-contentpump\n$ mvn clean package -DskipTests=true\n```\n\nThe build writes to the respective **deliverable** directory under the root directory `marklogic-contentpump/`.\n\nFor information on contributing to this project see [CONTRIBUTING.md](https://github.com/marklogic/marklogic-contentpump/blob/develop/CONTRIBUTING.md). For information on working on the development of this project see [project wiki page](https://github.com/marklogic/marklogic-contentpump/wiki).\n\n## Tests\n\nThe unit tests included in this repository are designed to provide illustrative examples of the APIs and to sanity check external contributions. MarkLogic Engineering runs a more comprehensive set of unit, integration, and performance tests internally. To run the unit tests, execute the following command from the `marklogic-contentpump/` root directory:\n\n``` bash\n$ mvn test\n```\n\nFor detailed information about running unit tests, see [Guideline to Run Tests](https://github.com/marklogic/marklogic-contentpump/wiki/Guideline-to-Run-Tests).\n\n## Have a question? Need help?\n\nIf you have questions about mlcp or the Hadoop Connector, ask on [StackOverflow](http://stackoverflow.com/questions/tagged/mlcp). Tag your question with [**mlcp** and **marklogic**](http://stackoverflow.com/questions/tagged/mlcp+marklogic). If you find a bug or would like to propose a new capability, [file a GitHub issue](https://github.com/marklogic/marklogic-contentpump/issues/new).\n\n## Support\n\nmlcp and the Hadoop Connector are maintained by MarkLogic Engineering and distributed under the [Apache 2.0 license](https://github.com/marklogic/marklogic-contentpump/blob/develop/LICENSE). They are designed for use in production applications with MarkLogic Server. Everyone is encouraged [to file bug reports, feature requests, and pull requests through GitHub](https://github.com/marklogic/marklogic-contentpump/issues/new). This input is critical and will be carefully considered. However, we can’t promise a specific resolution or timeframe for any request. In addition, MarkLogic provides technical support for [release tags](https://github.com/marklogic/marklogic-contentpump/releases) of mlcp and the Hadoop Connector to licensed customers under the terms outlined in the [Support Handbook](http://www.marklogic.com/files/Mark_Logic_Support_Handbook.pdf). For more information or to sign up for support, visit [help.marklogic.com](http://help.marklogic.com).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarklogic%2Fmarklogic-contentpump","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmarklogic%2Fmarklogic-contentpump","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarklogic%2Fmarklogic-contentpump/lists"}