{"id":21655934,"url":"https://github.com/koltyakov/cq-source-sharepoint","last_synced_at":"2025-04-11T21:32:57.265Z","repository":{"id":73404490,"uuid":"601777141","full_name":"koltyakov/cq-source-sharepoint","owner":"koltyakov","description":"🔌 CloudQuery SharePoint Source Plugin","archived":false,"fork":false,"pushed_at":"2023-11-19T21:31:05.000Z","size":3158,"stargazers_count":17,"open_issues_count":4,"forks_count":6,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-25T17:23:01.817Z","etag":null,"topics":["cloudquery","elt","etl","integration","plugin","sharepoint","sync"],"latest_commit_sha":null,"homepage":"https://hub.cloudquery.io/plugins/source/koltyakov/sharepoint","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/koltyakov.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-14T19:51:41.000Z","updated_at":"2024-01-14T12:48:31.000Z","dependencies_parsed_at":"2024-06-19T06:12:23.152Z","dependency_job_id":"5628b580-852f-45f8-908e-c6676a1c568f","html_url":"https://github.com/koltyakov/cq-source-sharepoint","commit_stats":{"total_commits":54,"total_committers":3,"mean_commits":18.0,"dds":0.03703703703703709,"last_synced_commit":"9203c7532c7b20aa4f69f94292bad9c241dae2fb"},"previous_names":[],"tags_count":19,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/koltyakov%2Fcq-source-sharepoint","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/koltyakov%2Fcq-source-sharepoint/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/koltyakov%2Fcq-source-sharepoint/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/koltyakov%2Fcq-source-sharepoint/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/koltyakov","download_url":"https://codeload.github.com/koltyakov/cq-source-sharepoint/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248482878,"owners_count":21111397,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cloudquery","elt","etl","integration","plugin","sharepoint","sync"],"created_at":"2024-11-25T08:37:51.490Z","updated_at":"2025-04-11T21:32:57.216Z","avatar_url":"https://github.com/koltyakov.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# cq-source-sharepoint\n\n\u003c!-- ![Downloads](https://img.shields.io/github/downloads/koltyakov/cq-source-sharepoint/total.svg) --\u003e\n\n[CloudQuery](https://github.com/cloudquery/cloudquery) SharePoint Source community plugin: synchronize SharePoint data to any database destination at ease.\n\n\u003cp float=\"left\"\u003e\n  \u003cimg height=\"40px\" src=\"./assets/cq.svg\" /\u003e\n  \u003cimg height=\"40px\" src=\"./assets/sp.svg\" /\u003e\n\u003c/p\u003e\n\n## Features\n\n- Lists and Document Libraries data fetching\n- Content Types based rollup\n- User Information List data fetching\n- Search Query data source\n- User Profiles data source\n- Managed Metadata data source\n- SharePoint Online support\n- SharePoint On-Premise support\n- Fast and potentially blazin fast with [spsync](https://github.com/koltyakov/spsync)\n\n![demo](./assets/demo.gif)\n\nVote for a feature you need or create a PR.\n\n## Schema\n\n```yaml\nkind: source\nspec:\n  name: \"sharepoint\"\n  registry: \"github\"\n  path: \"koltyakov/sharepoint\"\n  version: \"v2.1.0\" # provide the latest version\n  destinations: [\"postgresql\"] # the list of used destinations\n  tables: [\"*\"] # required field, a list of tables to sync\n  spec:\n    # Spec is mandatory\n    # This plugin follows idealogy of explicit configuration\n    # we can change this in future based on the feedback\n```\n\n### Interactive schema builder\n\nSince v1.8.0, the plugin ships with configuration utility `spctl`.\n\n![](./assets/spctl.gif)\n\nIt can be downloaded from [releases](https://github.com/koltyakov/cq-source-sharepoint/releases): `spctl_[OS]_[ARCH].zip`.\n\nOn a macOS System Settings / Security allowance is needed for it to run.\n\n### Authentication options\n\n```yaml\n# sharepoint.yml\n# ...\nspec:\n  auth:\n    strategy: \"azurecert\"\n    creds:\n      siteUrl: \"https://contoso.sharepoint.com/sites/cloudquery\"\n      tenantId: \"e1990a0a-dcf7-4b71-8b96-2a53c7e323e0\"\n      clientId: \"2a53c7e323e0-e1990a0a-dcf7-4b71-8b96\"\n      certPath: \"/path/to/cert.pfx\"\n      certPass: \"certpass\"\n```\n\n`creds` options are unique for different auth strategies. See more details in [Auth strategies](https://go.spflow.com/auth/strategies).\n\nWe recomment Azure AD (`azurecert`) or Add-In (`addin`) auth for production scenarios for SharePoint Online. Yet, other auth strategies are also available, e.g. `saml`, `device`. Some of the APIs could require using user contextual auth, for instance, Search API can't work without a user context.\n\nSharePoint On-Premise auth is also supported, based on your farm configuration you can use: `ntlm`, `adfs` to name a few.\n\nNeed to hands on quickly without configuring Azure Apps or Addins or asking an admin to turn on app password? Try On-Demand auth:\n\n```yaml\n# sharepoint.yml\n# ...\nspec:\n  auth:\n    strategy: \"ondemand\"\n    creds:\n      siteUrl: \"https://contoso.sharepoint.com/sites/cloudquery\"\n```\n\n### Entities configuration\n\nA single source `yml` configuration assumes fetching data from a single SharePoint site. If you need to fetch data from multiple sites, you can create multiple source configurations. Alternatevely, search based data fetching can be used for rollup scenarios grabbing data from as many sites as needed.\n\n```yaml\n# sharepoint.yml\n# ...\nspec:\n  # A map of URIs with the list configurations\n  # If no lists are provided, nothing will be fetched\n  lists:\n    # List or Document library URI - a relative path without a site URL\n    # Can be checker in the browser URL (exclude site URL and view page path)\n    Lists/ListEntityName:\n      # REST `$select` OData modificator, fields entity properties array\n      # Wildcard selectors `*` are intentionally not supported\n      # If not provided, only default fields will be fetched (ID, Created, AuthorId, Modified, EditorId)\n      select:\n        - Title\n        - Author/Title\n        # Fields mapping via `-\u003e` arrow alias, when a specific field name is considered\n        - EditorId -\u003e editor\n      # REST `$expand` OData modificator, fields entity properties array\n      # When expanding an entity use selection of a nested entity property(s)\n      # Optional, and in most of the cases we recommend to avoid it and\n      # prefer to map nested entities to the separate tables\n      expand:\n        - Author\n      # REST `$filter` OData modificator, a filter string\n      # Don't use filters for large entities which potentially\n      # can return more than 5000 in a view\n      # such filtering will throttle no matter top limit is set\n      filter: \"Active eq true\"\n      # Optional, an alias for the table name\n      # Don't map different lists to the same table - such scenario is not supported\n      alias: \"my_table\"\n    Lists/AnotherList:\n      select:\n        - Title\n  # content_types: # see more below\n  # mmd: # see more below\n  # search: # see more below\n  # profiles: # see more below\n```\n\n#### Content Types based rollup\n\nContent Types based rollup allows to fetch data from multiple lists or document libraries based on the Content Type configuration.\n\nAll items based on the parent content type are fetched from all lists and subwebs below the context site URL.\n\n```yaml\n# sharepoint.yml\n# ...\nspec:\n  # A map of Content Types with the rollup configurations\n  content_types:\n    # Base Content Type name or ID (e.g. \"0x0101009D1CB255D\" must be in quotes)\n    Task:\n      # REST `$select` OData modificator, fields entity properties array\n      select:\n        - Title\n        - AssignedTo/Title\n      # REST `$expand` OData modificator, fields entity properties array\n      expand:\n        - AssignedTo\n      # Optional, an alias for the table name\n      # the name of the alias is prefixed with `rollup_`\n      alias: \"task\"\n```\n\n#### User Information List\n\nQuite often you'd need getting User Information List for Author and Editor fields joining. This is a special case, and we have a dedicated configuration for it.\n\n```yaml\n# sharepoint.yml\n# ...\nspec:\n  lists:\n    _catalogs/users: # UIL list URI, source of People Picker lookup\n      select:\n        - Title\n        - FirstName\n        - LastName\n        - JobTitle\n        - Department\n        - EMail\n        - Deleted\n      filter: \"UserName ne null\"\n      alias: \"user\"\n```\n\n#### Document libraries\n\nDocument listariries are the same as lists in SharePoint, but with a few differences. And it's common to expand File entity to get file metadata.\n\nAlso, a document library URI usually doesn't contain `Lists/` prefix.\n\n```yaml\n# sharepoint.yml\n# ...\nspec:\n  lists:\n    Shared Documents:\n      select:\n        - FileLeafRef\n        - FileRef\n        - FileDirRef\n        - File/Length\n      expand:\n        - File\n      alias: \"document\"\n```\n\n#### Managed Metadata\n\nTo configure managed metadata fetching, you need to provide a term set ID (GUID) and an optional alias for the table name.\n\n```yaml\n# sharepoint.yml\n# ...\nspec:\n  # A map of MMD term sets IDs (GUIDs)\n  mmd:\n    # Term set ID\n    8ed8c9ea-7052-4c1d-a4d7-b9c10bffea6f:\n      # Optional, an alias for the table name\n      # the name of the alias is prefixed with `mmd_`\n      alias: \"department\"\n```\n\n#### User Profiles\n\nUser Profiles are fetched via Search API, so the search should be configured in the farm.\n\nSearch drived data source can be user only with user associated authentication strategies. E.g. it won't work with `addin` strategy.\n\n```yaml\n# sharepoint.yml\n# ...\nspec:\n  # Include `profiles` property to fetch user profiles\n  # Object structure for extensibility (adding custom properties)\n  profiles:\n    enabled: true\n    # Optional, an alias for the table name\n    alias: \"profile\"\n```\n\n#### Search queries\n\nSearch drived data source can be user only with user associated authentication strategies. E.g. it won't work with `addin` strategy.\n\n```yaml\n# sharepoint.yml\n# ...\nspec:\n  # A map of search queries\n  search:\n    # Query name (whatever you want to name a resulted table)\n    # Should be unique within other compound aliases\n    documents:\n      # Required, search query text\n      # https://learn.microsoft.com/en-us/sharepoint/dev/general-development/sharepoint-search-rest-api-overview#querytext-parameter\n      query_text: \"*\"\n      # Optional, the managed properties to return in the search results\n      # https://learn.microsoft.com/en-us/sharepoint/dev/general-development/sharepoint-search-rest-api-overview#selectproperties\n      # By defining the list of properties, you also tell the plugin\n      # to have correcponding columns in the table\n      select_properties:\n        - Size\n        - Title\n        - ContentTypeId\n        - IsDocument\n        - FileType\n        - DocId\n        - SPWebUrl\n        - SiteId\n        - WebId\n        - ListId\n      # Optional, whether duplicate items are removed from the results\n      # https://learn.microsoft.com/en-us/sharepoint/dev/general-development/sharepoint-search-rest-api-overview#trimduplicates\n      trim_duplicates: true\n    profiles:\n      query_text: \"*\",\n      trim_duplicates: false\n      # The result source ID to use for executing the search query.\n      # https://learn.microsoft.com/en-us/sharepoint/dev/general-development/sharepoint-search-rest-api-overview#sourceid\n      source_id: \"b09a7990-05ea-4af9-81ef-edfab16c4e31\"\n```\n\n## Get started\n\n### Install CloudQuery\n\nFollow [quickstart instructions](https://www.cloudquery.io/docs/quickstart/).\n\n### Source sample data\n\nProvision and seed some sample data. [See more](./cmd/seed/README.md). Which satisfy the schema below.\n\n### Auth configuration\n\n```bash\n# .env or env vars export\n# See more details in https://go.spflow.com/auth/strategies\nSP_SITE_URL=https://contoso.sharepoint.com/sites/site\n```\n\nor use \"ondemand\" auth.\n\n### Source configuration\n\n```yaml\n# sharepoint.yml\nkind: source\nspec:\n  name: \"sharepoint\"\n  registry: \"github\"\n  path: \"koltyakov/sharepoint\"\n  version: \"v2.1.0\" # https://github.com/koltyakov/cq-source-sharepoint/releases\n  destinations: [\"sqlite\"]\n  tables: [\"*\"]\n  spec:\n    auth:\n      strategy: \"ondemand\"\n      creds:\n        siteUrl: ${SP_SITE_URL}\n        # align creds with the used strategy\n    lists:\n      _catalogs/users:\n        select:\n          - Title\n          - FirstName\n          - LastName\n          - JobTitle\n          - Department\n          - EMail\n          - Deleted\n        filter: \"UserName ne null\"\n        alias: \"user\"\n      Shared Documents:\n        select:\n          - FileLeafRef\n          - FileRef\n          - FileDirRef\n          - Author/Title\n          - File/Length\n        expand:\n          - Author\n          - File\n        alias: \"document\"\n      Lists/Managers:\n        select:\n          - Title\n        alias: \"manager\"\n      Lists/Customers:\n        select:\n          - Title\n          - RoutingNumber\n          - Region\n          - Revenue\n          - ManagerId\n        alias: \"customer\"\n      Lists/Orders:\n        select:\n          - Title\n          - CustomerId\n          - OrderNumber\n          - OrderDate\n          - Total\n        alias: \"order\"\n```\n\n### Destination configuration\n\nFor the sake of simplicity, we'll use SQLite destination.\n\n```yaml\n# sqlite.yml\nkind: destination\nspec:\n  name: sqlite\n  path: cloudquery/sqlite\n  version: \"v2.4.15\"\n  spec:\n    connection_string: ./sp.db\n```\n\n### Run CloudQuery\n\n```bash\n# With auth environment variables exported\ncloudquery sync sharepoint.yml sqlite.yml\n```\n\nYou should see the following output:\n\n```bash\nLoading spec(s) from sharepoint_reg.yml, sqlite.yml\nDownloading https://github.com/koltyakov/...sharepoint_darwin_arm64.zip\nDownloading 100% |█████████████████████████████████████| (5.2/5.2 MB, 10 MB/s)\nMigration completed successfully.\nStarting sync for: sharepoint (v2.0.0) -\u003e [sqlite (v2.4.15)]\nSync completed successfully. Resources: 37478, Errors: 0, Panics: 0, Time: 21s\n```\n\nCheck for destination database data.\n\n---\n\nPowered by [gosip](https://github.com/koltyakov/gosip).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkoltyakov%2Fcq-source-sharepoint","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkoltyakov%2Fcq-source-sharepoint","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkoltyakov%2Fcq-source-sharepoint/lists"}