Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wsmelton/stackdb
PowerShell module to build a SQL Server database from the StackExchange data archives
https://github.com/wsmelton/stackdb
database powershell sql-server-database stackexchange stackoverflow
Last synced: about 1 month ago
JSON representation
PowerShell module to build a SQL Server database from the StackExchange data archives
- Host: GitHub
- URL: https://github.com/wsmelton/stackdb
- Owner: wsmelton
- License: mit
- Created: 2017-01-01T05:59:01.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2019-08-21T12:17:12.000Z (over 5 years ago)
- Last Synced: 2024-11-12T11:46:24.244Z (3 months ago)
- Topics: database, powershell, sql-server-database, stackexchange, stackoverflow
- Language: PowerShell
- Homepage: https://archive.org/details/stackexchange
- Size: 84 KB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# Summary
PowerShell module to build an SQL Server database(s) from the [StackExchange Archives](https://archive.org/details/stackexchange). You can use this to create the database, tables and then import the data.
![]()
Dev Build Status
CI Status
# Example
The example below shows the general process to utilize in order to create a database from StackExchange data dumps:
```powershell
Install-Module stackdbC:\> Get-StackArchive -SiteName woodworking -ListSite | Select-Object TinyName, Name, Total* | Format-Table
TinyName Name TotalQuestions TotalAnswers TotalUsers TotalComments TotalTags
-------- ---- -------------- ------------ ---------- ------------- ---------
woodworking Woodworking 2142 4531 4852 12451 205
woodworkingme Woodworking Meta 124 228 335 596 72
```I want to create a database for the woodworking data dump.
```powershell
C:\> Get-StackArchive -SiteName woodworking -DownloadPath c:\temp
[Get-StackArchive] Downloading https://archive.org/download/stackexchange/woodworking.stackexchange.com.7z to c:\temp\woodworking.stackexchange.com.7z][Get-StackArchive] Download completed!
C:\> Get-ChildItem C:\temp\woodworking.stackexchange.com.7zDirectory: C:\temp
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 2018-06-08 8:47 PM 7038655 woodworking.stackexchange.com.7z
```I need to expand that 7z file to access the XML files of the data dump. This will assume that you have 7z installed using the Windows installer. If you have happen to have just the executable (exe) you can set this at the module level using `Set-StackdbConfig -Name zpp.7zippath -Value 'C:\whatever\7z.exe'`
```powershell
C:\> Expand-StackArchive -FileName C:\temp\woodworking.stackexchange.com.7z -ExportPath c:\temp7-Zip 18.05 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2018-04-30
Scanning the drive for archives:
1 file, 7038655 bytes (6874 KiB)Extracting archive: C:\temp\woodworking.stackexchange.com.7z
--
Path = C:\temp\woodworking.stackexchange.com.7z
Type = 7z
Physical Size = 7038655
Headers Size = 306
Method = BZip2
Solid = +
Blocks = 1Everything is Ok
Files: 8
Size: 37051719
Compressed: 7038655
```The next thing I need is a database to import the data. I use containers for my most of my testing and all of my database files reside under `C:\sqlfiles`:
```powershell
C:\> New-StackDatabase -SqlServer 'localhost,1416' -DatabaseName 'woodworkingse' -DataPath c:\sqlfiles -LogPath c:\sqlfiles
[20:59:35][New-StackDatabase] woodworkingse created on [localhost,1416]
C:\> Get-DbaDatabase -SqlInstance 'localhost,1416' -Database 'woodworkingse' | ftComputerName InstanceName SqlInstance Name Status IsAccessible RecoveryModel LogReuseWaitStatus SizeMB Compatibility Collation
------------ ------------ ----------- ---- ------ ------------ ------------- ------------------ ------ ------------- ---------
67D218C1FB41 MSSQLSERVER 67D218C1FB41 woodworkingse Normal True Full Nothing 175 Version130 SQL_Latin1_Gen...
```Now we just need to import all the data into the pre-built tables.
```powershell
C:\> Import-StackArchive -Folder C:\temp\woodworking.stackexchange.com\ -SqlServer 'localhost,1416'-Database woodworkingse -Schema 'dbo'
C:\> Get-DbaTable -SqlInstance 'localhost,1416' -Database woodworkingse | select database, schema, name, rowcount
Database Schema Name RowCount
-------- ------ ---- --------
woodworkingse dbo Badges 9377
woodworkingse dbo CloseReasonIdDesc 5
woodworkingse dbo Comments 12451
woodworkingse dbo PostHistory 17261
woodworkingse dbo PostHistoryTypeIdDesc 22
woodworkingse dbo PostLinks 889
woodworkingse dbo PostLinkTypeIdDesc 2
woodworkingse dbo Posts 6903
woodworkingse dbo PostsTypeIdDesc 2
woodworkingse dbo Tags 205
woodworkingse dbo Users 4852
woodworkingse dbo Votes 33576
woodworkingse dbo VoteTypeIdDesc 13
```## ToDo
- Build out `Invoke-StackDatabase`, wrapper function that calls all supported commands in proper sequence. Can use splatting to handle all the parameters that will be required.
1. Get-StackArchive
2. Expand-StackArchive
3. New-StackDatabase (deal with if database does not exist, or if it does and tables don't)
4. Import-StackArchive (all of it)