September 29, 2023

Build Scripts in SQL Server

What is a build script?

A database build script contains the SQL statements that create the database and its objects in the correct dependency order. These statements are DDL (data definition language) statements (e.g. CREATE TABLE), but they could also be DML (data manipulation language) statements that, for example, load static data.

There is another kind of script called a migration script which, when applied to a database, alters existing objects and creates new ones to bring the database in line with a newer version (usually the version in source control). It keeps the data intact whereas a build script creates a brand new database or drops and creates objects, and therefore does not aim to preserve data or settings. For example, a migration script might simply add a new column to a table:

ALTER TABLE dbo.SomeTable ADD FooBar NVARCHAR(100) NOT NULL

A migration script is used to push changes from source control to a live production system.

What are build scripts used for?

The purpose of the build script is to prove that a working version of the database can be created successfully from the object definitions saved in source control. This is important, especially when a team is developing a database, as it builds (no pun intended) trust that the database works as expected. Ideally the build is automated and run frequently (e.g. after every commit), and is followed up with a suite of automated tests. This ensures that when the build fails the problem is spotted and corrected quickly. Taking it further any serious attempt to follow the best practices of database lifecycle management (DLM) should include a strategy to build the database.

My (lack of) experience

The topic of building databases is new to me which is why I decided to learn more about it. Most of my past jobs were in data warehouse (DW) scenarios meaning we weren't supporting an online transaction processing (OLTP) system. It was decision support (online analytical processing (OLAP)) involving the maintenance of in house data warehouses. I mention this because it always felt like the best practices of software development (e.g. source control, building, testing, CI/CD, DevOps) have had the slowest adoption rate in the small company DW shops that I've worked in.

So I never learned about this stuff because it didn't matter to the teams I worked in. Things like source control were an afterthought that "one day we'll get round to using". There was sometimes a development environment but it was always out of sync with production and there was no reliable way to push changes from development to production (i.e. continuous deployment (CD)). Sometimes there would be one developer in the team who owned a hand crafted CD process, but it was very Rube Goldberg, didn't really work, and definitely wasn't documented. So it was entirely normal for people to develop in production.

Therefore learning how to build the database seemed like a good place to start in addressing these neglected areas.

Back to build scripts

In exploring the topic I naively expected to find some sort of master program (probably written in PowerShell) capable of generating a build script for any SQL Server database that, most importantly, did not raise any errors when ran. And I could take this program with me to any new job and easily get to work building test versions of the databases.

But I quickly discovered that generating a build script is a complicated task, for lots of reasons. Generally it's because of the wide differences between any two databases and the practices of the teams developing the databases. Any automated solution that generates a build script would always be tailored to a team's unique way of doing things. Some typical differences are as follows:

Tools

There are tools (e.g. by Redgate) that can automatically create build and migration scripts, but these tools are developed by teams of professional software engineers over months and years, meaning you can't just roll your own on a quiet Friday afternoon! It involves things like tracking dependencies between objects, topological sorts (not my area of expertise), dynamically comparing schemas, and, I imagine, lots more complicated stuff.

Really the only sensible thing to do is pay for a tool that can do database DevOps tasks. It's very difficult to, on your own, as a side project at work, do better than professional software.

Having said that sometimes you have no choice but to come up with your own solution e.g. in the companies (like where I've worked) that don't have the budget to support its OLAP teams with fancy tools like it does the mission critical OLTP production database.

Example

Here's a simple way to create a build script for the AdventureWorks database. The technique is to use a manifest file that lists DDL scripts in the correct order. The manifest is a .sql file that uses SQLCMD commands to call the individual DDL files. It looks like this:

:r "C:\SQL\View\Purchasing.vVendorWithAddresses.View.sql"
:r "C:\SQL\View\Purchasing.vVendorWithContacts.View.sql"
:r "C:\SQL\View\Sales.vIndividualCustomer.View.sql"

Notice the :r command which takes a path to the SQL file to run.

This technique won't be an automated process because maintaining the manifest is a manual task. But at least it's a start. Lets describe the scenario.

Scripting object definitions

Generating object DDL statements can be done with mssql-sctiper which I wrote about previously. The following PowerShell script calls mssql-scripter and saves the objects to folders based on type (table, view, etc.):

cd "C:\work\mssql-scripter"
.\venv\scripts\activate
cd ".\AdventureWorks2017"
$server = 'localhost'
$databases = ('AdventureWorks2017')
$objects =  ('Table','StoredProcedure','View','Schema',
'UserDefinedFunction','UserDefinedDataType')
$databases | ForEach-Object {
        $database = $_

        $objects | ForEach-Object {
                $object = $_
                if(-not (Test-Path $object)){
                    mkdir $object
                }
                Push-Location
                Set-Location $object
                rm *.sql
                mssql-scripter.bat -S $server `
                    -d $database `
                    -f .\ `
                    --file-per-object `
                    --check-for-existence `
                    --script-drop-create `
                    --include-types $object `
                    --exclude-use-database `
                    --exclude-extended-properties `
                    --exclude-headers
                Pop-Location
        }
}        

A bit about this script. I've created a directory to work in (C:\work\mssql-scripter\AdventureWorks2017). Since mssql-scripter is a Python module I've created a virtual environment to use it in. The line .\venv\scripts\activate actives the virtual environment from which mssql-scripter will be used (if it's not activated then calls to mssql-scripter won't work).

Note the options I've used:

You end up with the directory looking like this with the object folders:

scripted objects

Create the manifest

The following PowerShell generates the manifest:

cd C:\work\mssql-scripter\AdventureWorks2017
$types = ('UserDefinedDataType', 'Schema','StoredProcedure', 
'UserDefinedFunction', 'Table','View')
$manifest = ""
$types | Foreach-Object{
    $type = $_
    Push-Location
    cd $type
    dir | select name | ForEach-Object{
        $name = $_.Name
        $manifest += ":r `$(sqlDirectory)""\$type\" + $name + """`n"
    }
    Pop-Location
}
$manifest | clip

It stores the script in the $manifest variable and copies it to the clipboard. Paste it somewhere and you should see something like the following (only the top few lines are shown for brevity):

:r $(sqlDirectory)"\Schema\HumanResources.Schema.sql"
:r $(sqlDirectory)"\Schema\Person.Schema.sql"
:r $(sqlDirectory)"\Schema\Production.Schema.sql"
:r $(sqlDirectory)"\Schema\Purchasing.Schema.sql"
:r $(sqlDirectory)"\Schema\Sales.Schema.sql"

$(sqlDirectory) is a SQLCMD scripting variable which can be assigned a value at runtime. As the name suggests the value should be the path to the directory containing the DDL files, so in this case C:\work\mssql-scripter\AdventureWorks2017. Doing it this way makes it more portable. Save the file as manifest.sql.

The DDL scripts probably won't be in the correct dependency order; that's the manual part. The ordering generally goes something like this. Objects that don't have dependencies, e.g. schemas, are created first, followed by stored procedures, which can be created even when the referenced objects don't exist because of deferred name resolution. After that it's tables, functions, and views, and this is where it's important to get the dependency order correct (e.g. a view that depends on another view needs to be created after the view it depends on).

This can be difficult when doing it by hand especially on a database like AdventureWorks that, because it is a sample database that has to demonstrate a lot of features, has a lot of dependencies between objects. This is where a tool comes in handy. But luckily Phil Factor has some helpful scripts for getting the dependency order of objects. I won't explore those techniques in this post. I'm just assuming we've done the manual part and have a manifest that lists object to create in the correct order.

Running the build script

Running the manifest can be done by calling SQLCMD from PowerShell e.g.:

$sqlDirectory = "C:\Work\ssis-framework\SQL"
$database = "Test"
cd C:\work\ssis-framework\sql\scripts
sqlcmd -d $database -i .\manifest.sql -v sqlDirectory=`"$sqlDirectory`"

Here we've specified with $database = "Test" that we want to create the objects in the Test database.

And that's it. The Test database should contain a copy of AdventureWorks (less the object types we've excluded for simplicity). This attempt at building the database clearly isn't a finished process; it's more of a starting point. It doesn't include any kind of logging or auditing that informs about the progress and result of the build. This is a problem I need to read up on.

Conclusion

Building a database (or software in general) is a huge topic and in attempting to write about it I bit off more than I could chew; this post barely scratches the surface. There's tons of information out there that goes into a lot more detail, and I've included the links to some sources below. The solution described is a very simple way of making a build script for a SQL Server database. It might be suitable for a solo project, but ideally you'd pay for a professional tool designed for the task.

Further reading