Monthly Archives: January 2010

Developing Against Large Databases

The database for the main application that I work on is fairly large, about 50 Gb or so.  Not the largest SQL Server database I’ve seen, by far, but one that is non-trivial to move across the network.  This has a significant impact on the development process of our team.  The size of the database combined with the mediocre size of the hard drives on the laptops we use means that keeping a local copy of production data is unfeasible.

Getting a local database’s schema in sync with that on production would be easy, but in our case, that’s not enough to create a working system.  Our application has a large amount what I call “static data”, such as menu structures or sets of permissions.  So getting a database that’s “up to date” means not only getting the schema in sync with production, but also ensuring this static data is up to date as well.

Using some sort of tool for evolutionary database design like Rails ActiveRecord migrations would alleviate some of these problems, because schema and static data changes would be a part of the source code. Developers could just run the migrations on their local databases after they updated from source control.  However, this still wouldn’t solve the whole problem.  In order to effectively develop features within the application, our developers need a reasonable set of test data so that the system isn’t just completely empty.

There are tools out there, such as Red Gate’s SQL Data Generator or the data generators in Visual Studio, that will do a pretty good job creating test data by looking at database column and table names, foreign keys, and such.  This might work out even for such a large system as ours, except that a lot of key tables have “polymorphic” relationships, meaning that the foreign key that they contain could point to the primary key in a number of different tables, depending on the particular piece of data.

For example, say we have an “Invoices” table.  We have a multi-tenant system, and our customers often base their invoicing on different things.  Some might base invoices based on each individual service they performed for their clients, while others might base them on the amount of time logged in the time and expense module for a client.  In each case, the invoice database record needs to point back to a record in the table that’s most relevant, given the customer’s business processes.  Another example of this kind of relationship might be audit records, which might point back to just about any other table in the system.

Since these “polymorphic” associations are not defined as proper foreign keys in the database, those data generation tools wouldn’t be able to figure out that the columns were foreign keys at all, and as far as I’ve been able to figure, it’s not possible to define foreign key relationships with a number of different tables manually.  And even if it were, I don’t think I could prevent the tool from associating and invoice from a company that bases it’s invoices on services performed with a time and expense entry.

There are a couple of ways that our developers cope with this, neither of which are ideal.  The first, which most of our team members use, is to develop against one of several shared database instances on our data tier servers.  The problems associated with using shared databases for development are well established; developers simply can’t be as productive when stepping all over each other with data and schema changes.

The second, which I use, is to keep an instance of the database on an external hard drive.  This keeps me isolated from the changes made by other developers, and it’s a significantly better experience than using a shared database, but problems start to crop up when I get latest from source control.  Developers will check in source code changes that require data or schema changes in order to work, and my local database won’t have those changes.

So, at the end of the day, the only reliable way to get an up-to-date copy of the schema is to restore a database from the last backup of production.  Since the database is so big, that restore takes multiple hours, which can seriously impede the development process.  This actually impacts developers using shared databases even more than me, because when one of those shared databases has to be refreshed, multiple developers are put out of commission.

The only way I’ve thought of to make this a little better is to manually create a script that will cherry-pick a certain number of rows from what’s essentially the “root” table of our database, and spider out to include all the data related to those cherry-picked rows, while also including all rows from the tables that contain static data.  The end result would be a much smaller test database that contains a meaningful subset of production data that could me moved around and refreshed in minutes or seconds rather than hours.  The problems with this idea are that it would be onerous to create the script in the first place, since our database contains over 500 tables, and keeping the script up to date with any changes to tables or columns.

I wish there was an easier answer to this.  I have a few ideas in the back of my head about writing a tool that might help me create those scripts, but I think it would still end up being a very manual process of defining relationships that only a human being with knowledge of the system would be able to come up with.  If any readers have experience with this kind of thing, I’d love to hear how you dealt with it.

TFS Installation: No Longer Rocket Science

I think one of the best things that I’ve observed in playing around with TFS 2010 was how easy it was to install.  This was a pretty big hurdle in previous versions, but 2010 has installation pared down to a “Next, Next, Finish” level of complexity in some simple scenarios.  In particular, the “Basic” installation, which doesn’t include the SharePoint or Reporting Services components, is brain-dead simple.

In addition, TFS can now be installed on client OSes (Vista and above), and use the free SQL Server Express.  It will even go so far as to install SQL Express for you if you don’t already have it installed (you probably already do if you’ve installed Visual Studio).  You can download Beta 2 of TFS 2010 from here.

However, if you don’t want to sully your pristine machine with Beta products, there’s a fully configured Virtual PC image available for download here.

In other words, it’s pretty trivial now to try out TFS yourself if you’re stuck on SourceSafe and are looking to try out all the other mainstream options, or (like me) if your shop’s already using TFS and you’d like to try your hand at some of the administrative features that are behind lock and key.

A Plan for 2010

Everyone on my blogroll is taking the opportunity of the new year to take stock of 2009 and make public plans for 2010, so I thought I would join in.

In 2009, I made an effort to start speaking a little bit more.  I spoke at my first code camp, the Northwest Arkansas Code Camp, on the Harding and LSU campuses for recruiting trips for my employer, as well as the Baton Rouge .NET User Group.  I hope to continue ramping that up in 2010.  I’m presenting at my home meeting, the Shreveport .NET User Group, later this month, and I have a trip scheduled to the northwest Arkansas area in March.  I’d like to make it down to southern Louisiana at some point, too; the guys down in Lafayette and New Orleans in particular need some love.

I also need to step up in promoting the SDNUG this year.  I felt like I was able to kind of coast for a lot of 2009 with our current set of attendees, and didn’t really make as much of an effort as I could have to make more people aware of the group’s existence.  I’ve already begun to remedy that this year by seeking out area software companies and making phone calls, which will hopefully yield some more members.  I need to get on top of speaker scheduling, too, both within the group and without.  There are several group members that I think would make great presenters, I just need to convince them that it’s a good idea!

I want to start blogging more, as well.  One post a month just isn’t where I want to be.  I think having a regular schedule will help with that, so I’m pledging right now in public to post something on my blog at least once a week.  That plan may take a hit when my daughter is born in February, but I’m going to give it my best effort.

I think something else that may help with blogging is having an area of focus.  I’ve really been interested in application lifecycle management lately, especially after all the stuff I saw at PDC, so at least for now, I think I’ll focus on Team System and TFS for a little while.  I’ve been listening to the back catalog of Radio TFS, which has been great, and I plan to seek out other additional resources for ideas. I’ve recently installed TFS on a virtual machine to play around with, so hopefully that will lead to some ideas, as well.

I hope you had a great 2009; here’s to 2010!