Why Make Up Test Data? Snag Some Government Data

… and by government data, I mean the mountain of data recently made available by the G-Men on Data.gov.  This site contains what must be terabytes of data on every topic from environmental measurements to crime statistics, from geographical data to labor statistics.  The Obama administration has committed to greater transparency, and the availability of this data is a significant step toward that goal.  The trendy geek magazine Wired.com recently did a feature on Data.gov that is worth reading.

It’s obvious that Data.gov is an immature portal.  Delivery types are inconsistent – some files are available only as flat files, others as only Excel, and a few claim to offer XML feeds.  The formatting can vary wildly from one set of data to the next, and often includes headers and footers which muddy up otherwise clean raw data files.

So why should you, as a database professional, care about this information? If you’re trying to improve your skills in database technologies (and especially in this economy, who isn’t trying to improve him/herself?), this data store is a great place to start.  Because of the sheer size and sometimes unusual layouts, this information is an excellent test bed for honing one’s skills at Integration Services, Analysis Services, or for creating VLDBs (very large databases) on which to practice.  And if you’re truly ambitious, there’s a contest to come up with the best application of this data, with a $10,000 bounty to the winner.

As for me, I’m currently pulling down some FBI crime data with the intention of using it in an upcoming SSIS class I’m presenting.  Perhaps I’ll think up an app that could win the $10K as well….

Social Networking in the SQL Server Community

Like many SQL Server (and other technology) people, I utilize several social networking vehicles to stay in touch with others in the SQL community.  Some of those who don’t use Facebook and Twitter (among others) have expressed skepticism of the real value of social networking.  Does it work? Does it provide any value beyond entertainment? Can it actually help your career?  I believe the answer to all of these queries is Yes.

I read a post by Chuck Boyce Jr. earlier today.  Chuck, a fellow SQL tweep, found himself suddenly looking for a new job.  He posted a brief message on Twitter indicating that he’s looking for a new opportunity, and almost immediately, his tweet was re-tweeted by at least two dozen others (could be many more – those were the ones that I observed from my list of friends).  Now I don’t know Chuck or the quality of his work, but I do know from his online activity that he cares enough about his career to share knowledge through Twitter and his blog.  As a result of that “relationship”, I was happy to share his job quest with those who follow me

We all have a relatively small geographic circle of associates, and a few more that we know through professional associations past and present.  I’ve found that social networking is an excellent way to broaden that reach.  I’ve asked a number of quick questions on Twitter, and have always received feedback, usually in a matter of minutes. 

I’ll bet if you ask Chuck in a few weeks, he’ll tell you that his virtual network generated leads which wouldn’t be otherwise available to him.  And it’s those kinds of favors that aren’t soon forgotten, and turn beneficiaries into benefactors down the road.

Being The Best vs. Being Affordable

I read a post on Brent Ozar’s blog last week that discussed employers’ expectations when hiring new team members.  Though the story was specific to database professionals, the same principles apply to almost any hiring situation.  The moral of Brent’s story is that when hiring, just like in real life, you have to compromise what you may really want to stay within the budget you have to spend.  If you had an unlimited budget, you’d hire Paul Randal to be your DBA, Emeril to be your cafeteria manager, that Sham-Wow guy would lead the janitorial team, and every employee would have a corner office and lunchtime massages.  Most situations don’t lend themselves to that kind of financial freedom, so you settle for more affordable talent.

There’s a flip side to this, specifically from the perspective of the candidate.  Everyone who has sat for an interview worries that they’ll be passed over in favor of someone who is better qualified.  Only the most arrogant truly believe that they are the best talent money can buy; the vast majority of people have enough self awareness to know that there are others who are better qualified, smarter, and willing to work for less money.

For the job candidate, the takeaway from this is to simply be yourself.  Understand that the employer wants to find the best person for the job, but they’re operating within a certain budget, and they won’t make their decision on skills alone.  Don’t try to convince your interviewer that you’re Seinfeld if you’re closer to being Carrot Top, or even Ben Stein.  Be honest about your strengths and your weaknesses, and don’t try too hard to impress. Your transparency will be apparent to any interviewer worth his/her salt, and even if you’re not a fit for that position, you’ll make an ally for the next time an opening appears.

Transaction Rollbacks with Identity Values

Have you ever noticed unexpected gaps in sequences in IDENTITY columns?  Even though you’ve got transactions set up for your inserts and a no-deletion policy, you find that there are missing values in what should be an unbroken sequence of numbers.  The problem could be partially related to transaction rollbacks.

Conventional wisdom would lead one to believe that a rolled back transaction would undo the logged data changes to a table. While a rollback will remove the data rows included in a transaction, it does not reset the identity value to its previous setting.  We can see this demonstrated in a brief example.

First, let’s create a table with an identity column and insert some data:

CREATE TABLE #idtest
(
theID INT IDENTITY(1,1)
, theValue VARCHAR(20) NULL
)

INSERT #idtest
VALUES (‘Plane’)
, (‘Train’)
, (‘Automobile’)

Now we’ll check the identity value by running:

DBCC CHECKIDENT(#idtest)

Which should return 3, the current identity value of the table.  Next, we’ll start a transaction, insert a few rows, and the roll back our changes.

BEGIN TRAN

INSERT #idtest
VALUES (‘Hop’)
, (‘Skip’)
, (‘Jump’)

ROLLBACK TRAN

We just inserted three rows but rolled back the transaction, so the new rows were never committed.  However, if you check the identity value again, you’ll see it’s been incremented to 6 even though no new rows have been committed to the table.

This is actually intended behavior and not a bug in the product.  If you think through some concurrency scenarios, you can understand why identity columns would be handled in this manner.  Just an FYI, you can reset the identity value using the same DBCC command listed above, but you should make sure you understand the potential effects of doing so before you roll it out to your production systems.