Tim Elhajj

Off the Microsoft stack!


4 Comments

TF18017 or TF250044: having problems with the TFS project creation wizard?

Have you ever had this happen to you? You installed TFS with SharePoint for a colleague. You make sure TFS is running, make sure SharePoint is up. You add your colleague to the local administrators group. You add her to the TFS admin tool. You write her an email telling her where to find her fresh installation of TFS and you’re just about to pat yourself on the back, when you get a message from her saying she can’t run PCW. She’s getting an error message that looks like this one:

If you’re anything like me, you sigh mightily.

Why can’t she (user2) run PCW? What is this about!? I already added her to the Farm Admin group (see below), but that didn’t help with this error.

It’s a permission issue, but it’s not Farm Admin permissions that’s needed. The solution is to add the user in question to the SharePoint site at the collection level. So, for example, navigate here on the SharePoint site (not the SharePoint administration site, but the site where the portal for the team project is created):

http://sharepoint:80/sites/defaultcollection/default.aspx

Here is what I did:

1) Someone who already has permissions has to go to /sites/defaultcollection/default.aspx and share the site with the new user.

2) Once you add the user, go to site permissions (click the wheel) and then give the user “full control.”

If you click the name, it lights up Edit User Permissions and you can click Full control on the next screen.

And Walla!

Redmond\user2 can now run PCW with success!


6 Comments

How to Install SQL Server 2008 Management Studio Express for TFS basic

Two nice thing about SQL Server Express: its low-cost (free!) and ease of use. Often you don’t even install it: another program handles the install for you, as Team Foundation Server can with its basic install. But occasionally you may find you need to do some management tasks with your database backend. For example, you might want to back up your data prior to upgrade. For situations like that, you can download and use SQL Server 2008 Management Studio Express (SSMSE) to manage SQL Server Express.

This post describes how to install SSMSE.

If you’re installing SSMSE on Windows 7 or Windows Server 2008 R2, you get some crazy program compatibility error that looks like this when you run the installer:

You can safely ignore this message. From the page were you download SSMSE, you will find this message (in the middle of page):

For Windows 7 and Windows Server 2008 R2, the install process displays the “Program Compatibility Assistant” dialog indicating that you must apply SQL Server 2008 Service Pack 1 or later. Select the option to “Run Program” to continue. Future releases of Microsoft SQL Server 2008 R2 Management Studio Express will not have this problem.

An auspicious start, but it is a free tool! Let’s move on:

Required Permissions

To perform this procedure, you must be a member of the Administrators security group on Windows server.

To install SQL Server 2008 Management Studio Express
  1. Launch the install program you downloaded. If you get the compatibility error, choose Run Program.
  2. On the SQL Server Installation Center page, choose Installation, and then choose New SQL Server stand-alone installation or add features to an existing installation.
  3. On the Setup Support Rules page, choose OK.
  4. On the Setup Support Files page, choose Install.
  5. On the Setup Support Rules page, choose Next.

    Tip
    : A Windows Firewall warning might appear here, but you can ignore this warning. TFS automatically adds an exception for Windows Firewall during upgrade.
  6. On the Installation Type page, choose Perform a new installation of SQL SErver 2008 and then choose Next.

    Tip: I tried to add features to my existing instance of SQL Server Express, and the installer wouldn’t give me the option to install the tool. 😦
  7. On the Product Key page, choose Next.
  8. On the License Terms page, accept the license agreement and choose Next.
  9. On the Feature Selection page, select the check box for Management Tools – Basic and then choose Next.
  10. On the Disk Space Requirements page, choose Next.
  11. (Optional) On the Error and Usage Reporting page, specify whether to send information about errors and then choose Next.
  12. On the Installation Rules page, choose Next.
  13. On the Ready to Install page, review the information, and then choose Install.
  14. The Installation Progress page shows the status of each component. Choose Next.
  15. On the Complete page, choose Close.


Leave a comment

What Does Microsoft Do When the TFS Server Goes Down?

Over the July Fourth Weekend this past summer, six thousand of the world’s most talented developers at lost access to the their bugs, planning data, and most importantly, the source code for their software products. Microsoft’s own TFS deployment went down. Over twelve terabytes of data was suddenly inaccessible. Within five days all systems were back online and fully operational. The culprit was a piece of hardware: A failed SAN storage unit, recently upgraded to satisfy the ever increasing need for more storage.

This is the story of those five days and the lessons learned.

It started early Sunday morning, July 3, just past midnight with one of the SAN storage units started to go bad after routine maintenance. The corruption caused the TFS deployment go offline.

This TFS deployment is a souped up cluster of TFS application tiers behind a load balancer connected to another cluster of SQL Servers that carry the backend. If you’re familiar with TFS, you know the data is stored in Team Project Collections (TPC). Developer division at Microsoft has many TPCs, but there are two important ones (one for TFS 2010, the other for the current project TFS vNext) which are huge and stored on an attached SAN.

disaster1

The OPS team at Microsoft went to work immediately on the failed server Sunday morning. The current project (vNext) was the highest priority. They restored the TPC from backup but soon discovered corruption in the restored database. On Monday, the OPS team decided to restore the vNext TPC and made an important discovery: the transaction log backups that were supposed to occur every fifteen minutes, weren’t happening. This meant going back to the last full backup, which was taken the previous Friday (7/1). This meant data loss from over the weekend, which was fortunately a paid US holiday, so hopefully many developers were out celebrating America’s independence. While the OPS team restored data, the rest of the team prepared for the loss of data by clearing all version control caches on all instances of TFS Proxy and TFS.

Meanwhile, the OPS team began copying the 2010 TPC to the SAN. This involved a copy operation with ten TB of data, so it took a considerable amount of time. By Tuesday the ten TB backup had still not finished copying to the SAN, but it was causing time outs for people using the current project, which was already back up and running. The OPS team stopped the backup to the shared SAN and secured an isolated SQL Server, where they could house the 2010 project. Ops started the ten terabyte copy again. The additional time required to bring the 2010 project online without harming the health of the TFS sever for the rest of the team was worth it.

disaster2

Just when everyone was beginning to breathe, on Tuesday July 5, the vNext TPC went down again.

This time a whole host of network engineers, developers and architects from across the corporate structure were called upon—IT hardware teams, SQL IT teams, and SQL dev teams. The SAN was the main suspect. By Wednesday, the newly formed emergency response team of network engineers isolated and resolved the problem with the SAN. (Erin, what was it?) By Thursday mail went out that OPS would attempt to restore from the 7/1 backup again. By Friday everything was back to normal.

Three days of data was irrecoverably lost. The two most valuable TPCs to the division were unavailable for a few days. And it took a few days after the recovery was finally in place for all the mirrors to sync up and for the cube to refresh itself with accurate data.  Mitigating our losses, the period of time for which data was lost was over a US holiday weekend. And, of course, we discovered the problem with our transaction logs backups, which were supposed to occur every fifteen minutes. Now the current TPC has its own dedicated SAN storage. With the transaction logs getting backed up, we should be able to recover from a similar disaster to within fifteen minutes instead of three days.