Today’s Washington Post highlights problems with the new healthcare.gov site, the website used by citizens to get insurance under the Affordable Care Act. The article also talks about the problems the federal government is having in general managing information technology (IT). As someone who just happens to manage such a site for the government, I figure I have something unique to contribute to this discussion.
Some of the problems the health care site are experiencing were predictable, but some were embarrassingly unnecessary. Off the top of my head I can see two clear problems: splitting the work between multiple contractors and the hard deadline of bringing the website up on October 1, 2013, no matter what.
It’s unclear why HHS chose to have the work done by two contractors. The presentation (web-side) was done by one contractor and the back end (server-side) was done by another. This likely had something to do with federal contracting regulations. It perhaps was seen as a risk mitigation strategy at the contracting level, or a way to keep the overall cost low. It’s never a great idea for two contractors to do their work mostly mindless of the other’s work. Each was doing subsystem development, and as subsystems it’s possible that each worked optimally. But from the public’s perspective it is just one system. What clearly got skipped was serious system testing. System testing is designed to test how the system behaves from a user’s perspective. A subset of system testing is load testing. Load testing sees how the system reacts when it is under a lot of stress. Clearly some of the requirements for initial use of the system wildly underestimated the traffic the site actually experienced. But it also looks like in an effort to meet an arbitrary deadline, load testing and correcting the problems from it could not happen in time.
It also looks like the use cases, i.e. user interaction stories that describe how the system would be used, were bad. It turned out that most initial users were just shopping around and trying to find basic information. It resulted in a lot of browsing but little in the way of actual buying. Most consumers, particularly when choosing something as complex as health insurance, will want to have some idea of the actual costs before they sign up. The cost of health care is obviously a lot more than just the cost of premiums. Copays can add thousands of dollars a year to the actual cost of insurance. This requires reading, study, asking questions of actual human beings in many cases, and then making an informed decision. It will take days or weeks for the typical consumer to figure out which policy will work best for them, which means a lot of traffic to the web site, even when it is working optimally.
The Post article also mentions something I noticed more in my last job than in my current one: that federal employees who manage web sites really don’t understand what they are managing. This is because most agencies don’t believe federal employees actually need experience developing and maintaining web sites. Instead, this work is seen as something that should be contracted out. I was fortunate enough to bring hands on skills to my last job, and it was one of the reasons I was hired. In general, the government sees the role of a federal employee to “manage” the system and for contractors to “develop and maintain” the system. This typically leads to the federal employee being deficient in the technical skills needed and thus he or she can easily make poor decisions. Since my last employer just happened to be HHS, I can state this is how they do things. Thus, it’s not surprising the site is experiencing issues.
Even if you do have a federal staff developing and maintaining the site, as I happen to have in my current job, it’s no guarantee that they will all have all the needed skills as well. Acquiring and maintaining those skills requires an investment in time and training, and adequate training money is frequently in short supply. Moreover, the technology changes incredibly quickly, leading to mistakes. These bit me from time to time.
We recently extended our site to add controls that give the user more powerful ways to view data. One of these is a jQuery table sorter library. It allows long displays of data in tables to be filtered and sorted without going back to the server to refresh the data. It’s a neat feature but it did not come free. The software was free but it added marginally to the time it took the page to fully load. It also takes time to put the data into structures where this functionality can work. The component gets slow with large tables or multiple tables on the same page. Ideally we would have tested this prior to deployment, but we didn’t. It did not occur to me, to my embarrassment. I like to think that I usually catch stuff like this. This is not a fatal problem in our case, but it is a little embarrassing, but only to the tune of a second or two extra for certain web pages to load. Still, those who have tried it love the feature. We’re going to go back and reengineer this work so that we only use it with appropriately sized tables. Still, the marginal extra page load time may be so annoying for some that they choose to leave the site.
Our site like healthcare.gov is also highly trafficked. I expect that healthcare.gov will get more traffic than our site, which is thirty to 40 million successful page requests per month. Still, scaling web sites is not easy. The latest theory is to put redundant servers “in the cloud” (commercial hosting sites) to use as needed on demand. Unfortunately, “the cloud” itself is an emerging technology. Its premier provider, Amazon Web Services, regularly has embarrassing issues managing its cloud. Using the cloud should be simple but it is not. There is a substantial learning curve and it all must work automatically and seamlessly. The federal government is pushing use of the cloud for obvious benefits including cost savings, but it is really not ready for prime time, mission-critical use. Despite the hassles, if high availability is an absolute requirement, it’s better to host the servers yourself.
The key nugget from the Post’s article is that the people managing these systems in many cases don’t have the technical expertise to do so. It’s sort of like expecting a guy in the front office of a dealership to disassemble and reassemble a car on the lot. The salesman doesn’t need this knowledge but to manage a large federal website you really need this experience to competently manage your websites. You need to come up from the technical trenches and then add managerial skills to your talents. In general, I think it’s a mistake for federal agencies to outsource web site development. Many of these problems were preventable, although not all of them were. Successful deployment of these kinds of sites depends to a large extent on having a federal staff knowing the right questions to ask. And to really keep up to date on a technology that changes so quickly, it’s better to have federal employees develop these sites for themselves. Contractors might still be needed, but more for advice and coaching.
Each interactive federal web site is its own unique system, as healthcare.gov certainly is. The site shows the perils of placing too much trust in contractors and in having a federal managerial staff with insufficient technical skills. Will we ever learn? Probably not. Given shrinking budgets and the mantra that contracting out is always good, it seems we are doomed to repeat to these mistakes in the future.
Don’t say I didn’t warn you.
(Update: 10/28/13. This initial post spawned a series of posts on this topic where I looked at this in more depth. You may want to read them, parts 1, 2, 3 and 4.)