The Thinker

The Coding Vacation

I must be a nerd. I am spending a significant part of my two weeks off programming.

While theoretically I can program at work, as a manager I usually do not have the leisure.
Therefore, I do it on my vacation as something of a hobby. The rest of the year, I cannot seem to work up the energy. For me to find joy in programming, I really need plenty of time without major interruptions or distractions. I also need to a project that might actually turn into something meaningful, both for myself in the form of knowledge, and for a community of people.

My first project involves another phpBB modification. phpBB is forum software written in the (ta da!) PHP programming language. Two years ago, I created a popular E-Mail Digests modification for phpBB. It sends out a customized digests of posts on a phpBB forum. It became so popular that it became something like a second job for which I was not compensated. Fortunately, a small community of phpBB open source zealots eventually took over the project. What a job they have done! Their latest version 1.3 is amazing. For me it was a learning experience in the value of open source collaboration. If you have a good enough idea, others will often collaborate with you, or even take over the project when you get sick of it. If it is not interesting enough it will simply join the voluminous list of dead open source projects, such as can be found at SourceForge.net and Freshmeat.net.

This new project is a variant of my E-Mail Digests modification. It is also considerably smaller in scope than the E-Mail Digests modification. As readers know, I am fascinated by syndication technology. The way I learn a new technology best is to try to build it from the ground up myself. There are a variety of competing news syndication formats out there including RSS 1.0, RSS 2.0 and Atom. Each format has its evangelists but in practice, all are being used. Most newsreaders can read any of the formats.

A half dozen or so phpBB modifications have already been created to allow phpBB forums to be syndicated using these XML protocols. phpBB sites allow forums to be restricted to special user groups. You probably do not want the Google search engine scanning these protected forums. However, with the emergence of news feeds, many people would like to get access to these restricted forums using a standards complaint newsreader. As best I can tell, none of the phpBB syndication modifications keeps users out of protected forums. In that sense, they are violating the architecture of phpBB. I made it my mission to figure out a secure way to allow authenticated members of a phpBB site to get content in protected forums as a news feed.

The trick is to authenticate the user. How to do this when news feeds are accessed as a simple URL? The solution was more challenging than I thought. Somehow you have to put in the URL both your username and password. That is not secure. You do not want your password bookmarked as part of the URL. phpBB helps by actually encoding the password into its database. However, it is easily decoded by simply studying the phpBB source code and writing a simple PHP script. Consequently, the password has to be further encrypted somehow.

My solution was not elegant, but it works. I mixed the password in the database with the database password using a hash algorithm. The database password is about the only thing guaranteed to be unique on a phpBB site, so it was a good encryption key. While it does not meet NIST encryption requirements, it is reasonably secure. I even figured out a way to roll in the user’s IP address into the authentication parameter. I made this optional, but if they select it, the news feed will only be served to the originating IP address.

I also wanted to dig into the mechanics of the various syndication protocols. I even bought a book to help. I came away from the experienced thinking that each protocol has its virtues and that people should pick the protocol they need. RSS 1.0 appears to be the most extensible, yet most complex. RSS 2.0 seems to have the most built in features. Atom seems the most thought through. I try not to reinvent the wheel if possible. I hunted for existing PHP classes that created news feeds. The best I found was this class, extended by many authors. You simply pass as a parameter to the class the type of feed you want. As good as it was I discovered a few minor problems. For example, the Atom 1.0 news feed did not properly validate HTML content. That was easily solved by modifying one line of the class. In addition, the class ignored some limitations of RSS 0.91, including the 15-item limit and length limits for the contents of certain tags. I coded around these too, and used FeedValidator.org to make sure my feeds were valid.

I will likely keep tweaking the modification in the weeks ahead, but eventually it will be provided to phpBB as a modification. Whether it will be embraced like the E-Mail Digest modification remains to be seen. My payoff is that I now know that I truly understand news feeds. What I have learned is that creating news feeds is pretty darn easy. Therefore, there is little reason not to keep promoting them where I work. I get regular requests from the public requesting our data in a news feed format. Since it is easy to do, I will try to allocate some of my team’s time for the project. (This is always a challenge since they are asked to do too much.)

Similarly, I am fascinated by web services. Web services allow data encapsulated in XML to be requested like an Application Programming Interface (API) over the Internet. Arguably, news feeds are a low-tech web service. This observation alone has been one of my biggest discoveries. While there are times when the orchestration of web services are needed and consequently you need SOAP protocols, 95% of the time low tech XML over HTTP is fine. Rather than reinvent the wheel, if possible serve your content by riding on top of an existing XML technology. RSS and Atom news feed formats are obvious examples, since there are so many consuming applications out there.

I also spent about a day trying and succeeding, in using server side XML technologies. Could I get data out of a database as XML? Could I then transform the XML data using XML style sheets and serve it as HTML? In short, could I replace the traditional plumbing in server side scripting languages to use these new methods?

Yes, I could, but it sure was neither elegant nor efficient. Since I am comfortable with PHP, I did not venture into the Perl, Python or Java worlds. The first problem was that MySQL has no inherent way to render SQL queries as XML. I am sure that will come in time. Therefore, I went hunting for a class library to do just that. I found a number of solutions, all very primitive. The XML that came out was rather inelegant, but at least it was XML. The problem turned out to be transforming the XML into HTML using a style sheet engine. The one currently built into PHP was primitive. It did the job but it required both the XML and the style sheet to be in files. It would be much more efficient to do it all in memory. So while viable, it hardly looks like a way to save CPU and bandwidth. There are already elegant solutions like Hibernate for the Java world, but those of us in a LAMP environment are still quite constrained. I am sure these solutions will appear in time. As I remarked a couple months ago, ActiveGrid looks like one such solution.

I do not see much value in doing all this work on the server side. It strikes me as inefficient. Therefore, I would let the other scripting template engines, or the application server for those who can afford one, do the dirty work using much more efficient methods. However, when transforming database information into something more than HTML, then these tools should be given heightened consideration. If, for example, a business were to implement a true SOAP compliant web service then these technologies should be used. For an ordinary dynamically driven web site serving HTML, the overhead of transforming data from a relational database into XML, then transforming it again into HTML is simply not worth the hassle.

With six days left in my vacation, I hope to keep experimenting with web services, and to carry my practical knowledge forward into my work during 2006.

 

Leave a Reply

 

Switch to our mobile site