Archive for the ‘Book’ Category

Customizable code: writing future-proof code

Saturday, January 19th, 2008

Before code can be customizable, it must be clear. But clarity is not enough, if you’re going to be using a codebase in multiple places.

Many open source projects excel at customization. People have enough different uses for an application that very few work perfectly out of the box for everybody. Most companies want to apply their branding to the software we use. Some people need an application localized and translated for their audience. Sometimes a company just needs a small change to make the software better fit their needs.

It’s relatively simple to customize any application, if you have the source code. What becomes a huge challenge is maintaining your customizations when the underlying software is updated. If the software is not designed with specific ways of customizing it, it’s going to end up being difficult to maintain, unless you have gotten your changes incorporated back into the original software.

Architecting for customization
Applications that are designed for customization have clear divisions of code. This can happen for several different areas:

  • Templates or Themes. Most people want to be able to change the look and feel of a web application. If it has a template or theme system, you can just create a new theme and turn it on. Upgrades can then happen without clobbering your changes.
  • Language. Most successful open source projects have separate language files containing all of the labels, instructions, menus, and other text the application shows. Many come with multiple translations, and accept others as people contribute them.
  • Add-ons, plugins, modules, and components. Content management systems like Joomla and Drupal are particularly strong at this. SugarCRM is, too. They have a well-defined way of adding new functionality to the application, keeping it self-contained in a separate unit of code that a site administrator can manage through the interface.
  • An override mechanism. Some programs make it easy to replace the default behavior with your own version. ZenCart does this well–you can take many different core files, copy them into a particular directory associated with your site, and change them to make it do what you want. Upgrades to ZenCart will still use your versions of the files, even if the underlying file changes.

When you’re customizing an application, all of the other aspects of quality code apply to your customizations, as well as the original code. Your add-on is faster and more secure if you use the application’s interface for retrieving data instead of including your own. Your add-on is more powerful, clear, maintainable, and reliable if it uses the application’s defined ways of customizing it.

While not all open source is designed to be customized, it’s a strong consideration we’re looking at when we evaluate a project. So what do you do if you need to customize something that’s core to an application?

Customizing software not designed to be customized
If you need to make changes to the core part of an open source project, you’re setting yourself up for a maintenance nightmare. All active server software has updates. No program is perfect. Somebody, somewhere, will find a way to crack into it, and if you have business data or unethical competitors or disgruntled customers or employees, you will get targeted eventually. In the security community, people publish vulnerabilities to programs, so that they may be fixed. That means if you’re using common software packages, somebody needs to maintain it.

If you’re using software designed to be customized, and all your customizations are outside of the core code, this is not a major problem. A system administrator updates the core software, and if any of your customizations break, your developers update your customizations. However, if you had to make a lot of changes to core files, you’re in trouble. You either need to re-implement the security fixes in your code, or re-implement your customizations in the updated code.

There are basically 3 strategies for minimizing these issues:

  1. Use strong source-code management tools to manage your changes as patch-sets, and re-apply them at each upgrade, rewriting sections that no longer work.
  2. Fork the project, and take over responsibility for managing your branch. You’ll need to track the vulnerabilities in the parent project, and re implement security fixes in your own.
  3. Contribute your changes back to the original project, and persuade the maintainer to incorporate them into the main code tree.

When you look at these alternatives, clearly #3 is far less expensive for you than the other two–your customizations are no longer customizations, but part of the core software. This is actually how open source develops, and how you may change from being an open source consumer to an open source contributor.

Clear code: Building understandable applications

Tuesday, January 15th, 2008

Programming is an exercise in understanding a problem. To program effectively, you need to fully understand, in intricate detail, the problem your program is solving. Sometimes as a programmer you don’t fully understand the problem until you’ve wrestled with it a few times in code.

Most experienced programmers will tell you that when creating a large program, you almost always have to scrap your work at least once. At some point, you find that you’ve programmed your way into a dead end, that you just can’t quite get where you’re trying to go without doing it again. This is part of the process of understanding the problem, and usually once you’ve made this leap, you can visualize the whole thing laid out before you, and the next go around leads to a useful, functioning program. Not only that, but the next go-around has a much higher percentage of clear, understandable code.

Clarity in code is a sign of the maturity of the application. It’s also a sign of requirements that haven’t changed from the original. Inevitably, in the real world, code accumulates hairy sections to deal with changing requirements, accreting moss, dirt, and all sorts of cruft as the real world steps in to make things messy. The more clear, organized, well-defined, and well-documented a code base is, the longer it will last in the real world before needing a major revision.

If you see a project that seems completely transparent, easy to figure out, and easy to change, you’re probably looking at code that has been through some serious revision, and has been recently refactored to reflect the problem it’s trying to solve. As long as the fundamental assumptions of the design do not change, clean code is easy to enhance, extend, and otherwise adjust to meet new requirements. Until it gets hairy again and is time to start again.

Clean code is elegant. Clean code is flexible. Clean code is related to powerful code, but code can be powerful without being clean.

Here are some principles we use to develop or identify clean code.

Use a good overall architecture for your application.
Like many other software companies, we use a Model-View-Controller architecture for most of our projects. The Model defines the problem space, what data needs to be stored, and how it’s broken down. The View is the human interface, the presentation of the software to the user. The Controller connects the model to the view, and often enforces authorization rules and the interface to other systems.

In our applications, the model is almost always object-oriented. We build up classes of objects that correspond to what we’re modeling. We like using template systems like Smarty for the view, so our designers and front-end coders can change the presentation without affecting core business logic. Our controllers are a mix of objects and functional code, whatever seems most appropriate for the overall system.

Normalize data as much as practical.
In database terms, normalization is the process of identifying all the properties of all the objects that have a one-to-one relationship to each other, that fit cleanly in the same database table. For example, a contact has only one first name and one last name, one father, and one mother (at least in the biological sense), but might have more than one email address, mailing address, and phone number. When modeling this data structure, you might decide to have one contact table that allows for 3 email addresses. Or you might have a separate email address table that allows any number of email addresses associated with a contact. If you were going to fully normalize this data, you would have separate email address tables, phone number tables, and physical address tables. But is this really practical? Does your particular system need to track all the email addresses of a user, or is one (or two) enough? If you can limit it to one email address, it might make a fine unique identifier for your system, if you know your users don’t share email addresses.

But if you’re going to track three contacts for a company, why not normalize this into a separate table, and remove the arbitrary limitation? I shudder when I see fields named “email1, email2, email3, email4.”

Each database table should be owned by a single class.
If you have a contact table, you should probably have a contact class to manage it. While other classes may query this table in a join, those classes should be getting only specific fields from the table. Only the contact class should write to the contact table, and in most cases, all requests for any contact details should go through the contact class. The rest of your application should talk to a contact object, rather than the underlying data, except when you’re trying to optimize for speed.

The main benefit of this approach is that you can more easily change the structure of your database tables with minimal impact to your application. If you decide that you really do need more than one email address for a contact, you can do most of the heavy lifting in the contact class, and only need to make small changes to the template to show the new data. The other parts of your application should be unaffected, because they simply request the default email address from your contact object–which is smart enough to know that’s now coming from a different table.

If you really need to do sophisticated table joins to make your application fast, consider setting up a query builder structure. We sometimes set up static methods on a class that modify the different parts of a query to add the desired fields and do the appropriate joins.

Define who is responsible for what.
I’m not talking about people here–I’m talking about classes, files, and functions. Just like classes in the model own particular database tables, you should define which part of the application is responsible for all of the major parts of an application: authentication, authorization, state, the structure of the URL, form handling, initialization, etc. Each one of these functions should be owned by a particular part of the application. This “meta” stuff about the system we usually leave in the controller, often with included files dedicated to particular features. We usually build helper methods into base classes inherited by all of our data objects in the model, specifically for state and authorization.

Authentication, verifying that a user is who they say they are, should be consistent across your application. You usually have people log in with a username and password. The problem is, because the Web is stateless, you need to verify that you’re still talking with the same user on every single request. To do this, you either use http authentication, which passes the same credentials with each request, or you give the browser a token that you match up in a session. Your web application needs to verify the session or credentials with every single request, if it does anything that you don’t want the Internet at large to be able to do.

Authorization, granting access to particular objects and methods for particular users, can be a bit more complicated. There are several different models for authorization: simple ownership, group ownership, user levels, and full-fledged access control lists. Authorization can either be handled by the controller or by the model itself. If the code is clear, it should be apparent where authorization is handled, and how it may be changed.

Small Pieces Loosely Joined.
Even more than powerful programming, clear programming means breaking things up into manageable, understandable chunks. Each class in the model should correspond to the objects in the real world you’re modeling. The typical method on classes in our models are usually between 5 to 25 lines of PHP code. Some reach 30 or 40 lines, and only the really ugly ones reach 100 lines. If a method is reaching that threshold, it can probably be broken into several smaller helper methods that make the main method more readable. If these helper methods can be reused by other methods, well, you’re killing two birds with one stone. More often that not, this level of refactoring distills the essence of the problem down into components that make your code more powerful.

Most of the long methods in our code seem to be related to form processing, parsing different parameters to insert or update data across multiple database tables. Through a combination of setting up property maps inside the object, clever getter and setter methods, and utility methods that iterate across relevant properties, these long methods can be decimated to a few calls that make the method much more portable, resilient to bad data, and more easily overridden from subclasses, too.

Create effective documentation.
I’m just starting to get into the habit of creating JavaDoc/PHPDoc style of comments, documenting each function and method. I’m a long time user of the Komodo IDE from ActiveState, and it kindly shows you the comment immediately preceding a function you type, in a tooltip as you provide parameters. Being able to see what parameters your method is expecting, what it returns, and any gotchas about using it without opening the file containing the class, saves a lot of time during development. Those kinds of comments I consider to be required.

On the other hand, a comment that states the obvious is a waste of space. Comment anything unusual or unexpected. For example, if I assign a variable in an “if” expression, I’ll put a comment that I meant to assign it, that it’s not just missing the extra =.
if ($a = $b->value) // assigns value to $a, skips section if value is false

Related to inline code comments, use descriptive variable names, and consistent placeholders. I use $i, $j, $k for loops, $ar for generic arrays in helper functions, $obj for an unknown object, $t for a global Smarty template object. Otherwise I’m referring to $task, $oldtask, $project, $user, and $todotomorrow.

For complex projects, inline comments are not enough. You need a solid architectural document that illustrates objects and their relationships, workflow, and how to customize. Diagrams are good.

Finally, clear code is tidy code. While PHP isn’t as picky about tabs and whitespace as Python, properly nested code blocks promote readability, help keep your code valid, and gives you a quick indication about how deep you are inside a function.

Clear code invites customization, enhancement, and further development. Clear code is maintainable, and a sign that an application can likely be kept up-to-date for quite a while to come. Clear code takes more time to develop, but usually indicates a better understanding of the problem. Clear code is more portable, more reusable for other purposes, and more powerful.

Powerful code: Get more out of every line

Monday, January 14th, 2008

Programming borrows a lot from the construction industry. Many programming terms derive from construction: hacking, builds, development, architecture, scaffolding, frameworks, and dozens of others. But in some ways, programming has an element of power beyond construction.

Take, for example, a building. When you build a building, you start by pouring a foundation. On top of that, you construct a skeleton, add walls, a roof, sheetrock, siding, and all the plumbing and electrical. Each one of these details needs to be built by somebody–all four walls of each room needs to be framed in, wired, and finished.

In the world of programming, however, you really only need to build one wall, and then the computer can create as many copies as you need. So when building your program, you might create a “wall” class, which is comprised of a bunch of two by fours, sheathing, sheet rock, wiring, and outlets. You might give your wall a set of properties: width between studs, overall width, overall height, position of outlets, the number and dimensions of windows and doors, etc.

Once you have a wall defined with a bunch of appropriate variables, you can then work up to defining a room. Your room might have four walls, with windows and doors in particular positions. Obviously, there’s new levels of complexity here, but you don’t have to build every single wall if you can just specify a new wall with particular characteristics.

Now that we have a generic room, we can extend our room model by creating specific types, or sub-classes, of rooms: bedroom, bathroom, kitchen, utility room. And then we can define an apartment as a particular combination of rooms, and an apartment building as a particular combination of apartments.

A powerful program is one that allows you to say, “give me an apartment building with this many apartments of this base floorplan, and put it here.” A few lines of code specifying any details that vary from your standard, and you’re done with the basic system–you can start creating custom trim.

Object-oriented programming is powerful because it lets you start with a basic model, and extend it to create variations. Each variation (or subclass) inherits all the hard work that went into the underlying class, but adds only the details that make it different. The bathroom extends a generic room by adding plumbing and fixtures.

To me, this ability to inherit properties from other objects is the main reason to write object-oriented code. Some languages (like Java) force you to do everything in an object-oriented way, which strikes me as less practical–you need to find design patterns that work with that model to accomplish what you’re trying to do. But object orientation provides a powerful way of modeling a system.

When I review code, I’m looking for object orientation used in an effective, sensible way. Each real world object being modeled in a system should have a corresponding class in the underlying system. Classes should extend some basic data class to avoid repeating the same methods in a bunch of separate classes. Code should be built up into units that can become parts of other units, so that individual chunks can be kept small and understandable. If any PHP file ends up longer than a thousand lines, I start looking for ways of simplifying, streamlining, sharing code with other modules. If any individual method ends up longer than a hundred lines, it should be doing something extremely unusual that isn’t necessary anywhere else.

The Unix architecture is often summarized as “small pieces loosely joined.” Each identifiable chunk should be small and have a clearly defined purpose. Assembling these small pieces into a larger system results in great power while also allowing for reliability, security, and actually getting the project finished.

It’s all a matter of scope. When you’re looking at a wall object, you are working with two by fours, nails, and sheetrock. When examining a room, you’re working with walls, a ceiling, and a floor. Programming should hide the details of lower layers, and allow the programmer to focus on the necessary detail for the scope of the module she’s working on. The result is powerful code.

Why would you not need powerful code?
Pascal (and many others) is credited with the idea that it takes longer to write shorter code. This series of blog entries certainly illustrates the concept… The same principle holds true in code. If you’re creating a web application that’s never going to need revision, it can be much quicker to just write as you go and end up with some big long pile of spaghetti code. The instant you need to change it, or worse, somebody else needs to change it, fast, long-winded coding takes a lot more time to update.

As far as I’m concerned, the only reason to not take a structured, measured, powerful approach to coding is that you need something temporary working today, and don’t care that you’ll probably have to scrap it and do it right later.

How do you create powerful code?
Powerful code comes from structure. Frameworks deliver structure. This does not mean a particular framework is powerful for your application.

A skyscraper needs a much stronger foundation, and far better design to prevent collapse than a house. In programming, you can either use somebody else’s framework, or build or grow your own.

Developers love building frameworks. It’s fun to think of all the things that people might someday do with your framework, and build in a mechanism that provides useful ways of doing those things. The problem is, build in too many features to the framework and you just end up with a large bloated blob of code that nobody uses entirely, that nobody even knows how to use properly. Make your framework too small, and people end up having to do more work in the actual application.

The hot framework right now is Rails. It has a lot going for it–a solid philosophy of convention over configuration, auto-creation of all sorts of things like database tables you otherwise have to build yourself, and other features I’m sure you’ve heard about already from all the Rails developers out there.

Personally, I think frameworks like Rails are overrated, hiding too much of the implementation to be valuable. The perfect analogy for this is photography. If you take a basic photography course, you learn about the basic fundamentals: lens focal length, aperture, shutter speed, focus distance, and film speed. That’s all you need to know to take great pictures with any camera–at least any that allows you to set these things manually. Most cameras these days try to automate all of this for you, and most of the time they do a reasonable job. But most cameras also have a whole set of special settings. My Casio has a “Best Shot” mode, designed to set the camera up for different scenarios: landscapes, portraits, evening shots, indoors, backlit, etc. Some of these modes do really sophisticated things, but is it better for a photographer to understand all the different programmed modes, or the fundamentals of photography? I would argue the latter–with an understanding of how photography works, you can operate any camera. With an understanding of the programmed settings of a particular camera, you’re lost as soon as you move to another.

That’s the problem with frameworks–you spend more time learning all the ins and outs and arbitrary ways of tweaking it, instead of focusing on the actual task at hand–taking good pictures. Then again, I prefer a stick shift to an automatic every time…

When it comes to frameworks, less is more. The simplest possible framework that fits your application requirements is the one to use. If you can’t find one that fits, start with some simple data objects, an effective template library, and build your own, but don’t spend too much time on it–let it grow as you need it.

In the grand scheme of things, I don’t need a framework to create a database table for me–that’s a lot of extra code for something that only happens once. But for all those things you do need more than once–for the walls, rooms, and apartments in your building, design with care and power in mind.

For more about power, go read Paul Graham’s essay, Succinctness is Power. Then follow it up with Holding a Program in One’s Head.

Fast code: Speed and Scalability in PHP applications

Sunday, January 13th, 2008

Continuing on the series, the next item on the list seems to be the mistake I see the most–putting slow code in loops, loading up things that don’t need to be loaded, making simple requests expensive.

In terms of processing time, it’s expensive to open a database connection. It’s expensive to connect to another computer. It’s expensive to load up a big framework to respond to a single request. It’s relatively cheap to retrieve a pre-constructed page out of a cache.

The single biggest mistake I see that kills performance in code is putting database calls inside a loop. One code project we picked up had display code that showed the results of a search. First, it did a search to identify all the matching rows in the database. Then it looped through that result set, grabbing the rest of the data for each individual row, one query at a time. Then it cut down this set to the page size, discarding all that data it had loaded up. If the search yielded over a thousand results, it took over a minute to run! All of this data could be loaded with a single smarter database query–and doing so made the same search practically instantaneous.

This type of performance penalty is the main reason I don’t care for frameworks all that much–they often trade performance for programmer convenience. This is fine when your site is small, but leads to a lot more optimizing work down the road if your site takes off. And while good frameworks can turn result sets across objects efficiently, it usually takes learning how to make the framework do this in the first place–which means that programmers are better off learning how to do all of the work themselves before using a framework so they understand how to avoid these problems.

So here are some principles I use to make PHP applications speedy from day one.

Get as much data from each database query as you possibly can–but not much more
Unless a database table regularly contains a large blob we rarely need, go ahead and load up the entire row when creating a corresponding object. For example, in a project management tool, if asked to retrieve a task object, in my code you would provide a task id and you would get a task object pre-loaded with all the task object properties loaded with data from the database. While you can call getter methods to get individual properties, these do not result in yet another call to the database.

When retrieving arrays of task items, I usually provide a static search method that does a single query getting all of the data for all matching rows, constructing each task object, and passing it the already retrieved data so there are no further database calls–request the first 30 matching tasks, and the system still only does a single query on the database.

Doing a database query is expensive, but making a sophisticated query doesn’t add much to that as long as the database is properly indexed. When you know you have to do one, wring as much data as you can from each query. Use JOINs and database functions to do as much work as you possibly can in a single query.

I’m not that big a fan of stored procedures, mainly because I haven’t learned how to manage them effectively across deployment instances. Make a change to a code base, all you have to do to get it elsewhere is commit it to the repository and update your working copies. Make a change to a stored Postgres function, and you need to manually replace the function using psql or some other tool. But a stored function can be a way to offload more processing to the database, possibly gaining some performance in the process.

In general, I think of the database being in a separate silo than the business logic. The requests between these silos are what’s expensive–the processing in one side or the other is less so. Minimize the number of times you switch, and your application will be faster… As a side benefit, when your traffic outgrows what a single server can handle, and your database calls are actually on a different server, you won’t need to rewrite your application.

Avoid repeating yourself
Cut and paste when programming is a bad thing. Stepping through my own code with a debugger often reveals areas where I do the same thing twice. I loop through an array in one method to calculate some value. Then I loop through the same array somewhere else to perform some other operation. While loops can be fast, if you’re manipulating large objects or arrays, you still want to minimize this wherever you can. Sorting is expensive–wherever you can, let the database pre-sort your results for you. Look for opportunities to leverage work you’re doing in one part of your application to do double-duty and handle the task you’re doing elsewhere.

Writing code is a lot like writing anything else–it takes time to distill down to the essence. Early drafts can be much wordier than later drafts. If you have the time to go back and consolidate the areas of work, you’ll get a small performance benefit out of this.

Out of this list, this item is the least important. Try to consolidate as much as you can the first time through your code, but caching will far more than make up the difference. These are the slight improvements to save for future revisions–but if you see an obvious opportunity to combine and simplify code, take it.

Use Lazy-Loading wherever it makes sense
If your application needs to hit the database on every single request, go ahead and open a database connection early. If on some requests your application just returns static data, save a tiny bit of processing and skip the database connection. On a few projects, I’ve written code that connects to multiple databases, so I’ve written a simple stub class that maintains a singleton database connection object. In every method that connects to a database, it calls the static method that returns the database connection object, creating and establishing it if it doesn’t already exist.

We program extensively with Smarty, and in some projects use Smarty’s caching system. When used with a lazy-loading design, it’s extremely effective at speeding up page views. In our “standard” architecture, we have a controller stub that the browser requests. This stub examines the request, identifies the view and the data objects to load, and sometimes creates controller objects to handle specific requests. However, if you’re using a caching system, you need to check for a cached version before doing any of this processing. Either check the cache at the top of your controller, or move your controller itself into a file that’s loaded by a Smarty template. By having the template load the controller and decide what to do next, that processing never happens if Smarty retrieves the cached template instead.

Now that we program a lot with Ajax, we no longer automatically create a Smarty object for every request–first we check whether we’re returning HTML, XML, JSON, or something else, and only create the Smarty object for particular types of views.

These are examples of how we use lazy loading to avoid loading large chunks of code or establish database connections we never use.

Plan early on for caching
When you first launch your application, you probably don’t need caching because you’re not getting that much traffic. Some applications only run in private networks and never need to do any caching. But if you’re building a Facebook application or expecting huge amounts of traffic someday, create strategies for caching early on.

As I mentioned earlier, Smarty does this extremely well. You need to provide a way to uniquely identify an item in the cache, and Smarty will do it for you. Just make sure you check for the cached version before doing a lot of extra processing.

Without Smarty, it’s relatively easy to use output buffering to capture the output of your code and store it somewhere for later retrieval.

Many projects designed for traffic have simple switches you can just turn on to take advantage of caching, including Drupal and Joomla. After caching as much HTML as possible, the problem turns into more of a system administration project–installing an opcode cache like eAccellerator can help your server handle 30-40% more traffic, in our experience. These systems essentially compile your PHP to get more speed, and cache the result.

The next level of caching, for truly large sites, is using a system like memcached. Memcached provides a system for distributing a cache across multiple data servers, so for the truly large sites, the problem starts involving developers again. PHP provides a memcache module you use to store and retrieve your pages in memcached. When your site outgrows what can be run on two servers, it’s time to have your system administrators set up a memcached cluster and rewrite your application to use it.

Avoid over-engineering your application
I inherited another project gone awry that had started with some really huge, complicated framework that seemed half-done. Most projects we’re called in to complete involve spaghetti code, mixed logic and presentation, and no clear architecture. This one, in contrast, was over-engineered for the problem. To figure out how the code worked, I ran it through a debugger. To get to my main class for a particular object, it ran through a series of no less than 8 inherited classes. And worse, some utility methods were copied between child classes, instead of being put once higher in the class hierarchy. I saw clear reasons for having 3 layers of inheritance in this application. Not 8.

Since then I’ve seen a few times where developers seem to create more inherited classes just because it seems like they should to be correct, not because there was any practical value in it. I rarely see the need for more than 3 levels of object inheritance, and never more than 4 (at least in a web application). When your application needs to open 20 files just to respond to a simple AJAX data request, that’s over-engineered. When you create an elaborate class structure just to avoid a simple function, that’s over-engineered.

There’s a scale here, from non-engineered spaghetti code to rigid, sophisticated frameworks. I suspect that most people without formal training start with spaghetti code and gradually learn how to create more structured code–while computer science majors start out with over-engineered structures and eventually loosen up in the real world after running their code through some profilers and realizing they don’t need all that complexity for a simple problem. Everyone over time, at least anyone with a knack for this stuff, ends up somewhere in the middle, with enough architecture to do the job–and little more. There’s definitely some variation here as a matter of taste, but there are measurable problems with either extreme.

I further suspect that Rails might be so popular now because a lot of web developers out there with no formal training are suddenly seeing the benefits of structured code and smart frameworks.

Keep in mind how expensive each operation is
Some actions take a while to complete. In our experience, the most expensive actions involve connecting to another server, especially ones not in the same data center. Keep these in mind when coding, and don’t do them if it isn’t necessary. For very expensive operations, especially when you need to do a bunch at once, consider forking a process using a call to the shell, or move to a maintenance routine called from a cron job.

Expensive:

  • curl to connect to another server
  • Other functions used to connect to remote servers: fopen, file, etc
  • domxml, SimpleXml on very large XML documents
  • Sending mail to multiple recipients

Moderately expensive:

  • Sorting on large arrays
  • Database connection to remote server
  • domxml, SimpleXML on medium-sized documents
  • Recursive functions

Somewhat expensive:

  • Individual database queries
  • domxml, SimpleXML
  • Creating complex objects
  • Loading large files

Inexpensive:

  • XML event-based parsers
  • Retrieving cached files
  • Loops on small arrays
  • Lookups in hashes stored in memory, retrieving constants

Do you have any other tips for writing fast PHP code? Please add a comment below…

Secure code: Understanding PHP vulnerabilities

Saturday, January 12th, 2008

There are many articles that cover PHP vulnerabilities, but I’ve run across a lot of programmers and code that seems oblivious to them. When interviewing programmers, I look for an understanding of these types of vulnerabilities, and how to prevent their programs from being vulnerable to them.

Aside from register globals issues, most of these attacks are not specific to PHP.

Register Globals issues
From early on, the developers of PHP had this great idea: accept any parameters passed from the browser, and automatically turn them into variables available in the code. Well, it turned out to not be such a great idea–it meant that improperly initialized variables could be seeded by attackers to potentially do all sorts of damage. Worse, sometime after PHP 5 came out, someone figured out that you could pass a particular variable that would load and execute any PHP file before running the actual code–and this file could be on a completely different server, in a regular PHP installation.

Most other web languages never offered this convenience–you have to retrieve parameters from a browser through a specific module or array. PHP now provides arrays like $_GET, $_POST, $_REQUEST that are simple to use, but make it so you need to specifically request the variable you want from your code.

Any code that depends on register_globals being set is completely broken, as far as I’m concerned. If it’s on a server with an older version of PHP, it’s just waiting to get cracked. Any developer that relies on registered globals is programming for 10 years ago, and needs some serious education.

The main point here is that software should never trust data coming from the browser. I don’t care how much validation you do with Javascript; you’d better double-check the request on the server, and make sure either you set variables before you use them, or work in functions/classes that are not in the global scope.

SQL Injection vulnerabilities
This is the next most serious issue, and it affects pretty much all web languages, not just PHP. The most common way to interact with a database is to use a language called “structured query language” (SQL) to select rows of data from the database, update data, insert new data, or delete things. Once you learn the basic syntax and structure, it’s very easy to use. The problem is, you nearly always depend upon the user to identify what data to retrieve, or to provide the data to add or change.

Once again, we can’t ever trust data from the user. Most databases accept more than one query at a time, and most information used to select rows in a database is wrapped in single quotes:

SELECT first_name, last_name, salary FROM employees WHERE first_name LIKE 'John';

Beginner programmers drop the variable containing the search from the browser into the query, wrapping it in single quotes: LIKE ‘$firstname’;

Attackers simply put a single quote in the field, and then add another SQL command to do something malicious. Like delete the entire database.

Now, when you know there might be a quote in the variable, you can escape it by adding a backslash in front of it. PHP actually does this for you automatically if you have an evil setting called magic_quotes_gpc turned on. That’s why you often see a lot of backslashes in forums, blog comments, etc by the way. But there are ways of getting around that, as well.

At a minimum, all variables used in a query should be escaped using a function known to handle all possibilities, usually those provided for the specific database engine. What I look for in code is someone using a database abstraction layer or interface that allows for parameterized queries: instead of putting the variables directly in the query, you create a query with placeholders (usually a question mark, ?) where variables are to be substituted, and then pass an array of variables. The abstraction layer handles all of the escaping for you, and you end up with much cleaner code.

We use PEAR::DB as a database abstraction layer in most of our projects. Others include ADODB, or PEAR::MDB. PHP5 provides a mysqli interface capable of this, as well. If I see a mysql_query command in general application code, it gets marked way down in my book.

Mail Header Injection
Many programmers don’t realize it’s not safe to use the PHP mail() function without special protection. I didn’t believe this was a vulnerable function until one of our clients got attacked with it. Basically, the mail() function on a Linux system is a wrapper to the system sendmail command. Sendmail takes a plain text email, looks for a To, CC, and BCC addresses, and sends the message on its way. The problem is, attackers can inject fake headers into the message that basically hijacks your server to send spam. Any field that ends up in the header of a message–to, from, subject, or any other arbitrary header you collect could be used for this purpose.

I haven’t tested to or subject recently–there may be some built-in protection for these fields now. But to set the from address of a message, you pass it in an array or a string to the “header” parameter of mail(). This is ripe for exploit. All the attacker has to do is insert a newline, and then they can supply their own bcc field with hundreds of email addresses to spam. PHP and the sendmail binary will happily spew your attacker’s message to hundreds of users at a time. The next thing you know, your server will get on a blacklist for spamming, and nobody on that server will be able to send mail to domains like AOL or Comcast and other places that actively reject mail from known spammers.

Some kind soul posted a function to filter headers and ignore anything after a newline character to the comments section of the PHP documentation for the mail() function (the PHP documentation, and the comments, are a fantastic resource, and one of our favorite features of PHP). We have a simple safe_mail function that runs all the headers through this function, which also makes for a convenient way to intercept mail on a test environment.

This one isn’t talked about that much, but a programmer that protects a mail function properly is an indication of an experienced PHP developer.

Cross-site scripting (XSS)
Cross-site scripting is the current favorite exploit of attackers. Unlike the other attacks, they’re not attacking your site directly but exploiting it to attack your visitors. Of course, if your visitors have access to an administrative interface on the site, they could then use this to attack your site.

The real problem is that cross-site scripting is a great way to spread spyware, and so many sites are vulnerable to it. MySpace was long a victim of XSS. Ebay, too. Basically any site that allows users to add content that is shown to other users is vulnerable to XSS, unless the application developer has taken specific measures to prevent this. In this age of social networking, that is a huge number of sites.

If an attacker can find a way to get a script into a page shown to others, there’s lots of things he can do. Sometimes it’s as simple as adding <script> and a chunk of javascript or a location to load a javascript from. Other times they will attach a mouseover event or some other devious place. Sometimes they insert an object or iframe containing their malicious content.

If they can load an arbitrary script of their choosing, they can view anything on that page and watch anything the visitor types into that window. That’s expected, defined behavior, and that’s not going to change. So at a minimum, they can get passwords to your site and from there, they can do anything on your site that an attacked user can do.

But they don’t start there. Both Internet Explorer and Firefox have contained vulnerabilities that allow an attacker to escape the sandbox of that browser window to be able to monitor other windows, or even at worst install malicious software on the user’s computer. That is how spyware is spread. And once they have their own malicious software installed on your computer, they own it–they can monitor every mouse movement and keystroke, they can use it to send spam or attack other computers or do whatever they want.

Cross-site scripting is diabolical. It doesn’t usually harm your site, because attackers don’t want you to know you’re carrying their malware. Application developers ignore these issues to the peril of the entire Internet…

Session Hijacking
Web applications differ from most other applications in that they are considered “stateless”. That is, the server does not know the state of anything the user is doing, and starts in exactly the same condition for every request. In most applications, however, you are working through some sort of process and what you do next depends on the action you take when you’re in a particular state. What actions you have available to you depends upon the state of the object you’re working with.

For example, if you’re working with a user object, it might have several states: “unconfirmed”, “logged in”, “not logged in”, “suspended”. For users that are suspended, the application would prevent access to private data. For users who are unconfirmed, the application might offer to resend a confirmation link. For users who are logged in, the application would provide access to appropriate parts.

In a web application, it’s up to the programmer to define these states and handle them appropriately–PHP has no internal concept of state at all. Every request coming into your application must do all the work of loading the appropriate objects, defining what state they’re in, and doing whatever action is necessary.

PHP and other languages do however provide a mechanism for keeping track of users, with something called a session. PHP basically provides an automatic mechanism for storing variables associated with a user session on the server, instead of the browser. Since as we know well by now, you can’t trust anything coming from a browser, a session is a much safer place to store critical data to help you determine the state of your application and not have to reconstruct it completely on every page. It’s especially used for logins.

The problem is, sessions can be hijacked. PHP and other languages use a cookie to store a simple unique identifier for the session in the browser, which the browser helpfully returns on every request. If the browser has been compromised (by a cross-site scripting attack, or spyware, etc) an attacker can read these cookies and pass somebody else’s session identifier into your application, and if you don’t protect against this, hijack the original user’s session.

That takes some effort, however. Much more of the problem is when a user turns cookies off. Back in the late 1990s/early 2000s, many users got completely paranoid that cookies identified them wherever they went on the Internet, and many applications help users manage their cookies. So this general paranoia about cookies actually makes the situation worse, because if the user turns off cookies, your application either needs to force them to reauthenticate, or allow the browser to pass their session identifier through another means.

PHP has yet another configuration parameter to automatically allow session ids to be passed via a GET request instead of a cookie. The problem is, when this is done, the session identifier becomes part of the URL in the browser address bar. Users then bookmark their session id, post it to their blog or a forum, do whatever with it they want. And if your application is not written to handle this, other completely innocent users may find themselves logged into your application under a hijacked session id!

Applications using sessions must use some other source to verify that the session corresponds to the right user. In some cases, it may be enough to just require cookies and not allow session identifiers to come through any other vector. In others, programmers may need to consider using http authentication or other methods to verify that they have the right user.

Session hijacking is one of the toughest vulnerabilities to manage, if you need to protect any sensitive data. Even if you don’t, the application should deal appropriately with accidental session hijacking, because it’s very common and easy for users to do.

Other vulnerabilities
The list doesn’t stop there, but those are the serious mistakes I see, sometimes on a weekly basis. It’s hard to write secure code, but starting with security as a mindset goes a long way towards preventing problems down the road.

To summarize, here are some general tips to keeping applications safe from these types of attacks. If I’m interviewing you for a programmer position, I will be asking you about these:

  1. Never trust input from the browser.
  2. Turn off register_globals, but always assume it’s on and protect your variables anyway.
  3. Use a database abstraction layer, and parameterized queries.
  4. Be extra careful with database statements that cannot be parameterized.
  5. Strip all script, object, and iframe tags out of user inputs. Strip all Javascript and event attributes from any HTML you do allow.
  6. Never trust input from the browser.
  7. Use wrapper functions to add extra protection to common functions like mail().
  8. Be extremely careful with sessions that are used to authenticate users.
  9. Provide an appropriate level of protection for private data.

Any other vulnerability types you care about, when writing or reviewing web application code?

Quality Code: How do you judge?

Friday, January 11th, 2008

We’re hiring programmers, over at Freelock. I’ve been going through lots code samples to try to identify how experienced and competent a particular developer is. I also do this on a regular basis to evaluate how solid a particular open source project is.

I’ve seen a lot of code in various languages. As a technical writer, I used to write documentation for programmers teaching them how to use a particular interface or system. I’ve been involved with traditional software development projects at large software companies and startups. And I’ve done my share of actual programming of web applications.

I’m finding there are several indicators I look for when evaluating code, specifically for PHP, our language of choice. I’ll go in more depth on each of these qualities in future posts, but for now just thought I’d capture them while they’re fresh in my mind. So when I review code of a web application, here are some qualities I’m looking for:

  • Secure. Does the application trust users to provide good data? Does it protect its internals to prevent all the various types of exploits out there? Does it protect data from malicious users?
  • Fast. This could mean many things, but I’m looking for efficiency across layers. Is there a database call inside a loop that gets called a couple hundred times? That’s a huge speed killer. I look for code that has an appropriate level of abstraction to the size of the problem–and makes sensible choices about how much data to load for each request.
  • Powerful. This one is stolen from Paul Graham. Does the code use object-orientation and inheritance in a powerful way? I like seeing utility methods on base classes, which can then be leveraged to make very short, easy-to-understand final classes. Are the methods attached to the appropriate level of the class hierarchy? How short can you make the main logic of the application?
  • Clear. Going hand-in-hand with power, clarity is about making it apparent what each chunk of code is for, and how to go about changing it to make it work the way you want. Clear code is maintainable, well-documented, easy to customize.
  • Customizeable. Was the program designed in a way that’s easy to override, easy to customize, easy to run in other environments? Can it be managed effectively, and work broken up into different units?
  • Reliable. Does each function or method cover all possible scenarios? Is there proper error-handling in the code? When an end user hits upon some combination of things that the programmer never anticipated, does the program die ungracefully, or provide useful feedback?

Very few programmers hit all of these. My biggest weak area is the reliability one–after reviewing other people’s code, I find a lot less exception handling in my code. We’ve all got something to learn. But reviewing other people’s code can help you spot weaknesses in your own, and develop a much stronger sense of how to do it right.

[Edit: Adding links to more detailed posts as I publish them]

How to get the best price

Wednesday, December 12th, 2007

… but are you sure price is the most important thing?

We’ve been on the receiving end of this type of call quite a bit these days. The unfortunate part about the whole deal is that pricing often seems entirely arbitrary.

When I got the bills for surgery on my ruptured Achilles tendon, I was amazed by the difference between the original price and the final price negotiated by my insurance company. Even though I had a very high deductible and had to pay most of the bill before the insurance kicked in, just having insurance lowered the cost dramatically, in some cases more than 50%.

As a service provider with payroll, taxes, and overhead, however, I’m less inclined to negotiate. With open source products, we’re providing incredible value to our business customers. But if we don’t get fairly compensated for our services, we wouldn’t be around to help businesses negotiate the open source bazaar for very long…

Project Planning, and response to Multi-Tasking is Killing Your Business

Thursday, December 6th, 2007

I met Bruce Henry at an MIT Enterprise Forum event last night. Turns out we’re both working on software for project management. Theirs is Liquid Planner, ours is Project Auriga.

I’m a bit skeptical about their statistical approach to project scheduling… I definitely agree with giving estimates in ranges, but how is your approach that different? Aren’t you depending on arbitrary guesses by the project manager users, to come up with those ranges and confidence values? I’d love to see how you’re addressing this.

Regarding this post: Bruce’s Brain: Multi-Tasking is Killing Your Business, my thinking in developing Project Auriga is that few of our projects are big enough to need a large project management system. What we needed is a task management system, a way of preventing any of the dozens of tasks we need to accomplish in a week from slipping through the cracks. And then making sure we’re billing the right customer for finishing each one. Yes, we’re multi-tasking, quite heavily. But then again, one thing missing from your multi-tasking post was context–you’re taking a very project-centric view of work. There are other ways of optimizing work, which we deal with all the time:

  • customer-centric
  • system-centric

We manage around 25 servers right now. A few of them we use to host customer accounts. When it’s time for a security upgrade, we might take our list of Joomla installations, and upgrade each one on a given server. Those tasks cross many customers and projects, but since they’re all grouped into one place, we take a system-centric view of this work. So I’m creating a view that makes it simple for system administrators to go down a list and check off systems as they’re upgraded.

Another reason to multi-task is that most of our projects have points where we must rely on outside input. Right now we have 3 Zen Cart projects, 3 Joomla projects, 3 new server installs, and 3 custom development projects that are highly active right now. We do a few tasks on one project, throw it over the fence to the customer, and when they’ve done what they need to do, it comes back to our plate. For example, one of the Zen Cart projects is waiting for the customer to regain control over their domain name so we can purchase an SSL cert and launch. The second one is waiting for the customer to give us a list of products to put into the calendar, and the third is on our developer’s calender next week to do a custom payment module. Meanwhile, our ZenCart person is able to work on one of the Joomla projects.

I guess it’s because software development is one third of our business, and our development projects tend to be small add-ons or changes to existing code. We’re not really in need of full-fledged project management software–we’re in need of a system to capture all the little tasks that need to get done, and make sure they get done and billed appropriately. Project Management, for us, is about identifying the specific tasks, dropping them on somebody’s schedule, and approving the time spent before billing.

Liquid Planner seems to be about managing uncertainty. I’d love to see how they’re attempting to do this–from our discussion last night, my impression is that the project manager estimates a range, and a confidence in that range, and the software then uses statistical calculations to quantify that into a figure to allocate on a calendar. But where does the range and confidence level come from? Isn’t this still relying on humans, on experience, on gut instinct to determine? Does this really add a useful tool to help project managers more accurately determine how long a given project will take, or just more numbers and complexity without actually solving the problem?

My assumption is that it takes 12 months to gain a year of experience… Project Managers only learn how to estimate accurately by doing, and mostly failing, at a bunch of projects. Most of our tasks have been done before, we take measured steps into new territory and allow large margins for error. For our developers, tasks get scheduled up to 75% – 6 hours of any 8 hour day. For large projects, the project manager leaves a few days unscheduled at the end of the project, to allow for the inevitable overruns.

For our sysadmins, who always have stuff to do that comes up each day, we schedule tasks at 50% – 4 hours of each 8 hour day. We add meetings, time off, holidays, etc to the scheduling system so that projects get scheduled around those known interruptions.

I had a client suggest that when presenting an estimate, give the high number first–we’re wired as humans to listen for the first number and take that as what it’s going to cost. Then we can give the factors that might make the project take less time, cost less. Auriga doesn’t try to estimate for you–it simply tracks your estimates, and tasks. We put the high number of our estimated range into the system for scheduling purposes, and try to beat it. That’s also what goes on the sales order, and unless there’s a clear change to the scope, that’s the most we bill. Project Auriga helps us know when we’re going over on a project, helps define how far along we are on a project, and makes our project management transparent to our customers. But it’s no substitute for an experienced project manager–it’s just a tool to keep details from falling through the cracks.

While there is plenty of value in having great tools, I’ve always detested technology that takes more effort to learn how to use than it takes to solve the original problem…

Upgrading to Gutsy

Monday, October 1st, 2007

Over the weekend, I upgraded my trusty Thinkpad to the new Beta release of Ubuntu, Gutsy Gibbon. Thought I would post my notes so far.

It’s a T43, and I got it around a year ago. The first thing I installed was the beta version of Edgy Eft, and then about a month before Feisty Fawn came out, I upgraded.

This time around, the upgrade wasn’t as clean. First off, my root partition was too full, so I had to do some shuffling to make enough room for the upgrade. Once I did that, it took several hours to download all the packages and start installing. At some point, a Latex package was broken, but the installation continued. When the installation progress bar was about 50% done, the installer crashed with a fatal python error, with the last messages indicating failing to configure Lyx, which depends upon Latex.

The installer couldn’t continue, and couldn’t roll back–I was stuck with a half-upgraded machine. Now you might think this is a serious issue, and for someone without much Linux experience, it might be. But my system never crashed, and I was able to finish the upgrade manually.

If you find yourself stuck half way between an upgrade like this, maybe these notes can help you finish. First off, don’t reboot. As long as your system is running, you’ve got all your tools and Internet access. Here are some things I did to get through, all in a shell window:

  1. nm-applet & — restart the network applet, because at some point in the upgrade, the panels had crashed and my system had lost its IP address.
  2. dpkg-reconfigure -a — Configure every package that needs to be configured. This took a couple hours, and stopped frequently to ask a question about whether to keep my current configuration of a package, or replace with new. This command failed part way through the first time, but when I ran it again, it made it all the way through (repeating many of the same questions it had the first time).
  3. apt-get update, apt-get dist-upgrade — download system updates, and install. This fixed the broken Latex package.
  4. apt-get autoremove — Remove the old packages.
  5. apt-get -f install — force installation of packages that still need to be installed. This didn’t do anything in my case, since I had already run dpkg-reconfigure -a, which probably did everything this command would do.
  6. apt-get install app1 app2 — there were a couple of applications that were “held back” by the other install commands, not even completing with a dist-upgrade. So I named them directly and apt found a couple other missing packages they depended upon, and installed.
  7. Crossed my fingers, and rebooted.

When my system came back up, at first I couldn’t log in. But I was expecting that–one of the key features of Gutsy is a new version of X Windows, with a new configuration system. I had a fair amount of customizations of my xorg.conf file to support multiple monitors, and other things. I used Ctrl-Alt-F1 to switch to a VT console, logged in, and ran dpkg-reconfigure xserver-xorg, accepting all the defaults. The next time, the X server came up fine, and I could log in.

Now, with Gutsy finally installed, all its new features started to shine. The system immediately asked if I wanted to install the restricted ATI driver for my graphics card. After doing so, and rebooting, it asked if I wanted to enable Xgl, for enhanced desktop effects. I wasn’t expecting this to just work like this–I wasn’t counting on having desktop effects on this laptop at all, due to conflicts with things I need to do with it (multi-monitor, Google Earth, etc).

So I was quite surprised that once I did that, I could enable the Desktop Effects, and soon had all the glitsy stuff working. Well, wobbly windows anyway–the desktop cube didn’t seem to work.

A quick Google search led me to install the ccsm tool — compizconfig-settings-manager, and wow, has this come a long ways. It didn’t seem to work, though–I had wobbly windows but nothing else. Finally, I tried running compiz –replace, and suddenly I have it all. Desktop cubes, Expo, windows that burn up when they close, windows that fold up into a paper airplane and fly off the screen–all the good stuff that can keep you away from productive work for hours!

Gutsy Beta initial impressions/issues

  • Even though the update-manager failed, everything seems to work fine right now. About the only issue related to installation that I currently see is a bunch of extra stuff in the “Other” menu, most of which is long gone. Have some menu cleanup to do. Also, Thunderbird disappeared, and I had to reinstall it.
  • Thunderbird profile moved from ~/.thunderbird to ~/.mozilla-thunderbird.
  • Evolution calendars lost all the color coding. Had to reapply a color to each of my calendar, and these didn’t get saved the first time. Now they save, but no colors in the gnome calendar widget.
  • Google Earth doesn’t work with Compiz or XGL. (to be expected, and I haven’t tried turning these off yet)
  • Suspend no longer works correctly (again to be expected with Xgl–in Feisty, I had trouble after using any video output).
  • Subpixel rendering is SWEET… everything looks fantastic.
  • Compiz task bar shows applications across all work spaces–cannot seem to limit it to just the current one. Also, the pager doesn’t turn the cube–it seems to track its own work spaces. It minimizes all the windows, then takes you to a clean desktop and you can’t spin back to the old one…
  • Tracker seems to be much better at searching than Beagle. After only a few hours, it had indexed everything, and found more relevant searches than Beagle did until recently…
  • In the gnome calendar, tasks can be hidden and appointments shown. This is a great improvement for me…
  • Every now and then, it seems like a key or a button press gets stuck. I went to close a few tabs in my development program, and after the first few closed, it then closed every tab I had open–not what I wanted.

These are impressions from less than a day of use. I’m sure I’ll find more to talk about soon–overall it seems quite nice, and I look forward to the new external monitor management, perhaps the key reason I upgraded. I’ll probably turn Compiz/Xgl off in a day or so, to get more of a sense of how well suspend and other OpenGL programs work. For now, it’s quite entertaining…

So, you want a web site…

Thursday, August 30th, 2007

The first thing to ask is, why? Web sites have lots of reasons for existence, but for business purposes, we tend to see some combination of four motivations:

  • To act as an online brochure
  • To attract new customers from search engines
  • To sell things online
  • To build a community of people who might someday buy something from you

A web site can do any or all of these, but generally the further down this list you get, the more the site is going to cost in terms of development cost and your time.

Web Site as Online Brochure

All businesses need a web site. It’s as crucial as having a Yellow Pages listing a couple decades ago-it’s the first place more and more people will look to find your address, phone number, and contact details. If you have nothing more than a single page with the basics about your business, it’s important to have at least that.

Your web site should not only tell your potential customers how to get ahold of you, but also why they should. What products or services do you sell? Who are your customers? Why do people buy from you instead of your competition?

A web site that answers those questions and nothing more is a sales tool. You are not likely to get new sales leads from such a basic web site, but it can help you close sales for prospects who already know who you are. When you put together such a site, you’ll need to consider your business brand, and there’s a couple of radically different schools of thought here:

  1. Brand matters
  2. Brand doesn’t matter, but personal reputation does

The old school of thought is that companies develop a brand that is supposed to represent its values. The danger of this approach is letting the trappings of a brand-the logo, the slogans, the marketing material-matter more than delivering those values. It’s like worshiping idols instead of the gods they represent-sooner or later you’re gonna get smote.

The newer school of thought is eloquently expressed in an essay called "The Cluetrain Manifesto" and espoused by many new thinkers and thought leaders, such as Seth Godin, one of our favorite current marketing writers. The gist is that graphics, logos, all the rest of these trappings are completely irrelevant, that nothing but content-your quality of service, your core products-matters. Their approach is minimalist-use freely available tools to build your web site, don’t spend on graphic design, instead just make sure you take care of your customers.

Of course, we think delivering quality service is important, but having a coherent brand can help. Especially if you’re trying to develop a consistent customer experience. Ignoring graphics, domain names, even business names, is fine for personality-driven businesses, but if you are trying to grow a business to be something more than the sum of its personalities, you need a visual identity that’s consistently expressed in your web site, your printed material, your contracts, in everything.

You can get started with a web site at your ISP, or a blog on a hosted service, for next to nothing more than a few hours of your time. We recommend that as soon as you can work it into your budget, hire a graphic designer to put together a business identity and a basic web site that incorporates it. Expect to spend around $3000 to get something unique that expresses what you want your business to represent, though this price can vary substantially depending on the web designer you choose, how well you can express your ideas to your designer, and how intricate and detailed your design ends up. You can find cheaper solutions, such as cookie-cutter designs, pre-built templates, or off-shore design to get something going for a few hundred dollars-but it will definitely show. Depending on the values you are trying to represent with your brand, this may or may not be a good thing.

Prices for web design can vary by a huge amount. We recommend finding a designer with a portfolio of designs you like, interviewing them to see how well you can work with them to make your ideas a reality, and decide what you’re willing to spend up front. Setting a budget for a web designer is perhaps the best way to go. Intricate designs take time to develop, which costs money-start with a logo and an overall concept, and refine until you’re happy or have reached your budget.

But before going crazy with design, read this post by Seth Godin for guidelines on what to put on your web site (and follow his suggestions for other places to post content).

Beyond an e-Brochure: Getting business from your web site

Just having a web site, however, does nothing to get customers beating down your doors. People need to find your web site somehow, amidst the millions of other web sites out there. For small, local businesses, they don’t find your web site online–they find it from your business card, a sign on your car, word-of-mouth, or all the rest of the traditional ways people market their business.

If you want your web site to actually generate business for you, recognize that it’s going to take a substantial investment in your time, more than anything else. The critical ingredient in getting your site noticed by search engines is content. The more, the better–especially if it’s interesting, relevant, and unique. Having new, original content on your site helps it in two ways:

  1. It’s more raw material for Google and the other search engines to index. Sheer quantity helps.
  2. If you’re a decent writer, and write something useful, people will return to your site to see what you write next, and some will link to your pages.

Google is basically a popularity contest: it places the highest value on pages with the most links coming from other sites. Create a page that people want to read, and eventually it will boost your rank on Google. Create a bunch of pages, and soon you’ll be at the top of the search engines, and start to get business over the Internet.

You can jump-start online marketing by buying advertising. Pay-per-click ads work, and don’t cost all that much. But nothing beats the organic results you get by growing your site with regular additions of new content.

If you need a system to make it easy to add stories to your site on a regular basis, that’s where Freelock Computing can help. We work extensively with Joomla, MediaWiki, Word Press, and Serendipity, different systems that make it simple for you to manage your own content without needing a technical background. We regularly deploy, customize, host, and provide training for these systems. Let
us know
if we can help!

Brick and Order: Selling online

Many people suggest having some sort of "call to action" on every page of your site, whether you actually sell online or not. If your web site is for a business, you almost certainly want people to take some action, some small step that might eventually lead to a sale. Even if your product or service doesn’t lend itself to online sales, your web site can help develop a relationship with potential customers, help them gain trust in your expertise or familiarity with your services.

But if your products can be sold online, you almost certainly should set up some sort of online shopping cart. The more specialized your business, the more unique your products, the more potential customers you can find online.

The Internet can put distant customers on your virtual doorstep. Having a friendly, inviting catalog online can greatly expand your customer base, and there are some great tools out there to make developing such a site affordable.

At Freelock, we recommend and deploy ZenCart for retail operations looking to open an Internet store. For people who have some products to sell but still want to have an information-rich site, we’ve deployed a Joomla shopping cart system called VirtueMart.

Growing a Community

By far the most audacious goal you might have for a web site is to make a place where people hang out and talk to each other. Many, many businesses are learning that this is a great way to cultivate a devoted following, but it takes a lot of work.

Community web sites are like gardens. It takes some fertilizer, regular watering, and someone to pull weeds to make a vibrant community grow. If your business is large enough to devote a major part of somebody’s time to keep a community site in good shape, it can pay off with enthusiastic support of your business.

Opening a web site to direct interaction with your customers can be difficult for a lot of businesses. You need to be open to criticism as well as praise, willing to allow the world to see the warts on your business. But doing so nearly always helps people trust your business, and makes them more willing to do business with you.

What sort of work is involved? Quite a bit:

  • writing stories and inviting comment on them
  • Responding to criticism and praise, both in a professional, business-like way
  • Deleting spam, or moderating posts (I recommend only moderating spam, not negative posts)
  • Generally making yourself available to your customers online

If you can’t put the time into managing such a site, I would suggest simplifying your goals, go with a marketing or an e-commerce site. Community sites are hard, and there’s not much worse for your business brand than a forum filled with spam, or negative posts that go unanswered.

But there’s not much better than having a community of vociferous fans of your business-they’ll help you with sales, marketing, and support.

We’re helping several companies put together or manage community-based web sites. Joomla has a number of common add-ons we deploy for this purpose-Community Builder and Fireboard provide a solid base of user profiles and forums. For more specialized web sites, Drupal is a more powerful content management system that makes a fine base for building entire custom applications.

Choosing a web site vendor

Lastly, a few words about hiring a web developer. There are lots of us around, with a wide range of prices. What one company can do for $2,000, another might be able to do for $10,000, and you might be able to get someone in India to do for $500. But the end result won’t be the same for any of these.

Spending more doesn’t always mean you’ll get a better result, either. Open Source software greatly lowers the entry cost to get powerful web sites, though these often result in a steeper learning curve to figure out how to use effectively.

The best way to find a good web developer is to ask people you know and trust for a recommendation. Make sure you talk to people who have worked with the web developer to get a sense for how the process went, and how satisfied they were with the results. There are two different skills used in putting together a web site: the graphical side, and the technical side. You need both, and you don’t tend to find both in the same person. Make sure your graphics person "gets" what you are trying to express, and make sure your technical person can explain things in terms you understand. These factors are far more important to the ultimate success of your web site, than the cost you pay up front.

So, here’s the final checklist of how to put together a great web site for your business:

  1. Decide upon your goals for the web site, what type of web site you want to create.
  2. Ask friends, colleagues, family members for
    recommendations to find a web designer and developer you can work
    with. You might also be able to find a developer by contacting the
    owner of a site you particularly like.
  3. Interview your potential web designer and
    developer. Ask to see samples of their work, and to talk with prior
    customers. For the designer, look for designs you like, and how well
    you connect with the designer-design can be extremely subjective,
    and you want someone who will deliver what you’re looking for. For
    the developer, make sure they’re competent, and that they can
    clearly explain what needs to be done and what your options are.
  4. Once you’ve decided upon the people,
    determine if they can do the job within your budget, and if so,
    you’re off and running!

My current desktop environment

Saturday, June 9th, 2007

Several others have listed the applications they use on a daily basis. I’ve been using Linux for my desktop environment for several years, and thought I would share what I use constantly.

  • Operating System: Ubuntu Feisty Fawn, 7.10
  • Desktop Environment: Gnome (though I tend to prefer KDE overall, Ubuntu does a better job with Gnome, especially integrating laptop features like power management, and Gnome has gotten good enough to use as the primary desktop. You’ll see lots of KDE applications listed here, though!)
  • Browser: Firefox (what else?)
  • Firefox plugins: Bookmarks Synchronizer, Firebug, Web Developer, Forecast Fox, Google Preview, HTML Validator, Sage, SearchStatus
  • Email client: Mozilla Thunderbird, occasionally Evolution
  • Thunderbird extensions: Asertiva Extension for Sugar, Display Mail User Agent, Enigmail, Lightning, QuickFile, QuickText
  • News Reader: Firefox Sage extension for general stuff, Thunderbird for security-related feeds
  • Calendar: Evolution
  • Address book: SugarCRM (okay, it’s not a desktop application)
  • Miscellaneous notes: Tomboy
  • IM: Kopete (though I mostly keep it off to avoid interruptions)
  • IRC: Konversation
  • Networking: NetworkManager with OpenVPN add-on, OpenVPN Admin for connecting/testing client VPN networks
  • Development: ActiveState Komodo Professional (almost the only proprietary software on the list!)
  • File management: Konqueror (hard to beat this for connecting to almost any type of server out there)
  • General editing: vi
  • Office software: OpenOffice.org (currently at 2.2)
  • Desktop search: Beagle
  • Graphic editing: Gimp
  • Database editing: Rekall
  • Multimedia playing: Kaffeine, Amarok, and DemocracyTV
  • Photo Management: F-Spot
  • Audio editing: Audacity, though I’m starting to play around with Jokosher
  • Disk Encryption: Truecrypt
  • Personal Finance: GnuCash
  • Repetitive Stress Injury Prevention: Workrave

I use a lot of server software, and spend much of my time in shell (terminal) windows… I usually have two terminal windows, each with 3 or 4 open tabs, all connected to different servers, many with multiple sessions running in screen.

There’s a bunch of other software I have installed, but don’t use regularly, including Scribus, Inkscape, Xara Extreme, and many others. Aside from some specialty industry applications, I have a hard time imagining anything I couldn’t do with my current desktop environment, and out of this entire list, there’s exactly one item I have to pay for: the Komodo IDE I use for development. Everything else here is free.

Random notes

Saturday, June 9th, 2007

Marc Andreesson, one of the authors of Mosaic, the original web browser, has taken up blogging, and in his first week he’s got some thought-provoking posts. I’ve adopted many of David Allen’s Getting Things Done ideas to help get my business off the ground, but Marc has some great tips here: blog.pmarca.com: The Pmarca Guide to Personal Productivity.

I particularly like structured procrastination and strategic incompetence…

What Is Drupal?

Tuesday, May 1st, 2007

At Freelock Computing, we’ve helped a few dozen companies get started with a content management system to manage their web sites. We’ve done a lot of work with the popular Joomla package, but have kept an eye on Drupal for customers with more sophisticated needs. Here’s a great introduction to the underlying architecture of Drupal, providing a context for developers and site administrators to start using it:
Dr. Dobb’s | What Is Drupal?

Condemned To Google Hell

Tuesday, May 1st, 2007

Search engines are crucial to marketing your business online, and Google is the most important of all. However, be careful what you do to try to get better search results–there’s a difference between getting organic search results and trying to game the system. If you get caught gaming the system, you may be
Condemned To Google Hell – Forbes.com.

Complete Web Marketing for $60

Sunday, April 15th, 2007

All businesses need a web site these days to be in business. It’s more important than a yellow pages ad. But it can be much more expensive and take much more knowledge to create than a yellow pages ad. When you’re struggling to get your business off the ground, how can you find the time and energy to put together a web site, too, especially if you’re not a computer person?

Seth Godin is one of my favorite business authors, and yesterday he wrote a simple recipe for small businesses needing to put together a web site. From the article:

The web has changed the game for a lot of organizations, but for the local business, it’s more of a threat and a quandary than an asset. My doctor went to a seminar yesterday ($100 ) where the ‘expert’ was busy selling her on buying a domain name, hiring a designer, using web development software, understanding site maps and navigation and keywords and metatags and servers…

He suggests making use of a few easy services that are free or very inexpensive to put together a site that tells your customer what they need to know about your business, and furthermore gets this content into places that Google loves to search. Read the full story at: Seth’s Blog: Memo to the very small.

When GPL software goes bad

Saturday, April 14th, 2007

Anyone still using SQL Ledger should be aware that new versions are no longer released under the GPL. When I wrote the book, and for several years afterward, SQL Ledger was the only game in town, and it’s been the only viable open source financial web application for quite some time. It’s been open source in name but never in spirit–the owner only helps people who have paid for the software, and discourages help from others that compete with his ~$300 manual. It’s quite the interesting drama.

Fortunately, last fall a group of developers forked the SQL Ledger codebase, starting LedgerSMB. LedgerSMB has the open source spirit of cooperation, open exchange of information, and a sense of excitement that’s been lacking in SQL Ledger. We’ve hopped on board LedgerSMB, and have switched our clients.

With the license change, we now see LedgerSMB as the only game in town–SQL Ledger just removed themselves from the competition. If you’re still using SQL Ledger, I encourage you to take a look at Ledger SMB. Read more about it here:
When GPL software goes bad | Realm of the Purple Dropbear.

Disclosure: We are contributing to LedgerSMB directly. We also paid for multiple years of SQL Ledger.

Truth in Numbers: the Wikipedia Story

Tuesday, April 3rd, 2007

wikidocumentary.pngSeen on Rocketboom: There’s a new documentary film in production about Wikipedia. It’s a non-profit project, and they’re looking for donations. Looks like a great project, and we’re delighted to see the cover of our book in the trailer.

Windows screwup forces Ubuntu shift

Monday, January 1st, 2007

Happy New Year! Here’s a quick story about why Linux is the future:

Windows screwup forces Ubuntu shift

Create Labels with Free Software

Thursday, December 28th, 2006

We’ve needed this sort of thing for a while:
Openoffice.org Label Templates for Ooo Writer free

Spam Revisited

Friday, December 8th, 2006

We’ve noticed a huge increase in the spam getting dropped into our spam quarantine–it’s doubled in the past two months. I have clients complaining about greatly increased spam as well. It turns out we’re not alone:
Spam Doubles, Finding New Ways to Deliver Itself – New York Times

Spam is back — in e-mail in-boxes and on everyone’s minds. In the last six months, the problem has gotten measurably worse. Worldwide spam volumes have doubled from last year, according to Ironport, a spam filtering firm, and unsolicited junk mail now accounts for more than 9 of every 10 e-mail messages sent over the Internet.

… What to do about it? Our next Freelock Irregular newsletter will offer some help. But meanwhile, for the technical folks, here’s a link to a How-To to implement checking for Yahoo domain keys:

Postfix with dkfilter (DomainKeys Implementation)