Alright, let’s look forward towards the time your business is going to be successful. If your website is like most, you’ll be generating less than $100 a year from the average user. That means your business will depend on scaling up the number of users, in order to pay for its development costs and realize the serious profits.
As you know, some companies (at the time of this writing, Twitter) haven’t even announced a business model that will make them profitable, yet they already have millions of dollars in funding, which they are using to amass millions of loyal users. So clearly, whether or not your business is actually profitable, there’s a point at which it may be very attractive to various investors. I will deal with the business model in a later article. The point I want to make is that if there’s one thing that can give you a good chance of raising money, it’s “traction”. Coming and saying “the site is growing at x% per week and currently has y members. We need money to scale up our servers and build features A, B and C” sounds much better than “I just know this site will become popular once we launch it.”
How to get more members
These days, your site will attract users via a combination of direct advertising and viral marketing. The more targeted the advertising, the better. One of the best deals is pay-per-click advertising where your ad strikes the right balance between weeding out people who would not convert to bona-fide members on your site (by making the text of your ad very specific) and generating such poor click-through rates that it’s buried by google and facebook (because it makes them too little profit).
The rest of your user base will come from viral channels. To find viral channels, look at any social network, such as: facebook, twitter, email, and tightly-knit communities such as colleges. Try to choose viral channels that are not saturated with spam, so people are paying attention to the message they receive from a friend (or someone they signed up to follow, such as a twitter user). Finally, create a viral loop where a user of your site has a strong incentive to link their friends to your site (or something specific within your site) using one of these viral channels, and encourage them to check it out.
There are several types of viral growth:
- The most powerful is the grass-roots type, where people tell their friends about something. Because there is no hierarchy of popularity required, any receiver of a message can turn around and become a sender.
- Another type of viral growth is the “PR” type viral growth. This is where your message bloggers, newspapers, media companies and other producers of content. They have user-bases that subscribe to what they say and respect it. One guy with a mailing list can get you 1000 eyeballs. Because it is a hierarchy, there are politics involved, and getting this done well is an art. You should probably hire a PR person for this.
- Finally, there’s the search-engine based viral growth. I say viral because these days, search engines look at inbound links and other related factors to determine a site’s popularity and relevance. The more people link to your site on their webpages, the higher it ranks on search engines. Here, the network you’re using is the network of internet websites. And the viral channels are the links, embedded in blog articles or what-not.
Optimizing your virality
You can easily tell how quickly your site is growing by looking at the number of users. But there are other metrics that you should probably be measuring as well, in order to optimize your site.
One such metric is the viral coefficient. We’ve all heard the “myth” of exponential growth… that all you need is 1 guy to bring 5, and 5 guys to bring 25, and so forth. Well, in practice it seems more like a powerful function (e.g. u = t^2 rather than u = 5^t). I’m going to define viral growth in a different, but more useful way. Imagine you attract a user to your site using some means over which you have direct control, such as pay-per-click advertising. You can track this using a parameter embedded in the link (e.g. http://mysite.com/myapp/?adwords=8). This user may send out links (invites) to their friends using some viral channels (such as email, or whatever). These links will contain an id associated with this user (e.g. http://mysite.com/myapp/coolpage?inviter=23498). Whenever someone follows this link, you store this id in their session. If they wind up signing up for your site for the very first time, you give a point to the original user (who came in through your ad) for having brought a new user to the fray.
The viral growth coefficient, then, is the average number of new members that an “original” member (who was signed up after following an ad) brings in through viral efforts on their part. Notice that for this coefficient, it doesn’t matter whether those new members in turn brought other members. The viral growth originating from one member could die out after bringing in a total of 20 people. But that one person brought 20 more without your having to advertise to them.
You can go further and measure other viral metrics if you wish. In my opinion, the one above is the most vital to your business model (which I will discuss in another post). But other possibilities include: the average number of users a user invites and the average conversion rate on each of those invitations. These can’t always be measured — for example, if a person shares a link on facebook, sends out a tweet, posts it to a blog, etc. you don’t really know how many people they have “invited”.
Other metrics you’ll want to measure and optimize:
- Conversion rate from a visitor (someone who visits your site and is not a bot) to a member. You measure this by starting a session for each visitor, and if they wind up signing up for the first time, you count this as a conversion. If they simply log in, you use that for the following metric:
- Frequency of logging in. Every time a user visits the site and logs in, you can update their average frequency. Set a minimum coarseness level of, say, 1 day — meaning that if they log in 7 times on Monday, 3 times on Tuesday, and then once on Friday, you will store the frequency as “3 days out of 5″.
These are obviously very important metrics. You’ll need them to increase your site’s audience. The first one directly affects the viral coefficient and therefore your user acquisition cost (UAC). After all, people are only successfully “brought in” through viral means if they actually sign up as members. The second metric measures how often people return to your site. Each day is a new day the member can spread the word about your site, create content for others, invest their time into it, build a reputation on it, and in general build mindshare among your market. You should reward your members for doing this. You should also reach out to them by sending them notifications and updates.
I’m going to write a separate article on user retention and engagement strategies. I’m a big believer in giving people a good experience and not simply milking their time and attention through some kind of addiction. In other words, it’s not enough to ruthlessly bring people back to your site and make them stay there for a while, but you should strive to make it healthy and enjoyable for them. Internet addiction may or may not be a disorder.
What to watch out for
In my opinion, most businesses fail because they couldn’t survive until they got enough revenue to cover their expenses. This could be because the business was not viable (say, for legal reasons), or it turned out customers just weren’t going to pay as much as the founders anticipated, or because the costs of delivering service to a member were much higher than the founders anticipated, or because the costs of acquiring a member were too high. The saddest one is where each member would have brought in a great net profit, but there simply wasn’t enough initial investment to get over the hump, and the business could not raise more money. It ran out of funding and had to cut corners or fold. (Clearly, there may be other reasons for a businesses to fail, but most failures can be traced back to some form of this.)
Therefore, the first — most fundamental thing that can go wrong is that your business model will require incredible luck to actually make a profit. I’ll talk about science of business models in a future article, but if your business model relies on a long shot, then you’re risking an insane amount from the get-go. Usually this means there is no fall-back plan. You either make it or you spectacularly fail.
On the other hand, there are business models that can survive a lot the sensitivity analysis, and remain profitable for most reasonable values you can throw at the variables. The great thing about websites and viral growth is that you can analyze a “small” sample of people to estimate what the larger population is going to do. You can do market studies to make an educated guess as to how much a typical user is going to generate. Once you build your prototype and your metrics, you can start measuring the viral coefficient and trying to improve it, or user retention, or user conversion. Usually, these are all fixable problems — as long as the business model can remain profitable for a large range of values.
The next thing that can go wrong is that you will run out of money before covering your initial costs (development, legal fees, hiring, etc.) . You will have no money left over for scaling up your servers, for example. As long as you can maintain the existing servers, this shouldn’t be a huge problem either. What you do is temporarily disable invites until you can raise more money. You already have traction — in fact you are fast becoming a victim of your own success. You’ll find many people willing to rescue that kind of damsel in distress — as long as you can prove your business model is empirically working and generating the profits you expected. Still, by cutting off viral growth, you run the risk of changing the culture of your site. If your site depends on spreading the word about stuff, you’re going to get lots of users who arrive but can’t sign up. On the other hand, it might be a blessing in disguise, as in a classic case of reverse psychology, you tell them to enter their email and be notified when you’ll accept more members into your “elite” society.
Perhaps a little ironically, one of the biggest problems you can have is growing too fast. Too many users, or too much content accruing, for you to scale up fast enough. That is what we deal with in the next section, and it is the main focus of this article. If you really ignore this problem in the beginning, you can really paint yourself into a corner, as unfortunately happened with pixelotto.com. The site started slowing down and experiencing a lot of scaling problems due to the traffic it was receiving. It was built in Ruby On Rails, and my guess is — at the time, a single server at a hosting company may have not been enough to handle all the traffic.
Another thing that can go wrong once your site becomes popular is security flaws. If you have 1 million users and people hear all the time about your site, and suddenly it goes DOWN, or starts acting very strange, that is a big problem. That is over 1 million people and their friends saying your site sucks. Now hopefully, they are saying your site’s been great and it’s just acting weird. But the best you can hope for is to restore the site from a backup, to the state it was in 2 days ago (when presumably everything was fine), and then FIX the vulnerability as soon as possible. This is very, very expensive, and time sensitive. If it takes you a month to fix the vulnerability, you’ve alienated over a million people and the word’s going to spread. Worse still, if the hackers were able to steal some kind of important data, you’re never going to be able to live it down. That is why it is absolutely crucial to secure your site while it is still small. The good news is, you just have to develop it in the right way. I’m going to devote an entire article to this alone.
Another thing that can go wrong is that your datacenter is hit by a tornado or a truck. Or more probably, one of your machines suffers a disk failure. That is why it is nice to build on a cloud of interchangeable machines these days, and to back up your database (and images of your app server).
As you can see, most things that can go wrong kick in when you have a lot of users and it’s hard to turn things around. That is why you should do all your thrashing early — as Seth Godin said in his lecture at The 99%. There, he also said that you’re not paid to write code, you are paid to ship. When you run out of money, you ship. Ideally, of course, you should ship before you run out of money. You should launch your site, build metrics, analyze how you can improve them, and then have a period where you may or may not need to raise more money. But
Scaling your back-end
Alright, now we get to the meat of the article. How do you build your website to handle all this scaling in traffic? Sometimes you’ll be growing smoothly and then get a huge spike in traffic from being “slashdotted,” or mentioned on the yahoo front page.
The first thing to know is about is shared-nothing architecture. If your application tier (the web server and your PHP applications, say) has a shared-nothing architecture, each box is self-sufficient and can handle a request as well as any other box. Then you can simply scale up with traffic, even to the point where you’re just bringing up new nodes on Amazon’s elastic compute cloud with your application server’s image (already containing the operating system, web server, etc. ready to go). You can then have a router do load balancing by routing incoming requests among all the web servers. Any code where you don’t store any state, such as the web server or your PHP script, can be put on a shared-nothing box.
The persistence (data model) tier is not going to be shared-nothing. This is where your web application boxes will send queries of their own, in order to retrieve data. If you’ve had to deal with relational databases, you’re probably familiar with the idea of database normalization. Those principles are designed to promote data consistency and free you from having to update duplicate copies of data all over the place. However, they do not promote scaling. To scale a data layer, you will usually apply some variation of horizontal partitioning. This is where data is stored in different partitions (often referred to as shards) which may or may not be on distinct machines. You can still use relational databases, but when the app server makes a query, it is routed to the machine that has all the data that is being queried. Splitting things up for easier querying is an instance of data warehousing.
To partition a table horizontally, first you must pick a field by which to partition. Good candidates include the user id (if it is present) because human users should only be allowed to execute a limited number of actions in a minute (throttling). Similarly you can use the API key if you have an external API. If none of these are present, then consider the primary key as the key to partition on.
Joins become harder to do across partitions. The whole point of partitioning is to spread the load among different machines, so instead of doing a join, you will usually wind up getting a list of IDs and then send out 5-10 separate queries to grab one row for each ID (which may reside on different machines). It’s like you would query the database without using joins at all — this allows scaling smoothly through horizontal partitioning. The one exception is when two tables are sharded by the exact same key, and it is being used as the key for the join. For example, when you get a user and all their photos. If the photos are sharded by the id of the user who uploaded them, their row can be placed on the same machine as the row of the user. So you can still do joins but only between restricted subsets of your tables.
Key-value databases are the easiest to partition horizontally, because the lookup is always happens based on a string key, and there are no joins. The key usually looks something like “tableName_keyValue”. This is how memcache works, and it can scale to huge numbers of machines. Still, you can improve performance even further if you store related items under the same key. For example, when you want to show the profile for a certain user, just store all the information under “user_profile_831551″ or something to that effect.
If you don’t want to deal with all this, you can check out new services like the Google app engine. Because of the way it’s designed, you can write the apps and let it worry about the scaling. Then again, you won’t get a lot of the relational database goodness you’ll have if you use MySQL, etc. Here is an article that makes a good case for it. Currently it only supports Python and Java, though.
Make client-centric apps
There was a time when web browsers had no client side scripting, and everything was sent from the server every time. The page had to be re-rendered every time. Sure, you could cache the results, but you still took a hit in the sheer number of requests (back then it wasn’t usually in the millions), and all that memory having to maintain the cache, as well as the view state, on the server.
Fast-forward to now, and I know that this kind of thing is not only harder (to code, to document, to educate new developers in) but also doesn’t scale nearly as well. Consider two example:
- Let’s say you have a 5-star rater that uses AJAX. The AJAX response could either a) return the new value for the rater to display, or b) re-render the rater on the server side and return the markup to replace the existing one. Clearly, the second one consumes more resources for both computation and bandwidth.
Finally, cache as much as you can. Caching rarely hurts. Do it at the browser level, at the webserver level (use a CDN to deliver static content), the app level (when someone is not logged in, show them relatively static data which a server-side cron job updates every few minutes), and the persistence level (memcache in front of your database, to store the results of queries as well as entire profiles).
If you do these things, you should be able to scale. Keep in mind that slow and steady viral growth can actually be a good thing. It gives you time to raise money, to re-architect things correctly, before optimizing your virality metrics and buying your next round of ads.
Oh and I have to mention: check out http://highscalability.com .