A week of work on Zortit

October 1st, 2008

It has been a hectic week working on our webapp. I planned to put up regular updates but have been so busy coding that I haven’t been able to keep up with it. Here’s a big update of what we’ve had going on for the last week or so. We’re coming up on the home stretch with the contest deadline looming on friday..

Marketing
Cherie has been hard at work. She’s been working on creating the product description, writing a rough draft of a marketing plan, doing competitive research, putting together a board of advisors, and coordinating the efforts of our graphics team.

Branding
Our graphics artist, Chris Barela, got us a great logo last week and we’ve gotten some great website mockups. HTML/CSS guru Jean Leitner has been hard at work converting the mockups into code. Here’s the new logo:

Infrastructure

Our application runs in the AWS cloud. We’re using EC2 instances. We’re using S3 to cache some web API results. There are other AWS services we use which I won’t dive into here.

Right now our app is running as a single EC2 instance, however I’ve partitioned the components out on this instance so that they can be spread across machines. On the front end we’re using HAProxy, with apache/mod_passenger (aka mod_rails) running rails instances, with MySQL as the database. We’re using memcache for a performance speedup, as well as S3 as a cache. I’m doing deployments via Capistrano which works pretty well.

‘I have a dream’ of having instances come up and self configure. Sometime in the future (probably when things are burning down) I’ll set up iclassify and puppet, and perhaps even configure user auth via LDAP. And then systems will spin up, register with iclassify and I’ll be able to provision them mostly automatically. I’d hoped to use pool party, but it’s in re-write right now — perhaps when it is finished.

I also got nagios set up to monitor from an existing machine, and during the process found one of my nameservers was broken - funny things you find out when you start monitoring things!

Collaboration
We’ve been using trac to colaborate - and we’ve managed to proxy tickets from email into trac’s bug tracking. So testers can click a mailto link when something breaks on the site. Very neat! Trac also has a subversion browser I’ve used on occasion and I’ve been posting links to system management pages and whatnot there.

So that’s where we’re at. More news as it happens!

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • Reddit
  • SphereIt
  • Technorati
  • YahooMyWeb

Some random links

September 24th, 2008

I’ve been cleaning out my email today, and finding several gems among the cruft..

A friend sent me a link to a pretty rocking collaborative whiteboarding application: Dabbleboard

I found this great slide-deck from a presentation at the Velocity conference done by Adam Jacob. It’s a great introduction to the latest tools that you can leverage for ‘deploying to the cloud’. Check it out:
Building an Automated Infrastructure (Powerpoint Slides)

Right now I’m busy hacking on an AWS Startup Challenge entry. Our entry is using the theme ‘redefining search’. The product is called Zortit, and will be leveraging the AWS cloud services and be built around Ruby on Rails. Keep tuned in for more updates!

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • Reddit
  • SphereIt
  • Technorati
  • YahooMyWeb

Sizing your infrastructure before launch

March 12th, 2008

So you got a webapp - How do you decide how many servers to deploy??? Even if you are still in development and don’t have a single outside user you can make an informed decision on how big to build and what your future network infrastructure will look like.

By gathering some data and doing a little load testing you can launch a new application confident in the fact that you know how many users your application will support.

I will outline the process you can use to size your infrastructure. I’ll be discussing it in the context of a web-based application but these methods can be applied to other types of applications. At my last client, Avvenu, half the network communication was not HTTP based and I used these methods to scale it regardless.

At the end of this process you’ll have a spreadsheet where you’ll be able to plug in arbitrary numbers and get out the scaling information you need. If bizdev asks “what happens if we close this deal and double our user base?” or if engineering finds a way to increase server performance by 100% you’ll be able to quickly answer what the impact on your network would be.

Understanding your usage

The first step in building our scaling model is to understand how your users use the system. There are a big series of questions that you’ll need to answer to get an idea of what that usage looks like.

First you’ll need to know how many active users to expect in the future. This data often comes from your marketing department.

The data is usually presented something like - in one month we’ll have X active users, in two months we’ll have Y, in three months we’ll have Z. You’ll need all these for your scaling spreadsheet.

Next you’ll need to find out how the typical user either uses the site (for existing sites) or is expected to use the site (for new sites). You’ll want this data in a given time period, such as per week. Some examples of what you’ll want to know are:

  • How many times a week does he visit?
  • When he visits what does he do?
  • Downloads a large file?
  • Looks at pages that require a large amount of processing
  • How many times and which ones?
  • Looks at images that are dynamically created?
  • Looks at static pages?
  • Uploads Data?

How much data do you have to maintain per users? This includes files, database rows, or in some applications constant open connections. This will also have to be accounted for in your scaling model.

For an existing application you’ll be able to mine your access logs. Always keep and archive these logs when at all possible. They come in handy to mine for useage pattern data. Throw together some scripts to extract the answers from your access logs.

For new sites put together a detailed but not overly technical questionnaire for your product manager. The answers from the questionnaire can be used to model typical visitor usage patterns.

One final note on usage patterns. You’ll find that you’ll have some users that look at a few pages every couple of months, and then some users who integrate your site into their daily routine. You’ll need to find the /average/ across all your active users.
 
Distilling the estimated traffic

Now you have how many users you have, vs. the activity of each user. You can now determine how many requests your service will have to handle. You can figure this out just by multiplying the number of users against the number of operations and then divide that by the number of seconds in your time period (i.e. a week) to find the average number of operations you’ll have to perform per second.

Important to note, when sizing your bandwidth that file sizes are measured in BYTES and bandwidth in BITS. multiply all file sizes by 8 to find the number of bits they would be when crossing Ethernet.

Load Testing

Once you’ve determined what your average user will do you’ll need to automate that behavior for load testing. Typically you’ll set up a load testing cluster - or just test against your pre-production or development environment on off hours. You’ll need to ensure your load-generating machines that run your load testing scripts do not become your bottleneck. In this phase it is very useful to be running server monitoring and graphing software like NAGIOS and CACTI. Make sure your server graphing captures CPU, Disk, Memory, Network, and process utilization so that you can identify which machines bottleneck and what parts of the machines have to be scaled. Sometimes you’ll think an application should bottleneck on CPU and find it bottlenecks on Memory. This helps you make informed purchasing decisions when you buy new machines for your production environment.

You can set up scripts and use tools such as AB (apache benchmark) to throw traffic at your servers and determine the number of operations per second your servers can handle. You’ll have to try to isolate each class of machine (i.e. DB or HTTP, etc) and determine it’s maximum load. With unlimited resources you could load test a single webserver to determine it’s limits, then throw 100 load-testers against 100 web-servers to find your DB’s load limits. But for most of us this is impractical. So you may have to be clever and try and profile the database traffic generated by the webserver load testing and then create a script to drive simulated load at your DB server directly.

It is important in this step to discover any horizontal scaling issues. If you find adding new servers does NOT increase your capacity as you expect then you’ll need to work with your software engineering team and fix the scaling problems or warn management that their is a likely hard limit of X number of users the system will support.

Peak vs. Average usage

You will need to determine the peak usage hour(s) of your service and how these relate to your average usage.

I have found that your peak usage will typically be double your average usage. If you have no other data then go ahead and size for that.

If you are sizing an existing application you already know your ratio of peak vs. average by looking at your log data.
   
Building the Spreadsheet

TOTAL          (users * usage / time-period-to-seconds ) * peak/avg
REQUIRED  =  --------------------------------------------
SERVERS       benchmarked-requests-per-second-per-server

Do this for each class of server, web servers, app servers, DB servers, etc. Then make a column for each month of growth. Make your formula round-up the number of servers. you can’t deploy 2.3333333 servers can you?

Often I’ll break this down into the number of active users each server can support. I can then divide the number of projected users and have the number of required servers.

USERS       benchmarked-requests-per-second-per-server
PER       = ---------------------------------------
SERVER     (per-user-usage / time-period-in-seconds ) * peak/avg

TOTAL                USERS
REQUIRED = ---------------------
SERVERS      USERS-PER-SERVER

Your total servers numbers can drive other parts of the spreadsheet as well. Every so many servers you’ll need a new Ethernet switch, another rack at the colo, and perhaps increased headcount (try and reduce this by automating as much as possible!)

Make sure your spreadsheet also accouts for the amount of static data you have to maintain per user. For example how many file servers will you need for the files your users upload? How many users will the disks on your DB server support?

Your model should also determine the maximum network traffic at peak times so that you’ll understand when you’ll need to order more bandwidth from your connectivity provider or will need bigger routers and load balancers.

In Conclusion

Using this process has allowed me to help size networks for many internet startups and kept my network operations groups from being caught with their pants down. Determining your scalability and using this data to anticipate required infrastructure growth will help you and the rest of your organization have confidence going forward with a growing userbase.

Technorati Tags: gioco poker gratuitogiochi online pokerpoker room on linepoker milanogiochi keno gratisroulette cineseslots machine downloads gratuito,giochi gratis slots machine,slots machineautomatic video pokerall slotsroulette da scaricare gratiscasino gamingla roulettebetandwin casinocasinos onlinegiochi casino da scaricare gratisi casino on lineregolamento roulettecasino on line con bonuscasino en lineatrucchi casino onlineswiss casino onlinecasino on line sicuricasino’ on lineplay slotscasino on net pone un icona sul desktop,casino net,david casino netregole baccaratvideo poker machinesforum casino on linewww casino online comrussian roulettescaricare casino gratisroulette europeacasino baccaratbaccarat on linegiochi casino pc,gioco pc casino,casino pccasino poker gratisplay baccarat onlinesistemi gioco roulettegiochi gratis video pokercasino con bonus senza depositoroulette liveslots gratiscasino paypaladvanced video pokerwin rouletteregole gioco roulettecasino on line americangioco gratuitodownload giochi casinocasino bonus no deposito

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • Reddit
  • SphereIt
  • Technorati
  • YahooMyWeb

Maintaining documentation — It’s in the wiki!

February 19th, 2008

One of the important things of maintaining a big network environment - with a small staff - is to keep up to date documentation on configurations, customizations, and instructions for frequently executed tasks. Commonly when I walk into a new company the documentation is terrible? Why? Because there is either no thought to maintaining documentation or the documentation system/procedure in place is too time consuming to use.

If a documentation system us hard to use it wont be used at all. It should take less effort to update a piece of documentation than to send an email. Locating a document should be as easy and should support freeform text searching. Thats why the best documentation setup I’ve worked with is a wiki. It’s easy to create, locate, and change documentation which encourages people to actually document things! You will have current verbose documentation when you need it.

If you do use a wiki to maintain your documentation produce an offline copy of periodically and burn it on cd. Put this CD along with one copy of every vendor supplied CD into a CD wallet and keep it at the datacenter. it will prove invaluable when you have outages.

Heres the wiki engine I’ve used - and liked - in the past. It runs on top of your vanilla LAMP stack.

tikiwiki.org — TikiWiki CMS/Groupware

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • Reddit
  • SphereIt
  • Technorati
  • YahooMyWeb

Small Business: How not to behave on the internet

February 17th, 2008

This is an example of how not to behave if you are a small business on the internet. A friend of mine simply posted a question on a forum, the entirety of his question was: I’m curious if anybody knows anything about Lucas Environmental Stormwater Services, Inc.? This simple question has led to the owner threatening legal action in email and via rambling voice-mails. It is never a good idea to threaten someone unless they are blatantly in the wrong and doing something clearly illegal. Otherwise you just rile people up and turn what should have been nothing into a huge negative-publicity exercise for your company. For more information see: mhalligan: Greatest voicemail transcript EVER

Technorati Tags:

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • Reddit
  • SphereIt
  • Technorati
  • YahooMyWeb

RoR: Testing with simple_captcha & HTTP-Auth

February 9th, 2008

While developing a small Ruby on Rails application for The Pilot’s Camping Directory website I ran into a few problems that weren’t solved by a simple google search - so I’m documenting them here for future posterity and googling. I had problems with testing when using some security features to keep out riff-raff. It was not obvious how to handle simple_captcha or simple_http_auth while doing testing so I scratched around the net and pieced together a solution for each of the problems. These work with Rails 1.2. With Rails 2.0 YMMV - but then 2.0 breaks every rails tutorial ever written so I don’t feel bad if this blows up in 2.0.

Using Mocks for testing with simple_captcha

Tests will fail when trying to save something protected by a captcha - obviously - as stoping automated lever-pulling is exactly what a captcha is designed to do. In my application I use capcha at the model level, so I simply override the save_with_captcha method with a simple save.

Here’s what my mocks/test/recipient.rb looks like:

# Can't fake captcha for testing - so we mock it out.
require_dependency 'models/recipient'
class Recipient < ActiveRecord::Base
def self.save_with_captcha
self.save
end
end


Functional Testing HTTP-Auth

To test HTTP Authorization / Authentication you must set up your request environment to pass the http authorization into the application. This is known to work with the simple_http_auth plugin, the plugin that I used for my application. Specify this in the setup section of your functional test.

def setup
@controller = SupersecretController.new
@request = ActionController::TestRequest.new
@request.env['HTTP_AUTHORIZATION'] = "Basic " + Base64.encode64(ADMIN_USER +':' + ADMIN_PASSWORD )
end


Integration Testing HTTP-Auth

Integration testing simulates making requests directly to the webserver. To work with http authorization here you must pass in the appropriate authentication headers when making each get/post request. An example is below:

@htauth = "Basic " + Base64.encode64(ADMIN_USER+':' + ADMIN_PASSWORD )
get("/supersecret/index", nil , {:authorization => @htauth})

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • Reddit
  • SphereIt
  • Technorati
  • YahooMyWeb

Sharpening the saw, html and graphics.

January 16th, 2008

In my off-season (winter) I am usually traveling internationally - mostly places that are sunnier and warmer than the San Francisco bay area. It’s often the perfect time for me to sharpen my various skills , being unconstrained by the usual grand infrastructure projects I do in the summer.

It’s often these times that I bring back up my html/coding/graphics skills. Wifi Bandwidth here in Puerto Vallarta has gotten much more ubiquitous and reliable and so I’ve got connectivity almost as good as back in SF. I’ve been diving back into apps like
Gimp, Aptana, & Inkscape.

I also enjoy catching up on the avant guard of web artistry and seeing what people are creating with html and css. I appreciate simplistic designs and so I really enjoyed the sites on display at the link below:

25 Beautiful, Minimalistic Website Designs - Part 2 | Vandelay Website Design

Powered by ScribeFire.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • Reddit
  • SphereIt
  • Technorati
  • YahooMyWeb

Home Fabrication

May 21st, 2007

This weekend I went to the Make Faire here in Silicon Valley, put on by Make Magazine. make is geared towards folks who enjoy making things with their own hands, inventing and creating instead of simply consuming what’s available at the store.

The most amazing technology at the fair was the home fabrication / 3D printer technology. There were las tarjetas. several units there, but one caught my eye. The Fab @ Home unit, designed for hobbyists, is a unit that can be assembled for just two thousand dollars in parts and a weekend of work.

This will prove to be one of the most distruptive technologies to come along. Home fabrication will make the copyright issues with MP3s look like a cakewalk. When you can print your own furniture, clothing, and other housewares, just by downloading designs from your friends.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • Reddit
  • SphereIt
  • Technorati
  • YahooMyWeb

Yahoo Pipes, a very neat app!

February 8th, 2007

Well, web 2.0, for me has a lot to do with making data available in an agnostic manner, wether that be via RSS or via a web services API. Data tied to a presentation layer, such as a traditional website, is data that has no future outside that website. The rise of mash-ups is enabled by data being decoupled from it’s presentation. Being combined with other data makes that data more valuable.

Until now you’ve needed to be a reasonably adept programmer to put together different data sources to create mash-ups. But not now. Yahoo has just launched an application that allows anyone with the most rudementary conceptual knowledge of programming to create new mashups.

Yahoo Pipes is the new application, and it allows anyone to easily string together web data sources and funnel them through some rudimentary filters to create new mash-ups. Yahoo has been a bit absent with the whole innovation thing since Google became the industries’ darling but I think this marks their comeback in a big way.

There are a good series of articles on the O’Reilly Radar about why it’s important and how it works. Tech crunch has a good mention about Yahoo! Launching Pipes and There’s a nice bit about it from Yahoo MySQL guru Jeremy Zawodny.

The excitement about this product is very high in the tech community, resulting in someone as big as yahoo having their new service overwhelmed. So be patient when trying it out until they’ve got some new servers spun up!





Technorati Tags: ,

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • Reddit
  • SphereIt
  • Technorati
  • YahooMyWeb

Self-healing networks

February 1st, 2007

Last year I wrote an article on building a self-healing network with off the shelf software components. If you are responsible for managing a large UNIX/Linux network it’s a must-read…

An excerpt from the article:

Computer immunology is a hot topic in system administration. Wouldn’t it be great to have our servers solve their own problems? System administrators would be free to work proactively, rather than reactively, to improve the quality of the network.

This is a noble goal, but few solutions have made it out of the lab and into the real world. Most real-world environments automate service monitoring, then notify a human to repair any detected fault. Other sites invest a large amount of time creating and maintaining a custom patchwork of scripts for detecting and repairing frequently recurring faults. This article demonstrates how to build a self-healing network infrastructure using mature open source software components that are widely used by system administrators. These components are NAGIOS and Cfengine.

ONLamp.com — Building a Self-Healing Network

Technorati Tags: , ,

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • Reddit
  • SphereIt
  • Technorati
  • YahooMyWeb