.

Coffee Powered

code and content

Safe action caching with Memcached

I’ve started using action caching more aggressively, to handle a large volume of not-signed-in search traffic. It composes a significant chunk of my site’s total traffic, but there’s no good reason to be recomputing full pages for all those long-tail hits. So, the obvious thing is to just implement a quick action cache.

# Controller
caches_action :show, :unless => :user?, :expires_in => 24.hours
# Sweeper
expire_action :controller => "nodes", :action => "show", :id => record.to_param

This all works dandy, but I generate pretty URLs, which means sometimes there are characters in the URL that Memcached doesn’t like. A few minutes after deploying my patch, I started getting IMs from my logger bot telling me things were unhappy.

blippr. com: [#1265856785] ArgumentError: illegal character in key "views/m.blippr.com/apps/346562-PicFo g.mobile"
blippr. com: [#1265857710] ArgumentError: illegal character in key "views/www.blippr.com/apps/336714-µTorrent  "
blippr. com: [#1265857897] ArgumentError: illegal character in key "views/www.blippr.com/apps/337076-ustre am"
blippr. com: [#1265857924] ArgumentError: illegal character in key "views/www.blippr.com/apps/336714-µTorrent  "

That’s memcached complaining about the hash keys we’re giving to it. This just won’t do. We could just regex out “bad” characters, but that means potential collisions, and potentially leaves edge cases. Why not just hash it instead?

A quick monkey patch later:

class ActionController::Caching::Actions::ActionCachePath
	def path
		@cached_path ||= Digest::SHA1.hexdigest(@path)
	end
end

And we’re all dandy. Now, rather than caching by path, the path is hashed, and the hash is used as the path key. Since hashes will always be hexadecimal characters, we know that it’ll never make memcached unhappy.

Path is blippr.com/movies/6696-The-Silence-of-the-Lambs...
Cached fragment hit: views/9111cdefca4a52cb0e3a5ebac4f618127a30efd0 (1.1ms)

There is an argument for not using this technique if you’re using file-based caching, since it means your cached bits won’t be segregated into directories, but memcached doesn’t support expiry by regex anyhow, so there’s no good reason to not use it in this case.

Enjoy!

Eight tips for getting the most out of your Rails app

Rails does an awful lot to optimize page generation, but there are a number of hacks, tweaks, and usage patterns you should be using to get the most out of your app.

Configuration tweaks

There’s a lot of the Rails stack that’s written in Ruby, which is great – it’s portable, it’s flexible, it works out of the box. Unfortunately, for some things, this also means it’s slow. Other times, pieces of the framework aren’t implemented as optimally as they could be. What if you could improve your app’s performance just by installing a few gems and tweaking a few config parameters? Good news – it’s not hard.

1. Replace REXML with LibXML

By default, Rails uses a Ruby-native XML library called REXML. REXML is slow. REXML is very slow. REXML is personally responsible for me almost entirely giving up on Ruby due to a bad encounter with it in my first Ruby project. Fortunately, Rails provides a very easy way to avoid using REXML.

gem install libxml-ruby

Then, in your app’s config/environment.rb

ActiveSupport::XmlMini.backend = 'LibXML'

That’s it. Now, Rails will use the very lean, very fast libxml to parse XML documents, rather than the very fat, very slow REXML. If you’re doing feed parsing, Hash.from_xml, or anything of that nature, this will save you massive amounts of pain.

2. slim_attributes

If you’re using MySQL, there’s no reason why you shouldn’t be using slim_attributes.

Slim Attributes boosts speed in Mysql/Rails ActiveRecord Models by avoiding instantiating Hashes for each result row, and lazily instantiating attributes as needed.

Pretty self-explanatory. Rather than creating massive hashes of everything the DB gives you, slim_attributes causes ActiveRecord to only create ruby objects when you actually ask for them in code. This can reduce both your app’s memory usage and time spent on database queries. It’s not a massive increase, but given that it takes exactly one line of code to add to your project, there’s no reason not to use it.

3. slim_scrooge

From the developers of slim_attributes comes another drop-in database optimization.

SlimScrooge is an optimization layer to ensure your application only fetches the database content needed to minimize wire traffic, excessive SQL queries and reduce conversion overheads to native Ruby types.

SlimScrooge implements inline query optimisation, automatically restricting the columns fetched based on what was used during previous passes through the same part of your code.

Make your ORM work for you! By only fetching the content you need from your database, you reduce over-the-wire overhead, CPU overtime due to type conversion, and other such niceties. Again, just install the gem, require it in your project, and you’re off to the races.

4. fast_xs

By default, string escaping in Rails happens in native Ruby code. This is slow. We don’t like slow. This is particularly prominent in areas like Builder::XmlMarkup, which you are using if you have any templates like foo.xml.builder lying around.

In modestly-sized document, this can result in pretty substantial slowdown in view construction. Rather than re-hashing what others have already done, I’ll point you at Speed up your feed generation in Rails for the long and short on it all. This can result in builder views running upwards of 10x as fast, and all you have to do is install the fast_xs gem – Rails will automatically detect and patch it in if it’s on the system.

5. Erubis

Erubis benchmarks Erubis is an ERB implementation written in C, rather than in Ruby. As a result, it parses ERB templates very, very quickly. In fact, the Erubis benchmarks up it at upwards of 3x faster than the native ERB implementation. Installation is easy – just check the using Erubis with Ruby on Rails guide and you’re off to the races.

Do note that if you’re entirely using Haml or similar, Erubis won’t do much for you. Erubis is much faster than Haml, but Haml is much prettier than ERB. What you end up using is up to you!

Reduce action runtimes

6. Use delayed_job

Sometimes in the course of any web service, you run into some action that takes a little while to process. This is generally a pain and causes a whole host of problems, including frustrated users clicking refresh and spawning a dozen instances of your app all running the same long-running request and tying up valuable request slots. Long-running jobs, or jobs that absolutely must succeed are something of a royal pain in the patootie to handle gracefully. Fortunately, there’s DelayedJob, which is much like a double shot of Codine to ease that terrible pain.

The concept is pretty simple – rather than immediately executing a long-running task, you create a “job” for it, then use an asynchronous daemon to run your job for you.

For example, let’s say that your app wants to post to Twitter when you accomplish some task. This is all well and good if Twitter is up (ha!) and fast and isn’t experiencing any technical issues and you aren’t having any issues on your end and you don’t have any exceptions. In short, it’s fine when things don’t break, but we all know that things break and go wrong and generally end up sideways when you’re ever dealing with any kind of I/O, particularly of the remote web service kind. Rather than trying to post to Twitter in-process, we’ll create a job whose task is to post to Twitter.

Install the delayed_job gem, create the delayed_jobs table as indicated in its documentation, and write your first worker.

module Jobs
	class PostToTwitter < Struct.new(:username, :password, :tweet)
		def perform
			auth = Twitter::HTTPAuth.new(username, password)
			client = Twitter::Base.new(auth)
			client.update(tweet)
		end
	end
end

Now, in your controller code, or after_create in your model, or where ever, rather than posting to Twitter directly, just enqueue a job:


Delayed::Job.enqueue Jobs::PostToTwitter.new(params[:username], params[:password], params[:tweet])

Finally, you’ll want to fire up a DelayedJob daemon. This is pretty easy to do under Rails.

Create a file called script/worker.rb and stick the following in it:

#!/usr/bin/env ruby
require 'rubygems'
require 'daemons'
dir = File.expand_path(File.join(File.dirname(__FILE__), '..'))

daemon_options = {
  :multiple =>; false,
  :dir_mode => :normal,
  :dir => File.join(dir, 'tmp', 'pids'),
  :backtrace => true
}

Daemons.run_proc('job_runner', daemon_options) do
  if ARGV.include?('--')
    ARGV.slice! 0..ARGV.index('--')
  else
    ARGV.clear
  end

  Dir.chdir dir
  RAILS_ENV = ARGV.first || ENV['RAILS_ENV'] || 'development'
  require File.join('config', 'environment')

  Delayed::Worker.new.start
end

Now, all you have to do is call script/worker start and you’re up and running. Jobs will automatically be processed as they’re added to the queue. If they fail, the reason why will be logged and the job will be scheduled to be retried in the future. You can correct any mistakes and re-run the job and watch it happily succeed. If the mistake is on the remote end, then the worker will keep retrying it until it succeeds, and your user doesn’t have to sit there and wait while your app continually receives the API equivalent of the failwhale. Everyone is happy (eventually!)

Once you start using DelayedJob, you’ll find that there are lots of things you can do with it to smooth out your app’s user-response speed. Processing user avatars or large file uploads, recomputing expensive queries (like a social graph update), talking to remote web services, or even sending emails can all be moved away from the realtime and into the background with total ease.

7. Use memcached

This should probably be tip #1. Good caching can make or break a project, and memcached is a fantastic method for managing your caching.

Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.

By default, Rails writes page and fragment cache bits to disk. This is slow, is difficult to clean up after, adds a lot of wear-and-tear to your disk, and is generally undesirable. It’s used because it’s easy. Memcached is a far better solution – it is very much a “giant hash table in the sky”. Dump a value into memory, read it back out of memory later. It is extremely fast, and comes with some super dandy features like time-based expiration that disk caching just won’t get you.

Implementation in Rails is easy. First, install both the memcached daemon and the memcache client. Second, in your environment file, add something like so:

require_library_or_gem 'memcache'
config.cache_store = :mem_cache_store, ["localhost:11211"]

By default, memcached runs on port 11211. Point Rails at it with the above directives and restart your app and that’s it. You’re running on memcached. No more ugly disk sweeping, and you get some really nice features. You can add multiple servers to the :mem_cache_store, too, which is several flavors of awesome. The memcached client will do automatic cluster management and balancing, so you can share the same cache between any number of servers, rather than each server having to have its own copy of that cache. Sweet!

<% cache("my_custom_fragment_name:#{@record_id}", :raw => true, :expires_in => 1.hour) do %>
	<%=render :partial => "some_expensive_partial", :object => @record %>
<% end %>

This is your standard fragment cache, but the :raw and :expires_in parameters are new.

:raw tells the Ruby memcached client to not marshal the content before sticking it in memcached. Since you’re just storing a document fragment (that is, a string), marshaling a ruby string and then unmarshaling it when you want to read it back is both unnecessary and slow.

:expires_in sets a maximum lifetime for this fragment. If we generate a fragment, memcached will timestamp it, and then if we try to read it back, say, 90 minutes later, memcached will recognize “oh hey, this fragment is expired! Sorry, I don’t have anything for you!”. Our view will regenerate and re-cache that fragment, and for the next 60 minutes, rather than trying to regenerate that fragment any time that view is called, it’ll just pull the cached copy from memcached.

If you need to ever flush your cache, it’s as easy as just restarting memcached. That’s it, really. In one fell swoop, you get faster caching (yay!), easier cache management (yay!), and a cache that can scale across multiple servers (double yay!)

8. Use etags

etags are a nifty little feature that are woefully under-used by most web developers. You can think of them as a fingerprint for a given page. Consider the following process:

  1. I request a page for the first time. The app generates the page and sends me both a copy of the page and a small hash finger print.
  2. I request the page a second time, and send the fingerprint of my cached copy back to the server.
  3. The server compares the fingerprint I sent with the fingerprint of its latest copy of the page. If they match, it just sends back a 304 Not Modified header and stops rendering

Sounds handy, right? Sure, and it’s really easy to implement in Rails. Let’s assume you have a BlogController which has a show method for showing a given blog post. You could use the following to implement etags:

def show
	@post = BlogPost.find params[:id]
	@comments = @post.comments.paginate params[:page], 25
	return unless stale? :etag => [@post, @comments]
end

Wait, that’s it? Yes, actually! What’s happening there is Rails builds a fingerprint of the object(s) you the :etag parameter of the stale? method. If the objects don’t change, then the etag doesn’t change. This means that you would get different etags for the same blog post on a different page of comments (good!), or a different etag if a comment is added (good!) or a different etag if the post is edited (good!), but as long as those objects haven’t changed since the user’s last request of that action, the etag will be the same, and the action will stop running right there and tell the browser to just display its cached copy.

On heavily-trafficked pages that aren’t easily customized on a global scale (for example, if you have custom per-user bits on the page that mean that you can’t serve the same page to everyone), this is a really decent way to prevent excessive and wasteful application work. If you don’t use the stale? method, Rails always assumes that the page is stale, and thus needs to be regenerated.

On something of a tangent, can also use stale? :last_modified => @post.updated_at to determine if a page is fresh or stale. However, this does have the drawback of not being compatible with pagination, or sorted views, or anything of that nature. By using etags, you can ensure that each unique data set gets its own etag, and thus, doesn’t have cache collisions.

When you have to store user passwords…

Today we got word of yet-another-database-hack-with-plaintext-passwords. This time, it’s RockYou, purveyor of many of those Facebook and Myspace apps you use. Oops.

Every time this comes up, everyone says “How naive! They should have been using salted hashed passwords!” This is true in any case where you don’t need to use the password again on an external service. With OAuth solutions becoming more and more popular, the need to collect and store user passwords is fortunately becoming more and more rare. However, it does need to happen sometimes, so how do you take the proper precautions when you do need to?

The first step is to encrypt your data before it is persisted into your database. This is pretty easy to do, and there are a number of methods for it. Here’s an example of something I used in a Rails app to provide encryption services.

require 'openssl'
require 'base64'
module Encryption
	class OpenSSL_Key
		PUBLIC_KEY_FILE = "#{RAILS_ROOT}/config/public.pem"
		PRIVATE_KEY_FILE = "#{RAILS_ROOT}/config/private.pem"

		def self.encrypt(data)
			@@public_key ||= OpenSSL::PKey::RSA.new(File.read(PUBLIC_KEY_FILE))
			encrypted_data = @@public_key.public_encrypt(data)
			Base64.encode64(encrypted_data)
		end

		def self.decrypt(data)
			@@private_key ||= OpenSSL::PKey::RSA.new(File.read(PRIVATE_KEY_FILE))
			decoded_data = Base64.decode64(data)
			@@private_key.private_decrypt(decoded_data)
		end
	end

	class OpenSSL_RSA
		IV64 = "xxxxxxxxxxxxxxxxxxxxxxxxxx==\n"
		KEY64 = "xxxxxxxxxxxxxxxxxxxxxxxxxx=\n"
		CIPHER = 'aes-256-cbc'

		def self.encrypt(data)
			@@iv ||= Base64.decode64(IV64)
			@@key ||= Base64.decode64(KEY64)

			cipher = OpenSSL::Cipher::Cipher.new(CIPHER)
			cipher.encrypt
			cipher.key = @@key
			cipher.iv = @@iv
			encrypted_data = cipher.update(data)
			encrypted_data << cipher.final
			Base64.encode64(encrypted_data)
		end

		def self.decrypt(data)
			@@iv ||= Base64.decode64(IV64)
			@@key ||= Base64.decode64(KEY64)

			cipher = OpenSSL::Cipher::Cipher.new(CIPHER)
			cipher.decrypt
			cipher.key = @@key
			cipher.iv = @@iv
			decrypted_data = cipher.update(Base64.decode64(data))
			decrypted_data << cipher.final
		end
	end
end

This provides two classes, Encryption::OpenSSL_Key and Encryption::OpenSSL_RSA which may be used to encrypt arbitrary strings. The OpenSSL_Key class uses a public/private keypair (in our example, read out of the Rails config directory), and the OpenSSL_RSA class uses an initialization vector and secret key. The latter is probably easier, since it means you don’t have to worry about keypairs, and since all the encrypt/decrypt is done locally, there isn’t any need for public public encryption.

Once you have that file in your project, using it is pretty simple.

# Our databse is going to have a field called encrypted_password. We'll use attr_accessor for the password itself.

class MySecretModal < ActiveRecord::Base
	before_save :encrypt_fields
	attr_accessor :password

	def password
		@decrypted_password ||= decrypt_field(:password)
	end

private

	def encrypt_fields
		write_attribute :encrypted_password, Encryption::OpenSSL_RSA.encrypt(@password)
	end

	def decrypt_field(field)
		Encryption::OpenSSL_RSA.decrypt read_attribute("encrypted_#{field}")
	end
end

The net result is that we can still get access to the raw password if we need to, but the content in the database will be RSA-encrypted against a secret key in our application. This is still vulnerable if the attacker gains access to the file containing your RSA IV/key, or if he gains access to your public/private keypair, but it is extremely resilient in the case that an attacker manages to simply dump your users table via SQL injection. You still need to practice good key management, and you absolutely should not use a technique this simplistic for storing financial data – there are a whole set of guidelines and procedures for that kind of information. However, for adding an extra layer of defense to save yourself and your customers from excess embarrassment in the case of a database breach, this is a quick, easy, and effective technique for hardening your data.

This is a rather raw implementation, and there are ways you could package it up so that you could transparently apply it to any number of models or fields, but the basic technique is solid. You could even use something like sql_crypt to easily protect sensitive fields. The technology is there, and “We needed to be able to re-use the password!” isn’t an excuse anymore. Stop storing plaintext passwords – just like backups, it’s just extra work until you need it, and then you’ll be glad you put that extra work in.

Multibyte string slicing for fun and profit

Ran into a small issue in one of my user models. I was using a helper to display a user’s first name, last initial. It looked something like this:

def display_name(user)
  "user.first_name #{user.last_name.slice(0,1)}"
end

Seems innocent enough, sure. Except…it doesn’t work in multibyte character sets. The first Cyrillic speaker to sign up blew that all up. When parsing an XML fragment with a name like this included, I was getting the following error:

ActionView::TemplateError: premature end of regular expression: /^\s*Елена\ �/

nokogiri (1.4.0) lib/nokogiri/xml/fragment_handler.rb:53:in `characters'

The issue, as it turned out, is that String#slice is a bytewise operation, not a character-wise operation like I’d so naively assumed. The issue is pretty easily to observe:

>> "Журинова".slice(0, 1)
=> "\320"

Fortunately, Rails has multibyte support baked in already, so it’s an easy mistake to correct:

def display_name(user)
  "user.first_name #{user.last_name.chars.first}"
end

And now…

>> "Журинова".chars.first
=> "Ж"

It’s very easy to make mistakes like this, and many times you may not even realize that they’re made unless you try to do something funny, like using it as a part of a regex. The safe operation is to never use String#slice or string subscripting on user data, but to instead treat all strings as multibyte strings. Very subtle, but the effects can be pretty nasty if you don’t.

System date considered important

I’ve been slamming my head against the wall for the past two hours. I had an OAuth connection to a remote service working just dandy in development, but as soon as I tried to use that exact same code with the exact same config and exact same gems in production…I was getting “401 unauthorized” errors back from the remote service when attempting to get a request token.

After an extremely tedious series of debugger checks to make sure my OAuth signature was right, I decided to just edit the oauth gem on my production box and add a little debugging statement to dump the HTTP request to stdout. What I found was…surprising.

>> OAuthConsumers::Netflix.new.consumer.get_request_token
opening connection to api.netflix.com...
opened
<- "POST /oauth/request_token HTTP/1.1\r\nAccept: */*\r\nConnection: close\r\nUser-Agent: OAuth gem v0.3.6\r\nAuthorization: OAuth oauth_nonce=\"E73sq4XMkG547EbuCB9GUfG4AtsjD2QFySwLPKj0tI\", oauth_callback=\"oob\", oauth_signature_method=\"HMAC-SHA1\", oauth_timestamp=\"1260049119\", oauth_consumer_key=\"xxxxxxxxxxxxx\", oauth_signature=\"QD5b5Oy8LFLvXWl%2B3R%2BQI0xlIcg%3D\", oauth_version=\"1.0\"\r\nContent-Length: 0\r\nHost: api.netflix.com\r\n\r\n"
-> "HTTP/1.1 401 Unauthorized\r\n"
-> "X-Lighty-Magnet-Uri-Path: /oauth/request_token\r\n"
-> "X-Mashery-Responder: proxyworker-i-e23bae8a.mashery.com\r\n"
-> "X-Mashery-Error-Code: ERR_401_TIMESTAMP_IS_INVALID\r\n"
-> "Content-Type: text/plain\r\n"
-> "Accept-Ranges: bytes\r\n"
-> "Content-Length: 20\r\n"
-> "Date: Sat, 05 Dec 2009 21:27:08 GMT\r\n"
-> "Server: Mashery Proxy\r\n"
-> "\r\n"
reading 20 bytes...
-> "Timestamp Is Invalid"
read 20 bytes
Conn close
OAuth::Unauthorized: 401 Unauthorized
        from /opt/ruby-enterprise-1.8.7-2009.10/lib/ruby/gems/1.8/gems/oauth-0.3.6/lib/oauth/consumer.rb:200:in `token_request'
        from /opt/ruby-enterprise-1.8.7-2009.10/lib/ruby/gems/1.8/gems/oauth-0.3.6/lib/oauth/consumer.rb:128:in `get_request_token'
        from (irb):1

Whoa there, there’s some info that the OAuth gem wasn’t giving back to me. “Timestamp is invalid.” Well then, a quick check of system time, and…oh, hey, it turns out that my system has drifted to about 10 minutes fast. Easily corrected, at least.

# ntpdate -b 0.centos.pool.ntp.org && service ntpd start

With that all done…

>> OAuthConsumers::Netflix.new.consumer.get_request_token
opening connection to api.netflix.com...
opened
<- "POST /oauth/request_token HTTP/1.1\r\nAccept: */*\r\nConnection: close\r\nUser-Agent: OAuth gem v0.3.6\r\nAuthorization: OAuth oauth_nonce=\"YIh5R3CBtAicneNREF5ZUcX80kao1zqRLLA5u8bQWA\", oauth_callback=\"oob\", oauth_signature_method=\"HMAC-SHA1\", oauth_timestamp=\"1260048573\", oauth_consumer_key=\"ksfa9rxmb8dzkxg4npwr74zv\", oauth_signature=\"%2B%2Fyd5sRsJ7qmmZWNRqSlCvByYxw%3D\", oauth_version=\"1.0\"\r\nContent-Length: 0\r\nHost: api.netflix.com\r\n\r\n"
-> "HTTP/1.1 200 OK\r\n"
-> "X-Lighty-Magnet-Uri-Path: /oauth/request_token\r\n"
-> "X-Mashery-Responder: proxyworker-i-7c31a414.mashery.com\r\n"
-> "Content-Type: text/plain\r\n"
-> "Server: Mashery_Server_Adapter_Query\r\n"
-> "Date: Sat, 05 Dec 2009 21:29:32 GMT\r\n"
-> "Accept-Ranges: bytes\r\n"
-> "Content-Length: 194\r\n"
-> "\r\n"
reading 194 bytes...
-> "oauth_token=xxxxxxxx&oauth_token_secret=xxxxxxxxx&application_name=xxxxx&login_url=https%3A%2F%2Fapi-user.netflix.com%2Foauth%2Flogin%3Foauth_token%3Dczjsmzw74nk2wy274g6drmwt"
read 194 bytes
Conn close

All better. Keep those datetimes synched, sports fans. Web services are becoming more and more interconnected, and if there’s one thing I’ve learned from heist movies, it’s that the first step in any successful job is to make sure your watches are synchronized. Nobody likes that guy who shows up 10 minutes late to everything!

Sweet-ass performance hacks: better_assets

HTTP overhead is expensive. DNS lookups are expensive. Start dropping a bunch of Twitter widgets, Google ads, and GetSatisfaction buttons into your killer new Web 2.0 social networking site and you’ll find that your painstakingly-optimized site has slowed to a crawl while the server sits there waiting on Amazon S3 to get its act together and serve you a 300-byte CSS file.

That sorta blows. Let’s not do it.

Introducing better_assets

better_assets is a monkeypatch to the Rails 2.3.2 AssetTagHelper to enable some additional functionality. The key points are:

* Time-based expiry of cached asset files, which is primarily useful for…
* Caching and combining of remote assets
* Finally, you can post-process combined assets with blocks passed to javascript_include_tag and stylesheet_link_tag.

Examples

It’s easy. You use it just like normal:

<%=javascript_include_tag(
  "jquery-1.3.2",
  "foo",
  :cache => "all") {|text| Packr.pack(text, :base62 => true) } %>

Whoa! What is this block madness? Why, that’s an extension to allow you to do whatever you want. In this example, we’re using jcoglan’s Packr library to automatically pack our generated Javascript. This can result in filesize being reduced by pretty massive amounts, and will result in appreciable performance benefits.

Well, that’s all fine and dandy, but it’s not my combined Javascript that’s killing me, it’s all those pesky DNS lookups for all my widget code and CSS. Never fear, you’re covered there, too.

<%=javascript_include_tag(
  "http://rpxnow.com/openid/v2/widget",
  "http://partner.googleadservices.com/gampad/google_service.js",
  "http://s3.amazonaws.com/getsatisfaction.com/feedback/feedback.js",
  "http://blippr.tags.crwdcntrl.net/cc.js",
  :cache => "remote", :lifetime => 12.hours) %>

Madness! Sheer madness! All those remote Javascript files are sucked down, combined, and cached as “remote.js”. It’ll automatically expire after 12 hours, and be re-cached after that. That way, you can get all the performance benefits of serving a single combined JS file without having to stress out that someone over at WidgetHeadquarters is going to change a piece of code and completely screw you over until you notice that your local Javascript file doesn’t match theirs six weeks later.

This, oddly enough, works for CSS files, too.

<%=stylesheet_link_tag(
  "http://s3.amazonaws.com/getsatisfaction.com/feedback/feedback.css",
  "http://s3.amazonaws.com/getsatisfaction.com/feedback/widget.css",
  :cache => "remote", :lifetime => 12.hours
) %>

No more stalling out at requests to Amazon’s S3 for CSS files! No more extraneous DNS requests or HTTP connections! No fuss, no muss, no headaches for you or you user.

All this, and it makes crispy bacon, too.*

To get it, just…wait for it. Very complex procedure ahead:

script/plugin install git://github.com/cheald/better_assets.git

Restart your app, and that’s it. Your assets are now approximately 163% more awesome, while being leaner and looking better in that fabulous summer swimsuit at the same time.

Score.

* Not really.

Fine tuning your garbage collector

If you’re familiar with Ruby at all, you know that it can be a little wacky when it comes to memory usage. Most of us have observed a Mongrel/Passenger instance that starts out small and then grows by leaps and bounds, eventually settling on some uncomfortably high number. We’re going to fix that with Ruby Enterprise Edition and Scrap.
Read More »

Quick tip: Strip URLs before parsing!

Rather than roll my own URL regexes, I prefer to let the existing libraries do the heavy lifting. Ruby has a uri library which is fantastic for parsing (and validating) URLs.

For example, something like this might be used in a model validation:

require 'uri'

def validate_url(url)
	parsed_uri = URI::parse(url)
rescue URI::InvalidURIError
	errors.add :url, "Sorry, that doesn't look like a valid URL"
end

I noticed a bit ago that I started getting invalid URL errors where there shouldn’t be any. After far too long spent in the library’s code, I realized my error: the URLs were being pasted with a trailing space. Stripping the string before attempting to parse it fixed it right up.

I’d argue that URI::parse should likely strip any incoming strings, but in the meantime, remember to strip your user input before trying to determine whether it’s valid or not, or you may end up with frustrated users.

Announcing Scrap

I do a lot of memory and garbage analysis on my Rails apps, and in upgrading to Rails 2.3, I discovered a practical use for the new Rails Metal middleware. Dumping memory stats to my log was just sorta unreadable in a practical scenario, and was more or less entirely unusable in production. Fortunately, Metal provides a really easy way to output readable information to the browser without invoking the full Rails stack. (It’s also an excuse to write a Metal endpoint because it’s new and shiny, but that’s beside the point.)

It’s up at github – installation is dead easy (assuming you’re on Rails 2.3+, of course) – just install the plugin, restart your app, and hit <your url>/stats/scrap in your browser. Bam, instant juicy memory goodness about your app at your fingertips. If you’d like an example of the output, good news! Check it out at http://tachyonsix.com/scrap.htm.

You can use it to troubleshoot heap leaks – just run a few requests, hit your Scrap URL, and see what your deltas look like. Seeing a huge growth in a certain type of object? Chances are pretty good that you have a heap leak, and can start tracking it down.

The request history can help you locate certain actions that might be causing spikes in memory usage. It’ll show the last N requests, along with memory and heap statistics before each request. If there’s a consistent memory usage leap after a certain action, chances are that it’s doing something naughty.

Want to get a bigger picture on what objects are hanging around? You can use the config/scrap.yml file to get Scrap to spit out more detailed reports on instances of a given class. There’s full documentation on it in the README.

Anyhow, give it a shot, let me know what you think.

Things to do when upgrading to Rails 2.3

I’m upgrading blippr to Rails 2.3. Here are some of the things that had to be changed to upgrade:

Switch the application entirely to LibXML for all its XML parsing needs

In config/environment.rb: Add the following

ActiveSupport::XmlMini.backend = 'LibXML'

This means that the faster_xml_simple monkeypatch is no longer needed. I don’t think we’re doing much else with XML on blippr, but it’ll be nice to have libxml-backed parsing all around. I must not use REXML. REXML is the app-killer. REXML is the little-death that brings total obliteration.

Fixes for will_paginate and SQL errors when counting records with a custom :select clause

* Upgrade will_paginate. Even after the upgrade, something about 2.3′s named scope handling was still breaking my app. I have a named scope like so:

  :select => "*, (blips.vote_score+2)/WEIGHT_FACTOR as weighted_score",
  :order => "weighted_score desc"

This was causing .paginate calls with this named scope to fail with an invalid SQL error. will_paginate should automatically clobber :select phrases before attempting to count records, but it wasn’t. The solution is to specify a :count condition to my .paginate calls with the right select clause.

Blip.best.paginate(:page => current_page, :per_page => 30, :count => {:select => "blips.id"})

In general, any paginate call with a :select specified seems to break. The :count clause fixes them.

Upgrade my libmemcached plugin

A lot of the internal session stuff has changed. We use Evan Weaver’s libmemcached client, and an upgraded copy of 37signals’ libmemcached store for Rails. The plugin’s been upgraded to work with 2.3, and provides a session store on top of the general Rails store.

Our caching config now looks something like this:

GENERAL_CACHE_SERVERS = ["localhost:11211"]
GENERAL_CACHE_OPTIONS = {:untaint => true}
SESSION_CACHE_SERVERS = ["localhost:11212"]
SESSION_CACHE_OPTIONS = { :prefix_key => "session:blippr" }
SESSION_MEMCACHE_CLIENT = Memcached.new(SESSION_CACHE_SERVERS, SESSION_CACHE_OPTIONS)

config.cache_store = :libmemcached_store, GENERAL_CACHE_SERVERS, GENERAL_CACHE_OPTIONS
config.action_controller.session_store = :libmemcached_store
config.action_controller.session = {
	:cache => SESSION_MEMCACHE_CLIENT,
	:expires_after => 86400
}

Works great with libmemcached, with separate memcached instances for fragments and sessions (so that an over-populated fragment store won’t start clobbering sessions).

Update query parsing

I parse query parameters for some funky filtering. In 2.2.2 I used:

ActionController::AbstractRequest.parse_query_parameters(query_string)

In 2.3, that becomes:

Rack::Utils.parse_query(query_string)

That’s about it for now, but as problems arise I’ll be sure to add them.