<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Coffee Powered</title>
	<atom:link href="http://www.coffeepowered.net/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.coffeepowered.net</link>
	<description>code and content</description>
	<lastBuildDate>Wed, 24 Feb 2010 11:51:07 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Serving files out of GridFS, part 2</title>
		<link>http://www.coffeepowered.net/2010/02/24/serving-files-out-of-gridfs-part-2/</link>
		<comments>http://www.coffeepowered.net/2010/02/24/serving-files-out-of-gridfs-part-2/#comments</comments>
		<pubDate>Wed, 24 Feb 2010 11:44:24 +0000</pubDate>
		<dc:creator>Chris Heald</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.coffeepowered.net/?p=244</guid>
		<description><![CDATA[Since my initial experiments with GridFS and nginx-gridfs, I discovered a rather downer of a dealbreaker: compiling Passenger and nginx-gridfs into the same nginx binary makes nginx very unhappy. It hard-freezes (as in, blocks forever) when you request a GridFS file with Passenger enabled. Oops.
So, I sat down and fixed gridfs-fuse. You can grab my [...]]]></description>
			<content:encoded><![CDATA[<p>Since my initial experiments with <a href="http://www.mongodb.org/display/DOCS/GridFS+Specification">GridFS</a> and <a href="http://github.com/mdirolf/nginx-gridfs">nginx-gridfs</a>, I discovered a rather downer of a dealbreaker: compiling <a href="http://www.modrails.com/">Passenger</a> and nginx-gridfs into the same <a href="http://nginx.org/">nginx</a> binary makes nginx very unhappy. It hard-freezes (as in, blocks forever) when you request a GridFS file with Passenger enabled. Oops.</p>
<p>So, I sat down and fixed gridfs-fuse. You can grab <a href="http://github.com/cheald/gridfs-fuse">my branch at GitHub</a>. I made a few changes that make it ideal for serving files out of a GridFS DB, with a few caveats.<br />
<span id="more-244"></span></p>
<h2>Installation and Configuration</h2>
<p>Building it is relatively simple.</p>
<ol>
<li>Install scons, the Python SConstruct utility (on Fedora/CentOS/RHEL, <code>yum install scons</code>)</li>
<li>Extract or symlink a copy of your <a href="http://www.mongodb.org/display/DOCS/Home">mongodb</a> install to <code>/opt/mongo</code></li>
<li>Run <code>scons</code></li>
<li>If all builds well, yay. If not, fix any missing dependencies or path issues. Edit SConstruct to change any paths that you need to.</li>
<li>Create a mount point for your GridFS filesystem; I used /mnt/gridfs (<code>sudo mkdir /mnt/gridfs</code>)</li>
<li>chown your mount point to your webserver&#8217;s user. If you run Apache, this is probably <code>apache</code>. If you run nginx, it&#8217;s probably <code>nobody</code>. (<code>sudo chown nobody.nobody /mnt/gridfs</code>)</li>
<li>Mount the database to the mount point.
<pre class="syntax-highlight:ruby">
sudo -u nobody ./mount_gridfs --db=your_database --host=localhost /mnt/gridfs
</pre>
<p>Change the user and db parameters as required.
</li>
<li>Configure your webserver to serve files appropriately. In my case, I have <a href="http://github.com/jnicklas/carrierwave">carrierwave</a> set up to write files to <code>uploads/model/_id/filename.png</code>, and carrierwave is configured to use <code>/images/gfs</code> as my base URL. This means that for a given file, I might end up with a path like <code>/images/gfs/uploads/user/avatar/4b8475cc69e0dc57e7000005/thumb_untitled-20.png</code>. To cause the GridFS files to be served off of the mount point, I just symlinked the mount to /images/gfs.
<pre class="syntax-highlight:ruby">
cd public/images
ln -s /mnt/gridfs gfs
</pre>
</li>
</ol>
<p>Once that&#8217;s all set up, you should be able to use your webserver to serve images directly out of your Mongo database, and at pretty fair rates, too!</p>
<h2>143% Unscientific Benchmarks</h2>
<pre class="syntax-highlight:ruby">
[chris@polaris gridfs-fuse]# ab -n 5000 -c 25 http://advice:81/images/gfs/uploads/user/avatar/4b8347a698db740b30000057/thumb_adrine-big.png

Server Software:        nginx/0.8.33
Server Hostname:        advice
Server Port:            81

Document Path:          /images/gfs/uploads/user/avatar/4b8347a698db740b30000057/thumb_adrine-big.png
Document Length:        14332 bytes

Concurrency Level:      25
Time taken for tests:   5.029 seconds
Complete requests:      5000
Failed requests:        0
Write errors:           0
Total transferred:      72725000 bytes
HTML transferred:       71660000 bytes
Requests per second:    994.22 [#/sec] (mean)
Time per request:       25.145 [ms] (mean)
Time per request:       1.006 [ms] (mean, across all concurrent requests)
Transfer rate:          14121.93 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:    16   25   1.4     25      52
Waiting:        2   24   1.4     24      52
Total:         17   25   1.4     25      53

Percentage of the requests served within a certain time (ms)
  50%     25
  66%     25
  75%     25
  80%     25
  90%     25
  95%     26
  98%     27
  99%     32
 100%     53 (longest request)
</pre>
<h2>Caveats</h2>
<p>To get this working, I had to hack in directory support. GridFS stores files with paths, but doesn&#8217;t store them in a hierarchy; Fuse navigates a filesystem, which is hierarchical. In order to overcome this, I made gridfs-fuse respond to directory requests as valid. For a given file, gridfs-fuse will walk the following path hierarchy:</p>
<p><code>GET /uploads/user/avatar/4b8347a698db740b30000057/thumb_adrine-big.png</code><br />
Check for <code>uploads</code>, directory exists<br />
Check for <code>uploads/user</code>, directory exists<br />
Check for <code>uploads/user/avatar/</code>, directory exists<br />
Check for <code>uploads/avatar/4b8347a698db740b30000057</code>, directory exists<br />
Check for <code>uploads/user/avatar/4b8347a698db740b30000057/thumb_adrine-big.png</code>, file exists, return file.</p>
<p>There are two things to be aware of here:</p>
<ol>
<li>The deeper your path hierarchy, the more steps gridfs-fuse will take to find your file. Less directory nesting means faster file serving. The performance difference won&#8217;t be massive, but it&#8217;s there.</li>
<li><strong>/!\ Big giant hack. /!\</strong> <em>gridfs-fuse assumes that any path part with a period in it is the path leaf</em>. This is done so that we don&#8217;t have to keep querying the DB with regexes, which degrades performance by about 90% in my testing. Always, always, always make sure your filenames have a period in them, and make sure your directories do not have a period in them. This is a rather hefty set of caveats, but if you&#8217;ll stick to them, you will be rewarded with easy GridFS file serving.</li>
</ol>
<h3>What happens if I don&#8217;t follow those rules?</h3>
<p>A few things happen. If you put periods in directory names, you&#8217;ll get 404s. They&#8217;ll be fast 404s, but they&#8217;ll be 404s. Even if a filepath is valid, like:</p>
<pre class="syntax-highlight:ruby">/images/foo.bar/baz/bin.png</pre>
<p>gridfs-fuse will short-circuit at <code>images/foo.bar</code>, assuming that is the leaf of the hierarchy.</p>
<p>If you don&#8217;t put a period in your filenames, then gridfs-fuse will keep returning &#8220;yup, that&#8217;s a directory&#8221;, even when your webserver requests <code>/images/foo.bar/baz/bin.png/index.html</code> and then <code>/images/foo.bar/baz/bin.png/index.html/index.html</code> and then <code>/images/foo.bar/baz/bin.png/index.html/index.html/index.html</code>, and so forth. There&#8217;s a built-in stop at 10 levels deep &#8211; at 10 levels, gridfs-fuse gives up and just returns a 404, but it&#8217;ll take you a relatively long time to get there, and it&#8217;s really very highly recommended that you don&#8217;t do that.</p>
<h2>What about when gridfs-fuse isn&#8217;t running?</h2>
<p>Never fear, that&#8217;s easily fixed. Just use a Rack or Rails Metal middleware to serve images from GridFS. This is <strong>massively</strong> slower than serving files through gridfs-fuse, but at least your visitors won&#8217;t be treated to a site full of broken images if your mount point goes away for whatever reason. I&#8217;m using the following Metal endpoint. Just throw it into app/metals/gridfs.rb, add <code>config.metals = ["Gridfs"]</code> into your environment.rb, and you&#8217;re off to the races.</p>
<pre class="syntax-highlight:ruby">
# rails metal to be used with carrierwave (gridfs) and MongoMapper

require &#039;mongo&#039;
require &#039;mongo/gridfs&#039;

# Allow the metal piece to run in isolation
require(File.dirname(__FILE__) + &quot;/../../config/environment&quot;) unless defined?(Rails)

class Gridfs
  def self.call(env)
    if env[&quot;PATH_INFO&quot;] =~ /^\/images\/gfs\/(.+)$/
      key = $1
      if ::GridFS::GridStore.exist?(MongoMapper.database, key)
        ::GridFS::GridStore.open(MongoMapper.database, key, &#039;r&#039;) do |file|
          [200, {&#039;Content-Type&#039; =&gt; file.content_type}, [file.read]]
        end
      else
        [404, {&#039;Content-Type&#039; =&gt; &#039;text/plain&#039;}, [&#039;File not found.&#039;]]
      end
    else
      [404, {&#039;Content-Type&#039; =&gt; &#039;text/plain&#039;}, [&#039;File not found.&#039;]]
    end
  end
end
</pre>
<p>(I didn&#8217;t write that, but I can&#8217;t find the source to give credit at the moment).</p>
<p>That gives you a highly performant front-end solution with a reliable fallback. For any given request, the following should happen:</p>
<ol>
<li>Your webserver attempts to load the file out of GridFS. If it can&#8217;t be found (likely due to a missing mountpoint), then&#8230;</li>
<li>The request will fall through to your Metal handler. It will then attempt to serve it from GridFS.</li>
<li>If it still can&#8217;t be found, the request falls through to your Rails app.</li>
</ol>
<p>To prevent step 3 from happening, you might want to change line 18 of the Metal handler to return a 200 and read out a generic &#8220;missing image&#8221; image of some sort. That&#8217;ll prevent 404s from invoking a hit to your app.</p>
<p>Stick a CDN in front of it all, and you have a high-performance file upload solution with automatic replication and sharding that you can treat like any other piece of web data. Hooray!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coffeepowered.net/2010/02/24/serving-files-out-of-gridfs-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Serving files out of GridFS</title>
		<link>http://www.coffeepowered.net/2010/02/17/serving-files-out-of-gridfs/</link>
		<comments>http://www.coffeepowered.net/2010/02/17/serving-files-out-of-gridfs/#comments</comments>
		<pubDate>Wed, 17 Feb 2010 20:54:11 +0000</pubDate>
		<dc:creator>Chris Heald</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.coffeepowered.net/?p=233</guid>
		<description><![CDATA[GridFS is a nifty little feature in MongoDB that allows you to store files of all shapes and sizes in Mongo itself, getting the benefits of Mongo&#8217;s sharding and replication. However, since they&#8217;re in a database, and not on the filesystem directly, how do we serve them?
There are lots of benchmarks and numbers under the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.mongodb.org/display/DOCS/GridFS+Specification">GridFS</a> is a nifty little feature in <a href="http://www.mongodb.org/display/DOCS/Home">MongoDB</a> that allows you to store files of all shapes and sizes in Mongo itself, getting the benefits of Mongo&#8217;s sharding and replication. However, since they&#8217;re in a database, and not on the filesystem directly, how do we serve them?</p>
<p>There are lots of benchmarks and numbers under the cut. Keep reading!</p>
<p><span id="more-233"></span></p>
<p>Right now, there are three options:</p>
<ol>
<li>Use a &#8220;low-level&#8221; script handler, like a Rack script or Rails Metal handler to serve them out of the database</li>
<li>Use something like <a href="http://github.com/mikejs/gridfs-fuse/">gridfs-fuse</a> to mount the database as a filesystem, and read it with the Fileserver directly</li>
<li>Use something like <a href="http://github.com/mdirolf/nginx-gridfs">nginx-gridfs</a> to talk directly to MongoDB from your webserver.</li>
</ol>
<p>I wasn&#8217;t able to get gridfs-fuse to build on my system, but I was able to build the nginx module. The question, of course, is how fast are you going be serving files with each solution?</p>
<h2>Filesystem read through Apache</h2>
<p>First, I&#8217;ll establish a baseline. I&#8217;m running Apache as my frontend server, and we&#8217;ll use ab to benchmark its throughput.</p>
<pre class="syntax-highlight:ruby">[chris@polaris conf]# ab -n 50000 -c 10 http://advice/images/embed/alliance-60.png

Server Software:        Apache/2.2.13
Server Hostname:        advice
Server Port:            80

Document Path:          /images/embed/normal_alliance-60.png
Document Length:        31596 bytes

Concurrency Level:      10
Time taken for tests:   1.904 seconds
Complete requests:      5000
Failed requests:        0
Write errors:           0
Total transferred:      159463760 bytes
HTML transferred:       158043192 bytes
Requests per second:    2625.37 [#/sec] (mean)
Time per request:       3.809 [ms] (mean)
Time per request:       0.381 [ms] (mean, across all concurrent requests)
Transfer rate:          81767.87 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   0.4      1       4
Processing:     1    3   0.5      3       6
Waiting:        0    1   0.4      1       4
Total:          2    4   0.4      4       8

Percentage of the requests served within a certain time (ms)
  50%      4
  66%      4
  75%      4
  80%      4
  90%      4
  95%      4
  98%      5
  99%      5
 100%      8 (longest request)
</pre>
<p>Nice and fast, like like we&#8217;d expect.</p>
<h2>Filesystem read through nginx</h2>
<pre class="syntax-highlight:ruby">[chris@polaris conf]# ab -n 50000 -c 10 http://advice:81/images/embed/normal_alliance-60.png

Server Software:        nginx/0.8.33
Server Hostname:        advice
Server Port:            81

Document Path:          /images/embed/normal_alliance-60.png
Document Length:        31596 bytes

Concurrency Level:      10
Time taken for tests:   7.623 seconds
Complete requests:      50000
Failed requests:        0
Write errors:           0
Total transferred:      1590513618 bytes
HTML transferred:       1579863192 bytes
Requests per second:    6559.31 [#/sec] (mean)
Time per request:       1.525 [ms] (mean)
Time per request:       0.152 [ms] (mean, across all concurrent requests)
Transfer rate:          203763.10 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       9
Processing:     1    1   0.4      1      11
Waiting:        0    0   0.1      0       9
Total:          1    1   0.5      1      12

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      1
  80%      2
  90%      2
  95%      2
  98%      3
  99%      3
 100%     12 (longest request)
</pre>
<p>nginx <i>screams</i>. At 6500 requests/sec, it&#8217;s blisteringly fast.</p>
<h2>GridFS read through nginx-gridfs</h2>
<pre class="syntax-highlight:ruby">[chris@polaris conf]# ab -n 5000 -c 10 http://advice:81/images/gfs/uploads/user/avatar/4b7b2c0e98db7475fc000003/normal_alliance-60.png

Server Software:        nginx/0.8.33
Server Hostname:        advice
Server Port:            81

Document Path:          /images/gfs/uploads/user/avatar/4b7b2c0e98db7475fc000003/normal_alliance-60.png
Document Length:        31596 bytes

Concurrency Level:      10
Time taken for tests:   4.613 seconds
Complete requests:      5000
Failed requests:        0
Write errors:           0
Total transferred:      158580000 bytes
HTML transferred:       157980000 bytes
Requests per second:    1083.88 [#/sec] (mean)
Time per request:       9.226 [ms] (mean)
Time per request:       0.923 [ms] (mean, across all concurrent requests)
Transfer rate:          33570.65 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       1
Processing:     1    9   4.7      9     103
Waiting:        1    9   4.7      9     102
Total:          2    9   4.7      9     103

Percentage of the requests served within a certain time (ms)
  50%      9
  66%      9
  75%      9
  80%      9
  90%      9
  95%      9
  98%      9
  99%     11
 100%    103 (longest request)
</pre>
<p>Definitely a lot slower, but still very respectable. 1051 requests/sec is going to be more than adequate for most purposes, particularly if fronted with a CDN.</p>
<p>And finally&#8230;</p>
<h2>Rails Metal handler</h2>
<p>The nice thing about the Rails metal handler solution is that it&#8217;s easy. No recompiling, just drop the handler into your project and you&#8217;re off to the races. That said&#8230;</p>
<pre class="syntax-highlight:ruby">[chris@polaris nginx-gridfs]$ ab -n 250 -c 4  http://advice/images/gfs/uploads/user/avatar/4b7b2c0e98db7475fc000003/normal_alliance-60.png

Server Software:        Apache/2.2.13
Server Hostname:        advice
Server Port:            80

Document Path:          /images/gfs/uploads/user/avatar/4b7b2c0e98db7475fc000003/normal_alliance-60.png
Document Length:        31596 bytes

Concurrency Level:      4
Time taken for tests:   4.646 seconds
Complete requests:      250
Failed requests:        0
Write errors:           0
Total transferred:      7960000 bytes
HTML transferred:       7899000 bytes
Requests per second:    53.81 [#/sec] (mean)
Time per request:       74.338 [ms] (mean)
Time per request:       18.585 [ms] (mean, across all concurrent requests)
Transfer rate:          1673.10 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:    15   74  75.6     34     287
Waiting:        0   72  75.8     30     276
Total:         15   74  75.6     34     288

Percentage of the requests served within a certain time (ms)
  50%     34
  66%     39
  75%    139
  80%    192
  90%    201
  95%    210
  98%    239
  99%    245
 100%    288 (longest request)
</pre>
<p>I obviously ran far fewer requests this go-round. The reason is pretty obvious &#8211; running 5000 requests through the Ruby stack would have taken approximately <em>forever</em>. At 53 requests per second, this is not an attractive solution, particularly if you consider the CPU overhead that it&#8217;s incurring.</p>
<h2>Conclusions</h2>
<table class='data' border='1'>
<tr>
<th>Solution</th>
<th>Requests/second</th>
<th>% Apache FS</th>
<th>% Nginx FS</th>
<th>% Nginx GridFS</th>
<th>% Apache Ruby</th>
</tr>
<tr>
<td>Filesystem via Apache</th>
<td>2625.37</td>
<td>-</td>
<td>40.03%</td>
<td>242.22%</td>
<td>4,878.96%</td>
</tr>
<tr>
<td>Filesystem via Nginx</th>
<td>6559.31</td>
<td>249.84%</td>
<td>-</td>
<td>605.17%</td>
<td>12,189.76%</td>
</tr>
<tr>
<td>GridFS via nginx module</th>
<td>1083.88</td>
<td>41.28%</td>
<td>16.52%</td>
<td>-</td>
<td>2014.27%</td>
</td>
</tr>
<tr>
<td>Rails metal handler via Passenger</th>
<td>53.81</td>
<td>2.05%</td>
<td>0.82%</td>
<td>4.96%</td>
<td>-</td>
</tr>
</table>
<p>If you&#8217;re looking to abstract away from storing files on a filesystem, GridFS is a feasable solution. It can really crank some mean output numbers, and though it&#8217;s not up to par with a raw filesystem read, also consider that in many production environments, such a raw filesystem read might be happening via an NFS or GFS share, which is going to massively degrade the performance of that request. Given the no-hassle store-and-forget-about-it solution that GridFS offers, even when faced with the challenge of multi-server replication, it seems that you can get enough performance out of it to justify it as a solution.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coffeepowered.net/2010/02/17/serving-files-out-of-gridfs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>counter_cache for MongoMapper</title>
		<link>http://www.coffeepowered.net/2010/02/15/counter_cache-for-mongomapper/</link>
		<comments>http://www.coffeepowered.net/2010/02/15/counter_cache-for-mongomapper/#comments</comments>
		<pubDate>Tue, 16 Feb 2010 02:30:48 +0000</pubDate>
		<dc:creator>Chris Heald</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.coffeepowered.net/?p=229</guid>
		<description><![CDATA[I&#8217;ve started playing with MongoMapper, and it&#8217;s quite excellent, but it does suffer very much from being young. There are lots of pieces missing that veterans of ActiveRecord will take for granted. I&#8217;ve been working around or patching them, for the most part, but I felt that my solution to `:counter_cache` deserved a post.
In short, [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve started playing with <a href="http://github.com/jnunemaker/mongomapper">MongoMapper</a>, and it&#8217;s quite excellent, but it does suffer very much from being young. There are lots of pieces missing that veterans of ActiveRecord will take for granted. I&#8217;ve been working around or patching them, for the most part, but I felt that my solution to `:counter_cache` deserved a post.</p>
<p>In short, I didn&#8217;t want to hack around with the MongoMapper associations code, so I just implemented my own little ride-along version.</p>
<pre class="syntax-highlight:ruby">
module SecretProject
  module CounterCache
    module ClassMethods
      def counter_cache(field)
        class_eval &lt;&lt;-EOF
          after_create &quot;increment_counter_for_#{field}&quot;
          after_destroy &quot;decrement_counter_for_#{field}&quot;
        EOF
      end
    end

    module InstanceMethods
      def method_missing(method, *args)
        if matches = method.to_s.match(/^(in|de)crement_counter_for_(.*)$/) then
          dir = matches[1] == &quot;in&quot; ? 1 : -1
          parent_association = matches[2]
          if parent = self.send(parent_association) then
            name = &quot;#{self.class.to_s.tableize}_count&quot;
            if parent.respond_to?(name)
              parent.collection.update({:_id =&gt; parent._id}, {&quot;$inc&quot; =&gt; {name =&gt; dir}})
            end
          end
        else
          super
        end
      end
    end

    def self.included(receiver)
      receiver.extend         ClassMethods
      receiver.send :include, InstanceMethods
    end
  end
end
</pre>
<p>Throw that into your lib directory, load it with an initializer, and then you can use it something like so:</p>
<pre class="syntax-highlight:ruby">
class Foo
  include MongoMapper::Document
  include SecretProject::CounterCache

  belongs_to :user
  counter_cache :user  # Will cause a foos_count field on the owning user to be maintained when a Foo is created or deleted.
end
</pre>
<p>This&#8217;ll only increment a counter if you&#8217;ve defined one on your parent object, via <code>key :foos_count, Integer</code> or similar, just so that it doesn&#8217;t go around updating every model you might associate it with.</p>
<p>Yay.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coffeepowered.net/2010/02/15/counter_cache-for-mongomapper/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Safe action caching with Memcached</title>
		<link>http://www.coffeepowered.net/2010/02/10/safe-action-caching-with-memcached/</link>
		<comments>http://www.coffeepowered.net/2010/02/10/safe-action-caching-with-memcached/#comments</comments>
		<pubDate>Thu, 11 Feb 2010 04:04:04 +0000</pubDate>
		<dc:creator>Chris Heald</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.coffeepowered.net/?p=223</guid>
		<description><![CDATA[I&#8217;ve started using action caching more aggressively, to handle a large volume of not-signed-in search traffic. It composes a significant chunk of my site&#8217;s total traffic, but there&#8217;s no good reason to be recomputing full pages for all those long-tail hits. So, the obvious thing is to just implement a quick action cache.

# Controller
caches_action :show, [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve started using action caching more aggressively, to handle a large volume of not-signed-in search traffic. It composes a significant chunk of my site&#8217;s total traffic, but there&#8217;s no good reason to be recomputing full pages for all those long-tail hits. So, the obvious thing is to just implement a quick action cache.</p>
<pre class="syntax-highlight:ruby">
# Controller
caches_action :show, :unless =&gt; :user?, :expires_in =&gt; 24.hours
</pre>
<pre class="syntax-highlight:ruby">
# Sweeper
expire_action :controller =&gt; &quot;nodes&quot;, :action =&gt; &quot;show&quot;, :id =&gt; record.to_param
</pre>
<p>This all works dandy, but I generate pretty URLs, which means sometimes there are characters in the URL that Memcached doesn&#8217;t like. A few minutes after deploying my patch, I started getting IMs from my logger bot telling me things were unhappy.</p>
<pre class="syntax-highlight:ruby">
blippr. com: [#1265856785] ArgumentError: illegal character in key &quot;views/m.blippr.com/apps/346562-PicFo g.mobile&quot;
blippr. com: [#1265857710] ArgumentError: illegal character in key &quot;views/www.blippr.com/apps/336714-µTorrent  &quot;
blippr. com: [#1265857897] ArgumentError: illegal character in key &quot;views/www.blippr.com/apps/337076-ustre am&quot;
blippr. com: [#1265857924] ArgumentError: illegal character in key &quot;views/www.blippr.com/apps/336714-µTorrent  &quot;
</pre>
<p>That&#8217;s memcached complaining about the hash keys we&#8217;re giving to it. This just won&#8217;t do. We could just regex out &#8220;bad&#8221; characters, but that means potential collisions, and potentially leaves edge cases. Why not just hash it instead?</p>
<p>A quick monkey patch later:</p>
<pre class="syntax-highlight:ruby">
class ActionController::Caching::Actions::ActionCachePath
	def path
		@cached_path ||= Digest::SHA1.hexdigest(@path)
	end
end
</pre>
<p>And we&#8217;re all dandy. Now, rather than caching by path, the path is hashed, and the hash is used as the path key. Since hashes will always be hexadecimal characters, we know that it&#8217;ll never make memcached unhappy.</p>
<pre class="syntax-highlight:ruby">
Path is blippr.com/movies/6696-The-Silence-of-the-Lambs...
Cached fragment hit: views/9111cdefca4a52cb0e3a5ebac4f618127a30efd0 (1.1ms)
</pre>
<p>There is an argument for not using this technique if you&#8217;re using file-based caching, since it means your cached bits won&#8217;t be segregated into directories, but memcached doesn&#8217;t support expiry by regex anyhow, so there&#8217;s no good reason to not use it in this case.</p>
<p>Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coffeepowered.net/2010/02/10/safe-action-caching-with-memcached/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Eight tips for getting the most out of your Rails app</title>
		<link>http://www.coffeepowered.net/2009/12/23/eight-tips-for-getting-the-most-out-of-your-rails-app/</link>
		<comments>http://www.coffeepowered.net/2009/12/23/eight-tips-for-getting-the-most-out-of-your-rails-app/#comments</comments>
		<pubDate>Wed, 23 Dec 2009 10:04:57 +0000</pubDate>
		<dc:creator>Chris Heald</dc:creator>
				<category><![CDATA[Rails]]></category>
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://www.coffeepowered.net/?p=205</guid>
		<description><![CDATA[Rails does an awful lot to optimize page generation, but there are a number of hacks, tweaks, and usage patterns you should be using to get the most out of your app.
Configuration tweaks
There&#8217;s a lot of the Rails stack that&#8217;s written in Ruby, which is great &#8211; it&#8217;s portable, it&#8217;s flexible, it works out of [...]]]></description>
			<content:encoded><![CDATA[<p>Rails does an awful lot to optimize page generation, but there are a number of hacks, tweaks, and usage patterns you should be using to get the most out of your app.</p>
<h2>Configuration tweaks</h2>
<p>There&#8217;s a lot of the Rails stack that&#8217;s written in Ruby, which is great &#8211; it&#8217;s portable, it&#8217;s flexible, it works out of the box. Unfortunately, for some things, this also means it&#8217;s slow. Other times, pieces of the framework aren&#8217;t implemented as optimally as they could be. What if you could improve your app&#8217;s performance just by installing a few gems and tweaking a few config parameters? Good news &#8211; it&#8217;s not hard.</p>
<h3>1. Replace REXML with LibXML</h3>
<p>By default, Rails uses a Ruby-native XML library called REXML. REXML is slow. REXML is very slow. REXML is personally responsible for me almost entirely giving up on Ruby due to a bad encounter with it in my first Ruby project. Fortunately, Rails provides a very easy way to avoid using REXML.</p>
<pre class="syntax-highlight:ruby">gem install libxml-ruby</pre>
<p>Then, in your app&#8217;s config/environment.rb</p>
<pre class="syntax-highlight:ruby">ActiveSupport::XmlMini.backend = &#039;LibXML&#039;</pre>
<p>That&#8217;s it. Now, Rails will use the very lean, very fast libxml to parse XML documents, rather than the very fat, very slow REXML. If you&#8217;re doing feed parsing, Hash.from_xml, or anything of that nature, this will save you massive amounts of pain.</p>
<h3>2. <a href="http://slim-attributes.rubyforge.org/">slim_attributes</a></h3>
<p>If you&#8217;re using MySQL, there&#8217;s no reason why you shouldn&#8217;t be using slim_attributes.</p>
<blockquote><p>Slim Attributes boosts speed in Mysql/Rails ActiveRecord Models by avoiding instantiating Hashes for each result row, and lazily instantiating attributes as needed.</p></blockquote>
<p>Pretty self-explanatory. Rather than creating massive hashes of everything the DB gives you, slim_attributes causes ActiveRecord to only create ruby objects when you actually ask for them in code. This can reduce both your app&#8217;s memory usage and time spent on database queries. It&#8217;s not a massive increase, but given that it takes exactly one line of code to add to your project, there&#8217;s no reason not to use it.</p>
<h3>3. <a href="http://github.com/sdsykes/slim_scrooge">slim_scrooge</a></h3>
<p>From the developers of slim_attributes comes another drop-in database optimization.</p>
<blockquote><p>SlimScrooge is an optimization layer to ensure your application only fetches the database content needed to minimize wire traffic, excessive <span>SQL</span> queries and reduce conversion overheads to native Ruby types.</p>
<p>SlimScrooge implements inline query optimisation, automatically restricting the columns fetched based on what was used during previous passes through the same part of your code.</p></blockquote>
<p>Make your ORM work for you! By only fetching the content you need from your database, you reduce over-the-wire overhead, CPU overtime due to type conversion, and other such niceties. Again, just install the gem, require it in your project, and you&#8217;re off to the races.</p>
<h3>4. <a href="http://fast-xs.rubyforge.org/">fast_xs</a></h3>
<p>By default, string escaping in Rails happens in native Ruby code. This is slow. We don&#8217;t like slow. This is particularly prominent in areas like Builder::XmlMarkup, which you are using if you have any templates like <code>foo.xml.builder</code> lying around.</p>
<p>In modestly-sized document, this can result in pretty substantial slowdown in view construction. Rather than re-hashing what others have already done, I&#8217;ll point you at <a href="http://samsaffron.com/archive/2008/03/29/Speed+up+your+feed+generation+in+rails">Speed up your feed generation in Rails</a> for the long and short on it all. This can result in builder views running upwards of 10x as fast, and all you have to do is install the fast_xs gem &#8211; Rails will automatically detect and patch it in if it&#8217;s on the system.</p>
<h3>5. <a href="http://www.kuwata-lab.com/erubis/">Erubis</a></h3>
<p><img src="http://www.coffeepowered.net/wp-content/uploads/2009/12/erubis01.png" alt="Erubis benchmarks" title="Erubis benchmarks" width="351" height="262" class="alignright size-full wp-image-213" /> Erubis is an ERB implementation written in C, rather than in Ruby. As a result, it parses ERB templates very, very quickly. In fact, the Erubis benchmarks up it at upwards of 3x faster than the native ERB implementation. Installation is easy &#8211; just check the <a href="http://www.kuwata-lab.com/erubis/users-guide.05.html">using Erubis with Ruby on Rails guide</a> and you&#8217;re off to the races.</p>
<p>Do note that if you&#8217;re entirely using <a href="http://haml-lang.com/">Haml</a> or similar, Erubis won&#8217;t do much for you. Erubis is much faster than Haml, but Haml is much prettier than ERB. What you end up using is up to you!</p>
<h2>Reduce action runtimes</h2>
<h3>6. Use <a href="http://github.com/tobi/delayed_job">delayed_job</a></h3>
<p>Sometimes in the course of any web service, you run into some action that takes a little while to process. This is generally a pain and causes a whole host of problems, including frustrated users clicking refresh and spawning a dozen instances of your app all running the same long-running request and tying up valuable request slots. Long-running jobs, or jobs that absolutely must succeed are something of a royal pain in the patootie to handle gracefully. Fortunately, there&#8217;s DelayedJob, which is much like a double shot of Codine to ease that terrible pain.</p>
<p>The concept is pretty simple &#8211; rather than immediately executing a long-running task, you create a &#8220;job&#8221; for it, then use an asynchronous daemon to run your job for you.</p>
<p>For example, let&#8217;s say that your app wants to post to Twitter when you accomplish some task. This is all well and good if Twitter is up (ha!) and fast and isn&#8217;t experiencing any technical issues and you aren&#8217;t having any issues on your end and you don&#8217;t have any exceptions. In short, it&#8217;s fine when things don&#8217;t break, but we all know that things break and go wrong and generally end up sideways when you&#8217;re ever dealing with any kind of I/O, particularly of the remote web service kind. Rather than trying to post to Twitter in-process, we&#8217;ll create a job whose task is to post to Twitter.</p>
<p>Install the delayed_job gem, create the delayed_jobs table as indicated in its documentation, and write your first worker.</p>
<pre class="syntax-highlight:ruby">
module Jobs
	class PostToTwitter &lt; Struct.new(:username, :password, :tweet)
		def perform
			auth = Twitter::HTTPAuth.new(username, password)
			client = Twitter::Base.new(auth)
			client.update(tweet)
		end
	end
end
</pre>
<p>Now, in your controller code, or after_create in your model, or where ever, rather than posting to Twitter directly, just enqueue a job:</p>
<pre class="syntax-highlight:ruby">

Delayed::Job.enqueue Jobs::PostToTwitter.new(params[:username], params[:password], params[:tweet])
</pre>
<p>Finally, you&#8217;ll want to fire up a DelayedJob daemon. This is pretty easy to do under Rails.</p>
<p>Create a file called <code>script/worker.rb</code> and stick the following in it:</p>
<pre class="syntax-highlight:ruby">#!/usr/bin/env ruby
require &#039;rubygems&#039;
require &#039;daemons&#039;
dir = File.expand_path(File.join(File.dirname(__FILE__), &#039;..&#039;))

daemon_options = {
  :multiple =&gt;; false,
  :dir_mode =&gt; :normal,
  :dir =&gt; File.join(dir, &#039;tmp&#039;, &#039;pids&#039;),
  :backtrace =&gt; true
}

Daemons.run_proc(&#039;job_runner&#039;, daemon_options) do
  if ARGV.include?(&#039;--&#039;)
    ARGV.slice! 0..ARGV.index(&#039;--&#039;)
  else
    ARGV.clear
  end

  Dir.chdir dir
  RAILS_ENV = ARGV.first || ENV[&#039;RAILS_ENV&#039;] || &#039;development&#039;
  require File.join(&#039;config&#039;, &#039;environment&#039;)

  Delayed::Worker.new.start
end
</pre>
<p>Now, all you have to do is call <code>script/worker start</code> and you&#8217;re up and running. Jobs will automatically be processed as they&#8217;re added to the queue. If they fail, the reason why will be logged and the job will be scheduled to be retried in the future. You can correct any mistakes and re-run the job and watch it happily succeed. If the mistake is on the remote end, then the worker will keep retrying it until it succeeds, and your user doesn&#8217;t have to sit there and wait while your app continually receives the API equivalent of the failwhale. Everyone is happy (eventually!)</p>
<p>Once you start using DelayedJob, you&#8217;ll find that there are lots of things you can do with it to smooth out your app&#8217;s user-response speed. Processing user avatars or large file uploads, recomputing expensive queries (like a social graph update), talking to remote web services, or even sending emails can all be moved away from the realtime and into the background with total ease.</p>
<h3>7. Use memcached</h3>
<p>This should probably be tip #1. Good caching can make or break a project, and memcached is a fantastic method for managing your caching. </p>
<blockquote><p>Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.</p></blockquote>
<p>By default, Rails writes page and fragment cache bits to disk. This is slow, is difficult to clean up after, adds a lot of wear-and-tear to your disk, and is generally undesirable. It&#8217;s used because it&#8217;s easy. Memcached is a far better solution &#8211; it is very much a &#8220;giant hash table in the sky&#8221;. Dump a value into memory, read it back out of memory later. It is extremely fast, and comes with some super dandy features like time-based expiration that disk caching just won&#8217;t get you.</p>
<p>Implementation in Rails is easy. First, install both the memcached daemon and the memcache client. Second, in your environment file, add something like so:</p>
<pre class="syntax-highlight:ruby">
require_library_or_gem &#039;memcache&#039;
config.cache_store = :mem_cache_store, [&quot;localhost:11211&quot;]
</pre>
<p>By default, memcached runs on port 11211. Point Rails at it with the above directives and restart your app and that&#8217;s it. You&#8217;re running on memcached. No more ugly disk sweeping, and you get some really nice features. You can add multiple servers to the :mem_cache_store, too, which is several flavors of awesome. The memcached client will do automatic cluster management and balancing, so you can share the same cache between any number of servers, rather than each server having to have its own copy of that cache. Sweet!</p>
<pre class="syntax-highlight:ruby">
&lt;% cache(&quot;my_custom_fragment_name:#{@record_id}&quot;, :raw =&gt; true, :expires_in =&gt; 1.hour) do %&gt;
	&lt;%=render :partial =&gt; &quot;some_expensive_partial&quot;, :object =&gt; @record %&gt;
&lt;% end %&gt;
</pre>
<p>This is your standard fragment cache, but the <code>:raw</code> and <code>:expires_in</code> parameters are new.</p>
<p><code>:raw</code> tells the Ruby memcached client to not marshal the content before sticking it in memcached. Since you&#8217;re just storing a document fragment (that is, a string), marshaling a ruby string and then unmarshaling it when you want to read it back is both unnecessary and slow.</p>
<p><code>:expires_in</code> sets a maximum lifetime for this fragment. If we generate a fragment, memcached will timestamp it, and then if we try to read it back, say, 90 minutes later, memcached will recognize &#8220;oh hey, this fragment is expired! Sorry, I don&#8217;t have anything for you!&#8221;. Our view will regenerate and re-cache that fragment, and for the next 60 minutes, rather than trying to regenerate that fragment any time that view is called, it&#8217;ll just pull the cached copy from memcached.</p>
<p>If you need to ever flush your cache, it&#8217;s as easy as just restarting memcached. That&#8217;s it, really. In one fell swoop, you get faster caching (yay!), easier cache management (yay!), and a cache that can scale across multiple servers (double yay!)</p>
<h3>8. Use etags</h3>
<p>etags are a nifty little feature that are woefully under-used by most web developers. You can think of them as a fingerprint for a given page. Consider the following process:</p>
<ol>
<li>I request a page for the first time. The app generates the page and sends me both a copy of the page and a small hash finger print.</li>
<li>I request the page a second time, and send the fingerprint of my cached copy back to the server.</li>
<li>The server compares the fingerprint I sent with the fingerprint of its latest copy of the page. If they match, it just sends back a <code>304 Not Modified</code> header and stops rendering</li>
</ol>
<p>Sounds handy, right? Sure, and it&#8217;s really easy to implement in Rails. Let&#8217;s assume you have a <code>BlogController</code> which has a <code>show</code> method for showing a given blog post. You could use the following to implement etags:</p>
<pre class="syntax-highlight:ruby">
def show
	@post = BlogPost.find params[:id]
	@comments = @post.comments.paginate params[:page], 25
	return unless stale? :etag =&gt; [@post, @comments]
end
</pre>
<p>Wait, that&#8217;s it? Yes, actually! What&#8217;s happening there is Rails builds a fingerprint of the object(s) you the <code>:etag</code> parameter of the <code>stale?</code> method. If the objects don&#8217;t change, then the etag doesn&#8217;t change. This means that you would get different etags for the same blog post on a different page of comments (good!), or a different etag if a comment is added (good!) or a different etag if the post is edited (good!), but as long as those objects haven&#8217;t changed since the user&#8217;s last request of that action, the etag will be the same, and the action will stop running right there and tell the browser to just display its cached copy.</p>
<p>On heavily-trafficked pages that aren&#8217;t easily customized on a global scale (for example, if you have custom per-user bits on the page that mean that you can&#8217;t serve the same page to everyone), this is a really decent way to prevent excessive and wasteful application work. If you don&#8217;t use the <code>stale?</code> method, Rails always assumes that the page is stale, and thus needs to be regenerated. </p>
<p>On something of a tangent, can also use <code>stale? :last_modified => @post.updated_at</code> to determine if a page is fresh or stale. However, this does have the drawback of not being compatible with pagination, or sorted views, or anything of that nature. By using etags, you can ensure that each unique data set gets its own etag, and thus, doesn&#8217;t have cache collisions.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coffeepowered.net/2009/12/23/eight-tips-for-getting-the-most-out-of-your-rails-app/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>When you have to store user passwords&#8230;</title>
		<link>http://www.coffeepowered.net/2009/12/15/when-you-have-to-store-user-passwords/</link>
		<comments>http://www.coffeepowered.net/2009/12/15/when-you-have-to-store-user-passwords/#comments</comments>
		<pubDate>Tue, 15 Dec 2009 10:14:42 +0000</pubDate>
		<dc:creator>Chris Heald</dc:creator>
				<category><![CDATA[Rails]]></category>
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://www.coffeepowered.net/?p=203</guid>
		<description><![CDATA[Today we got word of yet-another-database-hack-with-plaintext-passwords. This time, it&#8217;s RockYou, purveyor of many of those Facebook and Myspace apps you use. Oops.
Every time this comes up, everyone says &#8220;How naive! They should have been using salted hashed passwords!&#8221; This is true in any case where you don&#8217;t need to use the password again on an [...]]]></description>
			<content:encoded><![CDATA[<p>Today we got word of yet-another-database-hack-with-plaintext-passwords.<a href="http://www.techcrunch.com/2009/12/14/rockyou-hack-security-myspace-facebook-passwords/"> This time, it&#8217;s RockYou</a>, purveyor of many of those Facebook and Myspace apps you use. Oops.</p>
<p>Every time this comes up, everyone says &#8220;How naive! They should have been using salted hashed passwords!&#8221; This is true in any case where you don&#8217;t need to use the password again on an external service. With OAuth solutions becoming more and more popular, the need to collect and store user passwords is fortunately becoming more and more rare. However, it does need to happen sometimes, so how do you take the proper precautions when you do need to?</p>
<p>The first step is to encrypt your data before it is persisted into your database. This is pretty easy to do, and there are a number of methods for it. Here&#8217;s an example of something I used in a Rails app to provide encryption services.</p>
<pre class="syntax-highlight:ruby">
require &#039;openssl&#039;
require &#039;base64&#039;
module Encryption
	class OpenSSL_Key
		PUBLIC_KEY_FILE = &quot;#{RAILS_ROOT}/config/public.pem&quot;
		PRIVATE_KEY_FILE = &quot;#{RAILS_ROOT}/config/private.pem&quot;

		def self.encrypt(data)
			@@public_key ||= OpenSSL::PKey::RSA.new(File.read(PUBLIC_KEY_FILE))
			encrypted_data = @@public_key.public_encrypt(data)
			Base64.encode64(encrypted_data)
		end

		def self.decrypt(data)
			@@private_key ||= OpenSSL::PKey::RSA.new(File.read(PRIVATE_KEY_FILE))
			decoded_data = Base64.decode64(data)
			@@private_key.private_decrypt(decoded_data)
		end
	end

	class OpenSSL_RSA
		IV64 = &quot;xxxxxxxxxxxxxxxxxxxxxxxxxx==\n&quot;
		KEY64 = &quot;xxxxxxxxxxxxxxxxxxxxxxxxxx=\n&quot;
		CIPHER = &#039;aes-256-cbc&#039;

		def self.encrypt(data)
			@@iv ||= Base64.decode64(IV64)
			@@key ||= Base64.decode64(KEY64)

			cipher = OpenSSL::Cipher::Cipher.new(CIPHER)
			cipher.encrypt
			cipher.key = @@key
			cipher.iv = @@iv
			encrypted_data = cipher.update(data)
			encrypted_data &lt;&lt; cipher.final
			Base64.encode64(encrypted_data)
		end

		def self.decrypt(data)
			@@iv ||= Base64.decode64(IV64)
			@@key ||= Base64.decode64(KEY64)

			cipher = OpenSSL::Cipher::Cipher.new(CIPHER)
			cipher.decrypt
			cipher.key = @@key
			cipher.iv = @@iv
			decrypted_data = cipher.update(Base64.decode64(data))
			decrypted_data &lt;&lt; cipher.final
		end
	end
end
</pre>
<p>This provides two classes, <code>Encryption::OpenSSL_Key</code> and <code>Encryption::OpenSSL_RSA</code> which may be used to encrypt arbitrary strings. The OpenSSL_Key class uses a public/private keypair (in our example, read out of the Rails config directory), and the OpenSSL_RSA class uses an initialization vector and secret key. The latter is probably easier, since it means you don&#8217;t have to worry about keypairs, and since all the encrypt/decrypt is done locally, there isn&#8217;t any need for public public encryption.</p>
<p>Once you have that file in your project, using it is pretty simple.</p>
<pre class="syntax-highlight:ruby">
# Our databse is going to have a field called encrypted_password. We&#039;ll use attr_accessor for the password itself.

class MySecretModal &lt; ActiveRecord::Base
	before_save :encrypt_fields
	attr_accessor :password

	def password
		@decrypted_password ||= decrypt_field(:password)
	end

private

	def encrypt_fields
		write_attribute :encrypted_password, Encryption::OpenSSL_RSA.encrypt(@password)
	end

	def decrypt_field(field)
		Encryption::OpenSSL_RSA.decrypt read_attribute(&quot;encrypted_#{field}&quot;)
	end
end
</pre>
<p>The net result is that we can still get access to the raw password if we need to, but the content in the database will be RSA-encrypted against a secret key in our application. This is still vulnerable if the attacker gains access to the file containing your RSA IV/key, or if he gains access to your public/private keypair, but it is extremely resilient in the case that an attacker manages to simply dump your users table via SQL injection. You still need to practice good key management, and you absolutely should not use a technique this simplistic for storing financial data &#8211; there are a whole set of guidelines and procedures for that kind of information. However, for adding an extra layer of defense to save yourself and your customers from excess embarrassment in the case of a database breach, this is a quick, easy, and effective technique for hardening your data.</p>
<p>This is a rather raw implementation, and there are ways you could package it up so that you could transparently apply it to any number of models or fields, but the basic technique is solid. You could even use something like <a href="http://github.com/Mechaferret/sql_crypt">sql_crypt</a> to easily protect sensitive fields. The technology is there, and &#8220;We needed to be able to re-use the password!&#8221; isn&#8217;t an excuse anymore. Stop storing plaintext passwords &#8211; just like backups, it&#8217;s just extra work until you need it, and then you&#8217;ll be glad you put that extra work in.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coffeepowered.net/2009/12/15/when-you-have-to-store-user-passwords/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Multibyte string slicing for fun and profit</title>
		<link>http://www.coffeepowered.net/2009/12/06/multibyte-string-slicing-for-fun-and-profit/</link>
		<comments>http://www.coffeepowered.net/2009/12/06/multibyte-string-slicing-for-fun-and-profit/#comments</comments>
		<pubDate>Sun, 06 Dec 2009 20:53:33 +0000</pubDate>
		<dc:creator>Chris Heald</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.coffeepowered.net/?p=200</guid>
		<description><![CDATA[Ran into a small issue in one of my user models. I was using a helper to display a user&#8217;s first name, last initial. It looked something like this:

def display_name(user)
  &#34;user.first_name #{user.last_name.slice(0,1)}&#34;
end

Seems innocent enough, sure. Except&#8230;it doesn&#8217;t work in multibyte character sets. The first Cyrillic speaker to sign up blew that all up. When [...]]]></description>
			<content:encoded><![CDATA[<p>Ran into a small issue in one of my user models. I was using a helper to display a user&#8217;s first name, last initial. It looked something like this:</p>
<pre class="syntax-highlight:ruby">
def display_name(user)
  &quot;user.first_name #{user.last_name.slice(0,1)}&quot;
end
</pre>
<p>Seems innocent enough, sure. Except&#8230;it doesn&#8217;t work in multibyte character sets. The first Cyrillic speaker to sign up blew that all up. When parsing an XML fragment with a name like this included, I was getting the following error: </p>
<pre class="syntax-highlight:ruby">
ActionView::TemplateError: premature end of regular expression: /^\s*Елена\ �/

nokogiri (1.4.0) lib/nokogiri/xml/fragment_handler.rb:53:in `characters&#039;</pre>
<p>The issue, as it turned out, is that String#slice is a bytewise operation, not a character-wise operation like I&#8217;d so naively assumed. The issue is pretty easily to observe:</p>
<pre class="syntax-highlight:ruby">&gt;&gt; &quot;Журинова&quot;.slice(0, 1)
=&gt; &quot;\320&quot;</pre>
<p>Fortunately, Rails has multibyte support baked in already, so it&#8217;s an easy mistake to correct:</p>
<pre class="syntax-highlight:ruby">
def display_name(user)
  &quot;user.first_name #{user.last_name.chars.first}&quot;
end
</pre>
<p>And now&#8230;</p>
<pre class="syntax-highlight:ruby">&gt;&gt; &quot;Журинова&quot;.chars.first
=&gt; &quot;Ж&quot;</pre>
<p>It&#8217;s very easy to make mistakes like this, and many times you may not even realize that they&#8217;re made unless you try to do something funny, like using it as a part of a regex. The safe operation is to never use String#slice or string subscripting on user data, but to instead treat all strings as multibyte strings. Very subtle, but the effects can be pretty nasty if you don&#8217;t.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coffeepowered.net/2009/12/06/multibyte-string-slicing-for-fun-and-profit/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>System date considered important</title>
		<link>http://www.coffeepowered.net/2009/12/05/system-date-considered-important/</link>
		<comments>http://www.coffeepowered.net/2009/12/05/system-date-considered-important/#comments</comments>
		<pubDate>Sat, 05 Dec 2009 21:38:10 +0000</pubDate>
		<dc:creator>Chris Heald</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.coffeepowered.net/?p=193</guid>
		<description><![CDATA[I&#8217;ve been slamming my head against the wall for the past two hours. I had an OAuth connection to a remote service working just dandy in development, but as soon as I tried to use that exact same code with the exact same config and exact same gems in production&#8230;I was getting &#8220;401 unauthorized&#8221; errors [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been slamming my head against the wall for the past two hours. I had an OAuth connection to a remote service working just dandy in development, but as soon as I tried to use that exact same code with the exact same config and exact same gems in production&#8230;I was getting &#8220;401 unauthorized&#8221; errors back from the remote service when attempting to get a request token.</p>
<p>After an extremely tedious series of debugger checks to make sure my OAuth signature was right, I decided to just edit the oauth gem on my production box and add a little debugging statement to dump the HTTP request to stdout. What I found was&#8230;surprising.</p>
<pre class="syntax-highlight:ruby">
&gt;&gt; OAuthConsumers::Netflix.new.consumer.get_request_token
opening connection to api.netflix.com...
opened
&lt;- &quot;POST /oauth/request_token HTTP/1.1\r\nAccept: */*\r\nConnection: close\r\nUser-Agent: OAuth gem v0.3.6\r\nAuthorization: OAuth oauth_nonce=\&quot;E73sq4XMkG547EbuCB9GUfG4AtsjD2QFySwLPKj0tI\&quot;, oauth_callback=\&quot;oob\&quot;, oauth_signature_method=\&quot;HMAC-SHA1\&quot;, oauth_timestamp=\&quot;1260049119\&quot;, oauth_consumer_key=\&quot;xxxxxxxxxxxxx\&quot;, oauth_signature=\&quot;QD5b5Oy8LFLvXWl%2B3R%2BQI0xlIcg%3D\&quot;, oauth_version=\&quot;1.0\&quot;\r\nContent-Length: 0\r\nHost: api.netflix.com\r\n\r\n&quot;
-&gt; &quot;HTTP/1.1 401 Unauthorized\r\n&quot;
-&gt; &quot;X-Lighty-Magnet-Uri-Path: /oauth/request_token\r\n&quot;
-&gt; &quot;X-Mashery-Responder: proxyworker-i-e23bae8a.mashery.com\r\n&quot;
-&gt; &quot;X-Mashery-Error-Code: ERR_401_TIMESTAMP_IS_INVALID\r\n&quot;
-&gt; &quot;Content-Type: text/plain\r\n&quot;
-&gt; &quot;Accept-Ranges: bytes\r\n&quot;
-&gt; &quot;Content-Length: 20\r\n&quot;
-&gt; &quot;Date: Sat, 05 Dec 2009 21:27:08 GMT\r\n&quot;
-&gt; &quot;Server: Mashery Proxy\r\n&quot;
-&gt; &quot;\r\n&quot;
reading 20 bytes...
-&gt; &quot;Timestamp Is Invalid&quot;
read 20 bytes
Conn close
OAuth::Unauthorized: 401 Unauthorized
        from /opt/ruby-enterprise-1.8.7-2009.10/lib/ruby/gems/1.8/gems/oauth-0.3.6/lib/oauth/consumer.rb:200:in `token_request&#039;
        from /opt/ruby-enterprise-1.8.7-2009.10/lib/ruby/gems/1.8/gems/oauth-0.3.6/lib/oauth/consumer.rb:128:in `get_request_token&#039;
        from (irb):1
</pre>
<p>Whoa there, there&#8217;s some info that the OAuth gem wasn&#8217;t giving back to me. &#8220;Timestamp is invalid.&#8221; Well then, a quick check of system time, and&#8230;oh, hey, it turns out that my system has drifted to about 10 minutes fast. Easily corrected, at least.</p>
<pre class="syntax-highlight:ruby"># ntpdate -b 0.centos.pool.ntp.org &amp;&amp; service ntpd start</pre>
<p>With that all done&#8230;</p>
<pre class="syntax-highlight:ruby">&gt;&gt; OAuthConsumers::Netflix.new.consumer.get_request_token
opening connection to api.netflix.com...
opened
&lt;- &quot;POST /oauth/request_token HTTP/1.1\r\nAccept: */*\r\nConnection: close\r\nUser-Agent: OAuth gem v0.3.6\r\nAuthorization: OAuth oauth_nonce=\&quot;YIh5R3CBtAicneNREF5ZUcX80kao1zqRLLA5u8bQWA\&quot;, oauth_callback=\&quot;oob\&quot;, oauth_signature_method=\&quot;HMAC-SHA1\&quot;, oauth_timestamp=\&quot;1260048573\&quot;, oauth_consumer_key=\&quot;ksfa9rxmb8dzkxg4npwr74zv\&quot;, oauth_signature=\&quot;%2B%2Fyd5sRsJ7qmmZWNRqSlCvByYxw%3D\&quot;, oauth_version=\&quot;1.0\&quot;\r\nContent-Length: 0\r\nHost: api.netflix.com\r\n\r\n&quot;
-&gt; &quot;HTTP/1.1 200 OK\r\n&quot;
-&gt; &quot;X-Lighty-Magnet-Uri-Path: /oauth/request_token\r\n&quot;
-&gt; &quot;X-Mashery-Responder: proxyworker-i-7c31a414.mashery.com\r\n&quot;
-&gt; &quot;Content-Type: text/plain\r\n&quot;
-&gt; &quot;Server: Mashery_Server_Adapter_Query\r\n&quot;
-&gt; &quot;Date: Sat, 05 Dec 2009 21:29:32 GMT\r\n&quot;
-&gt; &quot;Accept-Ranges: bytes\r\n&quot;
-&gt; &quot;Content-Length: 194\r\n&quot;
-&gt; &quot;\r\n&quot;
reading 194 bytes...
-&gt; &quot;oauth_token=xxxxxxxx&amp;oauth_token_secret=xxxxxxxxx&amp;application_name=xxxxx&amp;login_url=https%3A%2F%2Fapi-user.netflix.com%2Foauth%2Flogin%3Foauth_token%3Dczjsmzw74nk2wy274g6drmwt&quot;
read 194 bytes
Conn close</pre>
<p>All better. Keep those datetimes synched, sports fans. Web services are becoming more and more interconnected, and if there&#8217;s one thing I&#8217;ve learned from heist movies, it&#8217;s that the first step in any successful job is to make sure your watches are synchronized. Nobody likes that guy who shows up 10 minutes late to everything!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coffeepowered.net/2009/12/05/system-date-considered-important/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sweet-ass performance hacks: better_assets</title>
		<link>http://www.coffeepowered.net/2009/07/29/sweet-ass-performance-hacks-better_assets/</link>
		<comments>http://www.coffeepowered.net/2009/07/29/sweet-ass-performance-hacks-better_assets/#comments</comments>
		<pubDate>Wed, 29 Jul 2009 10:29:18 +0000</pubDate>
		<dc:creator>Chris Heald</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.coffeepowered.net/?p=183</guid>
		<description><![CDATA[HTTP overhead is expensive. DNS lookups are expensive. Start dropping a bunch of Twitter widgets, Google ads, and GetSatisfaction buttons into your killer new Web 2.0 social networking site and you&#8217;ll find that your painstakingly-optimized site has slowed to a crawl while the server sits there waiting on Amazon S3 to get its act together [...]]]></description>
			<content:encoded><![CDATA[<p>HTTP overhead is expensive. DNS lookups are expensive. Start dropping a bunch of Twitter widgets, Google ads, and GetSatisfaction buttons into your killer new Web 2.0 social networking site and you&#8217;ll find that your painstakingly-optimized site has slowed to a crawl while the server sits there waiting on Amazon S3 to get its act together and serve you a 300-byte CSS file.</p>
<p>That sorta blows. Let&#8217;s not do it.</p>
<h2>Introducing better_assets</h2>
<p><a href="http://github.com/cheald/better_assets/tree/master">better_assets</a> is a monkeypatch to the Rails 2.3.2 AssetTagHelper to enable some additional functionality. The key points are:</p>
<p>* Time-based expiry of cached asset files, which is primarily useful for&#8230;<br />
* Caching and combining of remote assets<br />
* Finally, you can post-process combined assets with blocks passed to <code>javascript_include_tag</code> and <code>stylesheet_link_tag</code>.</p>
<h3>Examples</h3>
<p>It&#8217;s easy. You use it just like normal:</p>
<pre class="syntax-highlight:ruby">&lt;%=javascript_include_tag(
  &quot;jquery-1.3.2&quot;,
  &quot;foo&quot;,
  :cache =&gt; &quot;all&quot;) {|text| Packr.pack(text, :base62 =&gt; true) } %&gt;</pre>
<p>Whoa! What is this block madness? Why, that&#8217;s an extension to allow you to do whatever you want. In this example, we&#8217;re using <a href="http://blog.jcoglan.com/2009/02/22/packr-31-improved-compression-and-private-variable-support/">jcoglan&#8217;s Packr library</a> to automatically pack our generated Javascript. This can result in filesize being reduced by pretty massive amounts, and will result in appreciable performance benefits.</p>
<p>Well, that&#8217;s all fine and dandy, but it&#8217;s not my combined Javascript that&#8217;s killing me, it&#8217;s all those pesky DNS lookups for all my widget code and CSS. Never fear, you&#8217;re covered there, too.</p>
<pre class="syntax-highlight:ruby">
&lt;%=javascript_include_tag(
  &quot;http://rpxnow.com/openid/v2/widget&quot;,
  &quot;http://partner.googleadservices.com/gampad/google_service.js&quot;,
  &quot;http://s3.amazonaws.com/getsatisfaction.com/feedback/feedback.js&quot;,
  &quot;http://blippr.tags.crwdcntrl.net/cc.js&quot;,
  :cache =&gt; &quot;remote&quot;, :lifetime =&gt; 12.hours) %&gt;
</pre>
<p>Madness! Sheer madness! All those remote Javascript files are sucked down, combined, and cached as &#8220;remote.js&#8221;. It&#8217;ll automatically expire after 12 hours, and be re-cached after that. That way, you can get all the performance benefits of serving a single combined JS file without having to stress out that someone over at WidgetHeadquarters is going to change a piece of code and completely screw you over until you notice that your local Javascript file doesn&#8217;t match theirs six weeks later.</p>
<p>This, oddly enough, works for CSS files, too.</p>
<pre class="syntax-highlight:ruby">
&lt;%=stylesheet_link_tag(
  &quot;http://s3.amazonaws.com/getsatisfaction.com/feedback/feedback.css&quot;,
  &quot;http://s3.amazonaws.com/getsatisfaction.com/feedback/widget.css&quot;,
  :cache =&gt; &quot;remote&quot;, :lifetime =&gt; 12.hours
) %&gt;
</pre>
<p>No more stalling out at requests to Amazon&#8217;s S3 for CSS files! No more extraneous DNS requests or HTTP connections! No fuss, no muss, no headaches for you or you user.</p>
<p>All this, and it makes crispy bacon, too.*</p>
<p>To get it, just&#8230;wait for it. Very complex procedure ahead:</p>
<pre class="syntax-highlight:ruby">
script/plugin install git://github.com/cheald/better_assets.git
</pre>
<p>Restart your app, and that&#8217;s it. Your assets are now approximately 163% more awesome, while being leaner and looking better in that fabulous summer swimsuit at the same time.</p>
<p>Score.</p>
<p>* Not really.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coffeepowered.net/2009/07/29/sweet-ass-performance-hacks-better_assets/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Fine tuning your garbage collector</title>
		<link>http://www.coffeepowered.net/2009/06/13/fine-tuning-your-garbage-collector/</link>
		<comments>http://www.coffeepowered.net/2009/06/13/fine-tuning-your-garbage-collector/#comments</comments>
		<pubDate>Sun, 14 Jun 2009 02:52:13 +0000</pubDate>
		<dc:creator>Chris Heald</dc:creator>
				<category><![CDATA[Rails]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.coffeepowered.net/?p=173</guid>
		<description><![CDATA[If you&#8217;re familiar with Ruby at all, you know that it can be a little wacky when it comes to memory usage. Most of us have observed a Mongrel/Passenger instance that starts out small and then grows by leaps and bounds, eventually settling on some uncomfortably high number. We&#8217;re going to fix that with Ruby [...]]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;re familiar with Ruby at all, you know that it can be a little wacky when it comes to memory usage. Most of us have observed a Mongrel/Passenger instance that starts out small and then grows by leaps and bounds, eventually settling on some uncomfortably high number. We&#8217;re going to fix that with <a href="http://www.rubyenterpriseedition.com/">Ruby Enterprise Edition</a> and <a href="http://github.com/cheald/scrap/tree/master">Scrap</a>.<br />
<span id="more-173"></span><br />
The Ruby garbage collector&#8217;s behavior is controlled by a number of constants. In the MRI, these are compiled into Ruby itself, and don&#8217;t change. However, if you&#8217;re using REE you can override them with environment variables on startup. It&#8217;s terribly handy.</p>
<h3>First, the boring documentation</h3>
<p>All the juicy information is available <a href="http://www.rubyenterpriseedition.com/documentation.html#_garbage_collector_performance_tuning">in the documentation</a>, but I&#8217;m going to just go over the key points real quick.</p>
<p><code>RUBY_HEAP_MIN_SLOTS</code>: This is the number of &#8220;heap slots&#8221; that each Ruby instance starts up with. One heap slot can hold one Ruby object. By default, this is 10,000. By controlling this value, we can get our apps to stabilize very quickly. More on this later.</p>
<p><code>RUBY_HEAP_SLOTS_INCREMENT</code>: Once Ruby has allocated <code>RUBY_HEAP_MIN_SLOTS</code> objects on its first heap, it will have to allocate a second heap to make room for more. This variable controls the size of this second heap, and sets the baseline for future heaps, as well.</p>
<p><code>RUBY_HEAP_SLOTS_GROWTH_FACTOR</code>: For heaps #3 and onward, Ruby uses <code>RUBY_HEAP_SLOTS_INCREMENT</code> and this value to determine the size to allocate for the new heap. By default, this is 1.8, meaning that your third heap will end up with 10,000 * 1.8 = 18,000 slots in it.</p>
<p><code>RUBY_HEAP_FREE_MIN</code>: After each garbage collection run, if the number of free slots is less than <code>RUBY_HEAP_FREE_MIN</code>, a new heap will be allocated. The default is 4096.</p>
<p>So, let&#8217;s look at this practically. Presume that we have a Rails process that is going to require 50,000 Ruby objects before it&#8217;s fully initialized. The allocation process, when at defaults, will look something like this:</p>
<p>Allocate 10,000 slots (10,000 total available)<br />
Allocate 10,000 slots (20,000 total available)<br />
Allocate 18,000 slots (38,000 total available)<br />
Allocate 68,400 slots (106,400 total available)</p>
<p>So, we end up with about 53% more slots than we actually needed, and it took us four heap allocations to even boot the process. Surely we can do better.</p>
<h3>Enter Scrap.</h3>
<p><a href="http://github.com/cheald/scrap/tree/master">Scrap</a> is a little <a href="http://weblog.rubyonrails.org/2008/12/17/introducing-rails-metal">Metal</a> handler I wrote for tracking memory usage and garbage statistics over an instance&#8217;s lifetime. Installing it is trivial &#8211; just drop it into your vendor directory, restart your app, and navigate to <code>http://yoururl.com/stats/scrap</code>.</p>
<p>With this in hand, we can peek our memory usage and see what we can see.</p>
<p>There are some stats at the top, but for our purposes, we&#8217;re interested in the per-request garbage statistics. The newest request is near the top of the file, and the oldest request is at the bottom of the file. The last 50 requests are tracked. Each request looks something like this:</p>
<pre><code>
[71.92 MB] GET /apps/176568-WordPress

Number of objects    : 817571 (658305 AST nodes, 80.52%)
Heap slot size       : 20
GC cycles so far     : 503
Number of heaps      : 7
Total size of objects: 15968.18 KB
Total size of heaps  : 18036.81 KB (2068.63 KB = 11.47% unused)
Leading free slots   : 27104 (529.38 KB = 2.93%)
Trailing free slots  : 1 (0.02 KB = 0.00%)
Number of contiguous groups of 16 slots: 2829 (4.90%)
Number of terminal objects: 4307 (0.47%)
</code></pre>
<p>Key points here for the time being are <code>Number of objects</code> and <code>Number of heaps</code>. When we look at the number of objects &#8211; in this case, 817,000, it&#8217;s obvious that we&#8217;re going to have to allocate a number of heaps to handle all those objects. Rails&#8217; boot-up cost is fairly significant, and the default Ruby settings just really don&#8217;t cut it here. As you can see, we&#8217;ve allocated 7 heaps, and we&#8217;re using 15.9 of 18.0 MB allocated to the heap. Once a heap is allocated, it&#8217;s never de-allocated, so we&#8217;re perma-stuck at 18 MB of heap usage. Note that this isn&#8217;t the size of all the data in the program &#8211; just the space allocated for objects. A string that contains 100MB of data will only consume 20 bytes (that&#8217;s the &#8220;heap slot size &#8211; the amount of memory each object on the heap consumes&#8221;) on the heap. </p>
<p>However, what if we could just allocate the whole startup cost in the initial heap, and save ourselves the problems of having to reallocate so often?</p>
<p>We note that we have 891k slots allocated, so we can guesstimate at a number to set our initial allocation to. In my production app, I set mine to 1,250,000 &#8211; I was observing peaks around the 1,100,000 mark, and just increased it by 10% and rounded up.</p>
<p>So, my first custom environment variable is </p>
<p><code>RUBY_HEAP_MIN_SLOTS=1250000</code></p>
<p>And it results in something like this on the app&#8217;s first boot:</p>
<p>[137.99 MB] GET /movies/7505-Star-Wars-Episode-V-The-Empire-Strikes-Back</p>
<pre><code>Number of objects    : 933037 (664785 AST nodes, 71.25%)
Heap slot size       : 20
GC cycles so far     : 12
Number of heaps      : 1
Total size of objects: 18223.38 KB
Total size of heaps  : 24414.08 KB (6190.70 KB = 25.36% unused)
Leading free slots   : 316963 (6190.68 KB = 25.36%)
Trailing free slots  : 0 (0.00 KB = 0.00%)
Number of contiguous groups of 16 slots: 19810 (25.36%)
Number of terminal objects: 25941 (2.08%)</code></pre>
<p>Yowza, a full 25% of my heap is unused after boot. But&#8230;well, that&#8217;s okay. We&#8217;ve only allocated 1 heap, and later on, my object allocation grows to around 1,100,000. This is still 15k under the heap size, and I&#8217;ve set <code>RUBY_HEAP_FREE_MIN=12500</code> (1% of the initial size), so if I have less than 12,500 heap objects free after a GC cycle, a new heap will be allocated. Stabilizing there means that I end up with 1 heap for the lifetime of my app, and I end up sitting just under the threshold that&#8217;d cause a new heap to be born. If I have a leak, or a super heavy action or something, though, that might kick me over my limit and require a new heap. So, we come to&#8230;</p>
<p><code>RUBY_HEAP_SLOTS_INCREMENT=100000</code></p>
<p>This value says &#8220;Hey, if you have to allocate a second heap, start with this many slots&#8221;. If we go over our limit of 1.25 million slots, we&#8217;ll allocate a second heap that&#8217;s about 8% the size of the original. That seems awfully small, but consider that we&#8217;re hoping to never get to that heap.</p>
<p>Should we end up using that entire second heap, then we have to worry about our third setting, <code>RUBY_HEAP_SLOTS_GROWTH_FACTOR=1</code>. This says &#8220;Each new heap should be 1.0 as large as the previous heap.&#8221; In this case, it means I&#8217;ll keep allocating 100k-slot heaps until the cows come home. In an untuned environment, this could be bad &#8211; we would either end up having to do a <em>ton</em> of allocations to get to our target, or we would overallocate very badly. However, because we know our app&#8217;s memory requirements, and know about where we want it to end up, a relatively small, linear growth factor is just what the doctor ordered here.</p>
<h3>Okay, now what?</h3>
<p>So, we have a collection of settings with which to run our app. Great! Now, how do we use it?</p>
<p>Fortunately, it&#8217;s easy.</p>
<pre><code>
pushd `which ruby | xargs dirname`
sudo vim ruby-with-env
</code></pre>
<p>We&#8217;re going to create a little bash script with the following:</p>
<pre class="syntax-highlight:ruby">
#!/bin/bash
export RUBY_HEAP_MIN_SLOTS=1250000
export RUBY_HEAP_SLOTS_INCREMENT=100000
export RUBY_HEAP_SLOTS_GROWTH_FACTOR=1
export RUBY_GC_MALLOC_LIMIT=30000000
export RUBY_HEAP_FREE_MIN=12500
exec &quot;/opt/ree/bin/ruby&quot; &quot;$@&quot;
</pre>
<p>Note that last line &#8211; the path will have to match the path to your Ruby executable, which fortunately, should be in the directory that you&#8217;re in.</p>
<p>Save it, don&#8217;t forget to <code>chmod a+x ruby-with-env</code>, and then edit your Apache or nginx configuration.</p>
<p>Under nginx, you&#8217;ll have a line like this:</p>
<p><code>passenger_ruby /opt/ruby-enterprise-1.8.6-20090610/bin/ruby;</code></p>
<p>Just change it to use your new wrapper script, like so:</p>
<p><code>passenger_ruby /opt/ruby-enterprise-1.8.6-20090610/bin/ruby-with-env;</code></p>
<p>The process is similarly easy for Apache &#8211; the line you need is something like:</p>
<p><code>PassengerRuby /opt/ruby-enterprise-1.8.6-20090610/bin/ruby</code></p>
<p>It might be in either your <code>httpd.conf</code> or <code>conf.d/passenger.conf</code>.</p>
<p>Once you&#8217;re all edited up, restart your webserver, and congratulations, you&#8217;ve got a fine-tuned garbage collector humming along with your app.</p>
<h3>Taking out the garbage</h3>
<p>&#8220;But Chris!&#8221;, you say, &#8220;There&#8217;s a variable in there that you didn&#8217;t talk about! What gives?&#8221; You are indeed correct, astute reader. We&#8217;ve thus far avoided the <code>RUBY_GC_MALLOC_LIMIT</code> variable. This is a handle little setting that lets you tell Ruby how often to clean up after itself. Ruby is written in C, and C uses <code>malloc</code> to allocate memory. Ruby just keeps a little counter each time it allocates an object with malloc, and it runs its garbage collector after so many malloc calls have been made. I haven&#8217;t found a great way to tune this one yet, except via experimentation, but here&#8217;s what to know about it:</p>
<ol>
<li>The lower this value is, the more often your garbage collector runs. Garbage collection is slow. Garbage collection is painfully slow. If a user is waiting on garbage collection, they are going to become impatient. You want as few users waiting on garbage collection as possible.</li>
<li>The higher this value is, the more memory Ruby will allocate before it tries to clean up after itself. If this value is too high, you&#8217;ll have dead objects hanging around eating up heap space, and possibly causing Ruby to crap itself and allocate a new heap. This is bad.</li>
<li>To tune this value, you want to find the happy medium, wherein you stabilize under your initial heap allocation value, but with as few garbage collection passes as possible. Read up on <a href="http://blog.evanweaver.com/articles/2009/04/09/ruby-gc-tuning/">Evan Weaver&#8217;s blog</a> for some more in-depth analysis of what garbage collection frequency tuning can do to your app&#8217;s performance.
<li>If you have excess memory and want a faster app, err on the side of this being too high. If you are on a tight memory budget, and would prefer slower actions in exchange for not blowing your heap and allocating a whole new one, err on the side of this being too low.</li>
<li>Recommended values for this are all over the board. Evan recommends a setting of 50 million. I&#8217;m using a setting of 30 million. The Ruby default is 8 million. You&#8217;ll have to play around and find what works best for you. Just pay attention to how many requests there are in between that &#8220;GC cycles so far&#8221; number incrementing in Scrap, and you&#8217;ll be able to measure approximately how often you&#8217;re entering a GC cycle.
</ol>
<p>Good luck with it, and have fun!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coffeepowered.net/2009/06/13/fine-tuning-your-garbage-collector/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
