.

Coffee Powered

code and content

Serving files out of GridFS

GridFS is a nifty little feature in MongoDB that allows you to store files of all shapes and sizes in Mongo itself, getting the benefits of Mongo’s sharding and replication. However, since they’re in a database, and not on the filesystem directly, how do we serve them?

There are lots of benchmarks and numbers under the cut. Keep reading!

Right now, there are three options:

  1. Use a “low-level” script handler, like a Rack script or Rails Metal handler to serve them out of the database
  2. Use something like gridfs-fuse to mount the database as a filesystem, and read it with the Fileserver directly
  3. Use something like nginx-gridfs to talk directly to MongoDB from your webserver.

I wasn’t able to get gridfs-fuse to build on my system, but I was able to build the nginx module. The question, of course, is how fast are you going be serving files with each solution?

Filesystem read through Apache

First, I’ll establish a baseline. I’m running Apache as my frontend server, and we’ll use ab to benchmark its throughput.

[chris@polaris conf]# ab -n 50000 -c 10 http://advice/images/embed/alliance-60.png

Server Software:        Apache/2.2.13
Server Hostname:        advice
Server Port:            80

Document Path:          /images/embed/normal_alliance-60.png
Document Length:        31596 bytes

Concurrency Level:      10
Time taken for tests:   1.904 seconds
Complete requests:      5000
Failed requests:        0
Write errors:           0
Total transferred:      159463760 bytes
HTML transferred:       158043192 bytes
Requests per second:    2625.37 [#/sec] (mean)
Time per request:       3.809 [ms] (mean)
Time per request:       0.381 [ms] (mean, across all concurrent requests)
Transfer rate:          81767.87 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   0.4      1       4
Processing:     1    3   0.5      3       6
Waiting:        0    1   0.4      1       4
Total:          2    4   0.4      4       8

Percentage of the requests served within a certain time (ms)
  50%      4
  66%      4
  75%      4
  80%      4
  90%      4
  95%      4
  98%      5
  99%      5
 100%      8 (longest request)

Nice and fast, like like we’d expect.

Filesystem read through nginx

[chris@polaris conf]# ab -n 50000 -c 10 http://advice:81/images/embed/normal_alliance-60.png

Server Software:        nginx/0.8.33
Server Hostname:        advice
Server Port:            81

Document Path:          /images/embed/normal_alliance-60.png
Document Length:        31596 bytes

Concurrency Level:      10
Time taken for tests:   7.623 seconds
Complete requests:      50000
Failed requests:        0
Write errors:           0
Total transferred:      1590513618 bytes
HTML transferred:       1579863192 bytes
Requests per second:    6559.31 [#/sec] (mean)
Time per request:       1.525 [ms] (mean)
Time per request:       0.152 [ms] (mean, across all concurrent requests)
Transfer rate:          203763.10 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       9
Processing:     1    1   0.4      1      11
Waiting:        0    0   0.1      0       9
Total:          1    1   0.5      1      12

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      1
  80%      2
  90%      2
  95%      2
  98%      3
  99%      3
 100%     12 (longest request)

nginx screams. At 6500 requests/sec, it’s blisteringly fast.

GridFS read through nginx-gridfs

[chris@polaris conf]# ab -n 5000 -c 10 http://advice:81/images/gfs/uploads/user/avatar/4b7b2c0e98db7475fc000003/normal_alliance-60.png

Server Software:        nginx/0.8.33
Server Hostname:        advice
Server Port:            81

Document Path:          /images/gfs/uploads/user/avatar/4b7b2c0e98db7475fc000003/normal_alliance-60.png
Document Length:        31596 bytes

Concurrency Level:      10
Time taken for tests:   4.613 seconds
Complete requests:      5000
Failed requests:        0
Write errors:           0
Total transferred:      158580000 bytes
HTML transferred:       157980000 bytes
Requests per second:    1083.88 [#/sec] (mean)
Time per request:       9.226 [ms] (mean)
Time per request:       0.923 [ms] (mean, across all concurrent requests)
Transfer rate:          33570.65 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       1
Processing:     1    9   4.7      9     103
Waiting:        1    9   4.7      9     102
Total:          2    9   4.7      9     103

Percentage of the requests served within a certain time (ms)
  50%      9
  66%      9
  75%      9
  80%      9
  90%      9
  95%      9
  98%      9
  99%     11
 100%    103 (longest request)

Definitely a lot slower, but still very respectable. 1051 requests/sec is going to be more than adequate for most purposes, particularly if fronted with a CDN.

And finally…

Rails Metal handler

The nice thing about the Rails metal handler solution is that it’s easy. No recompiling, just drop the handler into your project and you’re off to the races. That said…

[chris@polaris nginx-gridfs]$ ab -n 250 -c 4  http://advice/images/gfs/uploads/user/avatar/4b7b2c0e98db7475fc000003/normal_alliance-60.png

Server Software:        Apache/2.2.13
Server Hostname:        advice
Server Port:            80

Document Path:          /images/gfs/uploads/user/avatar/4b7b2c0e98db7475fc000003/normal_alliance-60.png
Document Length:        31596 bytes

Concurrency Level:      4
Time taken for tests:   4.646 seconds
Complete requests:      250
Failed requests:        0
Write errors:           0
Total transferred:      7960000 bytes
HTML transferred:       7899000 bytes
Requests per second:    53.81 [#/sec] (mean)
Time per request:       74.338 [ms] (mean)
Time per request:       18.585 [ms] (mean, across all concurrent requests)
Transfer rate:          1673.10 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:    15   74  75.6     34     287
Waiting:        0   72  75.8     30     276
Total:         15   74  75.6     34     288

Percentage of the requests served within a certain time (ms)
  50%     34
  66%     39
  75%    139
  80%    192
  90%    201
  95%    210
  98%    239
  99%    245
 100%    288 (longest request)

I obviously ran far fewer requests this go-round. The reason is pretty obvious – running 5000 requests through the Ruby stack would have taken approximately forever. At 53 requests per second, this is not an attractive solution, particularly if you consider the CPU overhead that it’s incurring.

Conclusions

Solution Requests/second % Apache FS % Nginx FS % Nginx GridFS % Apache Ruby
Filesystem via Apache 2625.37 - 40.03% 242.22% 4,878.96%
Filesystem via Nginx 6559.31 249.84% - 605.17% 12,189.76%
GridFS via nginx module 1083.88 41.28% 16.52% - 2014.27%
Rails metal handler via Passenger 53.81 2.05% 0.82% 4.96% -

If you’re looking to abstract away from storing files on a filesystem, GridFS is a feasable solution. It can really crank some mean output numbers, and though it’s not up to par with a raw filesystem read, also consider that in many production environments, such a raw filesystem read might be happening via an NFS or GFS share, which is going to massively degrade the performance of that request. Given the no-hassle store-and-forget-about-it solution that GridFS offers, even when faced with the challenge of multi-server replication, it seems that you can get enough performance out of it to justify it as a solution.

  • http://chesmart.in Ches Martin

    Thanks for the write-up, it’s always nice to see some numbers! I know this article is getting a little old by now, but a great addition would be a rack-gridfs benchmark. One assumes it’ll fall somewhere between Rails Metal and nginx, it’d just be nice to know exactly where it falls — I’ll soon find out :-)

  • http://lgone.com/html/y2010/815.html 七号站台 » Blog Archive » MongoDB GridFS 数据读取效率 benchmark

    [...] 数据来源:http://www.coffeepowered.net/2010/02/17/serving-files-out-of-gridfs/ [...]

  • http://blog.nosqlfan.com/html/730.html MongoDB GridFS 数据读取效率 benchmark : NoSQLfan

    [...] 数据来源:http://www.coffeepowered.net/2010/02/17/serving-files-out-of-gridfs/ [...]

  • http://xanview.com Roman Gaufman

    I don’t understand – how did you serve the files in rails, did you use pull the data into a string and use send_data?

  • http://coffeepowered.net Chris Heald

    Roman, check part 2. It has the Metal handler in there. You would want to translate it to a middleware for later versions of Rails.

    http://www.coffeepowered.net/2010/02/24/serving-files-out-of-gridfs-part-2/ 

  • Anonymous

    Approx. summary:

    Apache 2500 reqs/sec
    NGINX 6500 reqs/sec
    GridFS NGINX 1,000 reqs/sec

  • Christoph Haas

    Thanks a lot for the benchmark. Serving files from the database is something I have always wanted. Storing metadata seperately from the images in a filesystem never felt right. All the syncing and housekeeping. I’ll try to implement nginx/gridfs on the upcoming relaunch of screenshots.debian.net.

  • Edemilson Lima

    There is a forked version of gridfs-fuse with more features at:
    https://github.com/wwoods/gridfs-fuse 

    I tried to compile it, but I got some errors, that I described here:https://github.com/mikejs/gridfs-fuse/issues/11 

    Could anyone here help me to figure out what is wrong? Thank you in advance!

  • http://www.xrchen.com/2012/05/using-varnish-to-accelerate-file-serving-out-of-gridfs/ Using varnish to accelerate file serving out of GridFS | xrchen.com

    [...] that serving files out of gridfs is significantly slower than before (This is also stated in this article.). Even worse, since there is no non-blocking implementation of mongodb c driver, fetching single [...]

  • http://blog.levelip.com/2013/01/mongodb-gridfs-disaster-recovery/ Disaster recovery: MongoDB e GridFS per la replicazione dei dati, ma non solo | Level IP weblog – Tutto su Server Dedicati, Hosting Dedicato, Server Dedicato in Cloud

    [...] la possibilità di servire i file direttamente da DB a webserver tralasciando il filesystem. Da un benchmark trovato in Rete, con questo sistema Nginx si aggirerebbe sulle 6.500 richieste al [...]

  • http://www.redazioneinrete.it/articolo/3880/disaster-recovery-mongodb-gridfs-la-replicazione-dei-dati-ma-solo/ Disaster recovery: MongoDB e GridFS per la replicazione dei dati, ma non solo | Comunicati stampa e Article Marketing

    [...] la possibilità di servire i file direttamente da DB a webserver tralasciando il filesystem. Da un benchmark trovato in Rete, con questo sistema Nginx si aggirerebbe sulle 6.500 richieste al [...]