Guide to PHP SecurityQuicksearchCalendar
|
Thursday, November 19. 2009Igbinary, The great serializer
If you are using PHP, chances are that at some point you needed to serialize PHP data, whether it was transparently done for you inside the PHP's session handler or directly so that complex PHP data types (objects & arrays) could be stored in DB or files, most people have done this.
The default way of doing it is via a native PHP serializer, which creates a clear-text version of the data, which if you are serializing a fair bit of information ends up being rather verbose (read: BIG). This means that you end up having to store more data in memory, read more data from disk, etc... all of which slow down your application. As I was reading docs on Andrei's new memcache extension (memcached) I came across a binary serialization extension called Igbinary written by Sulake Dynamoid Oy. This extension promised much more optimal serialization routines by using binary, rather then a clear text format. Sounded, good so I decided to run a few benchmarks on it. For my tests I simply grabbed a whole pile of our existing session files, that range from 300 bytes all the way to 50kb in size. Here are the results of the tests: (Datalen represents the original session size). Datalen: 422 Serialization: (old: 0.0837728977203) vs (new: 0.10560798645) DeSerialization: (old: 0.0824909210205) vs (new: 0.0741739273071) Datasize (old: 435) vs (new: 274) Datalen: 0 Serialization: (old: 0.0176718235016) vs (new: 0.0239200592041) DeSerialization: (old: 0.0165419578552) vs (new: 0.0155339241028) Datasize (old: 6) vs (new: 6) Datalen: 12134 Serialization: (old: 2.24982118607) vs (new: 2.8341999054) DeSerialization: (old: 2.16276907921) vs (new: 1.58506608009) Datasize (old: 12153) vs (new: 5165) Datalen: 1022 Serialization: (old: 0.266027927399) vs (new: 0.328108072281) DeSerialization: (old: 0.254096984863) vs (new: 0.19114112854) Datasize (old: 1041) vs (new: 549) Datalen: 383 Serialization: (old: 0.0765979290009) vs (new: 0.10179400444) DeSerialization: (old: 0.0790250301361) vs (new: 0.0743551254272) Datasize (old: 396) vs (new: 259) Datalen: 52932 Serialization: (old: 13.5063810349) vs (new: 9.47544908524) DeSerialization: (old: 18.6985230446) vs (new: 15.0079500675) Datasize (old: 52951) vs (new: 15672) Datalen: 1016 Serialization: (old: 0.319713115692) vs (new: 0.335229873657) DeSerialization: (old: 0.376559972763) vs (new: 0.319846868515) Datasize (old: 1035) vs (new: 546) Datalen: 27150 Serialization: (old: 8.02781701088) vs (new: 6.41440796852) DeSerialization: (old: 8.8329000473) vs (new: 7.9133450985) Datasize (old: 27169) vs (new: 10288) Datalen: 1021 Serialization: (old: 0.327615022659) vs (new: 0.350838899612) DeSerialization: (old: 0.41247010231) vs (new: 0.334348917007) Datasize (old: 1040) vs (new: 549) Datalen: 80 Serialization: (old: 0.0359728336334) vs (new: 0.0439360141754) DeSerialization: (old: 0.0504360198975) vs (new: 0.0389099121094) Datasize (old: 93) vs (new: 59) Datalen: 1020 Serialization: (old: 0.331864118576) vs (new: 0.33015704155) DeSerialization: (old: 0.421649932861) vs (new: 0.337309122086) Datasize (old: 1039) vs (new: 548) Datalen: 8122 Serialization: (old: 2.38410592079) vs (new: 2.02453589439) DeSerialization: (old: 3.10884404182) vs (new: 2.53294110298) Datasize (old: 8141) vs (new: 3138) As you can see from the tests on the datasets smaller 10kb the serialization processing using Igbinary is slightly slower, but that comes with a advantage of data compression rates of nearly 50% and that has an impact on the de-serialization routines which are always faster then the native serializer. When you get to large data sets the smaller sizes also translate to faster serialization speeds, as you can see with the 53kb session sample. Overall it seems to be fairly beneficial especially when it comes to storage of data in memory or disk, since reading less data will provide additional performance benefits beyond speed improvements in serialization or deserialization speeds. And when storing data in memory it also means you have nearly 2x as much room to store your data. There is however something you can do with Igbinary if you care more about speed then space or are using typically small data sets. The extension by default sets compact_strings parameter to true that does string compacting, by building a hash table of strings and only storing each unique string once. This results in smaller datasets, but does require more processing when it comes to performing of serialization. If this option is turned off, you can still get a pretty good compression of data 30-35% in most cases and the serialization speed is radically improved, now in all cases the speed of the igBinary serializer is faster for serializing and either the same (within margin of error) or better for de-serializing. Here are the result of a run with the igbinary.compact_strings set to off. Datalen: 422 Serialization: (old: 0.0849559307098) vs (new: 0.0546460151672) DeSerialization: (old: 0.0798780918121) vs (new: 0.0715551376343) Datasize (old: 435) vs (new: 287) Datalen: 0 Serialization: (old: 0.0175089836121) vs (new: 0.0226271152496) DeSerialization: (old: 0.016165971756) vs (new: 0.015634059906) Datasize (old: 6) vs (new: 6) Datalen: 12134 Serialization: (old: 2.24274802208) vs (new: 1.41059803963) DeSerialization: (old: 2.12310409546) vs (new: 1.75957202911) Datasize (old: 12153) vs (new: 8416) Datalen: 1022 Serialization: (old: 0.264977931976) vs (new: 0.14998793602) DeSerialization: (old: 0.25306892395) vs (new: 0.194103002548) Datasize (old: 1041) vs (new: 656) Datalen: 383 Serialization: (old: 0.0750911235809) vs (new: 0.0483870506287) DeSerialization: (old: 0.077201128006) vs (new: 0.0721011161804) Datasize (old: 396) vs (new: 267) Datalen: 52932 Serialization: (old: 13.734609127) vs (new: 5.25420594215) DeSerialization: (old: 19.5111608505) vs (new: 19.6129918098) Datasize (old: 52951) vs (new: 31663) Datalen: 1016 Serialization: (old: 0.331208944321) vs (new: 0.171414136887) DeSerialization: (old: 0.381343126297) vs (new: 0.344583034515) Datasize (old: 1035) vs (new: 650) Datalen: 27150 Serialization: (old: 7.89811491966) vs (new: 2.80587387085) DeSerialization: (old: 8.85835886002) vs (new: 8.89144587517) Datasize (old: 27169) vs (new: 17552) Datalen: 1021 Serialization: (old: 0.343111991882) vs (new: 0.155555963516) DeSerialization: (old: 0.401265859604) vs (new: 0.340510129929) Datasize (old: 1040) vs (new: 656) Datalen: 80 Serialization: (old: 0.0351438522339) vs (new: 0.0331211090088) DeSerialization: (old: 0.0486869812012) vs (new: 0.0400359630585) Datasize (old: 93) vs (new: 64) Datalen: 1020 Serialization: (old: 0.333083868027) vs (new: 0.172137022018) DeSerialization: (old: 0.413018941879) vs (new: 0.341893911362) Datasize (old: 1039) vs (new: 654) Datalen: 8122 Serialization: (old: 2.46762108803) vs (new: 1.11415696144) DeSerialization: (old: 3.09275889397) vs (new: 2.85179400444) Datasize (old: 8141) vs (new: 5378) As you can see the serialization speed is pretty amazing, once your serialized data set is in excess of 100bytes, you can expect nearly 2x improvement with a ~30% data set reduction with igbinary. I think we got a winner! Trackbacks
Social comments and analytics for this post
This post was mentioned on Twitter by devfunnel: [php: Planet PHP] Igbinary, The great serializer - Ilia Alshanetsky http://ilia.ws/archives/211-Igbinary,-The-great-serializer.html
Weblog: uberVU - social comments
Tracked: Nov 19, 14:14 Comments
Display comments as
(Linear | Threaded)
Nice article. I think Mongo's Bson has a pretty efficient serialization too, -- http://www.mongodb.org/display/DOCS/BSON#BSON-PHP
I've been using this for a while as well combined with memcached for session storage, love it to bits, now for someone to implement a php.ini flag or similar to make it transparent to php on which serializer to use automagically and auto select the right unserialize method and it would be pure heaven! *stares at the bloatware known as wordpress*
Very good idea, will have to test it out. Thanks for a good tip!
Art
I was thinking too to use this for storing sessions too. Another important advantage would be that if you have to use a session "offline" (for example modifying it via an admin interface) getting the content of the session will be easier while currently in PHP the session_decode and session_encode works with the actual $_SESSION, so you have to fire a temporary session to use them which a little cumbersome.
have you run any tests of this vs say, gzip(serialize($foo), 6) ?
curious how this compares with regards to cpu util and size. guessing it's faster but larger?
I have not tried using gzip, simply because the overhead would be excessive. The big plus of igbinary is that not only it produces smaller data representations it can also do so faster then the standard serialize routine.
right; i was trying to get an idea of cpu cost vs compressed data size to understand the tradeoff between the two options.
i might do some testing with this myself; thanks for bringing it to my attn!
so happy that found this article, thanks a lot for advises
So maybe it's a good idea to add a boolean "asBinary" to the native php serialize and deserialize functions? That way everybody can easily access this awesomeness.
Ron, it's a good idea to add something like "asBinary" to the native serialization routines. The compression rate and the increase of speed is such impressive and in most cases I'm able to use binary data instead of clear text.
ini flag or similar to make it transparent to php on which serializer to use automagically and auto select the right unserialize method and it would be pure heaven,cissp
|
ArchivesCategoriesSyndicate This BlogBlog Administration |
|||||||||||||||||||||||||||||||||||||||||||||||||










Comments