If you are using PHP, chances are that at some point you needed to serialize PHP data, whether it was transparently done for you inside the PHP's session handler or directly so that complex PHP data types (objects & arrays) could be stored in DB or files, most people have done this.
The default way of doing it is via a native PHP serializer, which creates a clear-text version of the data, which if you are serializing a fair bit of information ends up being rather verbose (read: BIG). This means that you end up having to store more data in memory, read more data from disk, etc... all of which slow down your application. As I was reading docs on
written by Sulake Dynamoid Oy. This extension promised much more optimal serialization routines by using binary, rather then a clear text format. Sounded, good so I decided to run a few benchmarks on it.
For my tests I simply grabbed a whole pile of our existing session files, that range from 300 bytes all the way to 50kb in size. Here are the results of the tests: (Datalen represents the original session size).
Datalen: 422 Serialization: (old: 0.0837728977203) vs (new: 0.10560798645) DeSerialization: (old: 0.0824909210205) vs (new: 0.0741739273071) Datasize (old: 435) vs (new: 274)
Datalen: 0 Serialization: (old: 0.0176718235016) vs (new: 0.0239200592041) DeSerialization: (old: 0.0165419578552) vs (new: 0.0155339241028) Datasize (old: 6) vs (new: 6)
Datalen: 12134 Serialization: (old: 2.24982118607) vs (new: 2.8341999054) DeSerialization: (old: 2.16276907921) vs (new: 1.58506608009) Datasize (old: 12153) vs (new: 5165)
Datalen: 1022 Serialization: (old: 0.266027927399) vs (new: 0.328108072281) DeSerialization: (old: 0.254096984863) vs (new: 0.19114112854) Datasize (old: 1041) vs (new: 549)
Datalen: 383 Serialization: (old: 0.0765979290009) vs (new: 0.10179400444) DeSerialization: (old: 0.0790250301361) vs (new: 0.0743551254272) Datasize (old: 396) vs (new: 259)
Datalen: 52932 Serialization: (old: 13.5063810349) vs (new: 9.47544908524) DeSerialization: (old: 18.6985230446) vs (new: 15.0079500675) Datasize (old: 52951) vs (new: 15672)
Datalen: 1016 Serialization: (old: 0.319713115692) vs (new: 0.335229873657) DeSerialization: (old: 0.376559972763) vs (new: 0.319846868515) Datasize (old: 1035) vs (new: 546)
Datalen: 27150 Serialization: (old: 8.02781701088) vs (new: 6.41440796852) DeSerialization: (old: 8.8329000473) vs (new: 7.9133450985) Datasize (old: 27169) vs (new: 10288)
Datalen: 1021 Serialization: (old: 0.327615022659) vs (new: 0.350838899612) DeSerialization: (old: 0.41247010231) vs (new: 0.334348917007) Datasize (old: 1040) vs (new: 549)
Datalen: 80 Serialization: (old: 0.0359728336334) vs (new: 0.0439360141754) DeSerialization: (old: 0.0504360198975) vs (new: 0.0389099121094) Datasize (old: 93) vs (new: 59)
Datalen: 1020 Serialization: (old: 0.331864118576) vs (new: 0.33015704155) DeSerialization: (old: 0.421649932861) vs (new: 0.337309122086) Datasize (old: 1039) vs (new: 548)
Datalen: 8122 Serialization: (old: 2.38410592079) vs (new: 2.02453589439) DeSerialization: (old: 3.10884404182) vs (new: 2.53294110298) Datasize (old: 8141) vs (new: 3138)
As you can see from the tests on the datasets smaller 10kb the serialization processing using Igbinary is slightly slower, but that comes with a advantage of data compression rates of nearly 50% and that has an impact on the de-serialization routines which are always faster then the native serializer. When you get to large data sets the smaller sizes also translate to faster serialization speeds, as you can see with the 53kb session sample. Overall it seems to be fairly beneficial especially when it comes to storage of data in memory or disk, since reading less data will provide additional performance benefits beyond speed improvements in serialization or deserialization speeds. And when storing data in memory it also means you have nearly 2x as much room to store your data.
There is however something you can do with Igbinary if you care more about speed then space or are using typically small data sets. The extension by default sets compact_strings parameter to true that does string compacting, by building a hash table of strings and only storing each unique string once. This results in smaller datasets, but does require more processing when it comes to performing of serialization. If this option is turned off, you can still get a pretty good compression of data 30-35% in most cases and the serialization speed is radically improved, now in all cases the speed of the igBinary serializer is faster for serializing and either the same (within margin of error) or better for de-serializing. Here are the result of a run with the igbinary.compact_strings set to off.
Datalen: 422 Serialization: (old: 0.0849559307098) vs (new: 0.0546460151672) DeSerialization: (old: 0.0798780918121) vs (new: 0.0715551376343) Datasize (old: 435) vs (new: 287)
Datalen: 0 Serialization: (old: 0.0175089836121) vs (new: 0.0226271152496) DeSerialization: (old: 0.016165971756) vs (new: 0.015634059906) Datasize (old: 6) vs (new: 6)
Datalen: 12134 Serialization: (old: 2.24274802208) vs (new: 1.41059803963) DeSerialization: (old: 2.12310409546) vs (new: 1.75957202911) Datasize (old: 12153) vs (new: 8416)
Datalen: 1022 Serialization: (old: 0.264977931976) vs (new: 0.14998793602) DeSerialization: (old: 0.25306892395) vs (new: 0.194103002548) Datasize (old: 1041) vs (new: 656)
Datalen: 383 Serialization: (old: 0.0750911235809) vs (new: 0.0483870506287) DeSerialization: (old: 0.077201128006) vs (new: 0.0721011161804) Datasize (old: 396) vs (new: 267)
Datalen: 52932 Serialization: (old: 13.734609127) vs (new: 5.25420594215) DeSerialization: (old: 19.5111608505) vs (new: 19.6129918098) Datasize (old: 52951) vs (new: 31663)
Datalen: 1016 Serialization: (old: 0.331208944321) vs (new: 0.171414136887) DeSerialization: (old: 0.381343126297) vs (new: 0.344583034515) Datasize (old: 1035) vs (new: 650)
Datalen: 27150 Serialization: (old: 7.89811491966) vs (new: 2.80587387085) DeSerialization: (old: 8.85835886002) vs (new: 8.89144587517) Datasize (old: 27169) vs (new: 17552)
Datalen: 1021 Serialization: (old: 0.343111991882) vs (new: 0.155555963516) DeSerialization: (old: 0.401265859604) vs (new: 0.340510129929) Datasize (old: 1040) vs (new: 656)
Datalen: 80 Serialization: (old: 0.0351438522339) vs (new: 0.0331211090088) DeSerialization: (old: 0.0486869812012) vs (new: 0.0400359630585) Datasize (old: 93) vs (new: 64)
Datalen: 1020 Serialization: (old: 0.333083868027) vs (new: 0.172137022018) DeSerialization: (old: 0.413018941879) vs (new: 0.341893911362) Datasize (old: 1039) vs (new: 654)
Datalen: 8122 Serialization: (old: 2.46762108803) vs (new: 1.11415696144) DeSerialization: (old: 3.09275889397) vs (new: 2.85179400444) Datasize (old: 8141) vs (new: 5378)
As you can see the serialization speed is pretty amazing, once your serialized data set is in excess of 100bytes, you can expect nearly 2x improvement with a ~30% data set reduction with igbinary. I think we got a winner!
uberVU - social comments on : Social comments and analytics for this post