Archive

Monthly Archives: November 2010

I have recently posted on predictions that real time web applications are going in the direction of being entirely event driven, from client (WebSockets) to web-server (Node.js) to datastore (Redisql). My current project: Alchemy Database (formerly known as Redisql) hopes to be the final link in this event driven chain. Alchemy Database is taking a new step towards reducing communication between web-server and datastore (thereby increasing thruput), by implementing Datastore-Side-Scripting.

Often, in web applications, the communication between the web-server and the datastore requires 2+ trips involving sequential requests. For example: first the session is validated and then IFF the session is valid, the request’s data is retrieved from the datastore. The web-server must first wait for the “is-session-valid-lookup” to issue the “get-me-data-for-this-request-lookup” (the latter can even be a set of sequential lookups), which implies a good deal of blocking and waiting in the web-server’s backend. Having 2+ sequential steps in web-server datastore communication may seem trivial, but these steps are commonly the cause of SYSTEM bottlenecks. A single frontend request results in webserver threads blocking and waiting multiple times on sequential datastore requests. Each {block, wait, wake-up} in the web-server means 2 context switches, in addition to the latency of 1+ tcp request(s), which are 4-6 orders of magnitude slower than a RAM lookup.

In such cases, if sequential web-server-datastore request-response pairs can be reduced to a single request/response pair, overall system performance will increase substantially and the system will become more predictable/stable (fewer context-switches, less waiting, less requests, less I/O). Datastore-side-scripting can accomplish this, by pushing trivial logic into the datastore itself. In the example above, the only logic being performed is a “if(session_valid)” which has the cost of 2 context switches and a 2+ fold increase in response duration … which is absurd. In this use-case, pushing the “if(session_valid)” into the datastore makes sense on ALL levels.

Some argue, introducing scripting in the datastore is adding logic to the classic bottleneck in 3 tiered architectures, it will exacerbate bottlenecking, it is BAD. This point is valid in theory. In practice, the main bottleneck in ultra-high-performance in-memory-databases is network I/O. Adding NICs, not adding CPUs (or cores) is a better bet to scale vertically (ref: Handlersocket blog). This means, computers running Alchemy Database spend most of their time transporting packets, on request from: NIC->RAM->CPU and then on response from: CPU->RAM->NIC. The operating system’s marshaling of TCP packets takes up far more resources/time than the trivial in-memory lookup to GET/SET the request’s data. Meaning if a few more trivial commands (e.g. an IF block) are packed into the TCP request packet, the request’s duration is not significantly effected, but the overall system is benefited greatly, by taking (very lengthy) blocking/waiting steps in the web-server out of the equation.

Another name for Datastore-side-scripting is “Stored Procedures”, a term w/ lots of baggage. Stored Procedures are flawed on many levels, yet there was and is a need for them. The reality of Stored Procedures is their syntax is ugly in SQL and they open up all sorts of possibilities for developers, which have often been abused. Yet the basic concept of pushing logic into the database was recently mentioned as a future goal for the NOSQL movement. Tokyo Cabinet has embedded Lua deeply into its product (w/ 200 Lua commands), to allow datastore-side-scripting. Voltdb, a next generation SQL server, supports ONLY Stored Procedures and argues that in production they make more sense than ad-hoc SQL.

Stored Procedures are done in a hacked together SQL-like syntax, Datastore-side-scripting is done by embedding Lua. “The Lua programming language is a small scripting language specifically designed to be embedded in other programs. Lua’s C API allows exceptionally clean and simple code both to call Lua from C, and to call C from Lua”.

Lua provides a full (and tested) scripting language and you can define functions in Lua that access your datastore natively in C. I will explain this further, as it confused me at first. Alchemy Database has Lua embedded into it’s server, both Alchemy Database and the Lua engine are written in C. Alchemy Database will pass text given as the 1st argument of the Alchemy Database “LUA” command to the Lua engine. Additionally Alchemy Database has a Lua function called “client()” that can make Alchemy Database SET/GET/INSERT/SELECT/etc.. calls from w/in Lua. The Lua “client” function is written in C via Lua’s C bindings, so calling “client()” in Lua is actually running code natively in C. See its confusing, but the net effect is the following Alchemy Database command:
LUA 'user_id = client("GET", "session:XYZ"); if (user_id) then return client("GET", "data:" .. user_id); else return ""; end'
will perform the use case described above in a single webserver-to-datastore request and the Lua code is pretty easy to understand/write/extend.

Alchemy Database’s datastore-side-scripting was implemented in about 200 lines of C code, and defines a single Lua function “client()” in C, yet it opens up a whole new level of functionality. Performance tests have shown that running Lua has about a 42% performance hit server-side (i.e. 50K req/s) as compared to a single native Alchemy Database call (e.g. SET vs. LUA ‘return client(“SET”);), but packing multiple “client()” calls into a single LUA function does not cause a linear slowdown, meaning use cases that bottleneck on sequential webserver-datastore I/O will see a HUGE performance boost from pushing logic into the datastore.

I do want to stress Alchemy Database’s datastore-side-scripting is meant to be used w/ caution. Proper usage (IMO) of Lua in Alchemy Database is to implement very trivial logic datastore-side (e.g. do not do a for loop w/ 10K lookups), especially because Alchemy Database is single threaded and long running queries block other queries during their execution. Recommended usage is to try and pack trivial logical blocks into a single datastore request, effectively avoiding sequential webserver-to-datastore requests. Lua scripting does open up the possibility for map-reduce, Alchemy Database-as-a-RDBMS-proxy/cache, (and for pathological hackers) even usage as a zero-security front-end server … but that was not the intent of embedding Lua; if you use Lua in Alchemy Database in this manner, you are hacking, please know what you are doing.

Datastore-side-scripting is a balancing act that can be exploited for gain by careful/savvy developers or wrongly implemented for loss by sloppy programming/data-modelling … It gives more power to the Developer and consequently the developer needs to wield said power wisely.