The case for Datastore-Side-Scripting


I have recently posted on predictions that real time web applications are going in the direction of being entirely event driven, from client (WebSockets) to web-server (Node.js) to datastore (Redisql). My current project: Alchemy Database (formerly known as Redisql) hopes to be the final link in this event driven chain. Alchemy Database is taking a new step towards reducing communication between web-server and datastore (thereby increasing thruput), by implementing Datastore-Side-Scripting.

Often, in web applications, the communication between the web-server and the datastore requires 2+ trips involving sequential requests. For example: first the session is validated and then IFF the session is valid, the request’s data is retrieved from the datastore. The web-server must first wait for the “is-session-valid-lookup” to issue the “get-me-data-for-this-request-lookup” (the latter can even be a set of sequential lookups), which implies a good deal of blocking and waiting in the web-server’s backend. Having 2+ sequential steps in web-server datastore communication may seem trivial, but these steps are commonly the cause of SYSTEM bottlenecks. A single frontend request results in webserver threads blocking and waiting multiple times on sequential datastore requests. Each {block, wait, wake-up} in the web-server means 2 context switches, in addition to the latency of 1+ tcp request(s), which are 4-6 orders of magnitude slower than a RAM lookup.

In such cases, if sequential web-server-datastore request-response pairs can be reduced to a single request/response pair, overall system performance will increase substantially and the system will become more predictable/stable (fewer context-switches, less waiting, less requests, less I/O). Datastore-side-scripting can accomplish this, by pushing trivial logic into the datastore itself. In the example above, the only logic being performed is a “if(session_valid)” which has the cost of 2 context switches and a 2+ fold increase in response duration … which is absurd. In this use-case, pushing the “if(session_valid)” into the datastore makes sense on ALL levels.

Some argue, introducing scripting in the datastore is adding logic to the classic bottleneck in 3 tiered architectures, it will exacerbate bottlenecking, it is BAD. This point is valid in theory. In practice, the main bottleneck in ultra-high-performance in-memory-databases is network I/O. Adding NICs, not adding CPUs (or cores) is a better bet to scale vertically (ref: Handlersocket blog). This means, computers running Alchemy Database spend most of their time transporting packets, on request from: NIC->RAM->CPU and then on response from: CPU->RAM->NIC. The operating system’s marshaling of TCP packets takes up far more resources/time than the trivial in-memory lookup to GET/SET the request’s data. Meaning if a few more trivial commands (e.g. an IF block) are packed into the TCP request packet, the request’s duration is not significantly effected, but the overall system is benefited greatly, by taking (very lengthy) blocking/waiting steps in the web-server out of the equation.

Another name for Datastore-side-scripting is “Stored Procedures”, a term w/ lots of baggage. Stored Procedures are flawed on many levels, yet there was and is a need for them. The reality of Stored Procedures is their syntax is ugly in SQL and they open up all sorts of possibilities for developers, which have often been abused. Yet the basic concept of pushing logic into the database was recently mentioned as a future goal for the NOSQL movement. Tokyo Cabinet has embedded Lua deeply into its product (w/ 200 Lua commands), to allow datastore-side-scripting. Voltdb, a next generation SQL server, supports ONLY Stored Procedures and argues that in production they make more sense than ad-hoc SQL.

Stored Procedures are done in a hacked together SQL-like syntax, Datastore-side-scripting is done by embedding Lua. “The Lua programming language is a small scripting language specifically designed to be embedded in other programs. Lua’s C API allows exceptionally clean and simple code both to call Lua from C, and to call C from Lua”.

Lua provides a full (and tested) scripting language and you can define functions in Lua that access your datastore natively in C. I will explain this further, as it confused me at first. Alchemy Database has Lua embedded into it’s server, both Alchemy Database and the Lua engine are written in C. Alchemy Database will pass text given as the 1st argument of the Alchemy Database “LUA” command to the Lua engine. Additionally Alchemy Database has a Lua function called “client()” that can make Alchemy Database SET/GET/INSERT/SELECT/etc.. calls from w/in Lua. The Lua “client” function is written in C via Lua’s C bindings, so calling “client()” in Lua is actually running code natively in C. See its confusing, but the net effect is the following Alchemy Database command:
LUA 'user_id = client("GET", "session:XYZ"); if (user_id) then return client("GET", "data:" .. user_id); else return ""; end'
will perform the use case described above in a single webserver-to-datastore request and the Lua code is pretty easy to understand/write/extend.

Alchemy Database’s datastore-side-scripting was implemented in about 200 lines of C code, and defines a single Lua function “client()” in C, yet it opens up a whole new level of functionality. Performance tests have shown that running Lua has about a 42% performance hit server-side (i.e. 50K req/s) as compared to a single native Alchemy Database call (e.g. SET vs. LUA ‘return client(“SET”);), but packing multiple “client()” calls into a single LUA function does not cause a linear slowdown, meaning use cases that bottleneck on sequential webserver-datastore I/O will see a HUGE performance boost from pushing logic into the datastore.

I do want to stress Alchemy Database’s datastore-side-scripting is meant to be used w/ caution. Proper usage (IMO) of Lua in Alchemy Database is to implement very trivial logic datastore-side (e.g. do not do a for loop w/ 10K lookups), especially because Alchemy Database is single threaded and long running queries block other queries during their execution. Recommended usage is to try and pack trivial logical blocks into a single datastore request, effectively avoiding sequential webserver-to-datastore requests. Lua scripting does open up the possibility for map-reduce, Alchemy Database-as-a-RDBMS-proxy/cache, (and for pathological hackers) even usage as a zero-security front-end server … but that was not the intent of embedding Lua; if you use Lua in Alchemy Database in this manner, you are hacking, please know what you are doing.

Datastore-side-scripting is a balancing act that can be exploited for gain by careful/savvy developers or wrongly implemented for loss by sloppy programming/data-modelling … It gives more power to the Developer and consequently the developer needs to wield said power wisely.

About these ads
This entry was posted in Alchemy Database, concurrency, node.js, Redisql. Bookmark the permalink.

15 Responses to The case for Datastore-Side-Scripting

  1. Martin says:

    Lua is great language for “datastore scripting” and what you’re doing is great, but while youre experimenting can you try JavaScript(V8/JaegerMonkey) or Go to see how much the “overhead” will be.

    Keep up to great work!

    • jaksprats says:

      I am not sure what “overhead” you are talking about exactly (there are many different types).

      Alchemy Database has makefile directives to use LuaJIT (http://luajit.org/) instead of Lua, and in my simple tests, I saw up to a 20% speed improvement (https://gist.github.com/673570 – file:using_LuaJIT) using LuaJIT.

      On comparing language runtime speeds, here is a helpful site, that graphs performance: http://bit.ly/9gmay9 (I included C/Java/LuaJIT,V8/Jaegermonkey/Lua). Based on these results, LuaJIT looks like the smartest language to embed, its easy to embed, and its FAST.

      • jaksprats says:

        The overhead in question was the overhead of calling V8/Jaegermonkey vs Lua(JIT) inside of AclhemyDatabase (I was thick and didnt get that, sorry).

        My question is: is it possible to call C functions directly from V8/Jaegermonkey? is C embedded in V8/Jaegermonkey? because that is the only way to get the datastore-scripting to work correctly? if so, please give links, I would be interested in this ….. The Node.js guys would love that too, Javascript from client to app-server to datastore :)

  2. Pixy Misa says:

    The moment I saw the title of this post I thought Please tell me you’re using Lua!

    Yay! Looks really cool. :)

    • jaksprats says:

      The deeper I get into Lua, the more I love it.
      AlchemyDB’s test suite is being rewritten in Lua, to test via client and also test from w/in the server w/ the same code.
      I truly hope Lua explodes in popularity, so the much needed large community will start making the needed libraries.

      On a Lua related note (and a side project), I am working w/ the authors (agentzh and chaoslawful) of the modules

      https://github.com/agentzh/redis2-nginx-module

      https://github.com/agentzh/lua-redis-parser

      https://github.com/chaoslawful/lua-nginx-module

      to get a Nginx module that interprets Lua (via LuaJIT) and makes non-blocking/async calls to AlchemyDB.

      This will represent a dynamic web serving platform w/ an entirely event driven chain and its pieces are Nginx (fastest webserver), LuaJIT (fastest interpretive language) and AlchemyDB (fastest single-core RDBMS), so I am very excited about putting it together.

      • webdevguy says:

        The fastest webserver is probably G-WAN.

      • jaksprats says:

        WordPress is not letting me reply to the comment below:

        The fastest webserver is probably G-WAN.

        So I am replying above it.

        I have never heard of G-WAN, but I can see how it would be the fastest, it is all C, there is no interpretation step or no abstraction at all.

        I wish G-WAN was open source. I also imagine G-WAN is pretty weak w/ library support because it is not that popular.
        Finally, I would only feel comfortable using G-WAN for web-serving w/ VERY high performance requirements. C programming is slow, and much of web-application programming deals w/ string manipulation which is a pain in C.

        Still an interesting tool and duly noted :)

  3. eggyknap says:

    Just use PostgreSQL. All the fun of stored procedures, with pretty much any language you want (including Lua): http://wiki.postgresql.org/wiki/PL_Matrix

    • jaksprats says:

      I am a big fan of Postgres.

      Here are AlchemyDB’s main advantages over Postgres
      1.) for certain workloads 10-50X faster
      2.) redis data primitives built in (which can be Lua-scripted also)

      If those 2 advantages aren’t important for your datastore load, then use Postgres, and if they become an issue, switch to AlchemyDB.

  4. Fazal Majid says:

    Stored procedures are definitely the way to go for OLTP. There will always be people who quibble about the choice of language (Ruby for VoltDB, Lua for RediSQL). I have no issues with PL/SQL or PL/pgsql in PostgreSQL, a lot of the moaning about SQL and impedance mismatch comes from immature web developers who do not have a healthy appreciation for the primacy of data over logic. Lua is a fine language, but one that still has a relatively small community compared to Python or Ruby. If I were to pick a language for stored procedures that don’t suck(TM), it would be JavaScript, as the sheer effort put into optimizing JS by browser makers, with JITs like Google’s V8 or Mozilla’s TraceMonkey is likely to dwarf what any language community can do on its own.

    Now if you combine this with an emphasis on tuning for SSDs used as memory-mapped devices, as they should, rather than emulating legacy spinning rust drives, you would definitely have a winner on your hands.

    • jaksprats says:

      >> a lot of the moaning about SQL and impedance mismatch comes from immature web developers who do not have a healthy appreciation

      VERY TRUE, SQL used correctly is fantastic. But, Stored Procedural languages are simplistic and can not compare to LuaJIT in terms of flexibility/readability/performance(for some things). If you know what you are doing, Stored Procedures are very powerful (clunky) tools, but I always find myself wanting to do something they don’t support, so I need a scripting language.

      >> JavaScript … dwarf any language community
      Agreed, Javascript is big and getting bigger and has humungous support and is a very hot technology.

      Lua is different than Javascript in some key ways. LuaJIT is the fastest interpretive language that exists and will likely stay so, because Lua stays lean. AlchemyDB is for advanced programmers who detest bloat. The datastore is the common bottleneck, so keeping it lean is key. Lua’s community is also made up of people who are fans of lean implementations, so most Lua packages hold to the stripped down implementation philosophy, very light weight.
      If you want a ton of libraries to run in your datastore, then Javascript is the way to go, but running a ton of 3rd party libs IN your datastore is IFFY. If you dont mind searching a little, and want lean, FAST, simple, predictable libraries, Lua is good and its community is growing (albeit slowly).

      >> emphasis on tuning for SSDs
      AlchemyDB is going to change a lot in the coming years, but if the gods permit, it will NEVER be multi-threaded. SSDs imply threads, so they are a no-go for AlchemyDB. VoltDB takes the same approach, data is made persistent via K-safety and there is only ONE way to insure optimal performance: to have 100% of your data in RAM. This comment immediately begs the question: but what about Data that doesn’t fit in RAM? the best answer is Shard it and make it fit into RAM, and if this doesnt work for you, then most likely your workload is an analytics one (not an OLTP one), so use Hbase or Vertica. If you still cant fit 100% of your data in RAM, use Postgres, which is slower because the disk comes into play, and the software, that engineers around this slowness, is bloated, so you have lost optimal performance. Unfortunately, there is no best of both worlds possiblities here.

  5. Andrew says:

    Hi Jak, what’s up with your alchemy google group? I notice there is a ‘zip download’ notice by google on it, implying it’s being wound up. Will you be moving it to a new home? Great work btw on alchemy.

    • jaksprats says:

      yeah i will look into that. I have been busy making enterprise software out of Alchemy lately (got sponsorship) and have neglected the use-group. I also need to get many months of software updates/enhancements into the git repo, I have just been BUSY. If the use-group goes away I will have to start another one (very irritating). I need to write a new blog sometime soon too :) BUSY BUSY BUSY

  6. jenson says:

    Hi Jaksprats,

    Excellent work! BTW, can Alchemy Database work against ActiveRecord in Rails 3? I’d like to try to replace MySql in my rails app.

    Jenson

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s