Archive

redis

I recently added a fairly feature rich Graph Database to AlchemyDB (called it LuaGraphDB) and it took roughly 10 days to prototype. I implemented the graph traversal logic in Lua (embedded in AlchemyDB) and used AlchemyDB’s RDBMS to index the data. The API for the GraphDB is modeled after the very advanced GraphDB Neo4j. Another recently added functionality in AlchemyDB, a column type that stores a Lua Table (called it LuaTable), led me to mix Lua-function-call-syntax into every part of SQL I could fit it into (effectively tacking on Document-Store functionality to AlchemyDB). Being able to call lua functions from any place in SQL and being able to call lua functions (that can call into the data-store) directly from the client, made building a GraphDB on top of AlchemyDB possible as a library, i.e. it didn’t require any new core functionality. This level of extensibility is unique and I am gonna refer to AlchemyDB as a “Data Platform”. This is the best term I can come up with, I am great at writing cache invalidation algorithms, but I suck at naming things 🙂

To elaborate what I mean when I say I mixed Lua-function-call-syntax into SQL (because it’s not clear and maybe not even good english:), here is an example AlchemyDB SQL call, w/ a bunch of LUA mixed in:
SELECT func1(colX), func2(nested.obj.x.y.z), col5 FROM table WHERE index = 33 AND func3(col3) ORDER BY func4(colX)

This request does the following:
1.) finds all rows from table w/ index = 33 (RDBMS secondary index traversal)
2.) filter rows out that don’t match lua function call: func3(col3) { func3() returns [true,false]}
3.) for every row that passes #2, run the functions [func1(colX), func2(nested.obj.x.y.z)] where ‘func1’ & ‘func2’ are previously defined lua functions, and ‘colX’ is a normal column and ‘nested.obj.x.y.z’ is a nested element in a LuaTable column, and combine the return values from the two function calls with col5’s contents to form a response row.
4.) these response rows are then sorted using the return value of the function call ‘func4(colX)’ as the cmp() function.

Besides SELECT calls, Lua-function-call-syntax has also been mixed into the SQL commands: INSERT, UPDATE, & DELETE, covering the 4 horseman of SQL. Full syntax found here.

In and of itself, this mixing of Lua-function-call-syntax into SQL could be viewed as a very developer friendly User-Defined-Functions mechanism and it is that. But the integration of Lua in AlchemyDB is much deeper, and it is the deepness of the integration that opens up new ways of programming within a datastore.

Alchemy’s GraphDB implemented its graph traversal logic in Lua (found here). Any global Lua function that was turned into byte-code (or possibly LuaJIT’ed to machine code) during interpretation is callable from the client via the LUAFUNC command. These Lua functions can call into the data-store (e.g. to get/set data) w/ the Lua alchemy() function. SQL Insert/Update/Delete triggers can be created via the LUATRIGGER command, enabling SQL commands to pass row data to Lua. The ability for both languages to call each other, done in a highly efficient manner using Lua’s virtual stack calling lua byte-code (meaning no interpretation in these cross language calls), allows the developer to treat AlchemyDB as if it were an AppServer w/ an embedded RDBMS (which BTW is another Alchemy Experiment).

A simple example of the Alchemy commands “LUAFUNC” and the Lua function “alchemy(,,,)” would be the function addSqlUserRowAndNode which creates a SQL row, and adds a GraphNode (via the createNamedNode() call) inside this row’s LUATABLE column.

function addSqlUserRowAndNode(pk, citypk, nodename)
alchemy('INSERT', 'INTO', 'users', 'VALUES', "(" .. pk .. ", " .. citypk .. ", {})");
alchemy('SELECT',"createNamedNode('users', 'lo', pk, '" .. nodename .."')", 'FROM', 'users', 'WHERE', 'pk = ' .. pk);
return "OK";
end

The lua function addSqlUserRowAndNode() can be called from the frontend w/ the following command
./alchemy-cli LUAFUNC addSqlUserRowAndNode 1 10 'A'

It is worth noting (at some point, why not here:) that as long as you keep requests relatively simple, meaning they dont look at 1000 table-rows or traverse 1000 graph-nodes, your performance will range between 10K-100K TPS on a single core w/ single millisecond latencies, these are the types of numbers people should demand for OLTP.

The most challenging feature of the GraphDB was the indexing of arbitrary relationships, which does not lend itself well to SQL’s CREATE INDEX syntax. In the example on the LuaGraphDB documentation page, the indexed relationship is “Users who HAS_VISITED CityX”, but the essence of indexing arbitrary relationships means indexing “Anything w/ SOME_RELATIONSHIP to SomethingElse (in a given or both directions) (possibly at a certain depth)”. And these indexed relationships can be nested graph relationships (i.e. Dude who KNOWS people who KNOW people who HAVE lotsOfCash), it is a bitch to frame in SQL’s syntax, so it made sense to keep the logic of indexing in Lua, for this use-case.

AlchemyDB’s GraphDB implemented the indexing of relationships using AlchemyDB’s Pure Lua Function Index (which for the record may be one of the top 5 worst names ever:). This index constructs & destructs itself via user defined lua function calls (i.e. you have to write them) declared in the CREATE INDEX statement, and leaves the population of the index (i.e. addToIndex() & deleteFromIndex()) to user defined functions (again you have to write them) AND it allows the index to be used in SQL’s where-clause, exactly the same as an indexed column is used (this part you dont have to write:). The GraphDB Pure-Lua-Function-Indexes every “USER who HAS_VISITED CityX” inside the Lua routines responsible for addRelationships() via function hooks registered during “construction” (sounds fuck all tricky, but its 10 lines of code). If someone then wants to find all the users who have visited CityX, it can be done in SQL, w/ a query like “SELECT * FROM users WHERE pure_lua_func_index() = CityX.pk”. A SQL query like this could also be used to find multiple start nodes (via an indexed lookup) in a complex graph traversal. This level of flexibility is not needed in most use cases, but it again represents a cyclical calling ability (Lua<->SQL) on a very deep level, and cyclical calling enables usecases².

This little experiment houses a RDBMS, a Nosql datastore (redis), a Document-Store (via the LUATABLE column type), and a GraphDB under ONE roof. They are unified, to a sane degree, by extending SQL and they can be integrated tightly via Lua function calls, as is needed. Any use-case that requires multiple types of data-stores and serves high velocity OLTP requests and fits w/in AlchemyDB’s various other quirks (InRam, single-threaded, single-node), can benefit greatly by using this approach to reduce the number of moving parts in their system, not need to sync data across/between data-stores, not need to store data redundantly, etc…

But the part I find really exciting about this: is that I added a GraphDB to a datastore in ten days, and the GraphDB is not too shabby. AlchemyDB’s bindings open up the possibility to add functionality to an already very able data-store in a highly extensible manner. It allows you to move your functionality to your data, and THIS is HUGE. It should be easy to create all sorts of efficient data-stores & data-based-services using this Data-platform, the GraphDB implementation proves the case.

And yes this was a lot of fun to code, and it was confusing as hell until it all came together and then it was surprisingly easy to code, it fit together smoothly, it felt right, and a GraphDB is neither a small nor a simple piece of software.

AlchemyDB: the lightweight OLTP Data Platform. Strong on functionality, flexibility, and being thoroughly misunderstood 🙂

p.s. The GraphDB code is brand new, and has been tested, but not enough, so if you use it be aware, there may be some bugs, but I’m busting my ass getting it production hardened, so if you have problems, just shoot me an email, I will fix’em. The branch for this code is called release_0.2_rc1.

Advertisements

For about a year, I have been using the NOSQL datastore redis, in various web-serving environments, as a very fast backend to store and retrieve key-value data and data that best fits in lists, sets, and hash-tables. In addition to redis, my backend also employed mysql, because some data fits much better in a relational table. Getting certain types of data to fit into redis data objects would have added to the complexity of the system and in some cases: it’s simply not doable. BUT, I hated having 2 data-stores, especially when one (mysql) is fundamentally slower, this created a misbalance in how my code was architected. The Mysql calls can take orders of magnitude longer to execute, which is exacerbated when traffic surges. So I wrote Redisql which is an extension of redis that also supports a large subset of SQL. The Idea was to have a single roof to house both relational data and redis data and both types of data would exhibit similar lookup/insert latencies under similar concurrency levels, i.e. a balanced backend.

Redisql supports all redis data types and functionality (as it’s an extension of redis) and it also supports SQL SELECT/INSERT/UPDATE/DELETE (including joins, range-queries, multiple indices, etc…) -> lots of SQL, short of stuff like nested joins and Datawarehousing functionality (e.g. FOREIGN KEY CONSTRAINTS). So using a Redisql library (in your environment’s native language), you can either call redis operations on redis data objects or SQL operations on relational tables, its all in one server accessed from one library. Redisql morph commands convert relational tables (including range query and join results) into sets of redis data objects. They can also convert the results of redis commands on redis data objects into relational tables. Denormalization from relation tables to sets of redis hash-tables is possible, as is normalization from sets of redis hash-tables (or sets of redis keys) into relational tables. Data can be reordered and shuffled into the data structure (relational table, list, set, hash-table, OR ordered-set) that best fits your use cases, and the archiving of redis data objects into relational tables is made possible.

Not only is all the data under a single data roof in Redisql, but the lookup/insert speeds are uniform, you can predict the speed of a SET, an INSERT, an LPOP, a SELECT range query … so application code runs w/o kinks (no unexpected bizarro waits due to mysql table locks -> that lock up an apache thread -> that decrease the performance of a single machine -> which creates an imbalance in the cluster).

Uniform data access patterns between front-end and back-end can fundamentally change how application code behaves. On a 3.0Ghz CPU core, Redis SET/GET run at 110K/s and Redisql INSERT/SELECT run at 95K/s, both w/ sub millisecond mean-latencies, so all of a sudden the application server can fetch data from the datastore w/ truly minimal delay. The oh-so-common bottleneck: “I/O between app-server and datastore” is cut to a bare minimum, which can even push the bottleneck back into the app-servers, and that’s great news as app-servers are dead simple (e.g. add server) to scale horizontally. Redisql is an event-driven non-blocking asynchronous-I/O in-memory database, which i have dubbed an Evented Relational Database, for brevity’s sake.

During the development of Redisql, it became evident that optimizing the number of bytes a row occupied was an incredibly important metric, as Redisql is an In-Memory database (w/ disk persistence snapshotting). Unlike redis, Redisql can function if you go into swap space, but this should be done w/ extreme care. Redisql has lots of memory optimisations, it has been written from the ground up to allow you to put as much data as is possible into your machine’s RAM. Relational table per-row overhead is minimal and TEXT columns are stored in compressed form, when possible (using algorithms w/ negligible performance hits). Analogous to providing predictable request latencies at high concurrency levels, Redisql gives predictable memory usage overhead for data storage and provides detailed per-table, per-index memory usage via the SQL DESC command, as well as per row memory usage via the “INSERT … RETURN SIZE” command. The predictability of Redisql, which translates into tweakability for the seasoned programmer, changes the traditional programming landscape where the datastore is slower than the app-server.

Redisql is architected to handle the c10K problem, so it is world class in terms of networking speed AND all of Redisql’s data is in RAM, so there are no hard disk seeks to engineer around, you get all your data in a predictably FAST manner AND you can pack a lot of data into RAM as Redisql aggressively minimizes memory usage AND Redisql combines SQL and NOSQL under one roof, unifying them w/ commands to morph data betwixt them …. the sum of these parts, when integrated correctly w/ a fast app-server architecture is unbeatable as a dynamic web page serving platform with low latency at high concurrency.

The goal of Redisql is to be the complete datastore solution for applications that require the fastest data lookups/inserts possible. Pairing Redisql w/ an event driven language like Node.js, Ruby Eventmachine, or Twisted Python, should yield a dynamic web page serving platform capable of unheard of low latency at high concurrency, which when paired w/ intelligent client side programming, could process user events in the browser quickly enough to finally realize the browser as an applications platform.

Redisql: the polyglot that speaks SQL and redis, was written to be the Evented Relational Database, the missing piece in the 100% event driven architecture spanning from browser to app-server to database-server and back.

Web pages are quickly becoming instantly reactive. Typing into an HTML form does more than simply display the typed characters, under the covers the browser sends an Ajax request to a server that replies w/ instructions to modify the current webpage … the web page reacts to the keystroke “instantly”. When the client-server I/O follow this model, web pages served via a browser can yield a user experience akin to that of a desktop application.

The best prevalent example is Google’s Instant Search. The results of your search are shown AS you type them. If you search for “taco bell”, Google will display a different result set for each new letter typed. In the “taco bell” example, Google receives requests for “t”, “ta”, “tac”, “taco”, “taco “, “taco b”, “taco be”, “taco bel”, and “taco bell” (a total of 9 requests instead of one). Google goes on to state: it takes 300 milliseconds to type a letter and (one tenth of that) 30 milliseconds to glance at another part of the page. So the 9 result sets need to be returned and rendered quickly to be “Instant”. This represents an increase in request concurrency (9 in the time span where one used to be made) and a decrease in latency (“instant” as opposed to the time needed to load a new web page).

History has shown that once Google adds something to its search engine, everyone follows suit and extends the concept beyond the search engine. It is very probably that in the near future any button or any form on any web page will need to be “instant” to yield a competitive user experience.

Recent client side advances to give “instant” results include HTML5 Websockets and Google’s
v8 JavaScript engine (the latter has been optimized to render Ajax requests VERY quickly).

Server-side serving “instant” requests means serving more requests (which are usually pretty small) and serving them QUICKLY. Each user will be making loads of smallish requests and need them returned ASAP. The server-side challenge boils down to achieving Low Latency at High Concurrency, which is something of an engineering paradox. Recent advances to tackle what is often referred to as the c10K problem (web servers able to handle ten thousand clients simultaneously) include nginx and node.js.

Both nginx and node.js differ from traditional web servers by being single threaded, event driven, non-blocking, and asynchronous. They are coded to the reality that the network interface is the bottleneck in serving web pages. Under highly concurrent load they greatly out perform any web server built on a multi-threaded architecture. The multi-threaded architecture stress tested under high concurrency will bottleneck on context switches whereas the single threaded event driven architecture cannot context switch, so it bottlenecks on other things under MUCH (100x) higher concurrency.

A common mistake in web server benchmarking is to test the server under low concurrency. In the real world, this tests the case where a small set of users is pounding your web server w/ requests. The only users that match these benchmarks’ characteristics are web-spiders and denial-of-service attacks. Consequently such benchmarks’ results are not that meaningful. In the real world, web traffic comes from many concurrent users and likes to come in bursts. So concurrency (even in the non-“Instant” case) is vital in any web server benchmark.

Nginx serves only static webpages, but Node.js is capable of building dynamic web pages employing JavaScript SERVER side. Node.js is non blocking by nature and if it matures properly could be the platform of choice for “Instant” dynamic web pages.

There is, unfortunately, one piece missing in the full chain to build a “instant” dynamic web page platform: the database. All relational databases to date are multi-threaded. In the use case where many smallish requests are made from the browser (to provide the “Instant” experience), even if the web server is capable of delivering low latency web pages at high concurrency, the database will bottleneck, and the flow through the chain will develops kink and no progress over the multi threaded web server architecture will be evident.

The NOSQL movement has developed some event driven data stores. Of note is Redis, an event driven data store providing list, set, and hash table primitives. It is VERY fast. Pairing Redis w/ Node.js to serve dynamic web pages and using nginx to serve static web pages is an optimal platform for bleeding edge “Instant” web serving platforms as the entire chain has the same event driven architecture … i.e. no kinks.

But NOSQL is new and foreign to most, so the switch towards “Instant” web serving platforms is currently reserved only for the gutsy early adapter. What is missing is an event driven relational database that speaks SQL. When an Evented Relational Database is created, anyone w/ JavaScript and SQL knowledge will be able to create “Instant” web serving platforms (JavaScript on the server-side also provides a single language for client side and server side development).

Users demanding an “Instant” web experience will be the catalyst to move the event driven programming style into the mainstream. The means to accomplish the “Instant” web experience on the large scale can only be realised by simplifying the process of writing event driven programs. Javascript and SQL employed as an end to end solution accomplishes this: both are widely known and have low barriers to entry.

The “Instant” web requires Low Latency at High Concurrency and the tools to realize this are Ajax/Websockets, an Evented Web Server (Nginx, Node.js), and an Evented Relational Database … I will write a later post on the Evented Relational Database.

I am a big fan of the NOSQL datastore redis (http://code.google.com/p/redis/)

Redis is an astonishly fast datastore, but what people often overlook is that it performs amazingly well under concurrent loads.

I will write a later post on why being able to deliver low latency requests at high concurrency levels will be the defining metric for databases powering the next generation of webservers.

For now, here are some rather ugly bug accurate graphs


Redis 2.0.0 latency tests from 1-64K concurrent connections

Standard Hardware: Standard NIC (A780GM-A motherboard integrated), Phenom X4 CPU @3.0GHz, 8GB RAM PC3200.
Two machines connected via a standard 1GigE Netgear Switch.
OS: Ubuntu 9.10 64bit.
Tests are run using a single core for the client and a single core (on the other machine) for the server.
This is how the server performs in terms of requests per second as concurrency goes up. This is such an impressive graph:

REQUEST PER SECOND (Y-Axis) VS CONCURRENCY (X-Axis)

COMMENTS: below 5 is bad, 10 is already max, then a beautifully slow and smooth degradation to 64K concurrent connections

Reqs/s vs Concurrency

 

NEXT: Latency will be analyzed as concurrency varies. Latency is measured in milliseconds, 100K requests are made, and the graphs will show the percentage of those 100K requests took how long

NOTES: For the following graphs, the axis are all the same:

  • Z-axis (to the right) is PERCENTAGE
  • X-axis (to the left) is LEVEL OF CONCURRENCY
  • Y-axis is MILLISECONDS
  • THE BIG PICTURE 1-64K concurrent connections

    COMMENTS: quick to see that even at high concurrency, if you are below 90%, you are ok

    BIG PICTURE

    MOST IMPORTANT: CONCURRENCY 1 to 10K

    COMMENTS: lots of L’s … there are always gonna be some anomolies, so this is very strong

    Concurrency 1 to 10K

    THE BULK: 0-90% latency

    COMMENTS: its all fantastic until maybe 30K, which is absurdly high. Also even in the very high ranges it is smooth in both the X and Z directions until 80%.

    less than 90% latency

    VERY GOOD: 90-99% latencies

    COMMENTS: everything is kosher until 30K AND under 99%

    90-99% Latency

    THE END: 99% latencies

    COMMENTS: Remember this is a closeup of the worst of the worst. The good news is we see the same L patterns at this resolution, until 30K. The 0-5K range is cluttered, so on to the next graph.

    99th Percentile

    THE END: Concurrency 1 to 1000 – 99% latencies

    COMMENTS: This is the SLA graph. @400 and approx: 99.5%, there is a hop. This holds true until 900. This latency jump is always from below 20 direct to about 220, and over the whole data set this hop hovers around the 99th percentile and mostly to pretty far right of it.

    1-100 concurrency 99th percentile

    FINAL COMMENTS: I am presenting this as neutrally as possible. I am pretty impressed with these latencies, and the thing that strikes me about the graphs is their smoothness in all directions. This smoothness to me is as important as the numbers, it means the software is predictable.

    NOTES:

    1. The data for these graphs is in this directory: http://allinram.info/redis/concurrency/test2/
    2. Two files were needed to create these benchmarks: redis-benchmark.c and Concurrency_test_redis.sh
    3. additonal unix commands need to be done to allow greater than 28K concurrency, like: “ulimit -n 90000” , “echo 1024 65000 > /proc/sys/net/ipv4/ip_local_port_range”, “echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle”, “”echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse”
    4. in reds code, the following change must be made “ae.h:#define AE_SETSIZE (1024*10) … redefine to 1024*64” for concurrency higher than 28K

    I used gnu-plot for these graphs, and this is the way to look at them, cause you can rotate the axis in 3 dimensions
    Here are the instructions:
    1.) gnuplot> splot “FILE” u 1:2:3