AlchemyDB – The world’s first integrated GraphDB + RDBMS + KV Store + Document Store


I recently added a fairly feature rich Graph Database to AlchemyDB (called it LuaGraphDB) and it took roughly 10 days to prototype. I implemented the graph traversal logic in Lua (embedded in AlchemyDB) and used AlchemyDB’s RDBMS to index the data. The API for the GraphDB is modeled after the very advanced GraphDB Neo4j. Another recently added functionality in AlchemyDB, a column type that stores a Lua Table (called it LuaTable), led me to mix Lua-function-call-syntax into every part of SQL I could fit it into (effectively tacking on Document-Store functionality to AlchemyDB). Being able to call lua functions from any place in SQL and being able to call lua functions (that can call into the data-store) directly from the client, made building a GraphDB on top of AlchemyDB possible as a library, i.e. it didn’t require any new core functionality. This level of extensibility is unique and I am gonna refer to AlchemyDB as a “Data Platform”. This is the best term I can come up with, I am great at writing cache invalidation algorithms, but I suck at naming things 🙂

To elaborate what I mean when I say I mixed Lua-function-call-syntax into SQL (because it’s not clear and maybe not even good english:), here is an example AlchemyDB SQL call, w/ a bunch of LUA mixed in:
SELECT func1(colX), func2(nested.obj.x.y.z), col5 FROM table WHERE index = 33 AND func3(col3) ORDER BY func4(colX)

This request does the following:
1.) finds all rows from table w/ index = 33 (RDBMS secondary index traversal)
2.) filter rows out that don’t match lua function call: func3(col3) { func3() returns [true,false]}
3.) for every row that passes #2, run the functions [func1(colX), func2(nested.obj.x.y.z)] where ‘func1’ & ‘func2’ are previously defined lua functions, and ‘colX’ is a normal column and ‘nested.obj.x.y.z’ is a nested element in a LuaTable column, and combine the return values from the two function calls with col5’s contents to form a response row.
4.) these response rows are then sorted using the return value of the function call ‘func4(colX)’ as the cmp() function.

Besides SELECT calls, Lua-function-call-syntax has also been mixed into the SQL commands: INSERT, UPDATE, & DELETE, covering the 4 horseman of SQL. Full syntax found here.

In and of itself, this mixing of Lua-function-call-syntax into SQL could be viewed as a very developer friendly User-Defined-Functions mechanism and it is that. But the integration of Lua in AlchemyDB is much deeper, and it is the deepness of the integration that opens up new ways of programming within a datastore.

Alchemy’s GraphDB implemented its graph traversal logic in Lua (found here). Any global Lua function that was turned into byte-code (or possibly LuaJIT’ed to machine code) during interpretation is callable from the client via the LUAFUNC command. These Lua functions can call into the data-store (e.g. to get/set data) w/ the Lua alchemy() function. SQL Insert/Update/Delete triggers can be created via the LUATRIGGER command, enabling SQL commands to pass row data to Lua. The ability for both languages to call each other, done in a highly efficient manner using Lua’s virtual stack calling lua byte-code (meaning no interpretation in these cross language calls), allows the developer to treat AlchemyDB as if it were an AppServer w/ an embedded RDBMS (which BTW is another Alchemy Experiment).

A simple example of the Alchemy commands “LUAFUNC” and the Lua function “alchemy(,,,)” would be the function addSqlUserRowAndNode which creates a SQL row, and adds a GraphNode (via the createNamedNode() call) inside this row’s LUATABLE column.

function addSqlUserRowAndNode(pk, citypk, nodename)
alchemy('INSERT', 'INTO', 'users', 'VALUES', "(" .. pk .. ", " .. citypk .. ", {})");
alchemy('SELECT',"createNamedNode('users', 'lo', pk, '" .. nodename .."')", 'FROM', 'users', 'WHERE', 'pk = ' .. pk);
return "OK";
end

The lua function addSqlUserRowAndNode() can be called from the frontend w/ the following command
./alchemy-cli LUAFUNC addSqlUserRowAndNode 1 10 'A'

It is worth noting (at some point, why not here:) that as long as you keep requests relatively simple, meaning they dont look at 1000 table-rows or traverse 1000 graph-nodes, your performance will range between 10K-100K TPS on a single core w/ single millisecond latencies, these are the types of numbers people should demand for OLTP.

The most challenging feature of the GraphDB was the indexing of arbitrary relationships, which does not lend itself well to SQL’s CREATE INDEX syntax. In the example on the LuaGraphDB documentation page, the indexed relationship is “Users who HAS_VISITED CityX”, but the essence of indexing arbitrary relationships means indexing “Anything w/ SOME_RELATIONSHIP to SomethingElse (in a given or both directions) (possibly at a certain depth)”. And these indexed relationships can be nested graph relationships (i.e. Dude who KNOWS people who KNOW people who HAVE lotsOfCash), it is a bitch to frame in SQL’s syntax, so it made sense to keep the logic of indexing in Lua, for this use-case.

AlchemyDB’s GraphDB implemented the indexing of relationships using AlchemyDB’s Pure Lua Function Index (which for the record may be one of the top 5 worst names ever:). This index constructs & destructs itself via user defined lua function calls (i.e. you have to write them) declared in the CREATE INDEX statement, and leaves the population of the index (i.e. addToIndex() & deleteFromIndex()) to user defined functions (again you have to write them) AND it allows the index to be used in SQL’s where-clause, exactly the same as an indexed column is used (this part you dont have to write:). The GraphDB Pure-Lua-Function-Indexes every “USER who HAS_VISITED CityX” inside the Lua routines responsible for addRelationships() via function hooks registered during “construction” (sounds fuck all tricky, but its 10 lines of code). If someone then wants to find all the users who have visited CityX, it can be done in SQL, w/ a query like “SELECT * FROM users WHERE pure_lua_func_index() = CityX.pk”. A SQL query like this could also be used to find multiple start nodes (via an indexed lookup) in a complex graph traversal. This level of flexibility is not needed in most use cases, but it again represents a cyclical calling ability (Lua<->SQL) on a very deep level, and cyclical calling enables usecases².

This little experiment houses a RDBMS, a Nosql datastore (redis), a Document-Store (via the LUATABLE column type), and a GraphDB under ONE roof. They are unified, to a sane degree, by extending SQL and they can be integrated tightly via Lua function calls, as is needed. Any use-case that requires multiple types of data-stores and serves high velocity OLTP requests and fits w/in AlchemyDB’s various other quirks (InRam, single-threaded, single-node), can benefit greatly by using this approach to reduce the number of moving parts in their system, not need to sync data across/between data-stores, not need to store data redundantly, etc…

But the part I find really exciting about this: is that I added a GraphDB to a datastore in ten days, and the GraphDB is not too shabby. AlchemyDB’s bindings open up the possibility to add functionality to an already very able data-store in a highly extensible manner. It allows you to move your functionality to your data, and THIS is HUGE. It should be easy to create all sorts of efficient data-stores & data-based-services using this Data-platform, the GraphDB implementation proves the case.

And yes this was a lot of fun to code, and it was confusing as hell until it all came together and then it was surprisingly easy to code, it fit together smoothly, it felt right, and a GraphDB is neither a small nor a simple piece of software.

AlchemyDB: the lightweight OLTP Data Platform. Strong on functionality, flexibility, and being thoroughly misunderstood 🙂

p.s. The GraphDB code is brand new, and has been tested, but not enough, so if you use it be aware, there may be some bugs, but I’m busting my ass getting it production hardened, so if you have problems, just shoot me an email, I will fix’em. The branch for this code is called release_0.2_rc1.

8 comments
  1. Awesome Jak. I particularly like the “programming within a datastore” like how couchdb intended it to be.

    • thanks David. It really is just that too, I wanted to be able to program in AlchemyDB, and I have changed the API like 30 times. The thing that made it so hard was prohibiting the interpretation for frontend requests. So first you have to interpret Lua and then call it, and this insures performance. But it works now, it flows, its fun 🙂

Leave a comment