SegundoNet a secondary cell network built cheaply

Quick link to slides

This is a concept I came up with in Jan 2014. Rumors of Google doing something similar to this concept started popping up recently, so it seemed like a good time to publish it.

Basic concept is to build a secondary cell network that complements the user’s existing (primary) cell network, offering reduced cellular data rates when a user is within the secondary cell network’s range. This secondary network would be significantly cheaper to construct as two major costs (backhaul & spectrum) are done more or less for free, by using Google Fiber as the backhaul and TV White Spaces & all unlicensed spectrum( 900MHz, 2.4 GHz, 5GHz) combined with best Cognitive radio practices for the network’s spectrum. Select Google Fiber customers would receive reduced Google Fiber rates for hosting the network’s Picocells. Additionally, each Google Fiber home would receive proprietary hardware in the form of a Wifi-router to implement the system’s Cognitive Radio. This proprietary hardware can be used to do other cool stuff.

But without further ado … here are the slides … the concept is a pretty cool one, pretty relevant, probably doable, and I expect some derivation of it to happen in real life pretty soon cuz Google has the rep for pushing things forward.

Quick link to slides

Posted in Uncategorized | 1 Comment

Glass’ evolution, some suggestions

Short link for PRESENTATION

I am a huge fan of Google Glass’ concept. The idea of having information pop up that only you can see to add context to whatever is in your line of sight is a wildly interesting technology.

I bought a Google Glass about 4 months ago and was totally OK w/ paying $1500. Since purchasing the Glass I have swayed between being euphoric about some features (e.g. POV camera) and disappointed about the product’s many shortcomings.

Recently I started cataloging all the changes I would make to Glass and wrote up a (not perfectly organized) power point presentation. More recently Google announced and released Android wear, which is very much in line with some of my suggestions, so I felt validated that my ideas were relevant and thought I should share them.

Hopefully some of my other ideas are in line w/ what Google is thinking, because a technology like a Glass is a certainty to be in your household, but not in its current form.

Without further ado … CLICK FOR PRESENTATION

 

* On re-reading my presentation, it is pretty poorly organized. The interesting ideas start around page 20 and the even better ones start at page 30, which is to say the first 20 are boring :)

Posted in Uncategorized | Tagged , | 2 Comments

AlchemyDB is all grown up

I started the AlchemyDB project in early 2010. After two years, the project had graduated to become a hybrid Document-store/GraphDB/RDBMS/redis-store. AlchemyDB was then acquired by Aerospike in early 2012. The next 2 years were spent integrating AlchemyDB’s codebase/architecture/philosophy into Aerospike, eventually yielding Aerospike 3. Four years years after it’s inception, AlchemyDB has provided the right ammunition to catapult Aerospike into the visionary enterprise NOSQL quadrant.

Integrating AlchemyDB into Aerospike was, of course, much harder than expected. The bulk of this integration work was accomplished not by merging code, but by redoing AlchemyDB’s innovative features in Aerospike’s architecture. What was interesting is that the merging of the two platforms didn’t just result in features being dropped (e.g. GraphDB support), it also took an unchartered course and some new features were born of the process (e.g. LargeDataTypes).

All in all, I am genuinely pleased with the architecture that emerged from the integration. Aerospike 3 is a document-store, w/ robust secondary indexing, a high performance User-Defined-Function (UDF) implementation, ultra-low-latency aggregations (e.g. <1ms possible), SSD-optimized LargeDataTypes, and support for a subset of SQL. What’s more, ALL of these new features have the same good-ole Aerospike guarantees of performance, elasticity, fault-tolerance, etc… Complexity was added w/o significant compromise on the original product’s strengths.

The rest of this post will dive into the details of Aerospike 3′s LargeDataType feature. Aerospike’s LargeDataTypes are standard data-types (e.g. Stack, List, Map) but each instance is stored as many different physical pages tied-together logically via a directory. LargeDataTypes deal w/ the inherent flaw in the document model that over time documents themselves tend towards becoming BigData: the hotter the document, the larger it becomes, the more the DB bottlenecks on I/O & de/serialization.

Aerospike’s LargeDataTypes greatly benefit from piggybacking on Aerospike’s SSD optimized data-layer. For example: the LargeStack data-type has near optimal I/O when the use-case is predominantly pushing and popping a small number of items. This pattern of usage results in only the directory and the top page being accessed: a near optimal I/O which is independent of the size of the LargeStack, as the cold-parts of the LargeStack simply sit idle on SSD.

Diving even deeper: Aerospike’s LargeDataTypes are native database types at the API level, but under the covers they are user-space server-side code making calls into Aerospike’s linked-pages API (still private as of Mar 2014). The linked-pages API enables UDFs to access raw data pages on the server. It also insures that these raw data pages are logically linked & bound to the data-structure’s primary key record such that they are guaranteed to always be co-located on the same server as the primary-key’s record. This guarantee covers before, during, & after cluster re-orgs (both elastic and fault tolerant).

Implementing LargeDataTypes in server-side UDF space has one priceless advantage: extensibility. One customer benefited tremendously from byte-packing & compressing data in their LargeStacks (which held the bulk of their data). This extensibility was easily accomplished by cloning the LargeStack code (we named the clone: CompressedLargeStack) and then inserting a few lines of compression algorithms into the clone’s code. The customer’s client code switched to using CompressedLargeStacks and WHAMMY: 10X space reduction.

This customer story is an example of how such extensibility allows for the creation of new LargeDataTypes in UDF space. Aerospike users can now roll their own customized LargeDataTypes, as Aerospike’s engineers did for this customer’s use-case and before that for the LargeDataType implementations themselves. The linked-pages API that guarantees co-location of many pages bound to a single primary key, can be used to create a wide variety of different LargeDataTypes (e.g. Graphs, ScoreBoards, Tries, Sparse-bitmaps) in server-side user-space code. With proper data structure architecting, these user-created LargeDataTypes can grow arbitrarily large and still perform according to strict SSD based SLAs. This makes the linked-pages API a fantastic tool for certain high velocity BigData use-cases. 

The power of being able to define a customized data-type that:

    1. is a single logical entity, that is made up of many physical pages
    2. has strict transactional guarantees
    3. has a guarantee that all of the physical pages are located on a single server even during cluster re-orgs

enables customization down to the very data-type definition & functionality, to better fit a customer’s unique data.

My personal goal with Aerospike 3 was to deliver on NOSQL’s promise of storing & querying data in it’s most natural form. Enabling developers to easily roll their own data-types to better fit their unique data and to tweak those data-types to improve performance is an ace in the hole for those that need to continually push their system that much further.

Posted in Alchemy Database | Leave a comment

Russ’ 10 Ingredient Recipe For Making 1 Million TPS On $5K Hardware

I wrote a blog on highscalability.com on how to get to 1 million Database TPS on $5000 worth of hardware (single machine). Hopefully there are some performance tips in the blog that people can use themselves.

http://highscalability.com/blog/2012/9/10/russ-10-ingredient-recipe-for-making-1-million-tps-on-5k-har.html

I also wrote a brag number post for Aerospike’s blog, and they let me quote the movie “Ricky Bobby”

http://www.aerospike.com/blog/all-about-speed/

I will update this blog every time I write blogs in other places, just seems like a  good idea

- Russ

Posted in Uncategorized | Leave a comment

AlchemyDB going to top of Aerospike

Aerospike is the former Citrusleaf: http://www.dbms2.com/2012/08/27/aerospike-the-former-citrusleaf/

Citrusleaf acquired AlchemyDB and we are now incrementally porting AlchemyDB functionality to run on top of Citrusleaf’s proven distributed high-availability linearly-scalable key-value store. First functionalities planned are: Lua with DocumentStore functionality, Secondary Indexes, and Real-time map-reduce. Further down the road Pregel like Distributed GraphDB like functionality or some next generation StreamingDB may be integrated in. Incrementally building AlchemyDB on top of Citrusleaf will create a distributed computing fabric, functions can be shipped to data (that lies on a horizontally scalable low latency storage layer), and the functions can propagate their results across the fabric, calling other functions on them.

Full info at: http://www.aerospike.com/blog/alchemydb/

Posted in Uncategorized | Leave a comment

3 blog kings of data

I get my information on databases, datastores, big-data, etc… primarily from 3 bloggers: Todd Hoff of High Scalability, Alex Popescu of myNOSQL, and Curt Monash of DBMS2. Each of them specialises in different areas and each has their own style and purpose, taken as a whole, they cover a lot of ground w/o a lot of dilution.

In 2-10 years you will look back at Todd Hoff’s blog High Scalability and it will contain everything that is happening right now. He has a keen insight into which technologies are fads and which technologies may lead to big changes, and he is not at all full of shit, which is next to impossible given this task. He dabbles in reporting on truly innovative technologies and he actually understands them. His style is strictly-facts and he rarely bad mouths people/ideas/stuff. His blog is the remedy for the hardened cynic that thinks we are making no important advances. His weekly “Stuff the Internet says on Scalability”, is ALWAYS good for at least 3 links to stuff I find interesting (which is 3 links higher than every other summarized email I get).

Alex Popescu is the go to guy for NOSQL. Alex Popescu’s blog myNSOSQL covers the ENTIRE NOSQL gambit (GraphDB’s, DocumentStores, KeyValueStores, Hadoop/Mapreduce, etc…). He quickly sees thru most of the bullshit in the NOSQL world, and clearly explains the differences in a movement that is full of confusion. He likes to tear people’s points apart, his points are valid and he also blasts NOSQL ideas/approaches/etc… when they have it coming. His tweet stream (@al3xandru) is a raging river of NOSQL information, it will keep you on top of the NOSQL game (once you learn how to wade thru it), plug into it, and you can be up on most of NOSQL in probably a month.

Curt Monash knows the RDBMS market, especially the analytics market, better than you know anything :) He has been in the game forever, and he earns money w/ his blog DBMS2, so it cant be called 100% objective, but he has the type of abrasive personality that is only comfortable telling mostly the truth, so IMO the info in his blog is basically objective. RDBMS technologies are relatively very mature, advanced, and widespread, so having a good summary of what is going on in that market is a MUST for any fan of data. Monash is a definitions junky, which can be boring/tiring to read, but it represents a mature approach and does help make sense of such a large, complicated, and polluted-by-enterprise-generated-bullshit market. Having a guy w/ such experience who is still very up to date, reporting on one of the oldest (yet most active) fields in computing is of great value to all of us.

In conclusion:
Real smart people are reading these 3 blogs, learning what other real smart people are doing, and forming new even smarter ideas. For this to happen a medium of information exchange that does not waste smart peoples’ time, doesn’t insult smart peoples’ intelligence w/ obvious marketing ploys, and is written in a style that sparks their imagination, is required. So go read their blogs, you will learn stuff :)

Posted in Uncategorized | 1 Comment

AlchemyDB – The world’s first integrated GraphDB + RDBMS + KV Store + Document Store

I recently added a fairly feature rich Graph Database to AlchemyDB (called it LuaGraphDB) and it took roughly 10 days to prototype. I implemented the graph traversal logic in Lua (embedded in AlchemyDB) and used AlchemyDB’s RDBMS to index the data. The API for the GraphDB is modeled after the very advanced GraphDB Neo4j. Another recently added functionality in AlchemyDB, a column type that stores a Lua Table (called it LuaTable), led me to mix Lua-function-call-syntax into every part of SQL I could fit it into (effectively tacking on Document-Store functionality to AlchemyDB). Being able to call lua functions from any place in SQL and being able to call lua functions (that can call into the data-store) directly from the client, made building a GraphDB on top of AlchemyDB possible as a library, i.e. it didn’t require any new core functionality. This level of extensibility is unique and I am gonna refer to AlchemyDB as a “Data Platform”. This is the best term I can come up with, I am great at writing cache invalidation algorithms, but I suck at naming things :)

To elaborate what I mean when I say I mixed Lua-function-call-syntax into SQL (because it’s not clear and maybe not even good english:), here is an example AlchemyDB SQL call, w/ a bunch of LUA mixed in:
SELECT func1(colX), func2(nested.obj.x.y.z), col5 FROM table WHERE index = 33 AND func3(col3) ORDER BY func4(colX)

This request does the following:
1.) finds all rows from table w/ index = 33 (RDBMS secondary index traversal)
2.) filter rows out that don’t match lua function call: func3(col3) { func3() returns [true,false]}
3.) for every row that passes #2, run the functions [func1(colX), func2(nested.obj.x.y.z)] where ‘func1′ & ‘func2′ are previously defined lua functions, and ‘colX’ is a normal column and ‘nested.obj.x.y.z’ is a nested element in a LuaTable column, and combine the return values from the two function calls with col5′s contents to form a response row.
4.) these response rows are then sorted using the return value of the function call ‘func4(colX)’ as the cmp() function.

Besides SELECT calls, Lua-function-call-syntax has also been mixed into the SQL commands: INSERT, UPDATE, & DELETE, covering the 4 horseman of SQL. Full syntax found here.

In and of itself, this mixing of Lua-function-call-syntax into SQL could be viewed as a very developer friendly User-Defined-Functions mechanism and it is that. But the integration of Lua in AlchemyDB is much deeper, and it is the deepness of the integration that opens up new ways of programming within a datastore.

Alchemy’s GraphDB implemented its graph traversal logic in Lua (found here). Any global Lua function that was turned into byte-code (or possibly LuaJIT’ed to machine code) during interpretation is callable from the client via the LUAFUNC command. These Lua functions can call into the data-store (e.g. to get/set data) w/ the Lua alchemy() function. SQL Insert/Update/Delete triggers can be created via the LUATRIGGER command, enabling SQL commands to pass row data to Lua. The ability for both languages to call each other, done in a highly efficient manner using Lua’s virtual stack calling lua byte-code (meaning no interpretation in these cross language calls), allows the developer to treat AlchemyDB as if it were an AppServer w/ an embedded RDBMS (which BTW is another Alchemy Experiment).

A simple example of the Alchemy commands “LUAFUNC” and the Lua function “alchemy(,,,)” would be the function addSqlUserRowAndNode which creates a SQL row, and adds a GraphNode (via the createNamedNode() call) inside this row’s LUATABLE column.

function addSqlUserRowAndNode(pk, citypk, nodename)
alchemy('INSERT', 'INTO', 'users', 'VALUES', "(" .. pk .. ", " .. citypk .. ", {})");
alchemy('SELECT',"createNamedNode('users', 'lo', pk, '" .. nodename .."')", 'FROM', 'users', 'WHERE', 'pk = ' .. pk);
return "OK";
end

The lua function addSqlUserRowAndNode() can be called from the frontend w/ the following command
./alchemy-cli LUAFUNC addSqlUserRowAndNode 1 10 'A'

It is worth noting (at some point, why not here:) that as long as you keep requests relatively simple, meaning they dont look at 1000 table-rows or traverse 1000 graph-nodes, your performance will range between 10K-100K TPS on a single core w/ single millisecond latencies, these are the types of numbers people should demand for OLTP.

The most challenging feature of the GraphDB was the indexing of arbitrary relationships, which does not lend itself well to SQL’s CREATE INDEX syntax. In the example on the LuaGraphDB documentation page, the indexed relationship is “Users who HAS_VISITED CityX”, but the essence of indexing arbitrary relationships means indexing “Anything w/ SOME_RELATIONSHIP to SomethingElse (in a given or both directions) (possibly at a certain depth)”. And these indexed relationships can be nested graph relationships (i.e. Dude who KNOWS people who KNOW people who HAVE lotsOfCash), it is a bitch to frame in SQL’s syntax, so it made sense to keep the logic of indexing in Lua, for this use-case.

AlchemyDB’s GraphDB implemented the indexing of relationships using AlchemyDB’s Pure Lua Function Index (which for the record may be one of the top 5 worst names ever:). This index constructs & destructs itself via user defined lua function calls (i.e. you have to write them) declared in the CREATE INDEX statement, and leaves the population of the index (i.e. addToIndex() & deleteFromIndex()) to user defined functions (again you have to write them) AND it allows the index to be used in SQL’s where-clause, exactly the same as an indexed column is used (this part you dont have to write:). The GraphDB Pure-Lua-Function-Indexes every “USER who HAS_VISITED CityX” inside the Lua routines responsible for addRelationships() via function hooks registered during “construction” (sounds fuck all tricky, but its 10 lines of code). If someone then wants to find all the users who have visited CityX, it can be done in SQL, w/ a query like “SELECT * FROM users WHERE pure_lua_func_index() = CityX.pk”. A SQL query like this could also be used to find multiple start nodes (via an indexed lookup) in a complex graph traversal. This level of flexibility is not needed in most use cases, but it again represents a cyclical calling ability (Lua<->SQL) on a very deep level, and cyclical calling enables usecases².

This little experiment houses a RDBMS, a Nosql datastore (redis), a Document-Store (via the LUATABLE column type), and a GraphDB under ONE roof. They are unified, to a sane degree, by extending SQL and they can be integrated tightly via Lua function calls, as is needed. Any use-case that requires multiple types of data-stores and serves high velocity OLTP requests and fits w/in AlchemyDB’s various other quirks (InRam, single-threaded, single-node), can benefit greatly by using this approach to reduce the number of moving parts in their system, not need to sync data across/between data-stores, not need to store data redundantly, etc…

But the part I find really exciting about this: is that I added a GraphDB to a datastore in ten days, and the GraphDB is not too shabby. AlchemyDB’s bindings open up the possibility to add functionality to an already very able data-store in a highly extensible manner. It allows you to move your functionality to your data, and THIS is HUGE. It should be easy to create all sorts of efficient data-stores & data-based-services using this Data-platform, the GraphDB implementation proves the case.

And yes this was a lot of fun to code, and it was confusing as hell until it all came together and then it was surprisingly easy to code, it fit together smoothly, it felt right, and a GraphDB is neither a small nor a simple piece of software.

AlchemyDB: the lightweight OLTP Data Platform. Strong on functionality, flexibility, and being thoroughly misunderstood :)

p.s. The GraphDB code is brand new, and has been tested, but not enough, so if you use it be aware, there may be some bugs, but I’m busting my ass getting it production hardened, so if you have problems, just shoot me an email, I will fix’em. The branch for this code is called release_0.2_rc1.

Posted in Alchemy Database, redis | 6 Comments