Recently I had the chance to play with a storage engine for MySQL called Spider, so I’m jotting down some notes on what it is and what my initial thoughts were.
Spider is not your typical storage engine: it allows you to redirect your data and queries to other MySQL instances depending on field values. That means you can use it to shard databases in a relatively easy way.

Spider for MySQL
Currently it’s available for MySQL versions higher than 5.1. Get it here:
Spider for MySQL in Launchpad
Here’s the creator’s blog where release info and such are posted (in Japanese):
Wild Growth 日本語
What to configure
- Set up four MySQL servers (A, B, C, and D)
- Create your normal schema in B, C, and D
- Create tables with the same schema as above, but with ENGINE=Spider. In this CREATE TABLE statement, specify B, C, D as databases where the data is actually stored.
You can get the details for configuration in the official docs. You’ll probably want to check the links below as well.
How it’s used
Just throw queries to server A like you normally would, and Spider will do the rest for you. No data is stored in A at all: everything is distributed over to the other three, which effectively achieves database sharding. What’s great about using Spider to shard your data is that you don’t have to make any changes in your client app code whatsoever.
What’s actually happening
Say you throw this query to server A: code>INSERT INTO users VALUES (‘someone@domain.com’, ‘Bobby’);. The INSERT statement is passed over to one of the three other servers depending on the conditions you set when you created the Spider table (for instance, by checking id mod 3). So what happens is that the row is stored in either B, C, or D, and A never holds any data at all.
From the client app point of view, all that happens is that you throw a query to A and A returns a result set, just like any old MySQL server. Other queries like SELECTs gets handled in the same transparent manner.
Of course, this won’t distribute the request load, but they say that IO loads make up for nearly all of a database server’s loads, don’t they?
This is all built upon a feature in MySQL called partitioning (which I didn’t know existed till recently, having been a very light user of MySQL). It’s a feature that lets you save data rows to different places in your local filesystem depending on the value of a particular column. What Spider does is it makes the three data tables (B, C, and D in this case) act as MySQL partitions.
Initial thoughts
It actually does what it says it does. It’s cool how in most cases you probably won’t have to change your app code at all thanks to the transparent query processing. So if your app code is not scale-ready, this might be a good solution to make some leeway.
But here’s the catch. The consequences of being built upon MySQL partitioning might be a pain in the ass. I think I’ll discuss that in another post.
Oh, and not being able to find a lot of tips, howtos and usage reports on the web doesn’t make me particularly happy either. There probably isn’t a whole lot of users yet. I pretty much like the concept and it does work, but I get this sort of fear of finding another pitfall along the way.
I’m looking forward to its gaining a bigger user base and getting better and more stable as it gets more bug reports and feature requests.
References
The Data Charmer: Test driving the Spider storage engine – sharding for the masses
Sharding for the masses: Introducing the SPIDER storage engine (OpenSQLCamp @ FrOSCon) | Colin Charles Agenda