Yeah, so our hosted solution is an EC2, and we started developing this in… Oh, man, I’m probably gonna get this wrong. This was 2016… Yes, 2016. I can’t believe it was that long ago. So yeah, this was in February of 2016, and what we were going to do for this kind of thing was essentially it looks basically like managed databases of services. The customer comes in, they sign up, they’ll get new EC2 instances where the InfluxDB clustered implementation will be deployed on containers, with some additional monitoring bits and stuff like that.
Then there is a bunch of stuff so that we can deal with the inevitable EC2 instance restarts, or another thing that we ended up putting in there is being able to literally clone a custom cluster and take its data and test it and play around with it, or spin up a new cluster with a new version and replicate the read/write traffic coming in to both the live cluster and the new test cluster.
Because this we were doing this in February 2016, Kubernetes wasn’t really mature at that point, since that project is really only like three and a half years old. So basically, what we did was we had pretty simple needs, and we just had a very small team working on it… They essentially wrote from scratch a container orchestrator in Go that also deals with the Amazon APIs and stuff like that.
But obviously, at this point, the writing is on the wall – Kubernetes is basically winning the orchestration game, and there’s a bunch of hooks and stuff that you can do within Kubernetes to customize it for your needs. Basically, probably about three or four months ago we took a look and we said, “Okay, one – this single-tenant architecture that we have is not really working as we scale up.” We run thousands of instances on EC2, and it means that, one, it’s a pain to coordinate all of that stuff and to monitor all of that stuff, but also we waste a lot of resources, because there are many customers who have very small workloads where a lot of their instances are basically just sitting idle, and this is exactly what cluster orchestration is for.
[00:27:45.15] Basically, our costs aren’t scaling properly with the number of customers, and we have to manage all these things… More importantly, if we want to release a feature in the database, we have to do it in the database, we have to test it as extensively as we can, and then we have to try and clone a few customer clusters and replicate the traffic, and then upgrade them. But the thing is we have to upgrade each of these clusters individually. It’s not like a SaaS service or a regular — a SaaS application usually, if it’s something that’s operating at scale, either in terms of the complexity or the traffic, you have a number of services underlying it, and you can deploy each of those individually. So it’s totally possible to develop and deploy a feature in a SaaS application, for instance, without deploying every single piece of code throughout the thing. Right now, clustered InfluxDB is very much a monolithic application. If you want to develop a new feature, you have to deploy the entire database, which means there’s a high risk to deploying code.
Basically, as we saw Kubernetes gaining in popularity and really maturing, I thought “Well, what if we started to try to think about for our cloud service and for a database in general, what would it look like if we actually designed it to run on Kubernetes from day one?” We took advantage of the primitives that Kubernetes has in terms of being able to schedule things, and we kind of separated out the different kinds of workloads that you have within a database.
Most databases are monolithic things, but they do a bunch of different things. Sometimes they’re just storing a bunch of data, sometimes they’re doing a bunch of query processing for a query that the user is running, sometimes they’re doing some re-indexing, or in our case, compactions on the background data… Or, because we also are a monitoring platform, we could be doing real-time monitoring and learning, or batch monitoring and learning. And trying to make all of that work in a single monolithic application I think is very, very hard, whereas if you break each of those out into separate services, you can tune them for the workload that they have to be built for. Then, once you pair that up with Kubernetes, you can have it manage deployment and the shrinking and growing of those services individually.
This year, for our cloud thing, that’s our big project – to try to move from this single-tenanted architecture to a multi-tenanted architecture that still has workload isolation across tenants, but it has the ability to decouple storage from compute, from processing for ETL monitoring tasks. Really the first part of that that we started doing last year was the development of our new query engine and query language, which is actually open source and up on GitHub.
What we did with that was we actually decoupled the engine in the language from the actual data storage tier. So the nice thing that gets you is you can deploy new query processors as basically share-nothing application servers that can just be spun up on the fly… Which actually, again, that’s actually Kubernetes’ sweet spot.