Why and When to use a Data Store


#1

It’s been a few years since we launched our Data Store feature and it quickly became a critical and widely used component of Chartio.

Creating a data store is very similar to creating a chart. You use the same data pipeline to query and manipulate your data, but in the end the database is piped into another database table rather than a visualization. You specify a name for that table and what time a day you would like to repopulate it and Chartio will continually keep it up to date. You can then make other visualizations and variables off of this data store just like any other table.

A number of people (especially new customers) are still unclear on why and when to use a data store so we’ve made the following list of the main use cases.

1. Your queries take a long time and you need a summary table.

Chartio customers have all sizes of databases. It’s not uncommon for a customer to be querying hundreds of billions of rows of data for several different charts on a dashboard.

Multiple queries like that can be incredibly taxing on your warehouse and take a long time to complete. A best practice for these cases is to create a pre-aggregated summary table. You can write your own scripts to create and manage these summary tables or you can use Chartio’s simple data stores.

2. You don’t want to repeat yourself.

You may find that many charts are relying on very similar or the same subquery. It’s also very common to have to frequently merge the same sets of data together as a basis for several different visualizations. Whatever your reason, if you’re finding you’re having to repeat yourself a lot you should consider storing the results of that base query into a data store.

From there your charts can more easily (and responsively) work off of that data store table and don’t have to include all the same building steps each time.

3. You want a simplified, cleaned data model.

Data stores are also commonly used to provide a simple, cleaned dataset for your team. Marketing for example, may have fairly complex schemas on many different sources in various stages of mess. They may find the raw version too complex or inconsistent to easily explore. You can simplify the process for them greatly by utilizing data stores with clean, aggregated, simplified tables for them.

Limitations and what to do

Currently data stores only support up to 1 million rows of data. This is by design, as these are intended to be summary/aggregate tables which ideally don’t get too big. It’s a helpful tool that covers about 80% of the use cases but definitely isn’t a full ETL solution. If you need larger aggregate tables you’re entering into ETL land and need to perform those on your database/warehouse.

Here are a few helpful tips for doing so.

- Summary Tables and or Views

Data stores are pretty much just summary tables that Chartio makes a bit easier. You can do them on your own database as well, and if you’re dealing with a lot of data, or just want to have more control this is a great option.

To do this create VIEWS or persistent TABLES that simply store the results of your query. Each database deals with this in different ways and the documentation is widely available.

- Extract Transform Load (ETL) tools (Xplenty)

If you need help automating creating these tables, and especially if you have data coming from multiple sources you may want to utilize one of the many great ETL tools available such as Xplenty. They’re somewhat similar to the Chartio data pipeline but designed for much larger sets of data and have more granular control over the timing and failover events of your operations.


#2

Great write up Dave,

I agree that Data Stores are a killer feature. What happens if your datastore hits the row limit? Say you have an aggregate thats is fine and then grows past 1,000,000 (or as recently happened to me, the default limit that I forgot to change when i created the data store)?

j