The headless data architecture is an organic emergence of the separation of data storage, management, optimization, and access from the services that write, process, and query it. With this architecture, you can manage your data from a single logical location, including permissions, schema evolution, and table optimizations. And, to top it off, it makes regulatory compliance a lot simpler, because your data resides in one place, instead of being copied around to every processing engine that needs it.
We call it a “headless” data architecture because of its similarity to a “headless server,” where you have to use your own monitor and keyboard to log in. If you want to process or query your data in a headless data architecture, you will have to bring your own processing or querying “head” and plug it into the data — for example, Trino, Presto, Apache Flink, or Apache Spark.
A headless data architecture can encompass multiple data formats, with data streams and tables as the two most common. Streams provide low-latency access to incremental data, while tables provide efficient bulk-query capabilities. Together, they give you the flexibility to choose the format that is most suitable for your use cases, whether it’s operational, analytical, or somewhere in between.
First, let’s take a look at streaming in the headless data architecture.