SingleStore

SingleStore
Genre	RDBMS
Founded	January 2011
Founders	Eric Frenkiel; Nikita Shamgunov; Adam Prout;
Headquarters	San Francisco, CA (HQ); Seattle, WA (hub); Raleigh, NC (hub);
Area served	Worldwide
Number of employees	250
Website	www.singlestore.com

SingleStore is a distributed, relational, SQL database management system^[2] (RDBMS) that features ANSI SQL support and is known for speed in data ingest, transaction processing, and query processing. SingleStore was formerly known as MemSQL.^[3]^[4]

SingleStore primarily stores relational data, though it can also store JSON data, graph data, and time series data. It supports blended workloads, commonly referred to as HTAP workloads, as well as more traditional OLTP and OLAP use cases. For queries, it compiles Structured Query Language (SQL) into machine code. The SingleStore database engine can be run in various Linux environments, including on-premise installations, public and private cloud providers, in containers via a Kubernetes operator, or as a hosted service in the cloud known as SingleStore Managed Service.

History[]

On April 23, 2013, SingleStore launched its first generally available version of the database to the public as MemSQL.^[5] Early versions only supported row-oriented tables, and were highly optimized for cases where all data can fit within main memory. This design was based on the idea that the cost of RAM would continue to decrease exponentially over time, in a trend similar to Moore's law. This would eventually allow most use cases for database systems to store their data exclusively in memory.

Shortly after launch, MemSQL added general support for an on-disk column-based storage format to work alongside the in-memory rowstore.^[6] The decreases in cost of memory slowed over time, and the market for purely in-memory database systems largely failed to materialize, with increasing demand for disk-based OLAP workloads. Thus, over time, MemSQL's columnstore became a major focus and a crucial feature for customers.

On October 27, 2020, MemSQL rebranded to SingleStore to reflect a shift in focus away from exclusively in-memory workloads. The new name highlights the goal of achieving a universal storage format capable of supporting both transactional and analytical use cases.^[7]

Architecture[]

Row and column table formats[]

SingleStore can store data in either row-oriented tables ("rowstores") or column-oriented tables ("columnstores"). The format used is determined by the user when creating the table.

Rowstore tables, as the name implies, store information in row format, which is the traditional data format used by RDBMS systems. Rowstores are optimized for singleton or small insert, update or delete queries and are most closely associated with OLTP (transactional) use cases. Data for rowstore tables is stored completely in-memory, making random reads fast, with snapshots and transaction logs persisted to disk.

Columnstores are optimized for complex SELECT queries, typically associated with OLAP (analytics) and data warehousing use cases. As an example, a large clinical data set for data analysis is best stored in columnar format, since queries run against it will typically be ad hoc queries where aggregates are computed over large numbers of similar data items. Data for columnstore tables is stored on-disk, supporting fast sequential reads and compression that typically reaches 5-10x.

Indexing[]

Rather than the traditional B-tree index, SingleStore rowstores use skiplists optimized for fast, lock-free processing in memory.^{[citation needed]} Columnstores store data indexed in sorted segments, in order to maximize on-disk compression and achieve fast ordered scans. SingleStore also supports using hash indexes as secondary indexes to speed up certain queries.

Distributed architecture[]

A SingleStore database is distributed across many commodity machines. Data is stored in partitions on leaf nodes, and users connect to aggregator nodes.^[1] A single piece of software is installed for SingleStore aggregator and leaf nodes; administrators designate each machine’s role in the cluster during setup. An aggregator node is responsible for receiving SQL queries, breaking them up across leaf nodes, and aggregating results back to the client. A leaf node stores SingleStore data and processes queries from the aggregator(s). All communication between aggregators and leaf nodes is done over the network using SQL. SingleStore uses hash partitioning to distribute data uniformly across the number of leaf nodes.^[8]

Real-time Ingest[]

SingleStore Pipelines allow for real-time data ingest. A pipeline is a native connector to data sources such as Apache Kafka, Apache Spark, Amazon S3 buckets, Microsoft Azure Blob Storage, or files on disk. The Pipeline pulls data into the system at high speed. Because of the lock-free skip lists, queries can retrieve the data as soon as it lands, but are not blocked from continuing while data is imported.

Durability[]

Durability for the in-memory rowstore is implemented with a write-ahead log and snapshots, similar to checkpoints. With default settings, as soon as a transaction is acknowledged in memory, the database will asynchronously write the transaction to disk as fast as the disk allows.^[9]

The on-disk columnstore is actually fronted by an in-memory rowstore-like structure, indexed using a skiplist. This structure has the same durability guarantees as the SingleStore rowstore. Apart from that, the columnstore is durable, since its data is stored on disk.

Replication[]

A SingleStore cluster can be configured in "High Availability" (HA) mode, where every data partition is automatically created with master and slave versions on two separate leaf nodes. In HA mode, aggregators send transactions to the master partitions, which then send logs to the slave partitions. In the event of an unexpected master failure, the slave partitions take over as master partitions, in a fully online operation with no downtime.

Distribution formats[]

SingleStore San Francisco office in 2020

SingleStore can be downloaded for free and run on Linux for systems up to 4 leaf nodes of 32 gigs RAM each; an Enterprise license is required for larger deployments and for official SingleStore support. SingleStore clusters can be managed in containers using the SingleStore Kubernetes Operator. SingleStore is also available as a managed service named SingleStore Managed Service, available in various regions in Google Cloud and Amazon Web Services, with a Microsoft Azure implementation promised for the near future. The underlying engine and potential system performance are identical in all distribution formats.

SingleStore ships with a set of installation, management, and monitoring tools called SingleStore Tools. When installing SingleStore, Tools can be used to set up the distributed SingleStore database across machines. SingleStore also provides a browser-based query and management UI called SingleStore Studio, which provides query processing and database monitoring, and shows health and informational details about the running cluster.

References[]

^ Jump up to: ^a ^b Tuesday (2012-08-14). "MemSQL Architecture - The Fast (MVCC, InMem, LockFree, CodeGen) and Familiar (SQL)". High Scalability. Retrieved 2019-08-13.
^ "MemSQL". Retrieved 2017-09-29.
^ "MemSQL is now SingleStore". businesswire.com (Press release). San Francisco. Archived from the original on 9 Apr 2021. Retrieved 9 Mar 2021.
^ Ravita, Domenic (21 Oct 2020). "MemSQL is Now SingleStore". Archived from the original on 9 Apr 2021.
^ Frenkiel, Eric (2013). "MemSQL ships 2.0. Scales in-memory database across hundreds of nodes, thousands of cores" (published 2013-04-23). Retrieved 2013-04-23.
^ https://www.singlestore.com/blog/celebrating-memsql-availability-two-years-in/
^ https://www.singlestore.com/blog/memsql-singlestore-then-there-was-one/
^ "Introduction to MemSQL | DBMS 2 : DataBase Management System Services".
^ "Using Durability and Recovery". docs.memsql.com. Retrieved 2018-01-19.

External links[]

[highscalability-sponsored-article-1] Jump up to: ^a ^b Tuesday (2012-08-14). "MemSQL Architecture - The Fast (MVCC, InMem, LockFree, CodeGen) and Familiar (SQL)". High Scalability. Retrieved 2019-08-13.

[2] "MemSQL". Retrieved 2017-09-29.

[3] "MemSQL is now SingleStore". businesswire.com (Press release). San Francisco. Archived from the original on 9 Apr 2021. Retrieved 9 Mar 2021.

[4] Ravita, Domenic (21 Oct 2020). "MemSQL is Now SingleStore". Archived from the original on 9 Apr 2021.

[5] Frenkiel, Eric (2013). "MemSQL ships 2.0. Scales in-memory database across hundreds of nodes, thousands of cores" (published 2013-04-23). Retrieved 2013-04-23.

[6] ttps://www.singlestore.com/blog/celebrating-memsql-availability-two-years-in/

[7] ttps://www.singlestore.com/blog/memsql-singlestore-then-there-was-one/

[8] "Introduction to MemSQL | DBMS 2 : DataBase Management System Services".

[9] "Using Durability and Recovery". docs.memsql.com. Retrieved 2018-01-19.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]