InfluxDB’s next-gen time-series engine is built on Rust and supports SQL

As enterprises see an unprecedented increase in real-time data analytics, InfluxDB announced Wednesday that it is launching a next-generation time series engine for its InfluxDB Cloud managed database service.

According to market research firm IDC, time series data can be defined as a collection of data points that are collected at regular intervals with fixed timestamps.

These types of datasets are primarily used to reveal patterns or seasonality among other trends and can help business analytics teams describe and understand what is happening with the data and why, in order to make better business decisions, Amy Machado, research manager at IDC, wrote in a research report.

Databases or time series datasets have recently gained prominence with the advent of streaming technologies, Machado wrote, adding that unlike the earlier practice of downloading such a database in a high-latency batch processing, streaming technologies allow time-series data to flow. in the real-time database.

“A time-series database and analytics toolset works best to handle a large influx of continuous data first, and then successfully leverage massive data workloads for insights,” wrote Machado in the report.

Developed on Rust for performance, scale

The new engine, which is based on the company’s IOx open-source project introduced in 2020, was developed on the Rust programming language to improve scale and performance, the company said in a statement.

To support faster storage performance, the company says it has redesigned its column-oriented storage, allowing the engine to ingest high-volume data with unlimited cardinality.

Generally, a column-oriented database is faster than a row-oriented database because it uses less memory to store data. It also improves query output speeds because the system has to access a smaller portion of the database to process it.

Cardinality in a database management system can be defined as the relationships between data in two database tables. The more cardinality allowed, the better a database will scale.

The new engine can process queries on most time-series data in milliseconds, the company said, adding that it uses Apache Parquet files on disk storage and Apache Arrow for in-memory data operations between components.

Write queries in SQL

With the introduction of the new engine, the company said it is finally adding support for developers to write queries in SQL.

SQL is the most popular database operating language as it is used in most traditional relational databases.

“The SQL capability that InfluxDB boasts was actually built in from the start by Timescale, which has always been based on PostgreSQL,” said Tony Baer, ​​principal analyst at market research firm dbinsight.

Previously, InfluxDB allowed developers to write queries using APIs, Flux, and InfluxQL.

Flux, which is based on open source, is a self-contained scripting and query language focused on code reuse and optimized for extract, transform, and load (ETL), the company said.

InfluxQL, on the other hand, is a query language that has an SQL-like syntax.

Adding SQL support is a growing global trend for real-time data solutions, Machado said, noting that the number of developers who know SQL is significant. “SQL support can increase your adoption rates. You can use existing teams to add new use cases when offering SQL support. »

According to the company, all query languages ​​can be accessed through the DataFusion Query Engine, which is an extensible query planning, optimization and execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

Additionally, the new engine will add support for observability use cases, as enterprises will have access to data needed for observability, such as traces, logs and metrics, the company said.

InfluxDB against the competition

InfluxDB is highly rated when it comes to time-series data workloads and competes with Graphite, Prometheous, TimeScaleDB, QuestDB, Apache Druid, and DolphinDB, among others, according to database recommendation website dbengines .com.

Asked about InfluxDB’s momentum in the market, Baer said, “InfluxDB initially became an early favorite among developers, but they squandered that opportunity with incompatible forks that slowed their momentum.”

“In the meantime, time series data has become a checkbox with many operational and analytical cloud databases,” Baer added.

According to IDC, time series data or workloads have increased with the explosion of IoT and are in high demand for operations use cases in the oil and gas, logistics, supply chain, transportation and healthcare.

Copyright © 2022 IDG Communications, Inc.

Comments are closed.