Scalable datastore for metrics, events, and real-time analytics
arrow-ipc
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
Data Agent Ready Warehouse : One for Analytics, Search, AI, Python Sandbox. — rebuilt from scratch. Unified architecture on your S3.
Apache DataFusion SQL Query Engine
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
The open-source Observability 2.0 database. One engine for metrics, logs, and traces — replacing Prometheus, Loki & ES.
The live data layer for apps and AI agents. Create up-to-the-second views into your business, just using SQL
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
Official Rust implementation of Apache Arrow
A native Rust library for Delta Lake, with bindings into Python
An extensible, state-of-the-art framework for columnar compression, and the fastest FOSS columnar file format. Formerly at @spiraldb, now an Incubation Stage project at LFAI&Data, part of the Linux Foundation.
A portable accelerated SQL query, search, and LLM-inference engine, written in Rust, for data-grounded AI apps and agents.
Parseable is an observability datalake built from first principles.
📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.
Tonbo is an embedded database for serverless and edge runtimes.
Scalable graph analytics database powered by a multithreaded, vectorized temporal engine, written in Rust
A single-node analytical database engine with geospatial as a first-class citizen
GeoArrow in Rust, Python, and JavaScript (WebAssembly) with vectorized geometry operations
Protocol and libraries for sending and receiving OpenTelemetry data using Apache Arrow
Lakehouse native graph engine with git-style workflows
A timeseries database created for events, logs, traces and metrics. Speaks the postgres dialect, and stores data in s3 via delta lake protocol
DuckLake took Flight. Welcome to SwanLake.
On-device property graph database. Schema-as-code. One CLI → One Folder. No Server. Think: DuckDB for graphs.
Databricks's Zerobus Ingest SDKs
Manage Multimodal Agentic Context Lifecycle with Lance
High-performance, DSL-free stream processing
Scalable Observability
Open-source streaming SQL engine written in Rust using Apache Arrow and DataFusion. Supports continuous queries, temporal stream joins, tumbling/session windows, and CDC/Kafka connectors. Lightweight, embeddable, and sub-microsecond latency
Uni is a modern, embedded database that combines property graph (OpenCypher), vector search, and columnar storage (Lance) into a single, cohesive engine. It is designed for applications requiring local, fast, and multimodal data access, backed by object storage (S3/GCS) durability.
KalamDB — a lightweight, real-time, storage-efficient SQL database. Designed for per-user data isolation and scalable performance — ideal for the AI era.
Building block library for using Apache Arrow in Rust WebAssembly modules.
A Rust ingester for GreptimeDB, which is compatible with GreptimeDB protocol and lightweight.
Orbit, aka the GitLab Knowledge Graph, is a project that aims to provide a unified context API for AI systems and human users. This project has both a local Knowledge Graph for your code and a backend service for the entire SDLC.