Masha Basmanova

Greater Boston, MA

Velox Axiom LinkedIn velox-lib.io

Building the execution engine that unifies data processing at industry scale

I'm a software engineer who's spent 20+ years deep in systems code — from disaster recovery orchestration at VMware to co-creating Velox and Axiom, open-source C++ libraries that power query execution across Presto, Spark, and training data pipelines. Velox drives in-house data engines at large companies as well as cloud offerings from IBM and Google. I like hard problems at the intersection of performance and correctness.

The best execution engine isn't the one that's fastest on benchmarks — it's the one that makes every other system better by existing.

Velox was born from a simple observation: Meta's dozens of specialized data engines kept solving the same problems independently. Rather than optimize in isolation, we built a shared library of execution primitives — vectorized, adaptable, and designed for complex data types from day one. Axiom extends this vision to the front-end: SQL and DataFrame parsing, logical planning, and cost-based optimization that feed directly into Velox. Together, they make it possible to go from a SQL query or DataFrame program to optimized execution in a single, composable stack.

Deep Dive

Velox + Axiom: from SQL to execution

Velox is an open-source C++ library that provides the building blocks for data-intensive computation: expression evaluation, aggregation, sorting, joining, and more. Axiom completes the picture — it's the front-end layer that handles SQL parsing, logical planning, cost-based optimization, and query orchestration on top of Velox. Together, they form a full stack: write a SQL query, and the system parses, optimizes, and executes it end to end.

GitHub Stars

Contributors

Systems Integrated

2–8×

Speedup (CPU-bound)

Key integrations

Presto → Prestissimo

Replacing Java workers with C++

Drop-in C++ replacement for Presto workers, translating plan fragments into Velox plans for execution.

Production workloads at Meta saw average 6–7× speedups, with some queries improving over 10×. The Prestissimo project was the first end-to-end Velox integration, and many of its components became the core of the library.

Click to expand →

Spark → Gluten

C++ execution inside Spark

Intel-led integration allowing Velox to execute Spark SQL queries via a JNI bridge and Substrait plans.

Gluten decouples Spark's JVM from the execution engine, enabling Velox's vectorized processing within existing Spark environments without requiring migration.

Click to expand →

PyTorch → TorchArrow

ML data preprocessing

Dataframe library for deep learning pipelines, translating operations into Velox plans under the hood.

Consolidates ML preprocessing with analytical engines, exposing the same functions and semantics so ML practitioners get consistent behavior across the stack.

Click to expand →

Research

Published work

VLDB 2024

Simple (yet Efficient) Function Authoring for Vectorized Engines

L. Sakka, P. Pedreira, O. Erling, M. Basmanova, K. Wilfong, W. He, X. Meng, K. Pai, B. Vig

VLDB 2022

Velox: Meta's Unified Execution Engine

P. Pedreira, O. Erling, M. Basmanova, K. Wilfong, L. Sakka, K. Pai, W. He, B. Chattopadhyay

IEEE ICDE 2022

From Batch Processing to Real Time Analytics: Running Presto® at Scale

Z. Luo, L. Niu, V. Korukanti, Y. Sun, M. Basmanova, Y. He, B. Wang, et al.

Speaking

Conference talks

I'm a regular speaker at VeloxCon, the annual conference for the Velox open-source community, and have presented at all four editions since the event launched. In 2025, I was a keynote speaker at VeloxCon China in Beijing.

2026

VeloxCon 2026

Featured past speaker · Meta HQ, Menlo Park

2025

VeloxCon China 2025

Keynote Speaker · Beijing

Keynote

2025

VeloxCon 2025

"Training Data Loading with Velox" · "PySpark Meets Velox: Redefining Efficiency in ML Workloads"

2024

VeloxCon 2024

Speaker · Hosted by IBM & Meta

2023

VeloxCon 2023

Speaker

Building the execution engine that unifies data processing at industry scale

Two decades of systems engineering

Velox + Axiom: from SQL to execution

Key integrations

Replacing Java workers with C++

C++ execution inside Spark

ML data preprocessing

Published work

Blog posts

Conference talks

I'm always happy to talk about query engines, open source, and hard systems problems.