Masha Basmanova
Masha Basmanova
Greater Boston, MA

Building the execution engine that unifies data processing at Meta-scale

I'm a software engineer who's spent 20+ years deep in systems code — from disaster recovery orchestration at VMware to co-creating Velox and Axiom, the open-source C++ libraries that power query execution across Presto, Spark, and PyTorch at Meta. I like hard problems at the intersection of performance and correctness.

The best execution engine isn't the one that's fastest on benchmarks — it's the one that makes every other system better by existing.

Velox was born from a simple observation: Meta's dozens of specialized data engines kept solving the same problems independently. Rather than optimize in isolation, we built a shared library of execution primitives — vectorized, adaptable, and designed for complex data types from day one. Axiom extends this vision to the front-end: SQL parsing, logical planning, and cost-based optimization that feed directly into Velox. Together, they make it possible to go from a SQL query to optimized execution in a single, composable stack.

Career

Two decades of systems engineering

Meta (Facebook) Sep 2014 – Present · 11+ years
Software Engineer
Co-creator of Velox, a unified C++ execution engine providing reusable, high-performance data processing components across Meta's data infrastructure — and Axiom, the front-end layer that brings SQL parsing, logical planning, cost-based optimization, and query orchestration to Velox-powered systems. Together, they form a complete stack from SQL to execution. Published at VLDB 2022 and IEEE ICDE 2022. Co-authored follow-up work on vectorized function authoring (VLDB 2024).
IC — Deep Technical
VMware Apr 2011 – Sep 2014 · 3.5 years
Staff Engineer
Core designer and implementer on vCloud Director, collaborating across vCenter and ESX teams to architect cloud services for a 30+ developer organization. Mentored junior engineers while staying hands-on with Java, Spring, OSGi, and Hibernate at the platform's core.
IC + Leadership
VMware Oct 2006 – Apr 2011 · 4.5 years
Sr. MTS — Team Lead, Storage
Led the storage component of Site Recovery Manager, designing the Storage Replication Adapter interface for fully automated disaster recovery. Worked directly with EMC, NetApp, Dell, and HP to ship SRA integrations. Orchestrated failover and failback workflows across vCenter and ESX in C++.
IC + Leadership
Phase Forward Jul 2004 – Sep 2006 · 2 years
Principal Software Engineer
Clinical data management software — where correctness and reliability are non-negotiable.
IC
Education 1996 – 2001
MS, Mathematics
A math background that shows up everywhere: algorithm design, performance analysis, and the kind of precise thinking that systems engineering demands.
Foundation

Velox + Axiom: from SQL to execution

Velox is an open-source C++ library that provides the building blocks for data-intensive computation: expression evaluation, aggregation, sorting, joining, and more. Axiom completes the picture — it's the front-end layer that handles SQL parsing, logical planning, cost-based optimization, and query orchestration on top of Velox. Together, they form a full stack: write a SQL query, and the system parses, optimizes, and executes it end to end.

SQL Query · Dataframe AXIOM — FRONT-END FOR VELOX SQL Parser PrestoSQL dialect Logical Plan relations + exprs Optimizer cost-based Query Runner orchestration Produces optimized multi-fragment Velox plan VELOX — UNIFIED EXECUTION ENGINE Expression Eval vectorized Operators join, agg, sort Type System nested, complex Functions simple + vectorized Vectors / Memory Arrow-compatible Connectors & I/O Parquet, ORC, DWRF Storage Adapters S3, HDFS, Tectonic Resource Mgmt memory, spill, cache STORAGE · RAM CACHE · SSD · REMOTE (TECTONIC / S3 / HDFS)
0+
GitHub Stars
0+
Contributors
0+
Systems Integrated
2–8×
Speedup (CPU-bound)

Key integrations

Presto → Prestissimo

Replacing Java workers with C++

Drop-in C++ replacement for Presto workers, translating plan fragments into Velox plans for execution.

Production workloads at Meta saw average 6–7× speedups, with some queries improving over 10×. The Prestissimo project was the first end-to-end Velox integration, and many of its components became the core of the library.
Click to expand →
Spark → Gluten

C++ execution inside Spark

Intel-led integration allowing Velox to execute Spark SQL queries via a JNI bridge and Substrait plans.

Gluten decouples Spark's JVM from the execution engine, enabling Velox's vectorized processing within existing Spark environments without requiring migration.
Click to expand →
PyTorch → TorchArrow

ML data preprocessing

Dataframe library for deep learning pipelines, translating operations into Velox plans under the hood.

Consolidates ML preprocessing with analytical engines, exposing the same functions and semantics so ML practitioners get consistent behavior across the stack.
Click to expand →
Research

Published work

VLDB 2024
L. Sakka, P. Pedreira, O. Erling, M. Basmanova, K. Wilfong, W. He, X. Meng, K. Pai, B. Vig
VLDB 2022
P. Pedreira, O. Erling, M. Basmanova, K. Wilfong, L. Sakka, K. Pai, W. He, B. Chattopadhyay
IEEE ICDE 2022
Z. Luo, L. Niu, V. Korukanti, Y. Sun, M. Basmanova, Y. He, B. Wang, et al.
Speaking

Conference talks

I'm a regular speaker at VeloxCon, the annual conference for the Velox open-source community, and have presented at all four editions since the event launched. In 2025, I was a keynote speaker at VeloxCon China in Beijing.

2026
Featured past speaker · Meta HQ, Menlo Park
2025
Keynote Speaker · Beijing
Keynote
2025
"Training Data Loading with Velox" · "PySpark Meets Velox: Redefining Efficiency in ML Workloads"
2024
VeloxCon 2024
Speaker · Hosted by IBM & Meta
2023
VeloxCon 2023
Speaker

I'm always happy to talk about query engines, open source, and hard systems problems.

Whether it's about Velox, database internals, performance engineering, or interesting technical challenges — I'd love to hear from you.