Data Engineer (Machine Learning)

At Voio, we’re rebuilding medical imaging for the people who rely on it every day. The data is immense, the challenges are real, and the impact is direct. If you want your work to matter — to clinicians, to patients, to the field itself — you’ll fit right in.

Apply Here

Role Description

At Voio, we’re redefining how radiologists work. Today, medical imaging is slowed by fragmented tools — one system to view scans, another to dictate, and another to search patient context. We’re building a unified system that connects it all: fast, intelligent, and deeply intuitive.

Our AI models originated from years of research at UC Berkeley and UCSF, but our mission goes far beyond the lab — we’re now building real-world systems that push the frontier of applied medical AI. Every line of code here helps doctors move faster, see clearer, and focus on care, not clicks.

Responsibilities

We’re looking for a Data Engineer (Machine Learning) to build and scale the pipelines that power Voio’s foundation models. You’ll develop reliable systems for ingesting, transforming, and serving multimodal medical data — enabling our AI models to learn from diverse imaging and clinical sources with precision and privacy.

You’ll work closely with ML researchers, backend engineers, and clinical partners to ensure that our data infrastructure meets the highest standards for performance, reproducibility, and compliance.

What You’ll Do

Design and maintain robust data pipelines to support large-scale medical imaging and multimodal datasets.
Build scalable ETL systems for model training, validation, and deployment workflows.
Develop data quality checks, lineage tracking, and observability across distributed systems.
Benchmark and optimize model performance in production environments, identifying system-level bottlenecks and driving improvements across the stack.
Collaborate with ML, infrastructure, and clinical teams to deliver efficient and reliable model training environments.
Deploy and manage GPU-based inference infrastructure using Triton and TensorRT for real-time AI applications.

Qualifications & Requirements

4+ years of experience in data engineering, ML infrastructure, or distributed systems.
Proficiency with Python and PyTorch, including experience in model training and deployment workflows.
Strong hands-on experience with NVIDIA Triton Inference Server and TensorRT.
Proven ability to design performant data pipelines and debug at the system level (GPU, I/O, or memory).
Familiarity with cloud environments (AWS, GCP) and containerized workflows (Docker, Kubernetes).

Desired Characteristics & Attributes

Experience with edge inference (Jetson, Orin, or equivalent).
Exposure to DICOM, HL7, or healthcare interoperability standards.
Prior experience building ML systems in regulated or safety-critical domains.

What We Offer

We hire for clarity, ownership, and judgment.
The ideal engineer:

Thinks in systems. Sees beyond individual tasks to how everything connects.
Executes with precision. Moves quickly without sacrificing long-term quality.
Owns outcomes. Takes responsibility across design, build, and delivery.
Builds with purpose. Writes code that improves lives, not just benchmarks.

Why Join Us

You’ll work directly with leading engineers, clinicians, and researchers from UC Berkeley and UCSF — building products that didn’t exist before. If you want to shape how AI enters the clinic, and you care about craft as much as impact, this is your team.

Apply Here