Protege logo

Forward Deployed Data Scientist (Healthcare)

Protege
12 days ago
Full-time
Remote

Company Overview:

We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. The process today is time intensive, incredibly expensive, and often ends in failure. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data.

Solving AI’s data problem is a generational opportunity. We’re backed by world-class investors and already powering partnerships with some of the most ambitious teams in AI. The company that succeeds will be one of the largest in AI — and in tech.

We’re a lean, fast-moving, high-trust team of builders who are obsessed with velocity and impact. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI.

Role Overview:

As a Forward Deployed Data Scientist (Healthcare Solutions Lead) in the Healthcare vertical, you will guide prospects and customers through the definition and delivery of healthcare datasets. Your job will be to understand what customers are building, identify the data that best fits their needs, and assemble and QA high-quality samples and final deliveries that meet their technical and conceptual specs. Along the way, you’ll ensure timelines and milestones are clearly communicated from the first stages of feasibility to the final data delivery.

What You Will Own

  • Lead solution architecture and deal design, translating customer requirements into a structured plan and driving it forward through internal project management, quality assurance, and cross-functional execution — while maintaining a customer-facing presence to ensure alignment and adapt solutions as needs evolve

  • Lead end-to-end program management from data specification and preparation through QA and delivery, ensuring cross-functional coordination and on-time execution

  • Work with Protege data partners to source cutting edge healthcare data into the Protege ecosystem

  • Oversee the QA, packaging, and delivery of complex datasets (EHR, claims, radiology, pathology, unstructured text), ensuring HIPAA compliance in collaboration with privacy partners

Who You Are

  • Proven customer-facing experience: skilled at managing expectations, leading customer conversations, and delivering technical outcomes with clarity and confidence

  • Bring an analyst-first mindset to challenges. You are an expert in using SQL and python to query data to construct complex patient cohorts, analyze data readiness for model training, validate clinical coverage, and support other customer-specific needs

  • FInd satisfaction by bringing order to multiple simultaneous projects and masterfully juggle competing (and sometimes changing) priorities

  • Deep expertise in various healthcare data modalities ranging from EHR, claims, radiology, pathology and unstructured text

  • Familiarity with privacy-preserving techniques of healthcare data

  • Experience in healthcare AI, ML products, or enterprise data platforms

  • Prior startup experience

  • You treat those around you with kindness

Why Protege

  • Be the connective tissue between Protege’s platform, our data, and our customers

  • Build datasets that directly power the next generation of AI models

  • Operate at the cutting edge of multimodal data — where human judgment meets machine intelligence