Job Description

Posted on:

November 1, 2024

CytoTronics is seeking a highly motivated and experienced Senior Data Engineer with a background in designing and building highly parallelized data processing pipelines. As a member of a small and dynamic team, the ideal candidate will have experience with a variety of data engineering tools and a passion for working closely with cross-functional teams to deliver scalable, efficient, and impactful data solutions.

Who We Are: CytoTronics is disrupting the traditional drug discovery process. Spinning out from Harvard after a decade of research, our proprietary semiconductor-cell interface can deliver high-dimensional functional assessment of live-cell responses at scale. The Pixel™ family of cloud-enabled cell-plate readers enable high-resolution, multiplexed, real-time assessment of live-cell characteristics, which provide deeper understanding of how chemical or genetic perturbations affect cell function. The software team provides support to all activities at CytoTronics, from the embedded software to data analysis tools running in the cloud. As an early member of the software team, you will play a key role in shaping our data infrastructure, pipelines, and processing tools, enabling deeper biological insights at scale. See this 90-second video to get a sense for our technology: Accelerating drug discovery with live cell insights at scale.

The Role: As a Senior Data Engineer at CytoTronics, you will be a hands-on contributor to designing and implementing scalable data pipelines that process large datasets generated by our instruments. Key responsibilities include:

  • Collaborating closely with biology, data science, and front-end teams to ensure seamless data flow from raw acquisition to advanced analysis.
  • Contributing to our data science toolbox API, used by internal and external Data Scientists.
  • Architecting and maintaining high-performance data pipelines that transform terabytes of raw data (e.g., images, videos, time-series measurements) into meaningful biological insights.
  • Working with the software team to design systems capable of handling the massive scale of our data operations.

Who You Are: You are a seasoned data engineer with significant experience in designing and building data processing pipelines. Key qualifications include:

  • 7+ years of experience in building highly parallelized data processing pipelines using Python.
  • Strong background in data manipulation and visualization libraries (NumPy, Pandas, SciPy, scikit-learn, matplotlib, seaborn, TensorFlow, etc.).
  • Experience in distributed data processing frameworks like Dask or similar.
  • Understanding of cloud-based data storage and processing, preferably AWS.
  • Proactive in optimizing data processes, identifying performance bottlenecks, and implementing creative solutions for reliability and scalability.
  • Comfortable contributing to DevOps activities and continuously improving CI/CD processes related to data infrastructure.
  • Curiosity to explore the technology stack and develop impactful solutions.

Nice to Have:

  • Signal/Image Processing
  • Scientific Data Analysis
  • Bioinformatics
  • Data Visualization
  • Software for scientific/analytical/imaging devices
  • Cloud infrastructure DevOps (Docker, Kubernetes, Terraform, etc.)

CytoTronics is an equal employment opportunity employer in Boston, United States. We offer a competitive salary and equity compensation package. This role is full-time and based out of our Boston South End office, with flexible in-person/work-from-home possibilities. This position reports to the Head of Software.

Secret insights

CytoTronics is on the rise with a 40% headcount growth, now at 25. They’re boosting engineering focus by 50% and HR by 30%. A solid place for AI talent looking for support and growth!