Job Description

Posted on:

October 31, 2024

About xAI

xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. Engineers are encouraged to work across multiple areas of the company, sharing the title 'Member of Technical Staff.' We operate with a flat organizational structure where all employees are expected to be hands-on and contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. A strong work ethic and prioritization skills are essential. Additionally, all engineers and researchers must possess strong communication skills to effectively share knowledge with their teammates. xAI does not use recruiters; every application is reviewed directly by a technical member of the team.

Tech Stack:

  • CUDA
  • CUTLASS
  • C/C++ and Python binding tools

Location:

The role is based in the Bay Area (San Francisco and Palo Alto). Candidates are expected to be located near the Bay Area or be open to relocation.

Focus:

  • Developing and improving low-level CUDA kernel optimizations for state-of-the-art inference and training software stack.
  • Profiling, debugging, and optimizing single and multi-GPU operations using tools such as Nsight.
  • Understanding GPU memory hierarchy and computation capabilities.
  • Implementing the latest methods from the deep learning literature in low-level CUDA kernels.
  • Innovating new ideas to optimize GPU performance.

Ideal Experiences:

  • Building high-performance GeMM CUDA kernels using Tensor cores or CUDA cores from scratch or utilizing CuTe/CUTLASS.
  • Implementing features for attention kernel by extending existing kernels or writing them from scratch.
  • Writing both forward and backward kernels while ensuring their correctness considering floating point errors.
  • Optimizing for both memory-bound and compute-bound operations.
  • Reasoning about register pressure, shared-memory usage, and GPU utilization, using tools such as Nsight to remove bottlenecks.
  • Familiarity with the latest techniques in optimizing inference and training workloads.
  • Using pybind to integrate custom-written kernels into a framework, especially JAX/XLA.

Interview Process:

  • After submitting your application, the team reviews your CV and statement of exceptional work.
  • If your application passes this stage, you will be invited to a 15-minute phone interview with basic questions.
  • If you clear the initial phone interview, you will enter the main process consisting of four technical interviews:
    • Coding assessment in a language of your choice.
    • Hands-on systems demonstration in a live problem-solving session.
    • Project deep-dive: Present your past exceptional work to a small audience.
    • Meet and greet with the wider team.
  • Our goal is to finish the main process within one week, with all interviews conducted via Google Meet.

Annual Salary Range:

$180,000 - $440,000 USD

California Consumer Privacy Act (CCPA) Notice

Secret insights

xAI is scaling rapidly! With 1008 employees, they've surged 75% in headcount. Engineering up 50%, showing solid tech focus. HR doubled, proving they value their team. Get in for big growth!