Staff/Sr. AI Infra Performance Engineer

<strong>Summary<br><br></strong>Scaling machine learning workloads across thousands of GPUs and TPUs creates challenges that few engineers ever encounter. In Apple’s Machine Learning Platform Technologies organization, we build the infrastructure that powers large-scale ML training and inference workloads, bringing together expertise in distributed systems, machine learning infrastructure, and high-performance computing.<br><br><strong>Description<br><br></strong>As a performance engineer in the ML Compute Efficiency team, you’ll tackle ambiguous systems challenges, identify inefficiencies and build solutions that maximize accelerator utilization, reduce idle and fragmented capacity, and minimize recovery periods. This includes analyzing accelerator performance, digging into various parallelism techniques, and refining workload scheduling and orchestration across the compute fleet.<br><br><strong>Responsibilities<br><br></strong><ul><li>Characterize ML workload behavior through profiling, benchmarks and metrics. </li><li>Dive into unfamiliar codebases to prototype changes, evaluate tradeoffs, and build production-ready solutions. </li><li>Design systems for efficient recovery from failures and preemptions.</li><li>Create tools to identify and alert bottlenecks across applications and frameworks.</li><li>Use workload-driven insights to influence next-generation hardware selection and procurement decisions. </li><li>Collaborate closely with ML researchers and infrastructure engineers to address inefficiencies. </li><li>Drive impact through hands-on contribution and mentorship.<br><br></li></ul><strong>Minimum Qualifications<br><br></strong><ul><li>Experience with large-scale distributed systems for AI/ML workloads running on GPUs or TPUs.</li><li>Strong software engineering skills with experience developing and optimizing training frameworks (e.g. PyTorch, JAX) using C/C++ or Python. </li><li>Experience working on cross-functional projects with ML research and infrastructure teams.</li><li>Familiarity with model architectures and various training techniques. </li><li>Bachelor’s degree in Computer Science or equivalent experience, with 7+ years of industry experience.<br><br></li></ul><strong>Preferred Qualifications<br><br></strong><ul><li>Have a track record of delivering transformative performance improvements on large scale infrastructure. </li><li>Ability to analyze ambiguous, distributed systems problems and articulate both high-level strategic metrics and underlying technical complexity.<br><br></li></ul><strong>Pay & Benefits<br><br></strong>At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $181,100 and $318,400, and your base pay will depend on your skills, qualifications, experience, and location.<br><br>Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits.<br><br>Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.<br><br>Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.<br><br>Apple accepts applications to this posting on an ongoing basis.<br><br>

Back to blog
Ads

Common Interview Questions And Answers

1. HOW DO YOU PLAN YOUR DAY?

This is what this question poses: When do you focus and start working seriously? What are the hours you work optimally? Are you a night owl? A morning bird? Remote teams can be made up of people working on different shifts and around the world, so you won't necessarily be stuck in the 9-5 schedule if it's not for you...

2. HOW DO YOU USE THE DIFFERENT COMMUNICATION TOOLS IN DIFFERENT SITUATIONS?

When you're working on a remote team, there's no way to chat in the hallway between meetings or catch up on the latest project during an office carpool. Therefore, virtual communication will be absolutely essential to get your work done...

3. WHAT IS "WORKING REMOTE" REALLY FOR YOU?

Many people want to work remotely because of the flexibility it allows. You can work anywhere and at any time of the day...

4. WHAT DO YOU NEED IN YOUR PHYSICAL WORKSPACE TO SUCCEED IN YOUR WORK?

With this question, companies are looking to see what equipment they may need to provide you with and to verify how aware you are of what remote working could mean for you physically and logistically...

5. HOW DO YOU PROCESS INFORMATION?

Several years ago, I was working in a team to plan a big event. My supervisor made us all work as a team before the big day. One of our activities has been to find out how each of us processes information...

6. HOW DO YOU MANAGE THE CALENDAR AND THE PROGRAM? WHICH APPLICATIONS / SYSTEM DO YOU USE?

Or you may receive even more specific questions, such as: What's on your calendar? Do you plan blocks of time to do certain types of work? Do you have an open calendar that everyone can see?...

7. HOW DO YOU ORGANIZE FILES, LINKS, AND TABS ON YOUR COMPUTER?

Just like your schedule, how you track files and other information is very important. After all, everything is digital!...

8. HOW TO PRIORITIZE WORK?

The day I watched Marie Forleo's film separating the important from the urgent, my life changed. Not all remote jobs start fast, but most of them are...

9. HOW DO YOU PREPARE FOR A MEETING AND PREPARE A MEETING? WHAT DO YOU SEE HAPPENING DURING THE MEETING?

Just as communication is essential when working remotely, so is organization. Because you won't have those opportunities in the elevator or a casual conversation in the lunchroom, you should take advantage of the little time you have in a video or phone conference...

10. HOW DO YOU USE TECHNOLOGY ON A DAILY BASIS, IN YOUR WORK AND FOR YOUR PLEASURE?

This is a great question because it shows your comfort level with technology, which is very important for a remote worker because you will be working with technology over time...