AI Cluster Architect

Intel
Hillsboro, OR 97123
  • Job Code
    JR0188248
  • Jobs Rated
    187th
Job Description

In this role you are joining a highly dynamic team geared to build a best-in-class, fastest scale-up and scale-out cluster to support the AI workloads of today and the future, on Intel.

In this role responsibilities include, although not limited to:

  • Architect all of the software components, and the glue, and at times defining that glue, to provide a complete end-to-end system to support dynamic provisioning, pervasive measurement/telemetry, frameworks and libraries, visualization of results, essentially all of the elements needed to run and optimize AI workloads (both from a software and hardware perspective) in an agile and sustainable way.
  • Research and development of hardware and software activities related to optimization for deep learning training and inference frameworks.
  • Working with multiple teams across multiple departments to jointly deliver to this mission. You will calculate and model the maximum attainable bandwidth of various network/storage/system topologies to set goals on what is realistically 'achievable', and working with several engineers to measure and attain those goals.
  • Participate in discussions with external cloud service providers to understand their requirements; these requirements will influence the reference design being built and how it evolves.
  • Create a blueprint that defines the infrastructure for software development (build and release, test automation, telemetry components, etc), enumeration of structural elements and how they work together (e.g. Fabric, discrete graphics components, etc.), automation entry and exit points, use-cases, and service metrics required to meet specific performance aspirations. With this blueprint, you will have identified all of the integration and dependencies, and you will be driving completion of those elements.
  • This infrastructure needs to be compatible with current and future business needs as well as in line with industry best practices and trends. You will ensure this new infrastructure fits the established architecture, infrastructure, and security systems within Intel.


To complement the software charter, it will be helpful if you have expertise in data center layout, mechanical design systems, cooling, power delivery, cluster topology.

In addition to the qualifications listed below, the ideal candidate will also have:

  • Relationship management
  • Effective influencing
  • Agile written and verbal communication


Qualifications

Minimum qualifications are required to be initially considered for this position. Preferred qualifications are in addition to the minimum requirements and are considered a plus factor in identifying top candidates.

Minimum Qualifications:

Bachelor's degree in Computer Science, Computer Engineering or any other related field and 9+ years of experience OR Master's degree in Computer Science,Master's degree in Computer Science, Computer Engineering or any other related field and 6+ years of experience OR PhD degree in Computer Science, Computer Engineering or any other related field and 4+ years of experience.

  • 7+ years of Experience with the Linux operating system
  • 7+ years of Experience administering high performance clusters for multiple users
  • 3+ years of experience designing clusters
  • 3+ year of experience with the technical concepts, architecture, systems, development methods, and disciplines associated with standing up, measuring and optimizing clusters


Preferred Qualifications:

Knowledge /expertise in one or more of these domains is highly desirable to best fit this need. Experience in:

  • Programming in at least one of the following languages (C, Python or Bash)
  • Managing cluster systems with 100+ nodes
  • Using external (Public) clouds (e.g. AWS, GCP, Azure)
  • Data center Network design (Fabric, Ethernet, etc) and virtualized networks
  • High performance fabric interconnects
  • Managing AI and HPC clusters with discrete GPUs (Nvidia, AMD or Intel)
  • Containers (Singularity, Podman, Charliecloud, Docker, Kubernetes, others)
  • Containerization as it pertains to HPC / AI workloads
  • Administering high performance cluster file systems (Lustre, GPFS, DDN, VAST, Others)
  • Supporting AI frameworks (TensorFlow, others)
  • Writing AI applications and using AI frameworks
  • MPI libraries, preferably Intel MPI
  • OneAPI libraries: oneDNN, oneCCL
  • Provisioning capabilities (Kubernetes, Openshift, SLURM, etc)
  • Collecting and analyzing telemetry in all parts of the HW/SW stack in a cluster
  • Prometheus, Grafana

Inside this Business Group

Intel Architecture, Graphics, and Software (IAGS) brings Intel's technical strategy to life. We have embraced the new reality of competing at a product and solution levelnot just a transistor one. We take pride in reshaping the status quo and thinking exponentially to achieve what's never been done before. We've also built a culture of continuous learning and persistent leadership that provides opportunities to practice until perfection and filter ambitious ideas into execution.



Other Locations

US, Arizona, Phoenix;US, California, Santa Clara;US, New Mexico, Albuquerque


Intel Corporation will require all new U.S. employees to be fully-vaccinated for Covid-19 as a condition of hire unless they have an approved accommodation in place under applicable law. Newly-hired employees will be required to provide proof of vaccination prior to their start date.



Posting Statement

All qualified applicants will receive consideration for employment without regard to race, color, religion, religious creed, sex, national origin, ancestry, age, physical or mental disability, medical condition, genetic information, military and veteran status, marital status, pregnancy, gender, gender expression, gender identity, sexual orientation, or any other characteristic protected by local law, regulation, or ordinance.

Jobs Rated Reports for Architect

Before you go...

Our free job seeker tools include alerts for new jobs, saving your favorites, optimized job matching, and more! Just enter your email below.

Share this job:

AI Cluster Architect

Intel
Hillsboro, OR 97123

Join us to start saving your Favorite Jobs!

Sign In Create Account
Architect
187th2019 - Architect
Overall Rating: 187/199
Median Salary: $79,380

Work Environment
Very Poor
190/220
Stress
Very High
204/220
Growth
Very Poor
183/220