Senior High Performance Computing (HPC) Systems Engineer

Lawrence Livermore National Laboratory
Livermore, CA 94550

Join us and make YOUR mark on the World!

Come join Lawrence Livermore National Laboratory (LLNL) where we apply science and technology to make the world a safer place; now one of 2020 Best Places to Work by Glassdoor!

We have an opening for a Senior High Performance Computing (HPC) System Engineer to support HPC clusters, including numerous high-speed, multi-petabyte Lustre file systems comprised of Linux servers and high performance RAID arrays all connected via Ethernet and Infiniband SANs.  You will independently contribute to complex technical projects using creativity and imagination.  This position is in the Livermore Computing (LC) Division within the Computing Directorate, supporting the LC Supercomputing Center.

Essential Duties
- Provide highly advanced system administration support for Linux-based HPC, Network Attached Storage (NAS) systems, Infrastructure and Parallel file systems servers and clusters. 
- Participate in the design and implementation of multiple complex Linux-based HPC, Infrastructure and Parallel file system servers and clusters.
- Build, configure, and maintain multiple RAID controllers and disk enclosures systems.
- Deploy and maintain Infiniband fabrics for compute and storage networks.
- Independently troubleshoot and determine root cause of moderately complex system issues.
- Analyze and tune performance of highly complex computer, network, file system and disk sub-systems.
- Investigate, evaluate, test and recommend technical solutions for future systems.
- Develop and maintain tools, and procedures to monitor and automate complex system tasks on servers and clusters.
- Perform other duties as assigned.

Qualifications
- Bachelors degree in Computer Science or related field, or the equivalent combination of education and related experience.
- Substantial experience with Linux/Unix in support of a number of independent but inter-related systems including installation, configuration, networking, backups, updates and patching, and ensuring security of systems with up-to-date operating system patches and the use of third-party utilities.
- Substantial experience with HPC environments and technologies such as Infiniband, Slurm, Lustre, and GPFS.
- Advanced knowledge of and substantial experience with scripting and programming languages, such as Python, Perl, and bash/csh/ksh.
- Significant experience with disk and storage systems, such as host-based RAID controllers, software RAID and vendor RAID systems (e.g. Network Appliance, Raid Inc, DDN, etc.).
- Substantial experience with version control and configuration management systems, such as Subversion, git, Ansible, cfengine, etc.
- Advanced knowledge of and significant experience providing innovative solutions to broadly defined tasks and problems.
- Expert verbal and written communication skills necessary to effectively collaborate in a team environment and present and explain technical information.
- Ability to work off-hours and on-call (intermittently either as needed or as part of a rotation).

Desired Qualifications
- Masters degree in Computer Science or related field.
- Experience with local, parallel and distributed file systems such as XFS, ZFS, GPFS, Lustre, and with NAS platforms such as Network Appliance cDot, as well as experience with Docker containers, and Kubernetes ecosystems.
- Experience with HPC system design and architecture, working on system procurements, and with vendors on HPC related issues.

Pre-Employment Drug Test:  External applicant(s) selected for this position will be required to pass a post-offer, pre-employment drug test.  This includes testing for use of marijuana as Federal Law applies to us as a Federal Contractor.

Security Clearance:  This position requires a Department of Energy (DOE) Q-level clearance.

If you are selected, we will initiate a Federal background investigation to determine if you meet eligibility requirements for access to classified information or matter. In addition, all L or Q cleared employees are subject to random drug testing.  Q-level clearance requires U.S. citizenship.  If you hold multiple citizenships (U.S. and another country), you may be required to renounce your non-U.S. citizenship before a DOE L or Q clearance will be processed/granted.  For additional information, please see DOE Order 472.2

Note:   This is a Career Indefinite position. Lab employees and external candidates may be considered for this position.

About Us

Lawrence Livermore National Laboratory (LLNL), located in the San Francisco Bay Area (East Bay), is a premier applied science laboratory that is part of the National Nuclear Security Administration (NNSA) within the Department of Energy (DOE).  LLNL's mission is strengthening national security by developing and applying cutting-edge science, technology, and engineering that respond with vision, quality, integrity, and technical excellence to scientific issues of national importance.  The Laboratory has a current annual budget of about $2.3 billion, employing approximately 6,900 employees.

 

LLNL is an affirmative action/ equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, marital status, national origin, ancestry, sex, sexual orientation, gender identity, disability, medical condition, protected veteran status, age, citizenship, or any other characteristic protected by law.

 

 

 

 

Categories

Before you go...

Our free job seeker tools include alerts for new jobs, saving your favorites, optimized job matching, and more! Just enter your email below.

Share this job:

Senior High Performance Computing (HPC) Systems Engineer

Lawrence Livermore National Laboratory
Livermore, CA 94550

Join us to start saving your Favorite Jobs!

Sign In Create Account