Systems Engineer - Server Reliability

Intel
Hillsboro, OR 97123
  • Job Code
    JR0180305
Job Description

The Data Center Platform Architecture Integration and Validation (PAIV) team is looking for an At-Scale Validation Architect to join their dynamic and growing organization. At Intel, we are building new and exciting memory technologies to drive innovation in the datacenter. That requires us to bring in new technical leaders who can steer us to success and help build new products with these technologies.

You will bring your broad understanding of multiple system areas and interfaces with Architecture, Design, and Post-Silicon Validation teams in improving at-scale validation content and providing feedback for future at scale platform level remote debug features.

The scope of the deliverable will vary based on the target platform and specific issue at hand. Hence you must have a solid big picture across the system (complete platform).

In this role, you will be responsible for:

  • Creating, defining, and developing the at-scale system validation environment and test plans.

  • Use and apply platform level tools and techniques to ensure performance to specifications.

  • Validation, enablement, and debugging of current and upcoming at-scale cloud native technologies on server products.

  • Development of methodologies, execution of validation plans, and debug of failures.

Successful candidates should have no barrier to time zones and be willing to travel to various Intel and customer sites for driving technologies, at-scale deployment, influencing customers and/or resolving problems within a reasonable notice.


Qualifications

You must possess the below minimum qualifications to be initially considered for this position. Preferred qualifications are in addition to the minimum requirements and are considered a positive factor in identifying top candidates. Requirements listed may be obtained through a combination of industry relevant job experience, internship experiences and or schoolwork/classes/research.

Minimum Requirements:

The candidate must have a Bachelor's degree in Electrical Engineering, Computer Engineering, Computer Science Engineering or related field, with 4+ years of experience or Master's degree with 3+ years or equivalent experience.

Minimum Qualifications:

  • Experience with reliability, availability and/or serviceability features.

  • Experience in troubleshooting kernel and/or systems issues occurring in server platforms.

  • Experience in systems architecture, flows involving hardware, software and/or firmware, hands-on debug.

Preferred qualifications:

  • Experience with hardware test equipment such as logic and/or protocol analyzer.

  • Programming experience managing the hardware equipment; with scripting experience around Python and/or C/C++.

  • Experience with debuggers such as GDB (GNU Debugger) or In-Target Probes, and/or ability to create kernel patches

  • PhD degree.

  • Must have delivered completed products.

  • Demonstrated understanding of buses, protocols, reset flows and chip interconnects (PCIe, DRAM etc), memory hierarchy and topology, especially around multi-socket coherency (e.g. UPI)/ error flows and recovery/ address translation and advanced scale debug techniques.

  • Experience with persistent memory and storage technology.

  • Deep understanding of Linux distributions and solutions development; design for experiments for technologies around virtualization, PnP, Security etc.

  • Experience running multi geo, multi- discipline taskforces around complex platform development and deployment.

  • Subject matter expert in one or more the following areas: as demonstrated by product concepts/POCs, accomplishments, recognitions and publications.

  • Understand use cases, develop configuration/ test/deployment strategies for a complex cloud scale platform/IT infrastructure/Telemetry and proven execution skills such as scheduling and risk management.

  • Experience with accelerators such as FPGA and platform efficiencies, optimizations.

Inside this Business Group

The Data Platforms Engineering and Architecture (DPEA) Group invents, designs & builds the world's most critical computing platforms which fuel Intel's most important business and solve the world's most fundamental problems. DPEA enables that data center which is the underpinning for every data-driven service, from artificial intelligence to 5G to high-performance computing, and DCG delivers the products and technologiesspanning software, processors, storage, I/O, and networking solutionsthat fuel cloud, communications, enterprise, and government data centers around the world.



Other Locations

US, Texas, Austin



Posting Statement

All qualified applicants will receive consideration for employment without regard to race, color, religion, religious creed, sex, national origin, ancestry, age, physical or mental disability, medical condition, genetic information, military and veteran status, marital status, pregnancy, gender, gender expression, gender identity, sexual orientation, or any other characteristic protected by local law, regulation, or ordinance.

Before you go...

Our free job seeker tools include alerts for new jobs, saving your favorites, optimized job matching, and more! Just enter your email below.

Share this job:

Systems Engineer - Server Reliability

Intel
Hillsboro, OR 97123

Join us to start saving your Favorite Jobs!

Sign In Create Account