Adapting Continuous Integration and Delivery to Hardware Quality

A hyperscale infrastructure demands a high level of automation to hardware testing to increase productivity and rigor. The idea was to automate the traditional methods of qualifying servers and server components by applying CI/CD (Continuous Integration and Continuous Deployment) principles of software development to the hardware development lifecycle.

What got us started

Systems hardware engineering has always had a different and a lower degree of automation than software development. In some ways, physical tests and infrastructure makes it a harder problem. Traditionally, hardware engineers have been qualifying servers primarily with manual testing. This is not only a time-consuming process, but it also limits the rigor of testing. And we all know manual testing cannot be scalable. 

Wait, we do similar things in software

Qualifying hardware involves a lot of similar activities to qualifying software, such as configuration, writing and maintaining test cases/test scripts, triggering various test scenarios, collecting metrics, analyzing the results, keeping track of the logs, notifications, etc. etc. Orchestrating these tasks in the right sequence to implement an automated regression testing pipeline would simplify the tedious hardware validation processes.

We came up with a novel approach using CI/CD concepts. We developed four key modules in-house and for the core, we used familiar DevOps (Development Operations) tools like Jenkins, GIT, and Puppet, and deployed our custom modules around a Jenkins pipeline.

The Whiteboard

design

The design is a hardware regression system that enables the hardware team to productively conduct reliability and performance testing of a continuously changing physical infrastructure at a click of a button. It interfaces seamlessly with platform software via standard GIT repositories and familiar Grafana dashboards. 

Something old, something new

The tools we use include the following:

Reuse

  • Source Control System: GIT
  • Orchestrator: Jenkins
  • Server Configuration System: Puppet
  • Analyze Time Series Data: InfluxDB/Grafana

Create

  • Lab reservation System: In-house developed tool to keep track of our assets in the lab
  • Composer: In-house developed utility to compose the required testing sequence
  • Test Executor: In-house developed framework to execute tests on the desired systems under test
  • Results Dashboard: In-house developed UI to display test results, links to logs and graphs

Just like software code needs to be deployed into production environment, we treat our severs as hardware to be deployed into our data center. Any code changes made to a software application goes through a series of testing, starting with unit testing, integration testing and regression testing. Similarly any changes made to our servers and server components like drives (SSD, HDD), DIMMS, NIC, BMC, BIOS, even firmware/driver upgrade, go through similar testing activities.

Features

This regression testing facility has the ability to trigger tests:

  • On a schedule
  • By manually clicking the start button
  • By automatically triggering tests by detecting changes made to the source control system.

It has three different types of testing categories:

  1. Unit Testing: To qualify individual server components including firmware/driver upgrades
  2. Integration Testing: To qualify the server as a whole (Sysbench testing)
  3. Regression testing: Full stack application benchmarking of various eBay applications like Cassini (eBay Search Engine), Cloud (front-end applications), NoSQL, Hadoop, Zoom (Object Store), etc. that run in our data centers

Proof of the Pudding….

We have successfully deployed this Regression Facility to qualify and release two generations of eBay servers (Intel Broadwell and Intel Skylake) and continue to on-board a variety of hardware/SKU combinations. Our customers are primarily our hardware engineers who work with various external vendors to get evaluation hardware into our labs. Then we partner with our internal eBay application teams that consumes these servers to develop performance application benchmark which qualifies these server using our automation framework. 

… is in the Eating!

This automated regression system can execute tests at a click of a button, in parallel, on a variety of SKUs and different applications at the same time. The results are presented on a dashboard along with links to logs, metrics, and charts, making it easy for our hardware engineers to analyze the data. This helps them make informed and quick decisions regarding the hardware reliability and performance. 

  • People time slashed dramatically! At least an order of improvement in many cases.
  • Manual errors are avoided, except when someone writes wrong test case, of course.
  • 24/7 efficiency of regressions - automatically triggered
  • Hardware engineers have now more time for new hard engineering challenges

The final product: eBay built, tested and deployed server in our data center

ebay server

Metrics

  • Real-time Cassini application (eBay Search Engine) testing metrics: Queries Per Second (qps) and Latency captured during a Cassini benchmark regression testing run in our lab

cassini qpscassini latency

  • Real-time Cloud application testing using SpecJBB benchmark: Critical jOPS & Max jOPS captured during regression testing run in our lab

max jops2critical jops

  • Real-time System metrics: CPU and memory utilization captured during regression testing run in our lab

cloud memcloud cpu

Acknowledgements

Implementing the regression system was indeed a collaborative effort. Thanks to my team, Mike Bernat, Jay Subramani, and Vedang Joshi for their contributions. And special thanks to Manoj Wadekar and Jay Shenoy for guiding us through this process.