Tech Blog Archive

Distributed Systems in Engineering
0

NuRaft: a Lightweight C++ Raft Core

We are excited to announce the public release of NuRaft, a lightweight C++ Raft core, under the Apache 2.0 open source license. NuRaft is based on the cornerstone C++ Raft implementation, but with various additions and changes, and is the result of over two years of development and testing for production use within eBay for storage server data replication. This post discusses what NuRaft is, and how it can be used.

By: Gene Zhang and Jung-Sang Ahn
Big Data in Engineering
0

Monitoring at eBay with Druid

At eBay, we switched one of our monitoring tech stacks from legacy homegrown architecture to a Druid-based real-time monitoring system. In this article, we discuss how we transitioned our journey to a new stack and also the benefits it has to offer.

By: Mohan Garadi
Big Data in Engineering
0

How eBay Governs its Big Data Fabric

At eBay, nearly everything we do is based on data. We deal with structured, unstructured, and semi-structured data, where Hadoop, as a big data platform, has provided key technology features. Keeping pace with the speed of innovation while continuing to help data consumers easily find and consume the data they need guides our architecture and investment in building out eBay’s Big Data Fabric.

By: Alex Liang
Java in Engineering
0

SRE Case Study: URL Distribution Issue Caused by an Application

One of the frequently asked questions from new site reliability engineers is: Where to begin when troubleshooting a problem in a cloud environment? I always tell them: You should begin with understanding the problem. Let me demonstrate the reasons and methods with a real troubleshooting case.

By: Charles Li
Java in Engineering
0

SRE Case Study: Triaging a Non-Heap JVM Out of Memory Issue

Most Java virtual machine out of memory issues happen on the heap, but this time proved to be a little different.

By: Eric Tian
Distributed Systems in Engineering
0

Providing Metadata Discovery on Large-Volume Data Sets

Many big data systems collect petabytes of data on a daily basis. Such systems are often designed primarily to query raw data records for a given time range with multiple data filters. However, discovering or identifying unique attributes present in such large datasets can be difficult.

By: Satbeer Lamba and Sudeep Kumar
Performance Engineering in Engineering
0

Troubleshooting a Connection Timeout Issue with tcp_tw_recycle Enabled

Availability and stability are very important for eBay's site, especially for those applications that take high traffic and are dependent on many other applications, such as CAL (our Centralized Application Logging framework). This blog shares an issue that happened recently that impacted the availability and stability of CAL, and how we found out the root cause using tcpdump and systemtap.

By: Edward Lin and Huai Jiang
Data Center Operations in Engineering
0

Working on the Engines While the Plane is Flying

Operators of large scale networks will, from time to time, be required to perform major upgrades to the network while keeping the network available with no downtime. This type of work has been compared to working on the engines of an airliner while it is flying. At eBay, our Site Network Engineering team recently completed a migration of our data center aggregation layer from one platform to another under these conditions. By sharing our experience, we hope to help our peers in the industry plan for and successfully execute their own network transformations.

By: Brian Davies and Thilak Thankappan
Java in Engineering
0

SRE Case Study: Mysterious Traffic Imbalance

As an architect of a large website, I spent over a decade of my life working on all kinds of troubleshooting cases. Many of those cases were quite challenging, similar to finding a suspect in a megacity, yet quite rewarding. I ended up with many Sherlock Holmes stories to tell. What I am sharing today is a troubleshooting case of mysterious traffic imbalance.

By: Charles Li
Data Infrastructure and Services in Engineering
0

Unicorn—Rheos Remediation Center

Rheos is eBay's near-line data platform, and it owns thousands of stateful machines in the cloud. The Rheos team has been building and enhancing the automation system over the past two years. However, it’s time to unify the past work and build a modern, automatic remediation system, Unicorn.

By: Lubin Liu