Tech Blog Archive

Distributed Systems in Engineering
0

NuRaft: a Lightweight C++ Raft Core

We are excited to announce the public release of NuRaft, a lightweight C++ Raft core, under the Apache 2.0 open source license. NuRaft is based on the cornerstone C++ Raft implementation, but with various additions and changes, and is the result of over two years of development and testing for production use within eBay for storage server data replication. This post discusses what NuRaft is, and how it can be used.

By: Gene Zhang and Jung-Sang Ahn
Big Data in Engineering
0

Monitoring at eBay with Druid

At eBay, we switched one of our monitoring tech stacks from legacy homegrown architecture to a Druid-based real-time monitoring system. In this article, we discuss how we transitioned our journey to a new stack and also the benefits it has to offer.

By: Mohan Garadi
Big Data in Engineering
0

How eBay Governs its Big Data Fabric

At eBay, nearly everything we do is based on data. We deal with structured, unstructured, and semi-structured data, where Hadoop, as a big data platform, has provided key technology features. Keeping pace with the speed of innovation while continuing to help data consumers easily find and consume the data they need guides our architecture and investment in building out eBay’s Big Data Fabric.

By: Alex Liang
Java in Engineering
0

SRE Case Study: URL Distribution Issue Caused by an Application

One of the frequently asked questions from new site reliability engineers is: Where to begin when troubleshooting a problem in a cloud environment? I always tell them: You should begin with understanding the problem. Let me demonstrate the reasons and methods with a real troubleshooting case.

By: Charles Li
Java in Engineering
0

SRE Case Study: Triaging a Non-Heap JVM Out of Memory Issue

Most Java virtual machine out of memory issues happen on the heap, but this time proved to be a little different.

By: Eric Tian
Distributed Systems in Engineering
0

Providing Metadata Discovery on Large-Volume Data Sets

Many big data systems collect petabytes of data on a daily basis. Such systems are often designed primarily to query raw data records for a given time range with multiple data filters. However, discovering or identifying unique attributes present in such large datasets can be difficult.

By: Satbeer Lamba and Sudeep Kumar
Performance Engineering in Engineering
0

Troubleshooting a Connection Timeout Issue with tcp_tw_recycle Enabled

Availability and stability are very important for eBay's site, especially for those applications that take high traffic and are dependent on many other applications, such as CAL (our Centralized Application Logging framework). This blog shares an issue that happened recently that impacted the availability and stability of CAL, and how we found out the root cause using tcpdump and systemtap.

By: Edward Lin and Huai Jiang
Data Center Operations in Engineering
0

Working on the Engines While the Plane is Flying

Operators of large scale networks will, from time to time, be required to perform major upgrades to the network while keeping the network available with no downtime. This type of work has been compared to working on the engines of an airliner while it is flying. At eBay, our Site Network Engineering team recently completed a migration of our data center aggregation layer from one platform to another under these conditions. By sharing our experience, we hope to help our peers in the industry plan for and successfully execute their own network transformations.

By: Brian Davies and Thilak Thankappan
Java in Engineering
0

SRE Case Study: Mysterious Traffic Imbalance

As an architect of a large website, I spent over a decade of my life working on all kinds of troubleshooting cases. Many of those cases were quite challenging, similar to finding a suspect in a megacity, yet quite rewarding. I ended up with many Sherlock Holmes stories to tell. What I am sharing today is a troubleshooting case of mysterious traffic imbalance.

By: Charles Li
Data Infrastructure and Services in Engineering
0

Unicorn—Rheos Remediation Center

Rheos is eBay's near-line data platform, and it owns thousands of stateful machines in the cloud. The Rheos team has been building and enhancing the automation system over the past two years. However, it’s time to unify the past work and build a modern, automatic remediation system, Unicorn.

By: Lubin Liu
Data Infrastructure and Services in Engineering
0

Adapting Continuous Integration and Delivery to Hardware Quality

A hyperscale infrastructure demands a high level of automation to hardware testing to increase productivity and rigor. The idea was to automate the traditional methods of qualifying servers and server components by applying CI/CD (Continuous Integration and Continuous Deployment) principles of software development to the hardware development lifecycle.

By: Ashvini Mangalvedhekar
Big Data in Engineering
0

Big Data Governance: Hive Metastore Listener for Apache Atlas Use Cases

At eBay, we are obsessed with data quality and governance. Because eBay's Hadoop platform hosts 500 PB of data running over 15,000 nodes, the focus on governance is of utmost importance. This article discusses our experiences handling data governance at scale.

By: Aroop Maliakkal Padmanabhan and Tiffany Nguyen
Cloud in Engineering
0

Managing HTTP Header Size on NetScaler Load Balancers

The way that the NetScaler load balancer handles oversized HTTP header is not quite straightforward when combined with layer 7 policies and may result in unexpected consequences and bad user experiences if overlooked. This article explains how the header limit works and offers our recommendations on how to manage it properly.

By: Charles Li
Coding Practices in Engineering
0

Event Sourcing in Action with eBay's Continuous Delivery Team (Part 2)

In our first article, we introduced the concept and some of the benefits of event sourcing. For this article, we are going to get very specific about how we implemented event sourcing for the Enterprise Continuous Delivery (ECD) project here at eBay.

By: John Long and Nataraj Sundar
Coding Practices in Engineering
0

Event Sourcing: Connecting the Dots for a Better Future (Part 1)

Using an Event-centric approach has enabled our team at eBay to scale to handle millions of events with the resiliency to recover from failures as quickly and reliably as possible. Though similar approaches have been widely adopted to augment large-scale data applications, for eBay's Continuous Delivery team, Event Sourcing is at the heart of decision-making and application development. To that end, we've built a system that continuously scales and tests our ability to handle an increasing volume of events and an ever growing list of external data sources and partner integrations.

By: John Long and Nataraj Sundar
Frontend Engineering in Engineering
0

Optimization Study on Processing Order of NetScaler Load Balancer Layer 7 Policies

Traffic on ebay.com is processed by thousands of layer 7 policies on the load balancers. Clearly understanding the processing order ensures availability (by avoiding misconfigurations) and performance (by prioritizing the policies efficiently).

By: Charles Li, John Yang and Leona Zhang
Performance Engineering in Engineering
0

Optimizing CAL Report Hadoop MapReduce Jobs

eBay's Central Application Logging system (CAL) collects log data from all kinds of applications. The summary reports for log data are created using Hadoop MapReduce jobs. This article discusses our experiences optimizing these jobs.

By: Wanxue Li
Performance Engineering in Research
0

Faster E-commerce Search

The search engine plays an essential role in e-Commerce: it connects the user's need with a set of relevant items based on a query. This is not a simple task; millions of queries per second need to be processed over possibly billions of items, and it is expected that every query will be executed in just a few hundred milliseconds using limited resources. In this article, we show how we improved eBay's search engine efficiency by over 25%, inspired by a technique coming from web search.

By: Roberto Konow
Search Science in Engineering
0

Elasticsearch Performance Tuning Practice at eBay

Elasticsearch is an open source search and analytic engine based on Apache Lucene that allows users to store, search, analyze data in near real time. While Elasticsearch is designed for fast queries, the performance depends largely on the scenarios that apply to your application, the volume of data you are indexing, and the rate at which applications and users query your data. This document summarizes the challenges as well as the process and tools that the Pronto team builds to address the challenges in a strategic way. It also shows certain results of benchmarking various configurations for illustration.

By: Pei Wang
Performance Engineering in Engineering
0

Beyond HTTPS

HTTPS is not just about security. There are many benefits that come along with it. One such benefit is access to modern technologies. Check out how eBay leverages some of these new technologies that HTTPS opens up.

By: Senthil Padmanabhan