Tech Blog Archive

Platforms and Frameworks in Engineering
0

Discovering Continuous Automation With Request Mirroring

Because eBay's item page updates frequently, and because it depends on hundreds of libraries and services, discovering the unknowns and automating testing for all use-case combinations from production calls for a different approach to testing.

By: Lakshimi Duraivenkatesh and Vineet Bindal
Service Architecture in Research
0

eBay’s New Approach to Managing a Vast Service Architecture

Learn how eBay's architecture knowledge graph was developed; the benefits eBay has received from it; and the use cases we see now and in the future for this approach.

By: Hanzhang Wang, Chirag Shah and Sanjeev Katariya
Distributed Transactions in Engineering
0

GRIT: a Protocol for Distributed Transactions across Microservices

eBay technologists recently showed off a distributed transaction protocol called GRIT, for distributed ACID (atomicity, consistency, isolation, durability) transactions across microservices with multiple underlying databases.

By: Gene Zhang, Mohammad Roohitavaf, Jung-Sang Ahn and Kun Ren
Cloud in Engineering
0

Scalability Tuning on a Tess.IO Cluster

Tess.IO is eBay’s new unified cloud infrastructure based on Kubernetes. With more and more applications being deployed on the Tess cluster, the requirements for scalability and capability of the cluster are growing. This article describes how to achieve 5000-node scalability for the tess.IO cluster.

By: Yingnan Zhang
Distributed Systems in Engineering
0

NuRaft: a Lightweight C++ Raft Core

We are excited to announce the public release of NuRaft, a lightweight C++ Raft core, under the Apache 2.0 open source license. NuRaft is based on the cornerstone C++ Raft implementation, but with various additions and changes, and is the result of over two years of development and testing for production use within eBay for storage server data replication. This post discusses what NuRaft is, and how it can be used.

By: Gene Zhang and Jung-Sang Ahn
Big Data in Engineering
0

Monitoring at eBay with Druid

At eBay, we switched one of our monitoring tech stacks from legacy homegrown architecture to a Druid-based real-time monitoring system. In this article, we discuss how we transitioned our journey to a new stack and also the benefits it has to offer.

By: Mohan Garadi
Big Data in Engineering
0

How eBay Governs its Big Data Fabric

At eBay, nearly everything we do is based on data. We deal with structured, unstructured, and semi-structured data, where Hadoop, as a big data platform, has provided key technology features. Keeping pace with the speed of innovation while continuing to help data consumers easily find and consume the data they need guides our architecture and investment in building out eBay’s Big Data Fabric.

By: Alex Liang
Java in Engineering
0

SRE Case Study: URL Distribution Issue Caused by an Application

One of the frequently asked questions from new site reliability engineers is: Where to begin when troubleshooting a problem in a cloud environment? I always tell them: You should begin with understanding the problem. Let me demonstrate the reasons and methods with a real troubleshooting case.

By: Charles Li
Java in Engineering
0

SRE Case Study: Triaging a Non-Heap JVM Out of Memory Issue

Most Java virtual machine out of memory issues happen on the heap, but this time proved to be a little different.

By: Eric Tian
Distributed Systems in Engineering
0

Providing Metadata Discovery on Large-Volume Data Sets

Many big data systems collect petabytes of data on a daily basis. Such systems are often designed primarily to query raw data records for a given time range with multiple data filters. However, discovering or identifying unique attributes present in such large datasets can be difficult.

By: Sudeep Kumar and Satbeer Lamba
Performance Engineering in Engineering
0

Troubleshooting a Connection Timeout Issue with tcp_tw_recycle Enabled

Availability and stability are very important for eBay's site, especially for those applications that take high traffic and are dependent on many other applications, such as CAL (our Centralized Application Logging framework). This blog shares an issue that happened recently that impacted the availability and stability of CAL, and how we found out the root cause using tcpdump and systemtap.

By: Edward Lin and Huai Jiang
Data Center Operations in Engineering
0

Working on the Engines While the Plane is Flying

Operators of large scale networks will, from time to time, be required to perform major upgrades to the network while keeping the network available with no downtime. This type of work has been compared to working on the engines of an airliner while it is flying. At eBay, our Site Network Engineering team recently completed a migration of our data center aggregation layer from one platform to another under these conditions. By sharing our experience, we hope to help our peers in the industry plan for and successfully execute their own network transformations.

By: Brian Davies and Thilak Thankappan
Java in Engineering
0

SRE Case Study: Mysterious Traffic Imbalance

As an architect of a large website, I spent over a decade of my life working on all kinds of troubleshooting cases. Many of those cases were quite challenging, similar to finding a suspect in a megacity, yet quite rewarding. I ended up with many Sherlock Holmes stories to tell. What I am sharing today is a troubleshooting case of mysterious traffic imbalance.

By: Charles Li
Data Infrastructure and Services in Engineering
0

Unicorn—Rheos Remediation Center

Rheos is eBay's near-line data platform, and it owns thousands of stateful machines in the cloud. The Rheos team has been building and enhancing the automation system over the past two years. However, it’s time to unify the past work and build a modern, automatic remediation system, Unicorn.

By: Lubin Liu
Data Infrastructure and Services in Engineering
0

Adapting Continuous Integration and Delivery to Hardware Quality

A hyperscale infrastructure demands a high level of automation to hardware testing to increase productivity and rigor. The idea was to automate the traditional methods of qualifying servers and server components by applying CI/CD (Continuous Integration and Continuous Deployment) principles of software development to the hardware development lifecycle.

By: Ashvini Mangalvedhekar
Big Data in Engineering
0

Big Data Governance: Hive Metastore Listener for Apache Atlas Use Cases

At eBay, we are obsessed with data quality and governance. Because eBay's Hadoop platform hosts 500 PB of data running over 15,000 nodes, the focus on governance is of utmost importance. This article discusses our experiences handling data governance at scale.

By: Aroop Maliakkal Padmanabhan and Tiffany Nguyen
Cloud in Engineering
0

Managing HTTP Header Size on NetScaler Load Balancers

The way that the NetScaler load balancer handles oversized HTTP header is not quite straightforward when combined with layer 7 policies and may result in unexpected consequences and bad user experiences if overlooked. This article explains how the header limit works and offers our recommendations on how to manage it properly.

By: Charles Li
Coding Practices in Engineering
0

Event Sourcing in Action with eBay's Continuous Delivery Team (Part 2)

In our first article, we introduced the concept and some of the benefits of event sourcing. For this article, we are going to get very specific about how we implemented event sourcing for the Enterprise Continuous Delivery (ECD) project here at eBay.

By: Nataraj Sundar and John Long
Coding Practices in Engineering
0

Event Sourcing: Connecting the Dots for a Better Future (Part 1)

Using an Event-centric approach has enabled our team at eBay to scale to handle millions of events with the resiliency to recover from failures as quickly and reliably as possible. Though similar approaches have been widely adopted to augment large-scale data applications, for eBay's Continuous Delivery team, Event Sourcing is at the heart of decision-making and application development. To that end, we've built a system that continuously scales and tests our ability to handle an increasing volume of events and an ever growing list of external data sources and partner integrations.

By: Nataraj Sundar and John Long
Frontend Engineering in Engineering
0

Optimization Study on Processing Order of NetScaler Load Balancer Layer 7 Policies

Traffic on ebay.com is processed by thousands of layer 7 policies on the load balancers. Clearly understanding the processing order ensures availability (by avoiding misconfigurations) and performance (by prioritizing the policies efficiently).

By: Charles Li, Leona Zhang and John Yang