The Christmas lights are long since down, the turkey was picked clean an eternity ago, and (if it was ever out), the mistletoe is safely back in the attic. Notwithstanding the completion of 2019, and as we move towards an even more massive open source effort in 2020, it’s still not too late to review a selection of last year’s eBay Open Source work. This article highlights a selection of some of the new and popular 2019 eBay open source projects, as well as some other notable works that should bring significant value to the open source community. The projects described here are grouped by technical field.
System Infrastructure Technologies
At eBay, huge amounts of listing data must be stored, retrieved, catalogued, and managed. It is not surprising that eBay is a leader in database and replication technology. Accordingly, in 2019, there were some notable releases in this field.
In terms of GitHub stars, the most popular project of 2019 was Akutan (named in honor of a “strato-volcano” in Alaska — Mount Akutan). This is a very significant project written in C++, and involves many components. From a separate blog article, Akutan’s architecture is shown below:
Importantly, Akutan has an architecture that scales out horizontally. It was run internally on a 20-server deployment, supported tens of thousands of changes per second, and loaded with over 2.5 billion facts. As the authors described, “we haven’t yet pushed Akutan to its limits.” See the detailed eBay Tech Blog article here. If you are interested in contributing to Akutan, and contributions are always welcome for eBay open source projects, more details are found here.
Later in 2019, another significant key-value store was released under an Apache license, also written in C++. It is an “embedded key-value storage library, based on a combined index of LSM-tree and copy-on-write (append-only) B+tree.” Jungle works with Akutan and serves as a replicated high-performance log store that can also be used with the NuRAFT protocol. To get further insight into fitting the components together, it is also worth reading the accompanying eBay Tech Blog article “NuRaft: a Lightweight C++ Raft Core.” For details about using or contributing to the project, please see the detailed README file in the root of the repo (that accompanies each and every newly released eBay open source project).
As already described, Jungle was actually part of a set of technologies intended to work together. Another key part of that architecture was a consensus-based protocol implementation of “RAFT.” Also under an Apache license, the NuRaft release was written about here. The underlying Raft protocol itself was partially developed at Stanford University, but the eBay team considered a number of enhancements including logical snapshot support, a pre-vote mechanism to avoid disruption of a leader, custom quorum sizes, asynchronous replication, and many other features documented here.
eBay uses machine learning to optimize listings in all kinds of ways, and from time to time, our developers release this code and technology to the open source community also. One important set of advances in 2019 was some very significant work to improve and optimize training time for machine learning algorithms:
In their own words, the architects described the reasoning behind AutoOpt:
Manual adjustment of hyperparameters is very costly and time-consuming, and even if done correctly, it lacks theoretical justification which inevitably leads to “rule of thumb” settings ... we propose a generic approach that utilizes the statistics of an unbiased gradient estimator to automatically and simultaneously adjust two paramount hyperparameters: the learning rate and momentum. ... The results match the performance of the best settings obtained through an exhaustive search and therefore, removes the need for a tedious manual tuning.
To make these gains in machine learning time possible, there’s a lot of math, but third-party developers need not dust off their college math books or wade through the equations (unless that was a New Year’s resolution for 2020). The code is available for use under the permissive Apache license here and can be used to improve machine learning training time.
eBay continuously seeks to improve, strengthen, and contribute to code that helps with accessibility of software. In 2018, our eBay researchers made a notable release of HeadGazeLib, software that allows control of a cursor without fingers, and instead allows control of a cursor by a user’s gaze. This year, a number of developers led the effort to release a framework to help with accessibility testing of web sites.
Accessibility-Ruleset-Runner (“[t]his project demonstrates how accessibility testing is done upstream during the development process. The project includes two rulesets, which is what we use internally (Custom Ruleset, aXe Ruleset). Developers can reuse our custom ruleset, exchange rulesets or add their own.”). This project is explained even further in the "Automation via the Accessibility Ruleset Runner."
The Developer division of eBay has a super history of releasing SDKs as open source, and that was strengthened further in 2019, which saw five new SDK releases, including:
The astute reader may notice a predominant theme in the list: OAuth authentication that can be used to call eBay APIs. eBay continues to fully support the legacy APIs that use the SOAP protocol, but eBay also encourages a move towards the newer REST APIs. The OAuth SDKs above, in a programming language of choice, will ease the transition to call the newer REST APIs - a transition that can take place in minutes. It is worth pointing out that detailed eBay Tech Blog articles in 2019 also documented use of the CSharp OAuth library and Python OAuth library. See eBay OAuth Client Library and eBay OAuth Client Library in Python and Best Practice. The principles of use for the OAuth SDKs remain the same in the other languages.
The one SDK outlier in the 2019 release set above was the FeedSDK in Python. It complements a prior FeedSDK written for Java clients. The intent and purpose of the feed libraries is to help with APIs that involve, as the name implies, feed files that can reach gigabytes in size. To quote from the detailed Python FeedSDK README, the SDK “abstracts the complexity involved in calculating the request header 'range' based on the response header 'content-range' and downloads and appends all the chunks until the whole feed file is downloaded.”
Looking Forward in 2020
eBay has a long history of releasing new open source code and contributing back to the open source community. Each open source project on the GitHub eBay site is intended to come with great documentation, well-written code, a clearly defined open source license, and each project represents a helpful addition to a technical field. In 2020, we fully expect to continue to expand the open source offerings that meet those criteria.
If not too late for well wishes, may you have an excellent and prosperous 2020, and if you have a mind, please pop by any eBay repo and feel free to add a GitHub star, use a library, or better still, make an open source contribution to a project. (Hint: Even beginner developers are very welcome in the community, and can easily learn how to make a “pull request” by finding and making a first typo fix contribution to a README!)