Optimizing Analytics Data Processing on eBay’s New Open-Source-Based Platform
See how eBay adopted a five-level tuning strategy for an in-house analytics tool to exceed a legacy vendor’s performance.
Tianyou is a passionate technology enthusiast. He serves as the Director of Data Services and Solutions for the China Center of Excellence on eBay's Data Analytics Platforms team.
Automatic workflow generation is described. One or more files containing code statements for accessing and modifying information in a destination database is received. The code statements are parsed from the one or more files and dependencies between the code statements are determined. A dependency graph is built by arranging the code statements according to the dependencies between the code statements. The dependency graph is partitioned by identifying at least one barrier code statement having an unclear dependency and dividing the dependency graph between code statements occurring prior to the at least one barrier code statement and code statements occurring after the at least one barrier code statement. Jobs are scheduled based on the partitioned dependency graph, and the code statements are annotated according to the scheduled jobs. A workflow is then automatically generated based on the annotated code statements.
A database object used in a plurality of database operations is determined. A live range of the database object is computed. The computing of the live range includes determining occurrences of the database operations to the database object. Based at least in part on the live range of the database object, a memory is determined to be optimally assigned to store the database object based on at least one characteristic of the memory. A first time to allocate the database object to the memory is determined. A second time to deallocate the database object from the memory is determined. An output file comprising a first instruction to store the database object in the memory at the first time and a second instruction to deallocate the database object from the memory at the second time is written.