All eBay-generated content is currently translated by our talented localization team, whereas eBay’s user-generated (UG) content is handled by our Machine Translation (MT) engine. It is common knowledge that UG text can get pretty noisy due to typos, non-dictionary terms, etc. At eBay, however, MT deals with more than that. We work with multiple types of UG content – search queries, item titles, and item descriptions – and each presents its own challenges. This post discusses one of those content types: search queries.
Translating search queries (SQ) is a very important step in providing our customers with a localized shopping experience, because before even opening a listing, customers look through the search results to choose the listings of interest. What we offer our users is an opportunity to search for items in their native language by automatically translating their queries into the language of the market (English, German) and matching their queries against our inventory.
Search on eBay is a complex process (think of polysemy, broad context, the variety of inventory, etc.); try adding a machine translation step to that! Here are some of the main challenges we face when translating SQ.
- Training set (see SMT for more). Queries are translated from the user’s language, which means there has to be a separate training corpus for this direction in every language. Being able to use actual post-edited queries for training is very helpful.
- Lack of context. As you can imagine, search queries are quite short; the average length is 1-3 words (iPhone; iPhone case; blue iPhone case). The training data therefore has very little context for the engine to learn from.
- Polysemy. Given that search queries provide zero category information, polysemous terms are number one candidates for an error. Categories are listed for search results, but we have no way to know which category a user had in mind when he/she typed “pipe” in the search field – was it Plumbing, Motors, or Collectibles? Which translation do we choose? The same issue applies to all languages.
- Domain limitations. This goes hand in hand with polysemy. Guessing the user intent and choosing the right category is not the only problem with polysemic terms; sometimes we also need to keep in mind legal aspects, domain specifics, and even shipping policies. For instance, some possible meanings of a term might point to items that would be very expensive or illegal to ship, or that for other reasons are very unlikely to be sold on eBay. In this case, we might give preference to a less common meaning that is more likely to be sold and bought on eBay.
- Language trends. Sometimes the most obvious and common translation, the one that would be offered in a dictionary, is not suitable for our purposes. Dictionaries cannot keep up with the fast-changing language culture and do not always reflect the current trends. This situation is especially true for clothing and gadgets. Fashion is changing, technologies are developing, and new words are emerging – or new meanings are assigned to existing terms. A lot of English words are also getting adopted by other languages (or transliterated, in the case of Russian), often with a local “flavor.” We must keep up with these trends.
Here’s a specific example: A simple word “шапка”, which means “hat” in Russian and would be translated as a hat in any other circumstances, should be translated as “beanie” on eBay, because when Russian users search for “шапка”, what they have in mind looks like a beanie; the search results for “hat” are much more diverse and less relevant.
Translating search queries correctly means helping eBay users find what they want. Providing an accurate, grammatically correct translation of a query is never enough; what we always keep in mind is user intent and relevance of the results.