Towards Agile AI
In this article, we propose a set of better practices, designed by and for eBay ML scientists, for facilitating weaving ML modeling into the cyclical Agile process flow.
Jean-David is a machine Learning scientist and NLP lead at eBay.
An apparatus and method for predicting a brand name of a product are disclosed herein. A product identification number for the product is converted into a normalized global trade item number (GTIN). For each of a plurality of GTIN prefixes corresponding to the normalized GTIN, brand names and counts of each of the brand names using product information stored in a product catalog are identified. A probability distribution of the brand names is determined in accordance with the brand names and the counts of the brand names for the plurality of the GTIN prefixes. A predicted brand name for the product is identified from among the brand names for the plurality of the GTIN prefixes, the predicted brand name having a highest probability score in the probability distribution of the brand names.
A method of propagating annotations of content items to a search query is disclosed. A strength of a correspondence between a search query and a listing of an item on a network-based publication system is determined. The strength of the correspondence is based on an analysis of a set of actions by a set of users who submitted the search query. A set of annotations is generated. The set of annotations is propagated to a search engine and used to enhance search results.
In some embodiments, a method includes receiving an electronic document that comprises a plurality of sections. The method includes marking the plurality of sections as a content section or a non-content section using a visual attribute of the sections that includes at least one of a width of the section, a density of the plurality of hyperlinks in the section, a size of a font of text in the section and whether a title of the electronic document overlaps with text in the section. The method also includes storing the marking of the plurality of sections of the electronic document in a machine-readable medium.
There is provided a method and system for qualification testing in a social network service. Qualification testing provides access control into a social network, wherein qualification is based on answers to questions related to a topic. In one example, members admitted to the network provide guidance, notes and research assistance to another member. The social network members access the social network from an external networked computing service, such as another social network, to facilitate easy connection to potential members. The social network may be implemented as an application overlay to the external service, or may access connections in the external network.
A system and method of metadata refinement using behavioral patterns is disclosed. In some embodiments, user behavioral data for results of a search query is received. The results can include an untagged item and a plurality of tagged items. A determination can then be made that the tagged items have been assigned a plurality of types of metadata. The untagged item can then be identified as a candidate to be tagged with at least one of the plurality of types of metadata assigned to the tagged items. In some embodiments, the user behavioral data comprises clickstream data indicating that a user selected the untagged item and at least one of the tagged items during a single search event.
An apparatus and method for predicting a brand name of a product are disclosed herein. A product identification number for the product is converted into a normalized global trade item number (GTIN). For each of a plurality of GTIN prefixes corresponding to the normalized GTIN, brand names and counts of each of the brand names using product information stored in a product catalog are identified. A probability distribution of the brand names is determined in accordance with the brand names and the counts of the brand names for the plurality of the GTIN prefixes. A predicted brand name for the product is identified from among the brand names for the plurality of the GTIN prefixes, the predicted brand name having a highest probability score in the probability distribution of the brand names.
A computer-implemented system and method for providing information tagging in a networked system is disclosed. The apparatus in an example embodiment includes a tag engine configured to process a database of categorized product listings; to receive a user- provided tag associated with at least one of the product listings; to retain the user-provided tag; and to serve the user-provided tag to a user viewing at least one of the product listings.
Methods and systems to perform image searches in a network-based publication system, such as image searches for items available for purchase via the network-based publication system, are described. In some example embodiments, the methods and systems access an image of an item, identify a purchaser of and/or user associated with the item in the image, query, using the image, a collection of images whose contents include items that are associated with the purchaser and that are offered for purchase by a network-based publication system, and match the item in the image to one or more of the items offered for purchase based on a result of the query. The network-based publication system may then present the matched one or more items, or recommendations for the matched items, to a user that provided the image.
A system and method of metadata refinement using behavioral patterns is disclosed. In some embodiments, user behavioral data for results of a search query is received. The results can include an untagged item and a plurality of tagged items. A determination can then be made that the tagged items have been assigned a first type of metadata not assigned to the untagged item. The untagged item can then be identified as a candidate to be tagged with the first type of metadata assigned to the tagged items based on the user behavioral data. In some embodiments, the user behavioral data comprises clickstream data indicating that a user selected the untagged item and the tagged items during a single search event.
A method of propagating annotations of content items to a search query is disclosed. A strength of a correspondence between a search query and a title of a listing of an item on a network-based publication system is determined. The strength of the correspondence is based on an analysis of a set of actions by a set of users who submitted the search query. A set of annotations corresponding to the title is generated. The set of annotations is propagated to an additional search query such that the set of annotations and the strength of the correspondence are used by a search engine to enhance search results corresponding to the additional search query.
A system and method of metadata refinement using behavioral patterns is disclosed. In some embodiments, user behavioral data for results of a search query is received. The results can include an untagged item and a plurality of tagged items. A determination can then be made that the tagged items have been assigned a first type of metadata not assigned to the untagged item. The untagged item can then be identified as a candidate to be tagged with the first type of metadata assigned to the tagged items based on the user behavioral data. In some embodiments, the user behavioral data comprises click-stream data indicating that a user selected the untagged item and the tagged items during a single search event.
In some embodiments, a method includes receiving an electronic document that comprises a plurality of sections. The method includes marking the plurality of sections as a content section or a non-content section using a visual attribute of the sections that includes at least one of a width of the section, a density of the plurality of hyperlinks in the section, a size of a font of text in the section and whether a title of the electronic document overlaps with text in the section. The method also includes storing the marking of the plurality of sections of the electronic document in a machine-readable medium.
A computer-implemented system and method for providing information tagging in a networked system is disclosed. The apparatus in an example embodiment includes a tag engine configured to process a database of categorized product listings; to receive a user-provided tag associated with at least one of the product listings; to retain the user-provided tag; and to serve the user-provided tag to a user viewing at least one of the product listings.
An item record in an item database contains an item description generated by a seller of an item. A server machine is configured to access the item database, analyze the item description, and extract descriptive information by inferring an attribute and a corresponding attribute value from the item description. The attribute and its attribute value constitute an attribute-value pair. The server machine uses the attribute-value pair to map the item record to a product record stored in a product database. The mapping of the item record to the product record is based on comparing the attribute-value pair of the item record to a reference attribute-value pair in the product record to identify the product record. The mapping is performed upon detection of a match between the attribute-value pairs.
In some embodiments, a method includes receiving an electronic document that comprises a plurality of sections. The method includes marking the plurality of sections as a content section or a non-content section using an attribute of the sections that includes at least one of a width of the section, a density of the plurality of hyperlinks in the section, a size of a font of text in the section and whether a title of the electronic document overlaps with text in the section. The method also includes storing the marking of the plurality of sections of the electronic document in a machine-readable medium.
An item record in an item database contains an item description generated by a seller of an item. A server machine is configured to access the item database, analyze the item description, and extract descriptive information by inferring an attribute and a corresponding attribute value from the item description. The attribute and its attribute value constitute an attribute-value pair. The server machine uses the attribute-value pair to map the item record to a product record stored in a product database. The mapping of the item record to the product record is based on comparing the attribute-value pair of the item record to a reference attribute-value pair in the product record to identify the product record. The mapping is performed upon detection of a match between the attribute-value pairs.
At eBay, we seek the very best talent to help us build more economic opportunity for everyone.