Dr. Changyi Zhu, Principal Engineer

An H2O (version 2/h2o-2) Rest API (Java) interface, as shown below, has been created at ShareThis to automate model building and retraining

In order to improve campaign performance, it is imperative to continuously refresh machine learned models so that the models use recent data for fine tuning. In order to do this, we need an automated pipeline. We use H20 ( http://http://h2o.ai/ ) platform to build models. It has a nice UI to build new models. The UI is really handy to experiment with new models but once a model algorithm has been decided, retraining with more recent data and pushing it to production should be lot more automated (API driven). Since more of the infrastructure is on Java, we want to build a interface which will make it easy.

Prepare Model Builder Input Properties

A default input property file (for GBM or GLM model) is provided by this interface. The jsons shown below can be modified for a specific (GBM) model.

Build the Model

A default bash script shown below can be modified to run the build (for a GBM model).

#!/bin/bash
java $CLASSPATH com.sharethis.service.h2o.model.GBM bin/res/gbm.properties

A jar file for the model will be generated by the interface and can be loaded by an application dynamically for optimization. Both the REST request and response are saved in a log file for further analysis to automate the model building/retraining process

Analyze the REST Response for Model Building/Retraining

The log file can be loaded by an analyzer using java.util. Properties and the REST responses, which are saved as key/value pairs, can be accessed through the following keys:

public static final String IMPORTFILERESPONSE
public static final String PARSEFILERESPONSE
public static final String MODELBUILDRESPONSE
public static final String MODELPREDICTRESPONSE
public static final String MODELINSPECTRESPONSE
public static final String MODELCALCAUC_RESPONSE

The response values can be deserialized into a list of java objects using the classes included in the interface. Those responses can then be analyzed by an analyzer, which is a part of a future project, for a given set of rules to tune the model input properties for a specific model.

Data Products

Data Use-Cases

Identity Solutions

Industry Solutions

Resources

Model Building/Retraining Automation @ShareThis Using H20 Platform

Newsletter

Search

More From ShareThis

Node Fibers and Asynchronous Callbacks

About ShareThis

Subscribe to our Newsletter

Website Tools

ShareThis Data

Company

Privacy

Model Building/Retraining Automation @ShareThis Using H20 Platform

Model Building/Retraining Automation @ShareThis Using H20 Platform

Newsletter

Search

More From ShareThis

Node Fibers and Asynchronous Callbacks

About ShareThis

Subscribe to our Newsletter

Related Content