超越机器学习管道：使用 MLJ

using MLJ
@load DecisionTreeRegressor

# load some data:
task = load_reduced_ames();
X, y = task();

# one-hot encode the inputs, X:
hot_model = OneHotEncoder()
hot = machine(hot_model, X)
fit!(hot)
Xt = transform(hot, X)

# fit a decision tree to the transformed data:
tree_model = DecisionTreeRegressor()
tree = machine(tree_model, Xt, y)
fit!(tree, rows = 1:1300)

请注意，MLJ 中的模型只是一个包含超参数的结构体。将模型包装在数据中会生成一个机器结构体，该结构体还将记录训练结果。

如果没有管道，每次我们想要呈现新的数据进行预测时，都必须首先应用独热编码

Xnew = X[1301:1400,:];
Xnewt = transform(hot, Xnew);
yhat = predict(tree, Xnewt);
yhat[1:3]
 3-element Array{Float64,1}:
  223956.9999999999
  320142.85714285733
  161227.49999999994

要构建一个管道，只需将提供的源数据包装在源节点中并重复类似的声明，省略对fit!的调用。现在不同之处在于每个“变量”（例如，Xt、yhat）都是我们管道的节点，而不是具体数据

Xs = source(X)
ys = source(y)

hot = machine(hot_model, Xs)
Xt = transform(hot, Xs);

tree = machine(tree_model, Xt, ys)
yhat = predict(tree, Xt)

如果我们愿意，可以将节点视为动态数据——“数据”因为它可以被调用（按行索引），但“动态”因为结果取决于训练事件的结果，而训练事件又取决于超参数值。例如，在拟合完成的管道后，我们可以像这样进行新的预测

fit!(yhat, rows=1:1300)
 [ Info: Training NodalMachine @ 1…51.
 [ Info: Spawned 1300 sub-features to one-hot encode feature :Neighborhood.
 [ Info: Spawned 1300 sub-features to one-hot encode feature :MSSubClass.
 [ Info: Training NodalMachine @ 1…17.
 Node @ 1…79 = predict(1…17, transform(1…51, 1…07))

yhat(rows=1301:1302) # to predict on rows of source node
yhat(Xnew)           # to predict on new data
156-element Array{Float64,1}:
 223956.9999999999
 320142.85714285733
 ...

导出和重新训练

一旦构建了这样的管道并在样本数据上进行了测试，就可以将其导出为一个独立的模型，准备在任何数据集上进行训练。有关详细信息，请参阅 MLJ 的文档。将来，Julia 宏将允许用几行代码构建常见的架构（例如，线性管道）。

最后，我们提到 MLJ 学习网络及其导出的对应物在某种意义上是“智能的”，因为更改超参数不会触发更改上游组件模型的重新训练

tree_model.max_depth = 4
fit!(yhat, rows=1:1300)
 [ Info: Not retraining NodalMachine @ 1…51. It is up-to-date.
 [ Info: Updating NodalMachine @ 1…17.
 Node @ 1…79 = predict(1…17, transform(1…51, 1…07))

只需“写出数学公式！”

由于 Julia 的泛型编程特性，您可以对数据应用的任何类型的操作（算术运算、行选择、列连接等）都可以被重载以用于节点。这样，MLJ 的网络构建语法简洁、直观且易于阅读。在这方面，我们受到了关于机器学习和编程语言的启发。

邀请加入社区

我们现在邀请社区试用我们新注册的MLJ 包，并提供您在未来可能有的任何反馈或建议。我们也特别想知道您将如何使用我们的包，以及它可能缺少哪些功能。

超越机器学习管道：使用 MLJ

2019 年 5 月 2 日 | Anthony Blaom，Diego Arenas，Franz Kiraly，Yiannis Simillides，Sebastian Vollmer

简介

MLJ 特性

学习网络

构建一个简单的网络

导出和重新训练

只需“写出数学公式！”

邀请加入社区