autoML训练custom模型,支持label。ML ENGINE - Managed tensorflow service.

inference 预测 overfitting 过度学习

The Iris classification problem is an example of supervised machine learning: the model is trained from examples that contain labels. In unsupervised machine learning, the examples don’t contain labels. Instead, the model typically finds patterns among the features.

训练数据越大模型越复杂。 hyperparameter预设参数。parameter is learned. While it’s helpful to print out the model’s training progress, it’s often more helpful to see this progress. TensorBoard is a nice visualization tool that is packaged with TensorFlow, but we can create basic charts using the mathplotlib module.

A regression model predicts continuous values. A classification model predicts discrete values. empirical risk minimization squared loss L2 loss Mean square error (MSE) stochastic gradient descent minibatch gradient descent Categorical Data textual data Numerical Data periods granularity of reporting

The fundamental tension of machine learning is between fitting our data well, but also fitting the data as simply as possible.

The more often we evaluate on a given test set, the more we are at risk for implicitly overfitting to that one test set. We’ll look at a better protocol next.

training data validation data test data

one-hot encoding

feature should appear more than a small handful of times, clear obvious meaning shouldn’t change over time should not have crazy outlier value. A sparse representation for street name. feature should be meaningful data.

A feature cross is a synthetic feature that encodes nonlinearity in the feature space by multiplying two or more input features together. feature like latitude and longitude should be binned.

penalise model complexity

L2 Regularization quantify complexity weight normal distribution to zero

Logistic Regression sigmoid function true negative false positive class-imbalanced data set Precision and Recall Non-Linear Transformation Layer (a.k.a. Activation Function) Dropout Regularization softmax