TensorFlow Serving在Kubernetes中怎么配置

210次閱讀

共計(jì) 7729 個(gè)字符，預(yù)計(jì)需要花費(fèi) 20 分鐘才能閱讀完成。

本篇內(nèi)容介紹了“TensorFlow Serving 在 Kubernetes 中怎么配置”的有關(guān)知識(shí)，在實(shí)際案例的操作過(guò)程中，不少人都會(huì)遇到這樣的困境，接下來(lái)就讓丸趣 TV 小編帶領(lǐng)大家學(xué)習(xí)一下如何處理這些情況吧！希望大家仔細(xì)閱讀，能夠?qū)W有所成！

關(guān)于 TensorFlow Serving

下面是 TensorFlow Serving 的架構(gòu)圖：

關(guān)于 TensorFlow Serving 的更多基礎(chǔ)概念等知識(shí)，請(qǐng)看官方文檔，翻譯的再好也不如原文寫的好。

這里，我總結(jié)了下面一些知識(shí)點(diǎn)，我認(rèn)為是比較重要的：

TensorFlow Serving 通過(guò) Model Version Policy 來(lái)配置多個(gè)模型的多個(gè)版本同時(shí) serving；

默認(rèn)只加載 model 的 latest version；

支持基于文件系統(tǒng)的模型自動(dòng)發(fā)現(xiàn)和加載；

請(qǐng)求處理延遲低；

無(wú)狀態(tài)，支持橫向擴(kuò)展；

可以使用 A / B 測(cè)試不同 Version Model；

支持從本地文件系統(tǒng)掃描和加載 TensorFlow 模型；

支持從 HDFS 掃描和加載 TensorFlow 模型；

提供了用于 client 調(diào)用的 gRPC 接口；

TensorFlow Serving 配置

當(dāng)我翻遍整個(gè) TensorFlow Serving 的官方文檔，我還是沒(méi)找到一個(gè)完整的 model config 是怎么配置的，很沮喪。沒(méi)辦法，發(fā)展太快了，文檔跟不上太正常，只能擼代碼了。

在 model_servers 的 main 方法中，我們看到 tensorflow_model_server 的完整配置項(xiàng)及說(shuō)明如下：

tensorflow_serving/model_servers/main.cc#L314
int main(int argc, char** argv) {
 std::vector tensorflow::Flag  flag_list = { tensorflow::Flag( port ,  port,  port to listen on),
 tensorflow::Flag(enable_batching ,  enable_batching,  enable batching),
 tensorflow::Flag( batching_parameters_file ,  batching_parameters_file,
  If non-empty, read an ascii BatchingParameters  
  protobuf from the supplied file name and use the  
  contained values instead of the defaults. ),
 tensorflow::Flag( model_config_file ,  model_config_file,
  If non-empty, read an ascii ModelServerConfig  
  protobuf from the supplied file name, and serve the  
  models in that file. This config file can be used to  
  specify multiple models to serve and other advanced  
  parameters including non-default version policy. (If  
  used, --model_name, --model_base_path are ignored.) ),
 tensorflow::Flag( model_name ,  model_name,
  name of model (ignored  
  if --model_config_file flag is set ),
 tensorflow::Flag( model_base_path ,  model_base_path,
  path to export (ignored if --model_config_file flag  
  is set, otherwise required) ),
 tensorflow::Flag( file_system_poll_wait_seconds ,
  file_system_poll_wait_seconds,
  interval in seconds between each poll of the file  
  system for new model version ),
 tensorflow::Flag( tensorflow_session_parallelism ,
  tensorflow_session_parallelism,
  Number of threads to use for running a  
  Tensorflow session. Auto-configured by default. 
  Note that this option is ignored if  
  --platform_config_file is non-empty. ),
 tensorflow::Flag( platform_config_file ,  platform_config_file,
  If non-empty, read an ascii PlatformConfigMap protobuf  
  from the supplied file name, and use that platform  
  config instead of the Tensorflow platform. (If used,  
  --enable_batching is ignored.) )};
}

因此，我們看到關(guān)于 model version config 的配置，全部在 –model_config_file 中進(jìn)行配置，下面是 model config 的完整結(jié)構(gòu)：

tensorflow_serving/config/model_server_config.proto#L55
// Common configuration for loading a model being served.
message ModelConfig {
 // Name of the model.
 string name = 1;
 // Base path to the model, excluding the version directory.
 // E.g  for a model at /foo/bar/my_model/123, where 123 is the version, the
 // base path is /foo/bar/my_model.
 //
 // (This can be changed once a model is in serving, *if* the underlying data
 // remains the same. Otherwise there are no guarantees about whether the old
 // or new data will be used for model versions currently loaded.)
 string base_path = 2;
 // Type of model.
 // TODO(b/31336131): DEPRECATED. Please use  model_platform  instead.
 ModelType model_type = 3 [deprecated = true];
 // Type of model (e.g.  tensorflow).
 //
 // (This cannot be changed once a model is in serving.)
 string model_platform = 4;
 reserved 5;
 // Version policy for the model indicating how many versions of the model to
 // be served at the same time.
 // The default option is to serve only the latest version of the model.
 //
 // (This can be changed once a model is in serving.)
 FileSystemStoragePathSourceConfig.ServableVersionPolicy model_version_policy =
 7;
 // Configures logging requests and responses, to the model.
 //
 // (This can be changed once a model is in serving.)
 LoggingConfig logging_config = 6;
}

我們看到了 model_version_policy，那便是我們要找的配置, 它的定義如下：

tensorflow_serving/sources/storage_path/file_system_storage_path_source.proto
message ServableVersionPolicy {
 // Serve the latest versions (i.e. the ones with the highest version
 // numbers), among those found on disk.
 //
 // This is the default policy, with the default number of versions as 1.
 message Latest { // Number of latest versions to serve. (The default is 1.)
 uint32 num_versions = 1;
 }
 // Serve all versions found on disk.
 message All { }
 // Serve a specific version (or set of versions).
 //
 // This policy is useful for rolling back to a specific version, or for
 // canarying a specific version while still serving a separate stable
 // version.
 message Specific {
 // The version numbers to serve.
 repeated int64 versions = 1;
 }
}

因此 model_version_policy 目前支持三種選項(xiàng)：

all: {} 表示加載所有發(fā)現(xiàn)的 model；

latest: {num_versions: n} 表示只加載最新的那 n 個(gè) model，也是默認(rèn)選項(xiàng)；

specific: {versions: m} 表示只加載指定 versions 的 model，通常用來(lái)測(cè)試；

因此，通過(guò) tensorflow_model_server —port=9000 —model_config_file= file 啟動(dòng)時(shí)，一個(gè)完整的 model_config_file 格式可參考如下：

model_config_list: {
 config: {
 name:  mnist ,
 base_path:  /tmp/monitored/_model ,mnist
 model_platform:  tensorflow ,
 model_version_policy: { all: {}
 config: {
 name:  inception ,
 base_path:  /tmp/monitored/inception_model ,
 model_platform:  tensorflow ,
 model_version_policy: {
  latest: {
   num_versions: 2
  }
 config: {
 name:  mxnet ,
 base_path:  /tmp/monitored/mxnet_model ,
 model_platform:  tensorflow ,
 model_version_policy: {
  specific: {
   versions: 1
  }
}

TensorFlow Serving 編譯

其實(shí) TensorFlow Serving 的編譯安裝，在 github setup 文檔中已經(jīng)寫的比較清楚了，在這里我只想強(qiáng)調(diào)一點(diǎn)，而且是非常重要的一點(diǎn), 就是文檔中提到的：

Optimized build
It s possible to compile using some platform specific instruction sets (e.g. AVX) that can significantly improve performance. Wherever you see  bazel build  in the documentation, you can add the flags -c opt --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-O3 (or some subset of these flags). For example:
bazel build -c opt --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-O3 tensorflow_serving/...
Note: These instruction sets are not available on all machines, especially with older processors, so it may not work with all flags. You can try some subset of them, or revert to just the basic  -c opt  which is guaranteed to work on all machines.

這很重要，開始的時(shí)候我們并沒(méi)有加上對(duì)應(yīng)的 copt 選項(xiàng)進(jìn)行編譯，測(cè)試發(fā)現(xiàn)這樣編譯出來(lái)的 tensorflow_model_server 的性能是很差的（至少不能滿足我們的要求），client 并發(fā)請(qǐng)求 tensorflow serving 的延遲很高 (基本上所有請(qǐng)求延遲都大于 100ms)。加上這些 copt 選項(xiàng)時(shí)，對(duì)同樣的 model 進(jìn)行同樣并發(fā)測(cè)試，結(jié)果 99.987% 的延遲都在 50ms 以內(nèi)，對(duì)比懸殊。

關(guān)于使用 –copt=O2 還是 O3 及其含義，請(qǐng)看 gcc optimizers 的說(shuō)明，這里不作討論。（因?yàn)槲乙膊欢?…）

那么，是不是都是按照官方給出的一模一樣的 copt 選項(xiàng)進(jìn)行編譯呢？答案是否定的！這取決于你運(yùn)行 TensorFlow Serving 的服務(wù)器的 cpu 配置，通過(guò)查看 /proc/cpuinfo 可知道你該用的編譯 copt 配置項(xiàng)：

使用注意事項(xiàng)

由于 TensorFlow 支持同時(shí) serve 多個(gè) model 的多個(gè)版本，因此建議 client 在 gRPC 調(diào)用時(shí)盡量指明想調(diào)用的 model 和 version，因?yàn)椴煌?version 對(duì)應(yīng)的 model 不同，得到的預(yù)測(cè)值也可能大不相同。

將訓(xùn)練好的模型復(fù)制導(dǎo)入到 model base path 時(shí)，盡量先壓縮成 tar 包，復(fù)制到 base path 后再解壓。因?yàn)槟Ｐ秃艽螅瑥?fù)制過(guò)程需要耗費(fèi)一些時(shí)間，這可能會(huì)導(dǎo)致導(dǎo)出的模型文件已復(fù)制，但相應(yīng)的 meta 文件還沒(méi)復(fù)制，此時(shí)如果 TensorFlow Serving 開始加載這個(gè)模型，并且無(wú)法檢測(cè)到 meta 文件，那么服務(wù)器將無(wú)法成功加載該模型，并且會(huì)停止嘗試再次加載該版本。

如果你使用的 protobuf version = 3.2.0, 那么請(qǐng)注意 TensorFlow Serving 只能加載不超過(guò) 64MB 大小的 model。可以通過(guò)命令 pip list | grep proto 查看到 probtobuf version。我的環(huán)境是使用 3.5.0 post1，不存在這個(gè)問(wèn)題，請(qǐng)你留意。更多請(qǐng)查看 issue 582。

官方宣稱支持通過(guò) gRPC 接口動(dòng)態(tài)更改 model_config_list, 但實(shí)際上你需要開發(fā) custom resource 才行，意味著不是開箱即用的。可持續(xù)關(guān)注 issue 380。

TensorFlow Serving on Kubernetes

將 TensorFlow Serving 以 Deployment 方式部署到 Kubernetes 中，下面是對(duì)應(yīng)的 Deployment yaml：

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
 name: tensorflow-serving
spec:
 replicas: 1
 template:
 metadata:
 labels:
 app:  tensorflow-serving 
 spec:
 restartPolicy: Always
 imagePullSecrets:
 - name: harborsecret
 containers:
 - name: tensorflow-serving
 image: registry.vivo.xyz:4443/bigdata_release/tensorflow_serving1.3.0:v0.5
 command: [/bin/sh ,  -c , export CLASSPATH=.:/usr/lib/jvm/java-1.8.0/lib/tools.jar:$(/usr/lib/hadoop-2.6.1/bin/hadoop classpath --glob); /root/tensorflow_model_server --port=8900 --model_name=test_model --model_base_path=hdfs://xx.xx.xx.xx:zz/data/serving_model ]
 ports:
 - containerPort: 8900

“TensorFlow Serving 在 Kubernetes 中怎么配置”的內(nèi)容就介紹到這里了，感謝大家的閱讀。如果想了解更多行業(yè)相關(guān)的知識(shí)可以關(guān)注丸趣 TV 網(wǎng)站，丸趣 TV 小編將為大家輸出更多高質(zhì)量的實(shí)用文章！

正文完