Dive into TensorFlow系列（3）- 揭开Tensor的神秘面纱-六虎

TensorFlow核算图是由op和tensor组成，那么tensor一般都用来代表什么呢？显然，像模型的输入数据、网络权重、输入数据经op处理后的输出成果都需求用张量或特别张量进行表达。已然tensor在TensorFlow体系架构中如此重要，因而本文将带领咱们由浅入深地学习tensor的三个论题：用户眼中的tensor、TensorFlow体系中的tensor、tensor高阶用法DLPack（跨结构编程，如：TensorFlow+PyTorch）。

注：本文根据TensorFlow v1.15.5进行编写。

一、小白眼中的Tensor

1.1 Tensor HelloWorld

界说两个张量，然后对其求加法，相关代码如下：

# segment 1
a = tf.constant(3.0, dtype=tf.float32)
b = tf.constant(4.0) # also tf.float32 implicitly
total = a + b
print(a)
print(b)
print(total)
### 三个print的输出如下：
"""
Tensor("Const:0", shape=(), dtype=float32)
Tensor("Const_1:0", shape=(), dtype=float32)
Tensor("add:0", shape=(), dtype=float32)
"""
# 阐明：此刻的Tenosr尚不能发生真实的成果。以上代码创立了核算图，Tensor只是代表op运转的成果(但此刻op未运转)。

假如想看到终究total的核算成果，则应该创立Session目标并运转核算图，具体代码如下（在segment1基础上增加代码）：

with tf.Session() as sess:
    result = sess.run(total)
    print(result, type(result), type(total))
# 输出成果= 7.0 <class 'numpy.float32'> <class 'tensorflow.python.framework.ops.Tensor'>

由此可见，Tensor代表尚未履行的成果表明，创立Session目标并运转核算图可得total成果7.0，并且成果的数据类型已变为numpy。终究阐明一下，本末节代码输出的Tensor是指tf.Tensor，对应的代码完成是tensorflow.python.framework.ops.Tensor。

1.2 张量特点及特别张量

从用户视角看tf.Tensor首要有三个特点：name、dtype、shape。除此之外，还有三个特点比较重要（不常用或者不直接可见）：op、graph、device。其间op特点记载发生此Tensor的操作称号，graph特点记载包括此Tensor的核算图，device特点记载发生此Tensor的设备称号。

在TensorFlow体系中有四种特别的张量（此处暂不严格区分Tensor与发生此Tensor的op），具体如下：

• tf.Variable: 界说内容可变的张量，一般用来界说模型权重。

• tf.constant: 一般来说，张量内容不可变，此API可用来界说常规张量。

• tf.placeholder: 占位符张量，用于描绘静态图输入标准。静态图采用先编译后履行的方式，因而在界说核算图时要知道输入标准。

• tf.SparseTensor: 为稀少数据定制的张量结构。

1.3 Tensor与op的联系

咱们多次说到，Tensor能够作为op的输入，经op一系列处理后发生新的Tensor作为输出。为了深化了解这一点，咱们回头从头审视segment1中的代码片段（请咱们留意Tensor的命名）：

# segment 1
a = tf.constant(3.0, dtype=tf.float32)
b = tf.constant(4.0) # also tf.float32 implicitly
total = a + b
print(a)
print(b)
print(total)
### 三个print的输出如下：
"""
Tensor("Const:0", shape=(), dtype=float32)
Tensor("Const_1:0", shape=(), dtype=float32)
Tensor("add:0", shape=(), dtype=float32)
"""
# 阐明：此刻的Tenosr尚不能发生真实的成果。以上代码创立了核算图，Tensor只是代表op运转的成果(但此刻op未运转)。

针对上述代码，咱们先来看看哪些是Tensor，哪些是op，然后根据此别离描绘每一个操作的履行进程。为答复榜首个问题，咱们先看一段TensorFlow官方注释：

"""
`tf.constant` creates a `Const` node in the computation graph with the
exact value at graph construction time.
"""

由此可见，segment1的代码中有两种op，别离为Const和add，前者出现了两次，而后者1次。根据此，咱们得知segment1依次向核算图中增加了三个op，与此一起也能够答复第二个问题，即每个操作的进程。具体如下：

### 三个print的输出如下(a,b,total)：
"""
Tensor("Const:0", shape=(), dtype=float32)
Tensor("Const_1:0", shape=(), dtype=float32)
Tensor("add:0", shape=(), dtype=float32)
"""
# 向核算图增加榜首个op(Const),输入是一个标量,输出是Tensor a,其称号由两部分组成,即op称号:a在op输出的索引位置.
# 向核算图增加第二个op(Const_1,由于op称号要唯一),输入标量,输出Tensor b,其命名规矩同上.
# 向核算图增加第三个op(add),输入是Tensor a和b,输出Tensor total,其命名规矩同上.

二、一探tensor究竟

2.1 前后端Tensor映射

在TensorFlow的白皮书[7]中说到C API是衔接前端用户代码和后端履行引擎的桥梁，为深化了解这个概念，建议读者参照TensorFlow官网从头编译源代码。TensorFlow v1.15.5根据Bazel进行编译，前端python与后端C++经过SWIG进行交互。实践上在体系编译之前会先启动SWIG代码生成进程，经过解析tensorflow.i主动生成两个wrapper文件：pywrap_tensorflow_internal.py和pywrap_tensorflow_internal.cc，前者对接前端python调用，后者对接后端C API调用。咱们装置tensorflow官方二进制包后，只能看到py文件而没有cc文件。假如自己编译TensorFlow源码，可在项目根目录下的bazel-bin中找到相应的py和cc文件，如下图所示：

Dive into TensorFlow系列（3）- 揭开Tensor的神秘面纱

上图红框中的so文件是由cc文件编译得到，黄框中的py模块初次被导入时，会主动加载so动态链接库。而在so对应的cc文件中，静态注册了一个函数映射表，完成python函数到C函数的映射。此映射表结构大致如下：

static PyMethodDef SwigMethods[] = {
          { (char *)"SWIG_PyInstanceMethod_New", (PyCFunction)SWIG_PyInstanceMethod_New, METH_O, NULL},
          { (char *)"TF_OK_swigconstant", TF_OK_swigconstant, METH_VARARGS, NULL},
          { (char *)"TF_CANCELLED_swigconstant", TF_CANCELLED_swigconstant, METH_VARARGS, NULL},
          { (char *)"TF_UNKNOWN_swigconstant", TF_UNKNOWN_swigconstant, METH_VARARGS, NULL},
          { (char *)"TF_INVALID_ARGUMENT_swigconstant", TF_INVALID_ARGUMENT_swigconstant, METH_VARARGS, NULL},
          // 此处省掉许多代码
};

假如没有亲自实践，上面这些文字读起来多少有些费劲。为便于咱们了解，咱们把上述文字用如下简图进行总结：

有些好奇宝宝或许会说：上面讲的太微观，好像懂了，又好像没懂。没联系，接下来咱们以静态图的运转接口session.run()为例，结合TensorFlow源码具体梳理一下前后端的映射进程，具体进程见下图：

由上图咱们可清晰看到C API层把前后端给隔离开了，当然C API层包括pywrap_tensorflow_internal.h/cc、tf_session_helper.h/cc、c_api.h/cc。至此session.run()从前端映射到后端的流程讲完了，那接下来答复前端tensor如何映射至后端Tensor，请看如下代码：

// tf_session_helper.cc    line351
void TF_SessionRun_wrapper_helper(TF_Session* session, const char* handle,
                                  const TF_Buffer* run_options,
                                  const std::vector<TF_Output>& inputs,
                                  const std::vector<PyObject*>& input_ndarrays,
                                  const std::vector<TF_Output>& outputs,
                                  const std::vector<TF_Operation*>& targets,
                                  TF_Buffer* run_metadata,
                                  TF_Status* out_status,
                                  std::vector<PyObject*>* py_outputs) {
  DCHECK_EQ(inputs.size(), input_ndarrays.size());
  DCHECK(py_outputs != nullptr);
  DCHECK(py_outputs->empty());
  Status s;
  // Convert input ndarray PyObjects to TF_Tensors. We maintain a continuous
  // array of TF_Tensor*s as well as scoped containers to make sure they're
  // cleaned up properly.
  // 省掉了许多代码，能够看到此处把前端类ndarray的目标转化成了TF_Tensors。
}
// c_api.cc  line2274
void TF_SessionRun(TF_Session* session, const TF_Buffer* run_options,
                   const TF_Output* inputs, TF_Tensor* const* input_values,
                   int ninputs, const TF_Output* outputs,
                   TF_Tensor** output_values, int noutputs,
                   const TF_Operation* const* target_opers, int ntargets,
                   TF_Buffer* run_metadata, TF_Status* status) {
  // TODO(josh11b,mrry): Change Session to be able to use a Graph*
  // directly, instead of requiring us to serialize to a GraphDef and
  // call Session::Extend().
  if (session->extend_before_run &&
      !ExtendSessionGraphHelper(session, status)) {
    return;
  }
  TF_Run_Setup(noutputs, output_values, status);
  // Convert from TF_Output and TF_Tensor to a string and Tensor.
  // 看这里，此外TensorFlow把TF_Tensor转化成c++ Tensor
  std::vector<std::pair<string, Tensor>> input_pairs(ninputs);
  if (!TF_Run_Inputs(input_values, &input_pairs, status)) return;
  for (int i = 0; i < ninputs; ++i) {
    input_pairs[i].first = OutputName(inputs[i]);
  }
  // Convert from TF_Output to string names.
  std::vector<string> output_names(noutputs);
  for (int i = 0; i < noutputs; ++i) {
    output_names[i] = OutputName(outputs[i]);
  }
}

2.2 C++ Tensor类

检查参阅文献5，咱们找到了C++ Tensor类的界说，其重要片段（seg1）如下：

class Tensor{
  public:
    // Tensor序列化/反序列化相关,在2.3节具体介绍
    bool FromProto(const TensorProto& other) TF_MUST_USE_RESULT;
    void AsProtoField(TensorProto* proto) const;
    void AsProtoTensorContent(TensorProto* proto) const;
    // Tensor实践为底层数据的一种视图,可用vec或matrix进行展现
    template <typename T>
    typename TTypes<T>::Vec vec() {
      return tensor<T, 1>();
    }
    template <typename T>
    typename TTypes<T>::Matrix matrix() {
      return tensor<T, 2>();
    }
    template <typename T, size_t NDIMS>
    typename TTypes<T, NDIMS>::Tensor tensor();
  private:
    TensorShape shape_;    // 保护Tensor的形状和数据类型
    TensorBuffer buf_;     // 底层数据的指针
}

咱们先来剖析下两个私有成员。首先看一下TensorBuffer类，它是一个承继引证计数类的虚拟类，不包括任何完成。经过检查参阅文献6，咱们得知BufferBase承继TensorBuffer类，且保护了一个内存分配器指针。而Buffer类承继BufferBase类，且保护了指向实践数据的指针data_和元素数量elem_。上述类的承继联系如下图所示（为便于了解图中给出成员界说，而非标准的UML图）：

接下来咱们剖析TensorShape类。它也有自己的类承继体系，其间心逻辑界说在父类TensorShapeRep中，相关的类承继体系如下图：

为深化了解TensorShape的效果，以下结合TensorShapeRep的部分代码（seg2）进行剖析：

class TensorShapeRep{
  private:
    // 如下buf合计16字节表明TensorShape，其间前12字节用来存储形状（Rep16、Rep32、Rep64）
    // 第13字节效果不清楚，第14、15、16字节别离表明数据类型编号、张量的维度数目、张量维度的表明类型
    union {
      uint8 buf[16];
      Rep64* unused_aligner;   // Force data to be aligned enough for a pointer.
    } u_;
  public:
    // 理论上可界说恣意维的张量，但1维、2维、3维张量最常见。所以给出如下三种维度表明办法（12字节）
    struct Rep16 {
      uint16 dims_[6];    // 最多可表明6维的张量，每一维的长度不超越2^16-1
    };
    struct Rep32 {
      uint32 dims_[3];    // 最多可表明3维的张量，每一维的长度不超越2^32-1
    };
    struct Rep64 {
      gtl::InlinedVector<int64, 4>* dims_;  // 支撑恣意维度的张量
    };
}

本末节终究，咱们再来看一下Tensor类界说中的vector()和matrix()。检查两个办法的完成，发现调用了一同的办法tensor()，而tensor()的返回类型为TTypes<T，NDIMS>::Tensor，而TTypes正是衔接TF Tensor与Eigen库的要害。请看如下代码（seg3）：

// tensorflow1.15.5\tensorflow\core\framework\tensor.h
class Tensor{
  public:
    // Returns the shape of the tensor.
    const TensorShape& shape() const { return shape_; }
    template <typename T>
    typename TTypes<T>::Vec vec() {
      return tensor<T, 1>();
    }
    template <typename T>
    typename TTypes<T>::Matrix matrix() {
      return tensor<T, 2>();
    }
    template <typename T, size_t NDIMS>
    typename TTypes<T, NDIMS>::Tensor tensor();
}
// tensorflow1.15.5\tensorflow\core\framework\tensor_types.h
template <typename T, int NDIMS = 1, typename IndexType = Eigen::DenseIndex>
struct TTypes {
  // Rank-<NDIMS> tensor of scalar type T.
  typedef Eigen::TensorMap<Eigen::Tensor<T, NDIMS, Eigen::RowMajor, IndexType>,Eigen::Aligned> Tensor;
  // 省掉了许多代码
}
// tensorflow1.15.5\tensorflow\core\framework\tensor.h
// TF Tensor的shape()返回TensorShape。base()返回指向实践数据的指针。
template <typename T, size_t NDIMS>
typename TTypes<T, NDIMS>::Tensor Tensor::tensor() {
  CheckTypeAndIsAligned(DataTypeToEnum<T>::v());
  return typename TTypes<T, NDIMS>::Tensor(base<T>(),
                                           shape().AsEigenDSizes<NDIMS>());
}

由上述代码可见，调用tensor()是把TF Tensor转化成了TTypes<T,NDIMS>::Tensor，而后者本质上是Eigen::TensorMap。至此，咱们搞清楚了TF Tensor与Eigen库的联系，能够认为TF C++ Tensor是对Eigen::TensorMap的一种封装。由于Eigen::TensorMap结构函数的参数来自于TF Tensor中保存的信息（base()和shape()对应的信息）。

2.3 C++ Tensor序列化

在TensorFlow的分布式训练环境中涉及很多的跨机通讯，通讯的内容便是序列化后的张量（经过send/recv op对协同作业）。本末节咱们将一同学习Tensor的序列化机制，以及Tensor与序列化目标的互编程。TensorFlow中Tensor对应的序列化目标叫TensorProto，它是由对应的proto文件生成。具体代码如下（seg4）：

// tensorflow1.15.5\tensorflow\core\framework\tensor.proto
syntax = "proto3";
message TensorProto {
  DataType dtype = 1;
  TensorShapeProto tensor_shape = 2;
  int32 version_number = 3;
  bytes tensor_content = 4;
  repeated int32 half_val = 13 [packed = true];
  // DT_FLOAT.
  repeated float float_val = 5 [packed = true];
  // DT_DOUBLE.
  repeated double double_val = 6 [packed = true];
  // DT_INT32, DT_INT16, DT_INT8, DT_UINT8.
  repeated int32 int_val = 7 [packed = true];
  // DT_STRING
  repeated bytes string_val = 8;
  // DT_COMPLEX64. scomplex_val(2*i) and scomplex_val(2*i+1) are real
  // and imaginary parts of i-th single precision complex.
  repeated float scomplex_val = 9 [packed = true];
  // DT_INT64
  repeated int64 int64_val = 10 [packed = true];
  // DT_BOOL
  repeated bool bool_val = 11 [packed = true];
  // DT_COMPLEX128. dcomplex_val(2*i) and dcomplex_val(2*i+1) are real
  // and imaginary parts of i-th double precision complex.
  repeated double dcomplex_val = 12 [packed = true];
  // DT_RESOURCE
  repeated ResourceHandleProto resource_handle_val = 14;
  // DT_VARIANT
  repeated VariantTensorDataProto variant_val = 15;
  // DT_UINT32
  repeated uint32 uint32_val = 16 [packed = true];
  // DT_UINT64
  repeated uint64 uint64_val = 17 [packed = true];
};

咱们可用protoc编译器来编译tensor.proto文件，成果生成tensor.pb.h和tensor.pb.cc两个文件，他们别离声明了TensorProto类界说、TensorProto成员办法的完成。咱们能够粗略地将TensorProto看作Tensor的二进制目标，根据此它们相互之间的转化代码如下所示（seg5）：

// Tensor的序列化进程
auto tensor_proto = new TensorProto();
// Fills in `proto` with `*this` tensor's content.
// `AsProtoField()` fills in the repeated field for `proto.dtype()`, 
// while `AsProtoTensorContent()` encodes the content in `proto.tensor_content()` in a compact form.
tensor->AsProtoField(tensor_proto);
tensor->AsProtoTensorContent(tensor_proto);
// Tensor的反序列化进程
Tensor tensor;
tensor.FromProto(tensor_proto);

三、跨结构编程-通用内存张量DLPack

3.1 什么是DLPack

DLPack是一种敞开的内存张量结构，用于在AI结构之间共享张量。多结构整合解决AI问题，能充分发挥各结构优势（一些运算在某结构中支撑更好），并终究取得全体最佳性能。但这里有一个要害问题要解决：如何将内存中的张量从一个结构传递到另一个结构，而不发生任何数据复制？走运的是，陈天奇团队给出了DLPack这个答案。

DLPack的设计理念是尽或许的轻量化，它不考虑内存分配、设备API，仅仅重视张量数据结构。它能够运转在多个硬件平台上，目前支撑的结构有：NumPy、CuPy、PyTorch、Tensorflow、MXNet、TVM、mpi4py。DLPack的开发者不计划完成Tensor和Ops，而是将其用作跨结构重用张量和操作的公共桥梁。深化了解DLPack，要把握两大模块：C API与Python API。DLPack C API体系结构如下：

上图中深蓝色的结构体均界说在[13]中。DLTensor代表一般C Tensor目标，但不担任内存办理。DLManagedTensor也是一个C Tensor目标，担任DLTensor的内存办理，它被设计用来协助其他结构借用此DLTensor。接下来，咱们将目光转向DLPack的Python API。

DLPack Python接口是Python array的标准API。用DLPack Python接口进行数据交换的接口有两个：

• from_dlpack(x)：输入一个包括__dlpack__办法的数组目标，用这个办法构建一个包括x数据域的新数组目标。

• __dlpack__(self,stream=None) and __dlpack_device__()：在from_dlpack(x)内部调用x的这两个办法，别离用于获取x的数据域以及定位x数组目标在哪个设备上。

从语义层面了解y=from_dlpack(x)的话，生成x的库叫生产者，包括from_dlpack()的库叫做顾客。其间生产者提供了拜访x数据域的途径，通常来说生产者和顾客之间关于相应的数据是零复制的，也即y可视为x的视图。假如深化from_dlpack(x)内部，则x.__dlpack__办法生成包括DLManagedTensor的PyCapsule目标（或称capsule），这个目标只能被消费一次。生产者必须将PyCapsule目标称号设为”dltensor”，以方便按称号检索；一起也要设置DLManagedTensor的deleter办法给PyCapsule_Destructor，这个设置是当名为”dltensor”的capsule目标不再需求时使用。顾客把DLManagedTensor的所有权从capsule目标转移至自己，这是经过把capsule目标改名为”used_dltensor”以确保PyCapsule_Destructor不会被调用来完成的。但当capsule目标把DLManagedTensor所有权转移至顾客目标时，顾客目标的destructor办法仍然能够调用DLManagedTensor的deleter办法。

3.2 TensorFlow中的dlpack

笔者发现TensorFlow对DLPack的支撑是从v2.2.0开端的，更早的版本没有dlpack相应的库。TensorFlow的dlpack接口与3.1恪守相同的语义描绘，相应的API测验句子如下：

import tensorflow as tf
x = tf.constant(5)
x                     // <tf.Tensor: shape=(), dtype=int32, numpy=5>
r =tf.experimental.dlpack.to_dlpack(x)
print(r,type(r))      // <capsule object "dltensor" at 0x7f55a0431c30> <class 'PyCapsule'>
x_other = tf.experimental.dlpack.from_dlpack(r)
x_other               // <tf.Tensor: shape=(), dtype=int32, numpy=5>

3.3 TVM与DLPack的联系

假如你想开发一款跨AI结构的深度学习编译器，DLPack便是一种可行的计划（TVM便是这条技术道路）。比方，咱们在TVM中声明并编译一个矩阵乘法算子，然后根据DLPack表明构建一个包装器，该包装器能让此矩阵乘法算子支撑PyTorch Tensor。对MxNet能够采用相似的操作。DLPack提供在AI结构和TVM之间共享的中间包装器的原理如下图所示：

上述原理能够参阅如下代码举例：

// 条件阐明:在PyTorch中核算矩阵乘法
import torch
x = torch.rand(56,56)
y = torch.rand(56,56)
z = x.mm(y)
// 榜首步，界说并构建一个TVM矩阵乘法算子
n = tvm.convert(56)
X = tvm.placeholder((n,n), name='X')
Y = tvm.placeholder((n,n), name='Y')
k = tvm.reduce_axis((0, n), name='k')
Z = tvm.compute((n,n), lambda i,j : tvm.sum(X[i,k]*Y[k,j], axis=k))
s = tvm.create_schedule(Z.op)
fmm = tvm.build(s, [X, Y, Z], target_host='llvm', name='fmm')
// 第二步，对TVM函数进行包装以支撑PyTorch Tensor,并验证成果
from tvm.contrib.dlpack import to_pytorch_func
# fmm is the previously built TVM function (Python function)
# fmm is the wrapped TVM function (Python function)
fmm_pytorch = to_pytorch_func(fmm)
z2 = torch.empty(56,56)
fmm_pytorch(x, y, z2)
np.testing.assert_allclose(z.numpy(), z2.numpy())
// 第三步，参照第二步对MxNet进行相似包装处理
import mxnet
from tvm.contrib.mxnet import to_mxnet_func
ctx = mxnet.cpu(0)
x = mxnet.nd.uniform(shape=(56,56), ctx=ctx)
y = mxnet.nd.uniform(shape=(56,56), ctx=ctx)
z = mxnet.nd.empty(shape=(56,56), ctx=ctx)
f = tvm.build(s, [X, Y, Z], target_host='llvm', name='f')
f_mxnet = to_mxnet_func(f)
f_mxnet(x, y, z)
np.testing.assert_allclose(z.asnumpy(), x.asnumpy().dot(y.asnumpy()))
// 第四步，to_pytorch_func()的具体界说
// TVM提供了dlpack tensor和TVM NDArray互转的函数.TVM函数在最底层调用的是TVM NDArray.
// 此包装器的大致流程是: AI Tensor -> dlpack tensor -> TVM NDArray -> call TVM function
def convert_func(tvm_func, tensor_type, to_dlpack_func):
    assert callable(tvm_func)
    def _wrapper(*args):
        args = tuple(ndarray.from_dlpack(to_dlpack_func(arg))\
            if isinstance(arg, tensor_type) else arg for arg in args)
        return tvm_func(*args)
    return _wrapper
def to_pytorch_func(tvm_func):
    import torch
    import torch.utils.dlpack
    return convert_func(tvm_func, torch.Tensor, torch.utils.dlpack.to_dlpack)

四、总结

本文内容较多且烧脑，建议读者反复阅读几遍，定能有所收成。咱们在此对通篇内容作个总结，本文首要讲了三个主题：

• 榜首部分解说小白眼中的Tensor，要点剖析了Tensor的特点和OP的联系。

• 第二部分解说体系开发者眼中的Tensor，要点解说Tensor前后端映射，以及Tensor的C++界说及序列化。

• 第三部分解说通用内存张量DLPack，要点解说了DLPack的界说及在TensorFlow中的使用，以及DLPack在TVM中扮演的人物。

参阅文献

1.TensorFlow Introduction： github.com/tensorflow/…

2.TensorFlow Tensors： github.com/tensorflow/…

3.tf.constant源码： github.com/tensorflow/…

4.tensorflow源码解析之framework-tensor： www.cnblogs.com/jicanghai/p…

5.TensorFlow c++ Tensor source code： github.com/tensorflow/…

6.TensorFlow c++ Tensor source code： github.com/tensorflow/…

7.《TensorFlow: A System for Large-Scale Machine Learning》： www.usenix.org/system/file…

8.tensorflow-internals.pdf： github.com/horance-liu…

9.DLPack doc： dmlc.github.io/dlpack/late…

10.DLPack github： github.com/dmlc/dlpack

11.DLPack CAPI： dmlc.github.io/dlpack/late…

12.Python Specification for DLPack： dmlc.github.io/dlpack/late…

13.dlpack.h： github.com/dmlc/dlpack…

14.Building a Cross-Framework Deep Learning Compiler via DLPack： tvm.apache.org/2018/08/10/…

Dive into TensorFlow系列（3）- 揭开Tensor的神秘面纱