{"id":13454703,"url":"https://github.com/vahidk/EffectiveTensorflow","last_synced_at":"2025-03-24T06:31:17.974Z","repository":{"id":47441637,"uuid":"98965177","full_name":"vahidk/EffectiveTensorflow","owner":"vahidk","description":"TensorFlow tutorials and best practices.","archived":false,"fork":false,"pushed_at":"2020-10-22T05:26:17.000Z","size":151,"stargazers_count":8619,"open_issues_count":0,"forks_count":908,"subscribers_count":345,"default_branch":"master","last_synced_at":"2024-10-10T08:40:51.848Z","etag":null,"topics":["deep-learning","ebook","machine-learning","neural-network","tensorflow"],"latest_commit_sha":null,"homepage":"https://twitter.com/VahidK","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vahidk.png","metadata":{"files":{"readme":"README(chs).md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-08-01T06:00:56.000Z","updated_at":"2024-10-09T17:12:43.000Z","dependencies_parsed_at":"2022-08-23T23:01:33.273Z","dependency_job_id":null,"html_url":"https://github.com/vahidk/EffectiveTensorflow","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vahidk%2FEffectiveTensorflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vahidk%2FEffectiveTensorflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vahidk%2FEffectiveTensorflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vahidk%2FEffectiveTensorflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vahidk","download_url":"https://codeload.github.com/vahidk/EffectiveTensorflow/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221939339,"owners_count":16904954,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","ebook","machine-learning","neural-network","tensorflow"],"created_at":"2024-07-31T08:00:57.087Z","updated_at":"2024-10-28T21:31:08.369Z","avatar_url":"https://github.com/vahidk.png","language":null,"readme":"# Effective TensorFlow 2 中文版\n\n目录\n=================\n## Part I: TensorFlow 2 基础\n1.  [TensorFlow 2 基础](#basics)\n2.  [广播](#broadcast)\n3.  [利用重载OPs](#overloaded_ops)\n4.  [控制流操作: 条件与循环](#control_flow)\n5.  [原型核和使用Python OPs可视化](#python_ops)\n6.  [TensorFlow中的数值稳定性](#stable)\n---\n\n_我们针对新发布的 TensorFlow 2.x API 更新了教程. 如果你想看 TensorFlow 1.x 的教程请移步 [v1 branch](https://github.com/vahidk/EffectiveTensorflow/tree/v1)._\n\n_安装 TensorFlow 2.0 (alpha) 请参照 [官方网站](https://www.tensorflow.org/install/pip):_\n```\npip install tensorflow==2.0.0-alpha0\n```\n\n_我们致力于逐步扩展新的文章，并保持与Tensorflow API更新同步。如果你有任何建议请提出来。_\n\n# Part I: TensorFlow 2.0 基础\n\u003ca name=\"fundamentals\"\u003e\u003c/a\u003e\n\n## TensorFlow 基础\n\u003ca name=\"basics\"\u003e\u003c/a\u003e\n重新设计的TensorFlow 2带来了更方便使用的API。如果你熟悉numpy，你用Tensorflow 2会很爽。不像完全静态图符号计算的Tensorflow 1，TF2隐藏静态图那部分，变得像个numpy。值得注意的是，虽然交互变化了，但是TF2仍然有静态图抽象的优势，TF1能做的TF2都能做。 \n\n让我们从一个简单的例子开始吧，我们那俩随机矩阵乘起来。我们先看看Numpy怎么做这事先。\n```python\nimport numpy as np\n\nx = np.random.normal(size=[10, 10])\ny = np.random.normal(size=[10, 10])\nz = np.dot(x, y)\n\nprint(z)\n```\n\n现在看看用TensorFlow 2.0怎么办:\n```python\nimport tensorflow as tf\n\nx = tf.random.normal([10, 10])\ny = tf.random.normal([10, 10])\nz = tf.matmul(x, y)\n\nprint(z)\n```\n与NumPy差不多，TensorFlow 2也马上执行并返回结果。唯一的不同是TensorFlow用tf.Tensor类型存储结果，当然这种数据可以方便的转换为NumPy数据，调用tf.Tensor.numpy()成员函数就行: \n\n```python\nprint(z.numpy())\n```\n\n为了理解符号计算的强大，让我们看看另一个例子。假设我们有从一个曲线(举个栗子 f(x) = 5x^2 + 3)上采集的样本点，并且我们要基于这些样本估计f(x)。我们建立了一个参数化函数g(x, w) = w0 x^2 + w1 x + w2，这个函数有输入x和隐藏参数w，我们的目标就是找出隐藏参数让g(x, w) ≈ f(x)。这个可以通过最小化以下的loss函数:L(w) = \u0026sum; (f(x) - g(x, w))^2。虽然这个问题有解析解，但是我们更乐意用一个可以应用到任意可微分方程上的通用方法，嗯，SGD。我们仅需要计算L(w) 在不同样本点上关于w的平均提督，然后往梯度反方向调整就行。\n\n\n那么，怎么用TensorFlow实现呢:\n\n```python\nimport numpy as np\nimport tensorflow as tf\n\n# 假设我们知道我们期望的多项式方程是二阶方程，\n# 我们分配一个长3的向量并用随机噪声初始化。\n\nw = tf.Variable(tf.random.normal([3, 1]))\n\n# 用Adam优化器优化，初始学习率0.1\nopt = tf.optimizers.Adam(0.1)\n\ndef model(x):\n    # 定义yhat为y的估计\n    f = tf.stack([tf.square(x), x, tf.ones_like(x)], 1)\n    yhat = tf.squeeze(tf.matmul(f, w), 1)\n    return yhat\n\ndef compute_loss(y, yhat):\n    # loss用y和yhat之间的L2距离估计。\n    # 对w加了正则项保证w较小。\n    loss = tf.nn.l2_loss(yhat - y) + 0.1 * tf.nn.l2_loss(w)\n    return loss\n\ndef generate_data():\n    # 根据真实函数生成一些训练样本\n    x = np.random.uniform(-10.0, 10.0, size=100).astype(np.float32)\n    y = 5 * np.square(x) + 3\n    return x, y\n\ndef train_step():\n    x, y = generate_data()\n\n    def _loss_fn():\n        yhat = model(x)\n        loss = compute_loss(y, yhat)\n        return loss\n    \n    opt.minimize(_loss_fn, [w])\n\nfor _ in range(1000):\n    train_step()\n\nprint(w.numpy())\n```\n运行这段代码你会看到近似下面这个的结果:\n```python\n[4.9924135, 0.00040895029, 3.4504161]\n```\n这和我们的参数很接近了.\n\n注意，上面的代码是交互式执行 (i.e. eager模式下ops直接执行)，这种操作并不高效. TensorFlow 2.0也提供静态图执行的法子，方便在GPUs和TPUs上快速并行执行。开启也很简单对于训练阶段函数用tf.function修饰就OK:\n\n```python\n@tf.function\ndef train_step():\n    x, y = generate_data()\n\n    def _loss_fn():\n        yhat = model(x)\n        loss = compute_loss(y, yhat)\n        return loss\n    \n    opt.minimize(_loss_fn, [w])\n```\n\ntf.function多牛逼，他也可以吧while、for之类函数转换进去。我们后面细说。\n\n这些只是TF能做的冰山一角。很多有几百万参数的复杂神经网络可以在TF用几行代码搞定。TF也可以在不同设备，不同线程上处理。\n\n## 广播操作\n\u003ca name=\"broadcast\"\u003e\u003c/a\u003e\nTF支持广播元素操作。一般来说，如果你想执行加法或者乘法之类操作，你得确保相加或者相乘元素形状匹配，比如你不能把形状为[3, 2]的tensor加到形状为[3, 4]的tensor上。但是有个特例，就是当你把一个tensor和另一有维度长度是1的tensor是去加去乘，TF会把银行的把那个维扩展，让两个tensor可操作。（去看numpy的广播机制吧）\n\n```python\nimport tensorflow as tf\n\na = tf.constant([[1., 2.], [3., 4.]])\nb = tf.constant([[1.], [2.]])\n# c = a + tf.tile(b, [1, 2])\nc = a + b\n\nprint(c)\n```\n\n广播可以让我们代码更短更高效。我们可以把不同长度的特征连接起来。比如用一些非线性操作复制特定维度，这在很多神经网络里经常用的到：\n\n\n```python\na = tf.random.uniform([5, 3, 5])\nb = tf.random.uniform([5, 1, 6])\n\n# 连接a和b\ntiled_b = tf.tile(b, [1, 3, 1])\nc = tf.concat([a, tiled_b], 2)\nd = tf.keras.layers.Dense(10, activation=tf.nn.relu).apply(c)\n\nprint(d)\n```\n\n但这个用了广播就更简单了，我们可以用f(m(x + y))等效f(mx + my)这个特性。然后隐含用广播来做连接。\n\n```python\npa = tf.keras.layers.Dense(10).apply(a)\npb = tf.keras.layers.Dense(10).apply(b)\nd = tf.nn.relu(pa + pb)\n\nprint(d)\n```\n\n事实下面的代码在可以广播的场景下更好用。\n\n```python\ndef merge(a, b, units, activation=None):\n    pa = tf.keras.layers.Dense(units).apply(a)\n    pb = tf.keras.layers.Dense(units).apply(b)\n    c = pa + pb\n    if activation is not None:\n        c = activation(c)\n    return c\n```\n\n所以，我们说了广播的好处，那么广播有啥坏处呢。隐含的广播可能导致debug麻烦。\n\n```python\na = tf.constant([[1.], [2.]])\nb = tf.constant([1., 2.])\nc = tf.reduce_sum(a + b)\n\nprint(c)\n```\n\n所以c的结果是啥？正确答案是12，当tensor形状不一样，TF自动的进行了广播。\n\n避免这个问题的法子就是尽量显式，比如reduce时候注明维度。\n\n```python\na = tf.constant([[1.], [2.]])\nb = tf.constant([1., 2.])\nc = tf.reduce_sum(a + b, 0)\n\nprint(c)\n```\n\n这里c得到[5, 7], 然后很容易发现问题。以后用reduce和tf.squeeze操作时最好注明维度。\n\n## 利用重载函数\n\u003ca name=\"overloaded_ops\"\u003e\u003c/a\u003e\n就像numpy，TF重载一些python操作来让graph构建更容易更可读。\n\n切片操作可以方便的索引tensor:\n```python\nz = x[begin:end]  # z = tf.slice(x, [begin], [end-begin])\n```\n尽量不要用切片，因为这个效率很逊。为了理解这玩意效率到底有多逊，让我们康康一个例子。下面将做一个列方向上的reduce_sum。\n\n```python\nimport tensorflow as tf\nimport time\n\nx = tf.random.uniform([500, 10])\n\nz = tf.zeros([10])\n\nstart = time.time()\nfor i in range(500):\n    z += x[i]\nprint(\"Took %f seconds.\" % (time.time() - start))\n```\n我的水果Pro上执行这段花了0.045秒，好逊。这是因为执行了500次切片，很慢的，更好的法子是矩阵分解。\n```python\nz = tf.zeros([10])\nfor x_i in tf.unstack(x):\n    z += x_i\n```\n花了0.01秒，当然，最勇的法子是用tf.reduce_sum操作:\n```python\nz = tf.reduce_sum(x, axis=0)\n```\n这个操作用了0.0001秒, 比最初的方法快了100倍。\n\nTF也重载了一堆算数和逻辑操作\n```python\nz = -x  # z = tf.negative(x)\nz = x + y  # z = tf.add(x, y)\nz = x - y  # z = tf.subtract(x, y)\nz = x * y  # z = tf.mul(x, y)\nz = x / y  # z = tf.div(x, y)\nz = x // y  # z = tf.floordiv(x, y)\nz = x % y  # z = tf.mod(x, y)\nz = x ** y  # z = tf.pow(x, y)\nz = x @ y  # z = tf.matmul(x, y)\nz = x \u003e y  # z = tf.greater(x, y)\nz = x \u003e= y  # z = tf.greater_equal(x, y)\nz = x \u003c y  # z = tf.less(x, y)\nz = x \u003c= y  # z = tf.less_equal(x, y)\nz = abs(x)  # z = tf.abs(x)\nz = x \u0026 y  # z = tf.logical_and(x, y)\nz = x | y  # z = tf.logical_or(x, y)\nz = x ^ y  # z = tf.logical_xor(x, y)\nz = ~x  # z = tf.logical_not(x)\n```\n\n你也可以这些操作的扩展用法。 比如`x += y` 和 `x **= 2`。\n\n注意，py不允许and or not之类的重载。\n\n其他比如等于(==) 和不等(!=) 等被NumPy重载的操作并没有被TensorFlow实现，请用函数版本的 `tf.equal` 和 `tf.not_equal`。（less_equal,greater_equal之类也得用函数式）\n\n## 控制流，条件与循环\n\u003ca name=\"control_flow\"\u003e\u003c/a\u003e\n当我们构建一个复杂的模型，比如递归神经网络，我们需要用条件或者循环来控制操作流。这一节里我们介绍一些常用的流控制操作。\n\n假设你想根据一个判断式来决定是否相乘或相加俩tensor。这个可以用py内置函数或者用tf.cond函数。\n\n```python\na = tf.constant(1)\nb = tf.constant(2)\n\np = tf.constant(True)\n\n# 或者:\n# x = tf.cond(p, lambda: a + b, lambda: a * b)\nx = a + b if p else a * b\n\nprint(x.numpy())\n```\n由于判断式为真，因此输出相加结果，等于3。\n\n大多数时候你在TF里用很大的tensor，并且想把操作应用到batch上。用tf.where就能对一个batch得到满足判断式的成分进行操作。\n```python\na = tf.constant([1, 1])\nb = tf.constant([2, 2])\n\np = tf.constant([True, False])\n\nx = tf.where(p, a + b, a * b)\n\nprint(x.numpy())\n```\n结果得到[3, 2].\n\n另一个常用的操作是tf.while_loop，他允许在TF里用动态循环处理可变长度序列。来个例子:\n\n```python\n@tf.function\ndef fibonacci(n):\n    a = tf.constant(1)\n    b = tf.constant(1)\n\n    for i in range(2, n):\n        a, b = b, a + b\n    \n    return b\n    \nn = tf.constant(5)\nb = fibonacci(n)\n    \nprint(b.numpy())\n```\n输出5. 注意tf.function装饰器自动把python代码转换为tf.while_loop因此我们不用折腾TF API。\n\n现在想一下，我们想要保持完整的斐波那契数列的话，我们需要更新代码来保存历史值:\n```python\n@tf.function\ndef fibonacci(n):\n    a = tf.constant(1)\n    b = tf.constant(1)\n    c = tf.constant([1, 1])\n\n    for i in range(2, n):\n        a, b = b, a + b\n        c = tf.concat([c, [b]], 0)\n    \n    return c\n    \nn = tf.constant(5)\nb = fibonacci(n)\n    \nprint(b.numpy())\n```\n\n如果你这么执行了，TF会反馈循环值发生变化。\n解决这个问题可以用 \"shape invariants\"，但是这个只能在底层tf.while_loop API里用。\n\n\n```python\nn = tf.constant(5)\n\ndef cond(i, a, b, c):\n    return i \u003c n\n\ndef body(i, a, b, c):\n    a, b = b, a + b\n    c = tf.concat([c, [b]], 0)\n    return i + 1, a, b, c\n\ni, a, b, c = tf.while_loop(\n    cond, body, (2, 1, 1, tf.constant([1, 1])),\n    shape_invariants=(tf.TensorShape([]),\n                      tf.TensorShape([]),\n                      tf.TensorShape([]),\n                      tf.TensorShape([None])))\n\nprint(c.numpy())\n```\n这个又丑又慢。我们建立一堆没用的中间tensor。TF有更好的解决方法，用tf.TensorArray就行了:\n```python\n@tf.function\ndef fibonacci(n):\n    a = tf.constant(1)\n    b = tf.constant(1)\n\n    c = tf.TensorArray(tf.int32, n)\n    c = c.write(0, a)\n    c = c.write(1, b)\n\n    for i in range(2, n):\n        a, b = b, a + b\n        c = c.write(i, b)\n    \n    return c.stack()\n\nn = tf.constant(5)\nc = fibonacci(n)\n    \nprint(c.numpy())\n```\nTF while循环再建立负载递归神经网络时候很有用。这里有个实验，[beam search](https://en.wikipedia.org/wiki/Beam_search) 他用了tf.while_loops，你那么勇应该可以用tensor arrays实现的更高效吧。\n\n## 原型核和用Python OPs可视化\n\u003ca name=\"python_ops\"\u003e\u003c/a\u003e\nTF里操作kernel使用Cpp实现来保证效率。但用Cpp写TensorFlow kernel很烦诶，所以你在实现自己的kernel前可以实验下自己想法是否奏效。用tf.py_function() 你可以把任何python操作编程tf操作。\n\n下面就是自己实现一个非线性的Relu:\n```python\nimport numpy as np\nimport tensorflow as tf\nimport uuid\n\ndef relu(inputs):\n    # Define the op in python\n    def _py_relu(x):\n        return np.maximum(x, 0.)\n\n    # Define the op's gradient in python\n    def _py_relu_grad(x):\n        return np.float32(x \u003e 0)\n    \n    @tf.custom_gradient\n    def _relu(x):\n        y = tf.py_function(_py_relu, [x], tf.float32)\n        \n        def _relu_grad(dy):\n            return dy * tf.py_function(_py_relu_grad, [x], tf.float32)\n\n        return y, _relu_grad\n\n    return _relu(inputs)\n```\n为了验证梯度的正确性，你应该比较解析和数值梯度。\n```python\n# 计算解析梯度\nx = tf.random.normal([10], dtype=np.float32)\nwith tf.GradientTape() as tape:\n    tape.watch(x)\n    y = relu(x)\ng = tape.gradient(y, x)\nprint(g)\n\n# 计算数值梯度\ndx_n = 1e-5\ndy_n = relu(x + dx_n) - relu(x)\ng_n = dy_n / dx_n\nprint(g_n)\n```\n这俩值应该很接近。\n\n注意这个实现很低效，因此只应该用在原型里，因为python代码超慢，后面你会想Cpp重新实现计算kernel的，大概。\n\n实际，我们通常用python操作来做可视化。比如你做图像分类，你在训练时想可视化你的模型预测，用Tensorboard看tf.summary.image()保存的结果吧:\n```python\nimage = tf.placeholder(tf.float32)\ntf.summary.image(\"image\", image)\n```\n但是你这只能可视化输入图，没法知道预测值，用tf的操作肯定嗝屁了，你可以用python操作:\n```python\ndef visualize_labeled_images(images, labels, max_outputs=3, name=\"image\"):\n    def _visualize_image(image, label):\n        #  python里绘图\n        fig = plt.figure(figsize=(3, 3), dpi=80)\n        ax = fig.add_subplot(111)\n        ax.imshow(image[::-1,...])\n        ax.text(0, 0, str(label),\n          horizontalalignment=\"left\",\n          verticalalignment=\"top\")\n        fig.canvas.draw()\n\n        # 写入内存中\n        buf = io.BytesIO()\n        data = fig.savefig(buf, format=\"png\")\n        buf.seek(0)\n\n        # Pillow解码图像\n        img = PIL.Image.open(buf)\n        return np.array(img.getdata()).reshape(img.size[0], img.size[1], -1)\n\n    def _visualize_images(images, labels):\n        # 只显示batch中部分图\n        outputs = []\n        for i in range(max_outputs):\n            output = _visualize_image(images[i], labels[i])\n            outputs.append(output)\n        return np.array(outputs, dtype=np.uint8)\n\n    # 、运行python op.\n    figs = tf.py_function(_visualize_images, [images, labels], tf.uint8)\n    return tf.summary.image(name, figs)\n```\n\n由于验证测试过一段时间测试一次，所以不用担心效率。\n\n## Numerical stability in TensorFlow\n\u003ca name=\"stable\"\u003e\u003c/a\u003e\n用TF或者Numpy之类数学计算库的时候，既要考虑数学计算的正确性，也要注意数值计算的稳定性。\n\n举个例子，小学就教了x * y / y在y不等于0情况下等于x，但是实际:\n```python\nimport numpy as np\n\nx = np.float32(1)\n\ny = np.float32(1e-50)  # y 被当成0了\nz = x * y / y\n\nprint(z)  # prints nan\n```\n\n对于单精度浮点y太小了，直接被当成0了，当然y很大的时候也有问题:\n\n```python\ny = np.float32(1e39)  # y 被当成无穷大\nz = x * y / y\n\nprint(z)  # prints nan\n```\n\n单精度浮点的最小值是1.4013e-45，任何比他小的值都被当成0，同样的任何大于3.40282e+38的,会被当成无穷大。\n\n```python\nprint(np.nextafter(np.float32(0), np.float32(1)))  # prints 1.4013e-45\nprint(np.finfo(np.float32).max)  # print 3.40282e+38\n```\n为了保证你计算的稳定，你必须避免过小值或者过大值。这个听起来理所当然，但是在TF进行梯度下降的时候可能很难debug。你在FP时候要保证稳定，在BP时候还要保证。\n\n让我们看一个例子，我们想要在一个logits向量上计算softmax，一个naive的实现就像：\n```python\nimport tensorflow as tf\n\ndef unstable_softmax(logits):\n    exp = tf.exp(logits)\n    return exp / tf.reduce_sum(exp)\n\nprint(unstable_softmax([1000., 0.]).numpy())  # prints [ nan, 0.]\n```\n所以你logits的exp的值，即使logits很小会得到很大的值，说不定超过单精度的范围。最大的不溢出logit值是ln(3.40282e+38) = 88.7，比他大的就会导致nan。\n\n所以怎么让这玩意稳定，exp(x - c) / \u0026sum; exp(x - c) = exp(x) / \u0026sum; exp(x)就搞掂了。如果我们logits减去一个数，结果还是一样的，一般减去logits最大值。这样exp函数的输入被限定在[-inf, 0]，然后输出就是[0.0, 1.0]，就很棒:\n\n```python\nimport tensorflow as tf\n\ndef softmax(logits):\n    exp = tf.exp(logits - tf.reduce_max(logits))\n    return exp / tf.reduce_sum(exp)\n\nprint(softmax([1000., 0.]).numpy())  # prints [ 1., 0.]\n```\n\n我们看一个更加复杂的情况，考虑一个分类问题，我们用softmax来得到logits的可能性，之后用交叉熵计算预测和真值。交叉熵这么算xe(p, q) = -\u0026sum; p_i log(q_i)。然后一个naive的实现如下:\n\n```python\ndef unstable_softmax_cross_entropy(labels, logits):\n    logits = tf.math.log(softmax(logits))\n    return -tf.reduce_sum(labels * logits)\n\nlabels = tf.constant([0.5, 0.5])\nlogits = tf.constant([1000., 0.])\n\nxe = unstable_softmax_cross_entropy(labels, logits)\n\nprint(xe.numpy())  # prints inf\n```\n\n由于softmax输出结果接近0，log的输出接近无限导致了计算的不稳定，我们扩展softmax并简化了计算交叉熵:\n\n```python\ndef softmax_cross_entropy(labels, logits):\n    scaled_logits = logits - tf.reduce_max(logits)\n    normalized_logits = scaled_logits - tf.reduce_logsumexp(scaled_logits)\n    return -tf.reduce_sum(labels * normalized_logits)\n\nlabels = tf.constant([0.5, 0.5])\nlogits = tf.constant([1000., 0.])\n\nxe = softmax_cross_entropy(labels, logits)\n\nprint(xe.numpy())  # prints 500.0\n```\n\n我们也证明了梯度计算的正确性:\n```python\nwith tf.GradientTape() as tape:\n    tape.watch(logits)\n    xe = softmax_cross_entropy(labels, logits)\n    \ng = tape.gradient(xe, logits)\nprint(g.numpy())  # prints [0.5, -0.5]\n```\n这就对了。\n\n必须再次提醒，在做梯度相关操作时候必须注意保证每一层梯度都在有效范围内，exp和log操作由于可以把小数变得很大，因此可能让计算变得不稳定，所以使用exp和log操作必须十分谨慎。\n","funding_links":[],"categories":["Tutorials","Others","教程","Best Practice ##","Uncategorized","其他_机器学习与深度学习"],"sub_categories":["微信群","Version 1.x ###","Uncategorized"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvahidk%2FEffectiveTensorflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvahidk%2FEffectiveTensorflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvahidk%2FEffectiveTensorflow/lists"}