TensorFlow|TensorFlow 2.2.0-rc0,这次更新让人惊奇!

TensorFlow|TensorFlow 2.2.0-rc0,这次更新让人惊奇!
文章图片

AI编辑:我是小将

刚刚谷歌在TensorFlow 开发者峰会上发布了 TensorFlow 2.2 版,2.2版本有很多地方的更新,我觉得可能两点会让大家欣喜若狂:

1. 增加同步BN层

【TensorFlow|TensorFlow 2.2.0-rc0,这次更新让人惊奇!】同步BN层tf.keras.layers.experimental.SyncBatchNormalization,这是分布式训练的好帮手,接口和原来的BatchNormalization层类似:

tf.keras.layers.experimental.SyncBatchNormalization( axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', moving_mean_initializer='zeros', moving_variance_initializer='ones', beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None, renorm=False, renorm_clipping=None, renorm_momentum=0.99, trainable=True, adjustment=None, name=None, **kwargs )

用法如下:
strategy = tf.distribute.MirroredStrategy()with strategy.scope(): model = tf.keras.Sequential() model.add(tf.keras.layers.Dense(16))

2. Model.fit可以自定义训练和测试逻辑
Model.fit支持Model.train_step接口改写,这样我们可以实现训练的自定义逻辑,具体请看:
def train_step(self, data): """The logic for one training step. This method can be overridden to support custom training logic. This method is called by `Model._make_train_function`. This method should contain the mathemetical logic for one step of training. This typically includes the forward pass, loss calculation, backpropagation, and metric updates. Configuration details for *how* this logic is run (e.g. `tf.function` and `tf.distribute.Strategy` settings), should be left to `Model._make_train_function`, which can also be overridden. Arguments: data: A nested structure of `Tensor`s. Returns: A `dict` containing values that will be passed to `tf.keras.callbacks.CallbackList.on_train_batch_end`. Typically, the values of the `Model`'s metrics are returned. Example: `{'loss': 0.2, 'accuracy': 0.7}`. """ # These are the only transformations `Model.fit` applies to user-input # data when a `tf.data.Dataset` is provided. These utilities will be exposed # publicly. data = https://www.it610.com/article/data_adapter.expand_1d(data) x, y, sample_weight = data_adapter.unpack_x_y_sample_weight(data)with backprop.GradientTape() as tape: y_pred = self(x, training=True) loss = self.compiled_loss( y, y_pred, sample_weight, regularization_losses=self.losses) # For custom training steps, users can just write: #trainable_variables = self.trainable_variables #gradients = tape.gradient(loss, trainable_variables) #self.optimizer.apply_gradients(zip(gradients, trainable_variables)) # The _minimize call does a few extra steps unnecessary in most cases, # such as loss scaling and gradient clipping. _minimize(tape, self.optimizer, loss, self.trainable_variables)self.compiled_metrics.update_state(y, y_pred, sample_weight) return {m.name: m.result() for m in self.metrics}

其实这样带来的一个好处就是,我们就可以更加灵活使用Model.fit来训练自己的模型,当然Model还有Model.test_step和Model.predict_step来修改测试和预测逻辑,我觉得这个绝对对TFer有吸引力。
主要更新和改进如下
  • Replaced the scalar type for string tensors from std::string to tensorflow::tstring which is now ABI stable.
  • A new Profiler for TF 2 for CPU/GPU/TPU. It offers both device and host performance analysis, including input pipeline and TF Ops. Optimization advisory is provided whenever possible. Please see this tutorial for usage guidelines.
  • Export C++ functions to Python using pybind11 as opposed to SWIG as a part of our deprecation of swig efforts.
  • tf.distribute:
    • Update NVIDIA NCCL to 2.5.7-1 for better performance and performance tuning. Please see nccl developer guide for more information on this.
    • Support gradient allreduce in float16. See this example usage.
    • Experimental support of all reduce gradient packing to allow overlapping gradient aggregation with backward path computation.
    • Support added for global sync BatchNormalization by using the newly added tf.keras.layers.experimental.SyncBatchNormalization layer. This layer will sync BatchNormalization statistics every step across all replicas taking part in sync training.
    • Performance improvements for GPU multi-worker distributed training using tf.distribute.experimental.MultiWorkerMirroredStrategy
  • tf.keras:
    • You can now use custom training logic with Model.fit by overriding Model.train_step.
    • Easily write state-of-the-art training loops without worrying about all of the features Model.fit handles for you (distribution strategies, callbacks, data formats, looping logic, etc)
    • See the default Model.train_step for an example of what this function should look like
    • Same applies for validation and inference via Model.test_step and Model.predict_step
    • Model.fit major improvements:
    • The SavedModel format now supports all Keras built-in layers (including metrics, preprocessing layers, and stateful RNN layers)
  • tf.lite:
    • Enable TFLite experimental new converter by default.
  • XLA
    • XLA now builds and works on windows. All prebuilt packages come with XLA available.
    • XLA can be enabled for a tf.function with “compile or throw exception” semantics on CPU and GPU.
Breaking Changes
  • tf.keras:
    • In tf.keras.applications the name of the "top" layer has been standardized to "predictions". This is only a problem if your code relies on the exact name of the layer.
    • Huber loss function has been updated to be consistent with other Keras losses. It now computes mean over the last axis of per-sample losses before applying the reduction function.
  • AutoGraph no longer converts functions passed to tf.py_function, tf.py_func and tf.numpy_function.
  • Deprecating XLA_CPU and XLA_GPU devices with this release.
  • Increasing the minimum bazel version to build TF to 1.2.1 to use Bazel's cc_experimental_shared_library.
Known Caveats
  • MacOS binaries are not available on pypi at tensorflow-cpu project, but they are identical to the binaries in tensorflow project, since MacOS has no GPU.
Bug Fixes and Other Changes
  • tf.data:
    • Removed autotune_algorithm from experimental optimization options.
  • TF Core:
    • tf.constant always creates CPU tensors irrespective of the current device context.
    • Eager TensorHandles maintain a list of mirrors for any copies to local or remote devices. This avoids any redundant copies due to op execution.
    • For tf.Tensor & tf.Variable, .experimental_ref() is no longer experimental and is available as simply .ref().
    • Support matrix inverse and solves in pfor/vectorized_map.
    • Set as much partial shape as we can infer statically within the gradient impl of the gather op.
    • Gradient of tf.while_loop emits StatelessWhile op if cond and body functions are stateless. This allows multiple gradients while ops to run in parallel under distribution strategy.
    • Speed up GradientTape in eager mode by auto-generating list of op inputs/outputs which are unused and hence not cached for gradient functions.
    • Support back_prop=False in while_v2 but mark it as deprecated.
    • Improve error message when attempting to use None in data-dependent control flow.
    • Add RaggedTensor.numpy().
    • Update RaggedTensor.__getitem__ to preserve uniform dimensions & allow indexing into uniform dimensions.
    • Update tf.expand_dims to always insert the new dimension as a non-ragged dimension.
    • Update tf.embedding_lookup to use partition_strategy and max_norm when ids is ragged.
    • Allow batch_dims==rank(indices) in tf.gather.
    • Add support for bfloat16 in tf.print.
  • tf.distribute:
    • Support embedding_column with variable-length input features for MultiWorkerMirroredStrategy.
  • tf.keras:
    • Added all_reduce_sum_gradients argument to tf.keras.optimizer.Optimizer.apply_gradients. This allows custom gradient aggregation and processing aggregated gradients in custom training loop.
    • Allow pathlib.Path paths for loading models via Keras API.
  • tf.function/AutoGraph:
    • AutoGraph is now available in ReplicaContext.merge_call, Strategy.extended.update and Strategy.extended.update_non_slot.
    • Experimental support for shape invariants has been enabled in tf.function. See the API docs for tf.autograph.experimental.set_loop_options for additonal info.
    • AutoGraph error messages now exclude frames corresponding to APIs internal to AutoGraph.
    • Improve shape inference for tf.function input arguments to unlock more Grappler optimizations in TensorFlow 2.x.
    • Improve automatic control dependency management of resources by allowing resource reads to occur in parallel and synchronizing only on writes.
    • Fix execution order of multiple stateful calls to experimental_run_v2 in tf.function.
    • You can now iterate over RaggedTensors using a for loop inside tf.function.
  • tf.lite:
    • Migrated the tf.lite C inference API out of experimental into lite/c.
    • Add an option to disallow NNAPI CPU / partial acceleration on Android 10
    • TFLite Android AARs now include the C headers and APIs are required to use TFLite from native code.
    • Refactors the delegate and delegate kernel sources to allow usage in the linter.
    • Limit delegated ops to actually supported ones if a device name is specified or NNAPI CPU Fallback is disabled.
    • TFLite now supports tf.math.reciprocal1 op by lowering to tf.div op.
    • TFLite's unpack op now supports boolean tensor inputs.
    • Microcontroller and embedded code moved from experimental to main TensorFlow Lite folder
    • Check for large TFLite tensors.
    • Fix GPU delegate crash with C++17.
    • Add 5D support to TFLite strided_slice.
    • Fix error in delegation of DEPTH_TO_SPACE to NNAPI causing op not to be accelerated.
    • Fix segmentation fault when running a model with LSTM nodes using NNAPI Delegate
    • Fix NNAPI delegate failure when an operand for Maximum/Minimum operation is a scalar.
    • Fix NNAPI delegate failure when Axis input for reduce operation is a scalar.
    • Expose option to limit the number of partitions that will be delegated to NNAPI.
    • If a target accelerator is specified, use its feature level to determine operations to delegate instead of SDK version.
  • tf.random:
    • Add a fast path for default random_uniform
    • random_seed documentation improvement.
    • RandomBinomial broadcasts and appends the sample shape to the left rather than the right.
    • Various random number generation improvements:
    • Added tf.random.stateless_binomial, tf.random.stateless_gamma, tf.random.stateless_poisson
    • tf.random.stateless_uniform now supports unbounded sampling of int types.
  • Math and Linear Algebra:
    • Add tf.linalg.LinearOperatorTridiag.
    • Add LinearOperatorBlockLowerTriangular
    • Add broadcasting support to tf.linalg.triangular_solve#26204, tf.math.invert_permutation.
    • Add tf.math.sobol_sample op.
    • Add tf.math.xlog1py.
    • Add tf.math.special.{dawsn,expi,fresnel_cos,fresnel_sin,spence}.
    • Add a Modified Discrete Cosine Transform (MDCT) and its inverse to tf.signal.
  • TPU Enhancements:
    • Refactor TpuClusterResolver to move shared logic to a separate pip package.
    • Support configuring TPU software version from cloud tpu client.
    • Allowed TPU embedding weight decay factor to be multiplied by learning rate.
  • XLA Support:
    • Add standalone XLA AOT runtime target + relevant .cc sources to pip package.
    • Add check for memory alignment to MemoryAllocation::MemoryAllocation() on 32-bit ARM. This ensures a deterministic early exit instead of a hard to debug bus error later.
    • saved_model_cli aot_compile_cpu allows you to compile saved models to XLA header+object files and include them in your C++ programs.
    • Enable Igamma, Igammac for XLA.
    • XLA reduction emitter is deterministic when the environment variable TF_DETERMINISTIC_OPS is set.
  • Tracing and Debugging:
    • Add source, destination name to _send traceme to allow easier debugging.
    • Add traceme event to fastpathexecute.
  • Other:
    • Fix an issue with AUC.reset_states for multi-label AUC #35852
    • Fix the TF upgrade script to not delete files when there is a parsing error and the output mode is in-place.
    • Move tensorflow/core:framework/*_pyclif rules to tensorflow/core/framework:*_pyclif.
参考:TensorFlow release https://github.com/tensorflow/tensorflow/releases

TensorFlow|TensorFlow 2.2.0-rc0,这次更新让人惊奇!
文章图片

TensorFlow|TensorFlow 2.2.0-rc0,这次更新让人惊奇!
文章图片


    推荐阅读