TensorFlow|TensorFlow 2.2.0-rc0，这次更新让人惊奇！ TensorFlow2.2.0-rc0，这次更新让人

文章图片

AI编辑：我是小将

刚刚谷歌在TensorFlow 开发者峰会上发布了 TensorFlow 2.2 版，2.2版本有很多地方的更新，我觉得可能两点会让大家欣喜若狂：

1. 增加同步BN层

【TensorFlow|TensorFlow 2.2.0-rc0，这次更新让人惊奇！】同步BN层tf.keras.layers.experimental.SyncBatchNormalization，这是分布式训练的好帮手，接口和原来的BatchNormalization层类似：

tf.keras.layers.experimental.SyncBatchNormalization( axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', moving_mean_initializer='zeros', moving_variance_initializer='ones', beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None, renorm=False, renorm_clipping=None, renorm_momentum=0.99, trainable=True, adjustment=None, name=None, **kwargs )

用法如下：

strategy = tf.distribute.MirroredStrategy()with strategy.scope(): model = tf.keras.Sequential() model.add(tf.keras.layers.Dense(16))

2. Model.fit可以自定义训练和测试逻辑
Model.fit支持Model.train_step接口改写，这样我们可以实现训练的自定义逻辑，具体请看：

def train_step(self, data): """The logic for one training step. This method can be overridden to support custom training logic. This method is called by `Model._make_train_function`. This method should contain the mathemetical logic for one step of training. This typically includes the forward pass, loss calculation, backpropagation, and metric updates. Configuration details for *how* this logic is run (e.g. `tf.function` and `tf.distribute.Strategy` settings), should be left to `Model._make_train_function`, which can also be overridden. Arguments: data: A nested structure of `Tensor`s. Returns: A `dict` containing values that will be passed to `tf.keras.callbacks.CallbackList.on_train_batch_end`. Typically, the values of the `Model`'s metrics are returned. Example: `{'loss': 0.2, 'accuracy': 0.7}`. """ # These are the only transformations `Model.fit` applies to user-input # data when a `tf.data.Dataset` is provided. These utilities will be exposed # publicly. data = https://www.it610.com/article/data_adapter.expand_1d(data) x, y, sample_weight = data_adapter.unpack_x_y_sample_weight(data)with backprop.GradientTape() as tape: y_pred = self(x, training=True) loss = self.compiled_loss( y, y_pred, sample_weight, regularization_losses=self.losses) # For custom training steps, users can just write: #trainable_variables = self.trainable_variables #gradients = tape.gradient(loss, trainable_variables) #self.optimizer.apply_gradients(zip(gradients, trainable_variables)) # The _minimize call does a few extra steps unnecessary in most cases, # such as loss scaling and gradient clipping. _minimize(tape, self.optimizer, loss, self.trainable_variables)self.compiled_metrics.update_state(y, y_pred, sample_weight) return {m.name: m.result() for m in self.metrics}

其实这样带来的一个好处就是，我们就可以更加灵活使用Model.fit来训练自己的模型，当然Model还有Model.test_step和Model.predict_step来修改测试和预测逻辑，我觉得这个绝对对TFer有吸引力。
主要更新和改进如下

Replaced the scalar type for string tensors from std::string to tensorflow::tstring which is now ABI stable.
A new Profiler for TF 2 for CPU/GPU/TPU. It offers both device and host performance analysis, including input pipeline and TF Ops. Optimization advisory is provided whenever possible. Please see this tutorial for usage guidelines.
Export C++ functions to Python using pybind11 as opposed to SWIG as a part of our deprecation of swig efforts.
tf.distribute:
- Update NVIDIA NCCL to 2.5.7-1 for better performance and performance tuning. Please see nccl developer guide for more information on this.
- Support gradient allreduce in float16. See this example usage.
- Experimental support of all reduce gradient packing to allow overlapping gradient aggregation with backward path computation.
- Support added for global sync BatchNormalization by using the newly added tf.keras.layers.experimental.SyncBatchNormalization layer. This layer will sync BatchNormalization statistics every step across all replicas taking part in sync training.
- Performance improvements for GPU multi-worker distributed training using tf.distribute.experimental.MultiWorkerMirroredStrategy
tf.keras:
- You can now use custom training logic with Model.fit by overriding Model.train_step.
- Easily write state-of-the-art training loops without worrying about all of the features Model.fit handles for you (distribution strategies, callbacks, data formats, looping logic, etc)
- See the default Model.train_step for an example of what this function should look like
- Same applies for validation and inference via Model.test_step and Model.predict_step
- Model.fit major improvements:
- The SavedModel format now supports all Keras built-in layers (including metrics, preprocessing layers, and stateful RNN layers)
tf.lite:
- Enable TFLite experimental new converter by default.
XLA
- XLA now builds and works on windows. All prebuilt packages come with XLA available.
- XLA can be enabled for a tf.function with “compile or throw exception” semantics on CPU and GPU.

Breaking Changes

tf.keras:
- In tf.keras.applications the name of the "top" layer has been standardized to "predictions". This is only a problem if your code relies on the exact name of the layer.
- Huber loss function has been updated to be consistent with other Keras losses. It now computes mean over the last axis of per-sample losses before applying the reduction function.
AutoGraph no longer converts functions passed to tf.py_function, tf.py_func and tf.numpy_function.
Deprecating XLA_CPU and XLA_GPU devices with this release.
Increasing the minimum bazel version to build TF to 1.2.1 to use Bazel's cc_experimental_shared_library.

Known Caveats

MacOS binaries are not available on pypi at tensorflow-cpu project, but they are identical to the binaries in tensorflow project, since MacOS has no GPU.

Bug Fixes and Other Changes

tf.data:
- Removed autotune_algorithm from experimental optimization options.
TF Core:
- tf.constant always creates CPU tensors irrespective of the current device context.
- Eager TensorHandles maintain a list of mirrors for any copies to local or remote devices. This avoids any redundant copies due to op execution.
- For tf.Tensor & tf.Variable, .experimental_ref() is no longer experimental and is available as simply .ref().
- Support matrix inverse and solves in pfor/vectorized_map.
- Set as much partial shape as we can infer statically within the gradient impl of the gather op.
- Gradient of tf.while_loop emits StatelessWhile op if cond and body functions are stateless. This allows multiple gradients while ops to run in parallel under distribution strategy.
- Speed up GradientTape in eager mode by auto-generating list of op inputs/outputs which are unused and hence not cached for gradient functions.
- Support back_prop=False in while_v2 but mark it as deprecated.
- Improve error message when attempting to use None in data-dependent control flow.
- Add RaggedTensor.numpy().
- Update RaggedTensor.__getitem__ to preserve uniform dimensions & allow indexing into uniform dimensions.
- Update tf.expand_dims to always insert the new dimension as a non-ragged dimension.
- Update tf.embedding_lookup to use partition_strategy and max_norm when ids is ragged.
- Allow batch_dims==rank(indices) in tf.gather.
- Add support for bfloat16 in tf.print.
tf.distribute:
- Support embedding_column with variable-length input features for MultiWorkerMirroredStrategy.
tf.keras:
- Added all_reduce_sum_gradients argument to tf.keras.optimizer.Optimizer.apply_gradients. This allows custom gradient aggregation and processing aggregated gradients in custom training loop.
- Allow pathlib.Path paths for loading models via Keras API.
tf.function/AutoGraph:
- AutoGraph is now available in ReplicaContext.merge_call, Strategy.extended.update and Strategy.extended.update_non_slot.
- Experimental support for shape invariants has been enabled in tf.function. See the API docs for tf.autograph.experimental.set_loop_options for additonal info.
- AutoGraph error messages now exclude frames corresponding to APIs internal to AutoGraph.
- Improve shape inference for tf.function input arguments to unlock more Grappler optimizations in TensorFlow 2.x.
- Improve automatic control dependency management of resources by allowing resource reads to occur in parallel and synchronizing only on writes.
- Fix execution order of multiple stateful calls to experimental_run_v2 in tf.function.
- You can now iterate over RaggedTensors using a for loop inside tf.function.
tf.lite:
- Migrated the tf.lite C inference API out of experimental into lite/c.
- Add an option to disallow NNAPI CPU / partial acceleration on Android 10
- TFLite Android AARs now include the C headers and APIs are required to use TFLite from native code.
- Refactors the delegate and delegate kernel sources to allow usage in the linter.
- Limit delegated ops to actually supported ones if a device name is specified or NNAPI CPU Fallback is disabled.
- TFLite now supports tf.math.reciprocal1 op by lowering to tf.div op.
- TFLite's unpack op now supports boolean tensor inputs.
- Microcontroller and embedded code moved from experimental to main TensorFlow Lite folder
- Check for large TFLite tensors.
- Fix GPU delegate crash with C++17.
- Add 5D support to TFLite strided_slice.
- Fix error in delegation of DEPTH_TO_SPACE to NNAPI causing op not to be accelerated.
- Fix segmentation fault when running a model with LSTM nodes using NNAPI Delegate
- Fix NNAPI delegate failure when an operand for Maximum/Minimum operation is a scalar.
- Fix NNAPI delegate failure when Axis input for reduce operation is a scalar.
- Expose option to limit the number of partitions that will be delegated to NNAPI.
- If a target accelerator is specified, use its feature level to determine operations to delegate instead of SDK version.
tf.random:
- Add a fast path for default random_uniform
- random_seed documentation improvement.
- RandomBinomial broadcasts and appends the sample shape to the left rather than the right.
- Various random number generation improvements:
- Added tf.random.stateless_binomial, tf.random.stateless_gamma, tf.random.stateless_poisson
- tf.random.stateless_uniform now supports unbounded sampling of int types.
Math and Linear Algebra:
- Add tf.linalg.LinearOperatorTridiag.
- Add LinearOperatorBlockLowerTriangular
- Add broadcasting support to tf.linalg.triangular_solve#26204, tf.math.invert_permutation.
- Add tf.math.sobol_sample op.
- Add tf.math.xlog1py.
- Add tf.math.special.{dawsn,expi,fresnel_cos,fresnel_sin,spence}.
- Add a Modified Discrete Cosine Transform (MDCT) and its inverse to tf.signal.
TPU Enhancements:
- Refactor TpuClusterResolver to move shared logic to a separate pip package.
- Support configuring TPU software version from cloud tpu client.
- Allowed TPU embedding weight decay factor to be multiplied by learning rate.
XLA Support:
- Add standalone XLA AOT runtime target + relevant .cc sources to pip package.
- Add check for memory alignment to MemoryAllocation::MemoryAllocation() on 32-bit ARM. This ensures a deterministic early exit instead of a hard to debug bus error later.
- saved_model_cli aot_compile_cpu allows you to compile saved models to XLA header+object files and include them in your C++ programs.
- Enable Igamma, Igammac for XLA.
- XLA reduction emitter is deterministic when the environment variable TF_DETERMINISTIC_OPS is set.
Tracing and Debugging:
- Add source, destination name to _send traceme to allow easier debugging.
- Add traceme event to fastpathexecute.
Other:
- Fix an issue with AUC.reset_states for multi-label AUC #35852
- Fix the TF upgrade script to not delete files when there is a parsing error and the output mode is in-place.
- Move tensorflow/core:framework/*_pyclif rules to tensorflow/core/framework:*_pyclif.