logo
down
shadow

Tensorflow restore `tf.Session` saved checkpoint using `tf.train.MonitoredTrainingSession`


Tensorflow restore `tf.Session` saved checkpoint using `tf.train.MonitoredTrainingSession`

By : Bryan
Date : November 22 2020, 03:01 PM
will help you I figured it out. Apparently, tf.Saver does not restore all variables from a checkpoint. I tried restoring and saving immediately and the output was half the size.
I used tf.train.list_variables to get all variables from latest checkpoint and then converted them into tf.Variable and created a dict from them. Then I passed the dict to tf.Saver and it restored all of my variables.
code :


Share : facebook icon twitter icon
How to use tf.train.MonitoredTrainingSession to restore only certain variables

How to use tf.train.MonitoredTrainingSession to restore only certain variables


By : Atish Dharvesh
Date : March 29 2020, 07:55 AM
Hope that helps OK so as I suspected, I got what I wanted by implementing a new RefinementSessionManager class based on the existing tf.training.SessionManager. The two classes are almost identical, except I modified the prepare_session method to call the init_op regardless of whether the model was loaded from a checkpoint.
This allows me to load a list of variables from the checkpoint and initialize the remaining variables in the init_op.
code :
  def prepare_session(self, master, init_op=None, saver=None,
                  checkpoint_dir=None, wait_for_checkpoint=False,
                  max_wait_secs=7200, config=None, init_feed_dict=None,
                  init_fn=None):

    sess, is_loaded_from_checkpoint = self._restore_checkpoint(
    master,
    saver,
    checkpoint_dir=checkpoint_dir,
    wait_for_checkpoint=wait_for_checkpoint,
    max_wait_secs=max_wait_secs,
    config=config)

    # [removed] if not is_loaded_from_checkpoint:
    # we still want to run any supplied initialization on models that
    # were loaded from checkpoint.

    if not is_loaded_from_checkpoint and init_op is None and not init_fn and self._local_init_op is None:
      raise RuntimeError("Model is not initialized and no init_op or "
                     "init_fn or local_init_op was given")
    if init_op is not None:
      sess.run(init_op, feed_dict=init_feed_dict)
    if init_fn:
      init_fn(sess)

    # [...]
In tensorflow,when graph is modified, how to use "MonitoredTrainingSession" to restore only part of checkpoint

In tensorflow,when graph is modified, how to use "MonitoredTrainingSession" to restore only part of checkpoint


By : sara
Date : March 29 2020, 07:55 AM
fixed the issue. Will look into that further OK, finally I figure it out.
After reading the monitored_session.py here: https://github.com/tensorflow/tensorflow/blob/4806cb0646bd21f713722bd97c0d0262c575f7e0/tensorflow/python/training/monitored_session.py, I've found the key(and very tricky) point is to change to a new empty checkpoint directory, so that the MonitoredTrainingSession will not ignore init_op or init_fn. Then you can use the following code to build your init_fn(in order to restore checkpoint) as well as scaffold:
code :
variables_to_restore = tf.contrib.framework.get_variables_to_restore(
    exclude=['XXX'])    
init_assign_op, init_feed_dict = tf.contrib.framework.assign_from_checkpoint(
    ckpt.model_checkpoint_path, variables_to_restore)
def InitAssignFn(scaffold,sess):
    sess.run(init_assign_op, init_feed_dict)

scaffold = tf.train.Scaffold(saver=tf.train.Saver(), init_fn=InitAssignFn)
with tf.train.MonitoredTrainingSession(
    checkpoint_dir=FLAGS.log_root_2,...) as mon_sess:
while not mon_sess.should_stop():
    mon_sess.run(_train_op)
tf.train.Saver failing to restore checkpoint from path

tf.train.Saver failing to restore checkpoint from path


By : Samir Vyas
Date : March 29 2020, 07:55 AM
This might help you Found a solution. Renaming the checkpoint file and its parent directories to python naming compliant names solved the issue.
I suspect that, since checkpoint files are actually code, these and their parent directories must be named with python compliant naming. This means some characters like '[' or '-' will not work (this was my issue). Haven't confirmed this is the reason however.
Is there any way to use tf.train.Checkpoint with MonitoredTrainingSession?

Is there any way to use tf.train.Checkpoint with MonitoredTrainingSession?


By : samn
Date : March 29 2020, 07:55 AM
help you fix your problem I found a weird technique, which call checkpoint.write with a temporary session to construct a graph preliminarily.
code :
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    checkpoint.save("./parameter/sample")

with tf.train.MonitoredTrainingSession() as sess:
    checkpoint.save(file_prefix="parameter/ckpt", session=sess)
How to restore variables of a particular scope from a saved checkpoint in tensorflow?

How to restore variables of a particular scope from a saved checkpoint in tensorflow?


By : JohnMD
Date : March 29 2020, 07:55 AM
may help you . Assume you have Google's model of InceptionNet in scope InceptionV1 and you want to load it except for the last layer in scope InceptionRetrained you want to retrain.
Assuming you already started retraining the last layer and you created last_layer.ckpt file by saver2.save(session, 'last_layer.ckpt'), here is how to restore the net from both checkpoints.
shadow
Privacy Policy - Terms - Contact Us © voile276.org