monitor – Monitor Training of Neural Networks

class neural_monitor.monitor.Monitor[source]

Collects statistics and displays the results using various backends. The collected stats are stored in <root>/<model_name>/<prefix><#id> where #id is automatically assigned each time a new run starts.

The following snippet shows how to plot smoothed training losses and save images from the current iteration, and then display them every 100 iterations.

from neural_monitor import monitor as mon

# Tensorboard is turned on by default
mon.initialize(model_name='foo-model', print_freq=100, use_tensorboard=True)
...

def calculate_loss(pred, gt):
    ...
    training_loss = ...
    mon.plot('training loss', loss, smooth=.99, filter_outliers=True)

def calculate_acc(pred, gt):
    accuracy = ...
    mon.plot('training acc', accuracy, smooth=.99, filter_outliers=True)

...
for epoch in mon.iter_epoch(range(n_epochs)):
    for data in mon.iter_batch(data_loader):
        pred = net(data)
        calculate_loss(pred, gt)
        calculate_acc(pred, gt)
        mon.imwrite('input images', data['images'], latest_only=True)

    mon.dump('checkpoint.pt', {
        'model_state_dict': model.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
        ...
    }, method='torch', keep=5)  # keep only 5 latest checkpoints
...
current_folder

path to the current run.

writer

an instance of Tensorboard’s SummaryWriter when use_tensorboard is set to True.

plot_folder

path to the folder containing the collected plots.

file_folder

path to the folder containing the collected files.

image_folder

path to the folder containing the collected images.

hist_folder

path to the folder containing the collected histograms.

backup(files_or_folders: Union[str, List[str]], ignores: Union[str, List[str]] = None, includes: Union[str, List[str]] = None)[source]

saves a copy of the given files to current_folder. Accepts a str or list/tuple of file or folder names. You can backup your codes and/or config files for later use.

Parameters
  • files_or_folders – files or folders to be saved.

  • ignores – files or patterns to ignore. Default: None.

  • includes – files or patterns to include. Default: None.

Returns

None.

clear_hist_stats(key: Union[int, str, Tuple])[source]

removes the collected statistics for histogram plot of the specified key.

Parameters

key – the name of the histogram collection.

Returns

None.

clear_num_stats(key)[source]

removes the collected statistics for scalar plot of the specified key.

Parameters

key – the name of the scalar collection.

Returns

None.

dump(name: str, obj: Any, method: str = 'pickle', keep: int = - 1, **kwargs)[source]

saves the given object.

Parameters
  • name – name of the file to be saved.

  • obj – object to be saved.

  • method

    str or callable. If callable, it should be a custom method to dump object. There are 3 types of str.

    'pickle': use pickle.dump() to store object.

    'torch': use torch.save() to store object.

    'txt': use numpy.savetxt() to store object.

    Default: 'pickle'.

  • keep – the number of versions of the saved file to keep. Default: -1 (keeps only the latest version).

  • kwargs – additional keyword arguments to the underlying save function.

Returns

None.

dump_model(network, use_tensorboard=False, *args, **kwargs)[source]

saves a string representation of the given neural net.

Parameters
  • network – neural net to be saved as string representation.

  • use_tensorboard – use tensorboard to save network’s graph.

  • args – additional arguments to Tensorboard’s SummaryWriter() when use_tensorboard is True.

  • kwargs – additional keyword arguments to Tensorboard’s SummaryWriter() when ~se_tensorboard is True.

Returns

None.

dump_rep(name, obj)[source]

saves a string representation of the given object.

Parameters
  • name – name of the txt file containing the string representation.

  • obj – object to saved as string representation.

Returns

None.

property epoch: int

returns the current epoch.

Returns

_last_epoch.

flush()[source]

executes all the scheduled plots. Do not call this if using Monitor’s context manager mode.

Returns

None.

hist(name, value: Union[torch.Tensor, numpy.ndarray], n_bins: int = 20, latest_only: bool = False, **kwargs)[source]

schedules a histogram plot of (a batch of) points. A matplotlib figure will be rendered and saved every print_freq iterations.

Parameters
  • name – name of the figure to be saved. Must be unique among plots.

  • value – any-dim tensor to be histogrammed.

  • n_bins – number of bins of the histogram.

  • latest_only – whether to save only the latest statistics or keep everything from beginning.

  • kwargs – additional options to tensorboard

Returns

None.

property hist_stats

returns the collected tensors from beginning.

Returns

_hist_since_beginning.

imwrite(name: str, value: Union[torch.Tensor, numpy.ndarray], latest_only: Optional[bool] = False, **kwargs)[source]

schedules to save images. The images will be rendered and saved every print_freq iterations. There are some assumptions about input data:

  • If the input is 'uint8' it is an 8-bit image.

  • If the input is 'float32', its values lie between 0 and 1.

  • If the input has 3 dims, the shape is [h, w, 3] or [h, w, 1].

  • If the channel dim is different from 3 or 1, it will be considered as multiple gray images.

Parameters
  • name – name of the figure to be saved. Must be unique among plots.

  • value – 2D, 3D or 4D tensor to be plotted. The expected shape is (H, W) for 2D tensor, (H, W, C) for 3D tensor and (N, C, H, W) for 4D tensor. If the number of channels is other than 3 or 1, each channel is saved as a gray image.

  • latest_only – whether to save only the latest statistics or keep everything from beginning.

  • kwargs – additional options to tensorboard.

Returns

None.

initialize(model_name: Optional[str] = None, root: Optional[str] = None, current_folder: Optional[str] = None, print_freq: Optional[int] = 1, num_iters: Optional[int] = None, prefix: Optional[str] = 'run', use_tensorboard: Optional[bool] = True, with_git: Optional[bool] = False)None[source]
Parameters
  • model_name – name of the experimented model. Default: 'my-model'.

  • root – root path to store results. Default: 'results'.

  • current_folder – the folder that the experiment is currently dump to. Note that if current_folder already exists, all the contents will be loaded. This option can be used for loading a trained model. Default: None.

  • print_freq – frequency of stdout. Default: 1.

  • num_iters – number of iterations per epoch. If not provided, it will be calculated after one epoch. Default: None.

  • prefix – a common prefix that is shared between folder names of different runs. Default: 'run'.

  • use_tensorboard – whether to use Tensorboard. Default: True.

  • with_git – whether to retrieve some Git information. Should be used only when the project is initialized with Git. Default: False.

Returns

None.

property iter: int

returns the current iteration.

Returns

_iter.

iter_batch(iterator: Iterable)Any[source]

tracks training iteration and returns the item in iterator.

Parameters

iterator – the batch iterator. For e.g., enumerator(loader).

Returns

a generator over iterator.

>>> from neuralnet_pytorch import monitor as mon
>>> mon.print_freq = 1000
>>> data_loader = ...
>>> num_epochs = 10
>>> for epoch in mon.iter_epoch(range(num_epochs)):
...     for idx, data in mon.iter_batch(enumerate(data_loader)):
...         # do something here

iter_epoch()

iter_epoch(iterator: Iterable)Any[source]

tracks training epoch and returns the item in iterator.

Parameters

iterator – the epoch iterator. For e.g., range(num_epochs).

Returns

a generator over iterator.

>>> from neuralnet_pytorch import monitor as mon
>>> mon.print_freq = 1000
>>> num_epochs = 10
>>> for epoch in mon.iter_epoch(range(mon.epoch, num_epochs))
...     # do something here

iter_batch()

load(file: str, method: str = 'pickle', version: int = - 1, **kwargs)[source]

loads from the given file.

Parameters
  • file – name of the saved file without version.

  • method

    str or callable. If callable, it should be a custom method to load object. There are 3 types of str.

    'pickle': use pickle.dump() to store object.

    'torch': use torch.save() to store object.

    'txt': use numpy.savetxt() to store object.

    Default: 'pickle'.

  • version – the version of the saved file to load. Default: -1 (loads the latest version of the saved file).

  • kwargs – additional keyword arguments to the underlying load function.

Returns

None.

property model_name: str

returns the name of the model.

Returns

_model_name.

property num_stats

returns the collected scalar statistics from beginning.

Returns

_num_since_beginning.

plot(name: str, value: Union[torch.Tensor, numpy.ndarray, float], smooth: Optional[float] = 0, filter_outliers: Optional[bool] = True, **kwargs)[source]

schedules a plot of scalar value. A matplotlib figure will be rendered and saved every print_freq iterations.

Parameters
  • name – name of the figure to be saved. Must be unique among plots.

  • value – scalar value to be plotted.

  • smooth – a value between 0 and 1 to define the smoothing window size. See smooth(). Default: 0.

  • filter_outliers – whether to filter out outliers in plot. This affects only the plot and not the raw statistics. Default: True.

  • kwargs – additional options to tensorboard.

Returns

None.

plot_matrix(name: str, value: Union[torch.Tensor, numpy.ndarray, float], labels: Union[List[str], List[List[str]]] = None, show_values: bool = False)[source]

plots the given matrix with colorbar and labels if provided.

Parameters
  • name – name of the figure to be saved. Must be unique among plots.

  • value – matrix value to be plotted.

  • labels – labels of each axis. Can be a list/tuple of strings or a nested list/tuple. Defaults: None.

  • show_values – show values of the matrix

Returns

None.

property prefix: str

returns the prefix of saved folders.

Returns

_prefix.

read_log()[source]

reads the saved log file.

Returns

contents of the log file.

reset()[source]

factory-resets the monitor object. This includes clearing all the collected data, set the iteration and epoch counters to 0, and reset the timer.

Returns

None.

scatter(name: str, value: Union[torch.Tensor, numpy.ndarray], latest_only: bool = False, **kwargs)[source]

schedules a scattor plot of (a batch of) points. A 3D matplotlib figure will be rendered and saved every print_freq iterations.

Parameters
  • name – name of the figure to be saved. Must be unique among plots.

  • value – 2D or 3D tensor to be plotted. The last dim should be 3.

  • latest_only – whether to save only the latest statistics or keep everything from beginning.

  • kwargs – additional options to tensorboard.

Returns

None.

neural_monitor.monitor.collect_tracked_variables(name=None, return_name=False)[source]

Gets tracked variable given name.

Parameters
  • name – name of the tracked variable. can be str or``list``/tuple of str``s. If ``None, all the tracked variables will be returned.

  • return_name – whether to return the names of the tracked variables.

Returns

the tracked variables.

neural_monitor.monitor.get_tracked_variables()Dict[source]

Retrieves the values of tracked variables.

Returns

a dictionary containing the values of tracked variables associated with the given names.

neural_monitor.monitor.track(name: str, x: Union[torch.Tensor, torch.nn.Module], direction: Optional[str] = None)Union[torch.Tensor, torch.nn.Module][source]

An identity function that registers hooks to track the value and gradient of the specified tensor.

Here is an example of how to track an intermediate output

from neural_monitor import track, get_tracked_variables
import nueralnet_pytorch as nnt

input = ...
conv1 = track('op', nn.Conv2d(dim, 4, 3), 'all')
conv2 = nn.Conv2d(4, 5, 3)
intermediate = conv1(input)
output = track('conv2_output', conv2(intermediate), 'all')
loss = T.sum(output ** 2)
loss.backward(retain_graph=True)
d_inter = T.autograd.grad(loss, intermediate, retain_graph=True)
d_out = T.autograd.grad(loss, output)
tracked = get_tracked_variables()

testing.assert_allclose(tracked['conv2_output'], nnt.utils.to_numpy(output))
testing.assert_allclose(np.stack(tracked['grad_conv2_output']), nnt.utils.to_numpy(d_out[0]))
testing.assert_allclose(tracked['op'], nnt.utils.to_numpy(intermediate))
for d_inter_, tracked_d_inter_ in zip(d_inter, tracked['grad_op_output']):
    testing.assert_allclose(tracked_d_inter_, nnt.utils.to_numpy(d_inter_))
Parameters
  • name – name of the tracked tensor.

  • x – tensor or module to be tracked. If module, the output of the module will be tracked.

  • direction

    there are 4 options

    None: tracks only value.

    'forward': tracks only value.

    'backward': tracks only gradient.

    'all': tracks both value and gradient.

    Default: None.

Returns

x.