Improve logging (logs.json) with SQLite

tl;dr: SQLite will replace logs.json

## Our current implementation

We use a [Logger](https://github.com/Cadene/bootstrap.pytorch/blob/master/bootstrap/lib/logger.py) object that stores data as lists of "values" associated to "keys" in a python [dictionary](https://github.com/Cadene/bootstrap.pytorch/blob/master/bootstrap/lib/logger.py#L75). This dictionary is stored in RAM. At the end of a [train epoch](https://github.com/Cadene/bootstrap.pytorch/blob/master/bootstrap/engines/engine.py#L230) or [eval epoch](https://github.com/Cadene/bootstrap.pytorch/blob/master/bootstrap/engines/engine.py#L319), Logger [creates/flushes](https://github.com/Cadene/bootstrap.pytorch/blob/master/bootstrap/lib/logger.py#L233) a `logs.json` file in the experiment directory.

```
logs/myexperiment/logs.json
```

```
{
  'train_epoch.epoch': [0, 1, 2, 3, 4, 5],
  'train_epoch.acc_top1': [0.0, 5.7, 13.8, 20.4, 28.1, 37.9]
}
```

## Its problems

- If the code crashes before a flush, the data is lost and we want to use Logger to monitor stuff such as CPU memory usage or GPU memory usage before a crash!
- We need to write the full json files each time a new value has been added.
- We need to load the full json files each time a new value has been added to visualize stuff.

## Our constraints

- We want to keep our logs in the experiment directory (no SQL/NoSQL datasets, SQLite maybe?).
- We want to write new values only (For instance, we only write values of epoch 10 at epoch 10).
- We want concurrent reads and writes (at least in differrent keys).

## Some propositions

The following tools store the data on the file system (not in RAM).

### H5PY one file

[see](http://docs.h5py.org/en/stable/)

```
logs/myexperiment/logs.h5py
```

Pros:
- Use numpy
- Easy to access `data['train_epoch.epoch'][10]`

Cons:
- Extendible datasets (when you don't specify the number of size) seems to need "resize" [see](https://stackoverflow.com/questions/16213525/updating-h5py-datasets).
- We encountered a lot of bugs in the past due to HDF5 when we multi-thread/multi-process reading or writing


### LMDB

[see](https://github.com/Cadene/recipe1m.bootstrap.pytorch/blob/master/recipe1m/datasets/recipe1m.py#L79)

```
logs/myexperiment/logs/train_epoch.epoch.lmdb
```

Pros:
- 

Cons:
- Cumbersome to use


### netCDF

[see](https://unidata.github.io/netcdf4-python/netCDF4/index.html)

```
logs/myexperiment/logs.nc
```


### One CSV per key / or binary file

```
logs/myexperiment/logs/train_epoch.epoch.csv
```

Pros:
- Very easy to understand, and track

Cons:
- Creates one file per tracked variable
- Associating different variables for the same time step requires reading different files and aligning them
- Difficult to implement (reinvent the wheel)


### SQLite

[see](https://sebastianraschka.com/Articles/2014_sqlite_in_python_tutorial.html)

```
logs/myexperiment/logs.sqlite
```

Pros:
- Can grow big enough
- Allow easy concurrent read/write
- Caching system (TODO source)
- Binary encoding
- Indexing (easy to read only what we want)
- Meta-data: timestamp, epoch_id, iteration_id
- Fault-tolerant (if crash happen) 

Cons:
- Requires library to read, user must know SQL to do custom queries/applications (We could add a wrapper over SQLite in Logger)


## Experiments comparison in SQLite

```python
databases = []
for experiment in all_experiments:
  databases.append(open...)
for experiment, database in zip(all_experiment, databases):
  for metric in list_of_metrics:
    min_metric = select... # may be already in cache
    max_metric = select... # may be already in cache
    (use it here to agglomerate in python)
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve logging (logs.json) with SQLite #23

Our current implementation

Its problems

Our constraints

Some propositions

H5PY one file

LMDB

Pros:

netCDF

One CSV per key / or binary file

SQLite

Experiments comparison in SQLite

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Improve logging (logs.json) with SQLite #23

Description

Our current implementation

Its problems

Our constraints

Some propositions

H5PY one file

LMDB

Pros:

netCDF

One CSV per key / or binary file

SQLite

Experiments comparison in SQLite

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions