Skip to content

alegrigoriev/onenote2xml

Repository files navigation

py1note package

The package contains Python (3.9+) modules for reading and parsing Microsoft OneNote files (.one sections and .onetoc2 notebooks).

Contents

All OneNote parser code is in ONE directory and its subdirectories.

Command line applications are provided to invoke the parser.

Command line applications

The following command line applications are provided:

parse1note.py just loads an OneNote file and dumps all its structures to the log file.

1note2xml.py command line application generates an XML file from the provided OneNote section or notebook file.

1note2json.py command line application generates a JSON file from the provided OneNote section or notebook file.

versions2git.sh Bash command shell script converts the series of revision directories (generated by --all-revisions --output-directory <directory> option) into a Git repository branch.

parse1note.py{#parse1note}

parse1note.py application is invoked with the following command line:

python parse1note.py <OneNote filename> [common options] [--raw]

The following option is specific to parse1note.py only:

--raw (-w)

  • Dump raw structures to the log file. The default is to dump pretty decoded attributes and objects.

1note2xml.py{#1note2xml}

1note2xml.py application is invoked with the following command line:

python 1note2xml.py <OneNote filename> [common options]

1note2json.py{#1note2json}

1note2json.py application is invoked with the following command line:

python 1note2json.py <OneNote filename> [common options]

versions2git.sh{#versions2git}

versions2git.sh Bash command shell script is invoked with the following command line:

versions2git.sh <versions directory> <Git repository root> <branch name>

versions directory is a directory created by 1note2xml.py or 1note2json.py with --all-revisions --output-directory <directory> option.

versions2git.sh script will create a new separate (starting from its own initial commit) branch <branch name> in <Git repository root>. The branch should not already exist in the repository.

Common options

These common options don't apply to versions2git.sh script.

--log <log filename> (-L <log filename>) options gives the file name to write the parser log.

--output <filename> (-O <filename)

  • the file name to write the XML or JSON file. The file will contain the most current revision of all pages stored in the source OneNote file. To produce a complete file with all revisions, add --all-revisions command line option.

--all-revisions (-A)

  • include all page revisions to the generated file, not just the most recent versions.

--combine-revisions <minutes> (-c <minutes>)

  • maximum interval from first to last revision to combine separate edits into a single version, to reduce number of insignificant revisions.

    Only revisions edited by same author are combined. By default, revisions are not combined.

    If the option is provided without <minutes> specifier, it means 600 minutes (10 hours), which covers a single workday.

--include-oids (-o)

  • tag all structures with object IDs (extended GUIDs) in the generated files. It allows to match the generated elements against the raw object contents in the log file. It's only useful for debugging OneNote file structure.

--list-revisions option generates a list of revisions of this OneNote section file to the standard output.

--verbose <verbosity> (-v <verbosity>) sets the level of data issued into the generated XML and JSON files.

The following verbosity levels are defined:

0 - only objects and attributes relevant for content and history parsing. Rich text objects are converted from separate text run index and style arrays, and the text string, to a single array of text run elements. Empty text objects and outlines are dropped.
1 - only objects and attributes relevant for content and history parsing. Rich text objects are left as is.
2 - page layout attributes are included.
3 - some extra author and timestamp attributes included.
4 - all objects and attributes, except for those with undocumented IDs.
5 - all objects and attributes, including those with undocumented IDs.

--output-directory <directory> (-R <directory>)

  • A directory name for writing all pages of the OneNote section (a .one file) as separate .xml or .json files per page in the given directory. Each page file is named according to its persistent GUID. This option is not applicable to parse1note.py, and cannot be used for .onetoc2 file. The directory also contains index.txt file, which lists all pages by filename and their titles. Lines in the index.txt file are indented with TAB characters according to the page level.

    Conflict pages will be generated as separate files.

    The program gives a warning if the directory exists and not empty.

    By default, only the most recent version is written. To save all versions, add --all-revisions command line option. The versions will be saved as separate directories, named with the version timestamp.

    If --all-revisions command line option is present, the root of the output directory will contain versions.txt file, which describes all version metadata.

--timestamp <revision timestamp> (-T <timestamp>)

  • selects a specific revision snapshot to write to the directory specified by --output-directory (-R), or as a single JSON or XML file specified by --output (-O) option. <revision timestamp> values can be obtained from the list produced by --list-revisions command line option. By default, in absence of --all-revisions option, the most recent revision snapshot is generated.

--incremental (-i) option modifies --output-directory with --all-revisions behavior, making it to write only modified files to the version directories. Without this option, each version directory contains the full snapshot of the whole OneNote section.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published