The package contains Python (3.9+) modules for reading and parsing Microsoft OneNote files (.one
sections and .onetoc2
notebooks).
All OneNote parser code is in ONE directory and its subdirectories.
Command line applications are provided to invoke the parser.
The following command line applications are provided:
parse1note.py just loads an OneNote file and dumps all its structures to the log file.
1note2xml.py command line application generates an XML file from the provided OneNote section or notebook file.
1note2json.py command line application generates a JSON file from the provided OneNote section or notebook file.
versions2git.sh Bash command shell script converts the series of revision directories
(generated by --all-revisions --output-directory <directory>
option) into a Git repository branch.
parse1note.py
application is invoked with the following command line:
python parse1note.py <OneNote filename> [common options] [--raw]
The following option is specific to parse1note.py
only:
--raw
(-w
)
- Dump raw structures to the log file. The default is to dump pretty decoded attributes and objects.
1note2xml.py application is invoked with the following command line:
python 1note2xml.py <OneNote filename> [common options]
1note2json.py application is invoked with the following command line:
python 1note2json.py <OneNote filename> [common options]
versions2git.sh Bash command shell script is invoked with the following command line:
versions2git.sh <versions directory> <Git repository root> <branch name>
versions directory
is a directory created by 1note2xml.py
or 1note2json.py
with
--all-revisions --output-directory <directory>
option.
versions2git.sh
script will create a new separate (starting from its own initial commit) branch <branch name>
in
<Git repository root>
. The branch should not already exist in the repository.
These common options don't apply to versions2git.sh
script.
--log <log filename>
(-L <log filename>
) options gives the file name to write the parser log.
--output <filename>
(-O <filename
)
- the file name to write the XML or JSON file.
The file will contain the most current revision of all pages stored in the source OneNote file.
To produce a complete file with all revisions, add
--all-revisions
command line option.
--all-revisions
(-A
)
- include all page revisions to the generated file, not just the most recent versions.
--combine-revisions <minutes>
(-c <minutes>
)
-
maximum interval from first to last revision to combine separate edits into a single version, to reduce number of insignificant revisions.
Only revisions edited by same author are combined. By default, revisions are not combined.
If the option is provided without
<minutes>
specifier, it means 600 minutes (10 hours), which covers a single workday.
--include-oids
(-o
)
- tag all structures with object IDs (extended GUIDs) in the generated files. It allows to match the generated elements against the raw object contents in the log file. It's only useful for debugging OneNote file structure.
--list-revisions
option generates a list of revisions of this OneNote section file to the standard output.
--verbose <verbosity>
(-v <verbosity>
) sets the level of data issued into the generated XML and JSON files.
The following verbosity levels are defined:
0
- only objects and attributes relevant for content and history parsing.
Rich text objects are converted from separate text run index and style arrays, and the text string,
to a single array of text run elements. Empty text objects and outlines are dropped.
1
- only objects and attributes relevant for content and history parsing.
Rich text objects are left as is.
2
- page layout attributes are included.
3
- some extra author and timestamp attributes included.
4
- all objects and attributes, except for those with undocumented IDs.
5
- all objects and attributes, including those with undocumented IDs.
--output-directory <directory>
(-R <directory>
)
-
A directory name for writing all pages of the OneNote section (a
.one
file) as separate.xml
or.json
files per page in the given directory. Each page file is named according to its persistent GUID. This option is not applicable toparse1note.py
, and cannot be used for.onetoc2
file. The directory also containsindex.txt
file, which lists all pages by filename and their titles. Lines in theindex.txt
file are indented with TAB characters according to the page level.Conflict pages will be generated as separate files.
The program gives a warning if the directory exists and not empty.
By default, only the most recent version is written. To save all versions, add
--all-revisions
command line option. The versions will be saved as separate directories, named with the version timestamp.If
--all-revisions
command line option is present, the root of the output directory will containversions.txt
file, which describes all version metadata.
--timestamp <revision timestamp>
(-T <timestamp>
)
- selects a specific revision snapshot to write to the directory specified by
--output-directory
(-R
), or as a single JSON or XML file specified by--output
(-O
) option.<revision timestamp>
values can be obtained from the list produced by--list-revisions
command line option. By default, in absence of--all-revisions
option, the most recent revision snapshot is generated.
--incremental
(-i
) option modifies --output-directory
with --all-revisions
behavior,
making it to write only modified files to the version directories.
Without this option, each version directory contains the full snapshot of the whole OneNote section.