Skip to content

Conversation

@gboeing
Copy link
Owner

@gboeing gboeing commented Aug 14, 2025

This PR proposes a new module to read data from PBF files. It uses the osmium library to read PBF data and provides a familiar graph_from_pbf function for users. This is much faster than using the Overpass API and allows users to download data extracts locally for loading.

Initially I felt like we shouldn't provide any data filtering, and just load whatever data is in the user's desired PBF file. But we can maybe provide some simple built in filtering, with the expectation that the user who needs richer filtering should just pre-filter their PBF file as they wish.

Here's a simple usage example, using Geofabrik's DC extract:

import osmnx as ox
filepath = "data/dc.osm.pbf"

# load PBF ways into graph with no filtering
G = ox.pbf.graph_from_pbf(filepath)
print(len(G))

# with simple filtering: only ways with highway=primary
tags = {"highway": ["primary"]}
G = ox.pbf.graph_from_pbf(filepath, tags)
print(len(G))

Or you can load the state of Oregon's highway network in ~8 seconds:

filepath = "./data/oregon-latest.osm.pbf"
tags = {"highway": ["motorway", "motorway_link", "trunk", "trunk_link"]}
G = ox.pbf.graph_from_pbf(filepath, tags)

Or you can load all of Australia's highway network in ~45 seconds:

filepath = "./data/australia-latest.osm.pbf"
tags = {"highway": ["motorway", "motorway_link", "trunk", "trunk_link"]}
G = ox.pbf.graph_from_pbf(filepath, tags)

More simple filtering examples:

filepath = "data/dc.osm.pbf"
tags = ["highway"]
G = ox.pbf.graph_from_pbf(filepath, tags)
print(len(G))

tags = {"railway": ["subway"], "highway": ["primary"]}
G = ox.pbf.graph_from_pbf(filepath, tags)
print(len(G))

tags = {"railway": ["subway"], "highway": ["primary", "secondary"]}
G = ox.pbf.graph_from_pbf(filepath, tags)
print(len(G))

tags = {"railway": ["subway"]}
G = ox.pbf.graph_from_pbf(filepath, tags)
print(len(G))

tags = ["railway", "highway"]
G = ox.pbf.graph_from_pbf(filepath, tags)
print(len(G))

tags = {"highway": ["motorway", "motorway_link", "trunk", "trunk_link", "primary", "secondary"]}
G = ox.pbf.graph_from_pbf(filepath, tags)
print(len(G))

Gp = ox.projection.project_graph(G)
gdf_nodes, gdf_edges = ox.convert.graph_to_gdfs(Gp)
fig, ax = ox.plot.plot_graph(Gp, node_size=2)

I've tested projection, plotting, converting, etc and all seems to work fine with the graphs we've created from the PBF files.

You can also create a graph from Overpass with OSMnx, save to an OSM XML file, convert that to a PBF file, then load it back into OSMnx (as proof of concept only... this isn't a good workflow):

from pathlib import Path
import osmium
fp_xml = Path("./data/graph.osm")
fp_pbf = Path("./data/graph.pbf")
ox.settings.all_oneway = True
G = ox.graph.graph_from_address("Piedmont, California, USA", dist=300, network_type="drive", simplify=False)
ox.io.save_graph_xml(G, fp_xml)
if fp_pbf.is_file():
    fp_pbf.unlink()
with osmium.SimpleWriter(fp_pbf) as writer:
    for obj in osmium.FileProcessor(fp_xml):
        writer.add(obj)
filepath = "./data/graph.pbf"
G = ox.pbf.graph_from_pbf(filepath)
len(G)

Any comments, feedback, or testing would be much appreciated.

@codecov
Copy link

codecov bot commented Aug 15, 2025

Codecov Report

❌ Patch coverage is 17.33333% with 62 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.28%. Comparing base (4afd7dd) to head (26f8fd7).

Files with missing lines Patch % Lines
osmnx/pbf.py 16.21% 62 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1338      +/-   ##
==========================================
- Coverage   98.57%   96.28%   -2.29%     
==========================================
  Files          25       26       +1     
  Lines        2591     2666      +75     
==========================================
+ Hits         2554     2567      +13     
- Misses         37       99      +62     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@shiqin-liu
Copy link

This is an impressive progress, thanks for working on this feature!

I ran the feature locally, loading a sample Chicago metro osm .pbf, which takes about ~2.8 min. I also test for projection and routing, also works as expected!

import osmnx as ox
import time
ox.__version__
'2.1.0.dev0'

start_time = time.time()
filepath='inputdata/osm_Chicago.pbf'
G = ox.pbf.graph_from_pbf(filepath, simplify=True)
print(f"Total processing time: {time.time() - start_time:.2f} seconds")

Total processing time: 170.85 seconds

Quick thoughts/questions on data filtering, the global-indicators software does rely on the OSMnx default walk type filter to retrieve the pedestrian network, e.g. https://github.com/healthysustainablecities/global-indicators/blob/1ad567bfe93c56a0c93f6512a1c215f11f773dc7/process/subprocesses/_03_create_network_resources.py#L112 . Many users might appreciate a default filter for convenience. But with current setting, I am thinking of a scenario where we might want to get the OSMnx walk network from .pbf file, what would be the best way to handle it? does current tag parameter acts the same as a custom filter where we could pass a customized walk filter?

@carlhiggs
Copy link

carlhiggs commented Aug 19, 2025

Thanks again for your work on this @gboeing --- I agree with @shiqin-liu that capacity to use a custom filter definition (or preset typology) would be really useful. In the code @shiqin-liu linked to, we actually use a custom filter (a couple of lines above the network type), that modifies the definition to remove the cycling restriction (defined here):

'["highway"]["area"!~"yes"]["highway"!~"motor|proposed|construction|abandoned|platform|raceway"]["foot"!~"no"]["service"!~"private"]["access"!~"private"]'

So, basically, having options like currently exist for graph_from_polygon, but optionally using a pbf as the source would be amazing.

I installed this branch using UV, but it didn't seem to install the osmium module successfully by itself, I had to separately uv pip install this on my side. Not sure if that's relevant (it might be an issue on my side) but just mentioning in case there is something to be done flagging osmium as a dependency.

D:\projects\repos\osmnx-pbf

>.venv\Scripts\activate.bat

(osmnx-pbf) [---] Tue 19/08/2025 15:19:59.72
D:\projects\repos\osmnx-pbf

>python
Python 3.12.3 (tags/v3.12.3:f6650f9, Apr  9 2024, 14:05:25) [MSC v.1938 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import osmnx as ox
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\projects\repos\osmnx-pbf\.venv\Lib\site-packages\osmnx\__init__.py", line 26, in <module>
    from . import pbf as pbf
  File "D:\projects\repos\osmnx-pbf\.venv\Lib\site-packages\osmnx\pbf.py", line 13, in <module>
    import osmium
ModuleNotFoundError: No module named 'osmium'

I also had an issue when I tried using this with our example Las Palmas pbf from the global-indicators project

>>> import osmnx as ox
>>> filepath = 'data/example_las_palmas_2023_osm_20230221.pbf'
>>> G = ox.pbf.graph_from_pbf(filepath)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\projects\repos\osmnx-pbf\.venv\Lib\site-packages\osmnx\pbf.py", line 189, in graph_from_pbf
    G = graph._create_graph(response_json, bidirectional)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\projects\repos\osmnx-pbf\.venv\Lib\site-packages\osmnx\graph.py", line 663, in _create_graph
    G = distance.add_edge_lengths(G)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\projects\repos\osmnx-pbf\.venv\Lib\site-packages\osmnx\distance.py", line 229, in add_edge_lengths
    raise ValueError(msg)
ValueError: Some edges missing nodes, possibly due to input data clipping issue.
>>>

I tried with on online Las Palmas excerpt from OpenStreetMap.fr. The example in our study (3.1MB) is a bespoke clipped version, but I would have expected a more or less official excerpt to work, but it had a similar failure:

>>> filepath = 'data/las_palmas.osm.pbf'
>>> G = ox.pbf.graph_from_pbf(filepath)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\projects\repos\osmnx-pbf\.venv\Lib\site-packages\osmnx\pbf.py", line 189, in graph_from_pbf
    G = graph._create_graph(response_json, bidirectional)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\projects\repos\osmnx-pbf\.venv\Lib\site-packages\osmnx\graph.py", line 663, in _create_graph
    G = distance.add_edge_lengths(G)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\projects\repos\osmnx-pbf\.venv\Lib\site-packages\osmnx\distance.py", line 229, in add_edge_lengths
    raise ValueError(msg)
ValueError: Some edges missing nodes, possibly due to input data clipping issue.
>>> G
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'G' is not defined

So using clipped pbfs might cause some issues, even when sourced from online re-publishers of OSM excerpts. Its good that it reports the issues, although, rather than failing, it could be convenient if there were a way this could work and just warn the user about potential for 'edge cases' where missing nodes on edges near the study region boundary could not be resolved, possibly due to data clipping issues. (and still return the graph object for use, with that caveat).

I'll aim to look into this more tomorrow (eg if its possible to operationalise a custom pedestrian definition using the current functionality), but wanted to share this early feedback.

Thanks again for exploring this possibility --- it will be a very useful functionality!

Base automatically changed from v2.1 to main December 12, 2025 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants