Skip to content

Conversation

dinse
Copy link
Contributor

@dinse dinse commented Jun 9, 2025

Rationale for this change

As mentioned in #46728, if Arrow C++ was built debug, and PyArrow wasn't, test_gdb.py runs tests that fail.

What changes are included in this PR?

The CMAKE_BUILD_TYPE environment variable is propagated from build into PyArrow, where it's checked to skip unit tests.

Are these changes tested?

Yes. I have built PyArrow in release, debug, and relwithdebinfo and observed the new behavior. Because CMakeLists.txt was changed, I built PyArrow twice via setup.py and pip install, and checked the new function.

Are there any user-facing changes?

Devs may skip unit tests that would fail. PyArrow now has build_info() with information about the build type.

@dinse dinse requested review from AlenkaF, raulcd and rok as code owners June 9, 2025 22:31
Copy link

github-actions bot commented Jun 9, 2025

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@dinse dinse changed the title Arrow 46728 - Skip test_gdb.py tests if PyArrow wasn't built debug GH-46728: [Python] Skip test_gdb.py tests if PyArrow wasn't built debug Jun 9, 2025
Copy link

github-actions bot commented Jun 9, 2025

⚠️ GitHub issue #46728 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. Here is a bunch of suggestions, but obviously this is a good fix.

Comment on lines 101 to 103
Returns the PyArrow build type (debug, minsizerel, release,
relwithdebinfo). The default build type is release, regardless of if C++
was built in debug mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep using the summary line + longer description convention?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@@ -96,6 +96,15 @@ def is_threading_enabled() -> bool:
return libarrow_python.IsThreadingEnabled()


def get_pybuild_type() -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name is awkward and restrictive. Perhaps we want a function def build_info() -> dict that returns various pieces of information about the build and, more importantly, that we can extend in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, pa.build_info() now returns a namedtuple.


# Write out compile-time configuration constants
string(TOLOWER ${CMAKE_BUILD_TYPE} LOWERCASE_PYBUILD_TYPE)
configure_file("${PYARROW_CPP_SOURCE_DIR}/pybuild_type.h.cmake" "${PYARROW_CPP_SOURCE_DIR}/pybuild_type.h" ESCAPE_QUOTES)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make this something like config.h.cmake? We may want to add other constants there in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, there is now config.h/.cc instead of doing this in helpers, and config_internal.h.cmake.

Comment on lines 503 to 509
namespace {
const std::string kPyBuildType = PYARROW_CYTHON_BUILD_TYPE;
}

std::string GetPyBuildType() {
return kPyBuildType;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The separate constant seems a bit pointless to me? This will return a copy anyway.

Suggested change
namespace {
const std::string kPyBuildType = PYARROW_CYTHON_BUILD_TYPE;
}
std::string GetPyBuildType() {
return kPyBuildType;
}
std::string GetPyBuildType() {
return PYARROW_CYTHON_BUILD_TYPE;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// specific language governing permissions and limitations
// under the License.

#define PYARROW_CYTHON_BUILD_TYPE "@LOWERCASE_PYBUILD_TYPE@"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simply call it PYARROW_BUILD_TYPE, because this is not related to Cython.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -25,6 +25,7 @@
import pytest

import pyarrow as pa
from pyarrow.lib import get_pybuild_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to import it explicitly, just call pa.get_pybuild_type? (and make sure this is exposed in the top-level pyarrow namespace)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Exposed pa.build_info().

@@ -23,6 +23,7 @@

import pyarrow as pa
from pyarrow.lib import ArrowInvalid
from pyarrow.lib import get_pybuild_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 24, 2025
@@ -77,6 +78,27 @@ def runtime_info():
detected_simd_level=frombytes(c_info.detected_simd_level))


PythonBuildInfo = namedtuple(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

  1. rename the BuildInfo class above to CppBuildInfo
  2. rename this PythonBuildInfo class to BuildInfo
  3. add the CppBuildInfo as an attribute of BuildInfo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, did that.

@@ -94,6 +94,10 @@ def test_build_info():
assert pa.cpp_build_info.build_type in (
'debug', 'release', 'minsizerel', 'relwithdebinfo')

assert isinstance(pa.build_info(), pa.PythonBuildInfo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I realize that pa.cpp_build_info is a simple attribute, not a function, so perhaps pa.build_info should be made an attribute as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, did so.

@@ -195,6 +195,8 @@ def gdb():
def gdb_arrow(gdb):
if 'deb' not in pa.cpp_build_info.build_type:
pytest.skip("Arrow C++ debug symbols not available")
if pa.build_info().build_type != 'debug':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also allow relwithdebug as a value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone would have to try it out, but I think relwithdebug might optimize local variables away.

Copy link
Contributor Author

@dinse dinse Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With relwithdebinfo, less of the tests fail than release but several still fail.

@dinse dinse force-pushed the arrow-46728-gdb-skip branch from 2b22170 to 6990c34 Compare July 2, 2025 23:54
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! Still one issue remaining, otherwise I think we're good to go.

@@ -0,0 +1,29 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file would be suitable if we actually had a libarrow_python_config shared library, but we don't, these APIs are part of libarrow_python. Therefore, these declarations should be moved to libarrow_python.pxd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am new to Cython and was getting ambiguous overload compilation errors from both GetBuildInfos with a variety of approaches. I did this dummy namespace pxd to address naming conflicts; the docs say

"There won’t be any actual [libarrow_python_config] module at run time, but that doesn’t matter; the [libarrow_python_config].pxd file has done its job of providing an additional namespace at compile time."

Anyways, I agree it makes sense to keep things in libarrow_python if possible, and I've since found a way to do that.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, will merge if CI is green. Thank you @dinse .

@pitrou pitrou force-pushed the arrow-46728-gdb-skip branch from 6ddbb1a to f9aa2da Compare July 15, 2025 15:16
@pitrou pitrou merged commit 6c9e30b into apache:main Jul 15, 2025
12 of 13 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Jul 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants