-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
When using a pandas dataframe as the base for a Feature Store dataset, having a trailing slash in the output_path
parameter causes objects to be uploaded to s3 with a double slash in their path. This conflicts with the definition of the temp table, which uses the correct location without the double-slash.
To reproduce
Try to create a dataset with code similar to:
df = feature_store.create_dataset(
base=base_df,
output_path="s3://sagemaker-us-east-1-<aact num>/fs-output/", #<--------
event_time_identifier_feature_name="index_dt",
record_identifier_feature_name="id"
).with_feature_group(
feature_group, "id", ["my_feature_1"]
).to_dataframe()
The returned dataset is empty, regardless of whether there is a valid inner join target for rows of the base dataframe.
Expected behavior
The returned dataset includes all the values that matched the inner join. Ideally, the s3 upload will not include the trailing slash if specified.
System information
A description of your system. Please provide:
- SageMaker Python SDK version: 2.214.3
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): N/A
- Framework version: N/A
- Python version: 3.11.6
- CPU or GPU: Apple M2
- Custom Docker image (Y/N): N/A
Additional context
I'll submit a fix PR, just opening an issue to track.