Skip to content

Expose Array Metadata functions #155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 4, 2020
Merged

Conversation

pomadchin
Copy link
Contributor

@pomadchin pomadchin commented May 2, 2020

This PR is the first attempt to expose missing Array metadata functions. This piece of functionality is neccesary to allow proper reads of tiledb files written via GDAL.

I hope it aligns with your development plans, and finally this Java bindings docs page would not be empty anymore.

@pomadchin pomadchin force-pushed the feature/metadata branch 2 times, most recently from e05d41a to 8eced24 Compare May 2, 2020 21:48
@pomadchin pomadchin force-pushed the feature/metadata branch from 8eced24 to f5148cd Compare May 2, 2020 21:52
@Shelnutt2 Shelnutt2 requested a review from gsvic May 2, 2020 21:56
@Shelnutt2
Copy link
Member

@pomadchin thank you for the pull request. We appreciate your effort to add the metadata support. @gsvic will review the PR shortly.

A few questions/comments from me, in a more general sense. Most of the TileDB-Java API revolves around NativeArrays (as you've implemented here too). This stems from we don't often directly use the java api, but instead built ontop for spark, prestodb and others. Might I ask how you are using or planning to use the TileDB-Java API? Would it be of help to you if we introduced some functions to that accepted java objects directly instead of requiring you as the user to build NativeArrays?

For instance, In-addition/instead of returning a NativeArray class for the metadata fetch functions, would it be helpful to return an Object that is the native java type? There is a NativeArray.toJavaArray() function which will convert to a Java based array. The next thing to do would be to use the metadata length to return either a single value i.e. Long or if >1 value return an array []long/String/appropriate type. We likely need to return a pair <Object, class> so you can cast the object to the appropriate type.

Similar for putMetadata, would you like an overloaded function which accepts a java object (with restrictions on the type) instead of a Native array?

We'd like your feedback so we can consider the improvements to make to the API. We are in the process of updating TileDB-Java for the just released TileDB 2.0, so we will be able to incorporate your comments as we make additions/changes.

@pomadchin pomadchin force-pushed the feature/metadata branch from 7d5803c to 88b63e9 Compare May 2, 2020 22:40
@pomadchin pomadchin force-pushed the feature/metadata branch from 88b63e9 to caa69fe Compare May 2, 2020 22:41
@pomadchin
Copy link
Contributor Author

pomadchin commented May 2, 2020

Thanks for a really quick response @Shelnutt2!

I've only started to look into the TileDB, and this is my first attemp to work with it.

I'm investigating how well TileDB will work for us as a replacement for the NetCDF format. I'm pretty sure it would be more efficient (:D everything is probably more eficient than NetCDF), however I would like to get some exact relative numbers.

In general, the idea is to compare the "windowed read" performance of JVM GeoTiff / NetCDF readers vs TileDB and vs GDAL (GeoTiff / NetCDF / TileDB) interface (mb it makes not a lot of sense to use TileDB JNI directly and we can just rely on GDAL).

Talking about the Array Metadata API:
I used NativeArray types to keep the API simple and to try to keep the current API design as it is. The NativeArray contains information about the underlying native data type wich is pretty convenient => it allows a bit more elegant work with types. Basing on the NativeArray nativeType it is possible to cast it into the Java Array of an appropriate type in a generic fashion on the user side.

Hm, mb "Java types" overloads can accept as an input some generic type parameter to "hide" the cast from the user? In this case it could indeed simplify and beatify the API.

I use it from Scala, so sorry for my a bit "Scala" vision of the API and I can understand that Objects in the return types is a common thing in Java.

Let me show some examples of how it is possible to work with the API now:

// in Java that would be just a static cast into short[]
// (short[]) query.getBuffer("TDB_VALUES")
query.getBuffer("TDB_VALUES").asInstanceOf[Array[Short]] //> looks a bit ugly

// the cast can be hidden
// in Java it can be a generic method as well
// short buff = query.getBuffer("TDB_VALUES")
query.getBuffer[Short]("TDB_VALUES")

// metadata retrieval
val md = array.getMetadata("_gdal", Datatype.TILEDB_UINT8)
val str = 
  md
    .getSecond()
    .toJavaArray()
    .asInstanceOf[Array[Short]]
    .map(_.toChar)
    .mkString

// which could look like 
val str = 
  array
    .getMetadata[Short]("_gdal", Datatype.TILEDB_UINT8)
    .getSecond()
    .map(_.toChar)
    .mkString

// metadata put
array.putMetadata("key", new NativeArray(
  ctx,
  new int[] {
    0, 1, 2, 3, 4, 5, 6, 7,
    8, 9, 10, 11, 12, 13, 14, 15
  },
  Integer.class
))

// which could be just
array.putMetadata("key", new int[] {
  0, 1, 2, 3, 4, 5, 6, 7,
  8, 9, 10, 11, 12, 13, 14, 15
})

// subarray definition example
val subarray = new NativeArray(ctx, Array(zMin, zMax, rowMin, ySize - 1, colMin, xSize - 1), Datatype.TILEDB_UINT64)

// which could look just as 
val subarray = Array(zMin, zMax, rowMin, ySize - 1, colMin, xSize - 1)

query
  .setLayout(TILEDB_ROW_MAJOR)
  .setSubarray(subarray)
  .setBuffer("TDB_VALUES", data)

Having such kind of overloads can make user API a bit better indeed (even without these exposed generics).
However, It could be really convenient to have all the logic around "casting" incapuslated, so on the user side it would be possible to avoid explicit casts and just rely on type parameters.

The current NativeArray methods can still be useful for some not trivial user logic (as I already mentioned above ~ a user logic to cast NativeArray).

Hopefully my feedback is not too ambigous.

@pomadchin pomadchin force-pushed the feature/metadata branch 7 times, most recently from d0c4e30 to 17fd5d5 Compare May 3, 2020 01:19
@pomadchin pomadchin force-pushed the feature/metadata branch from 17fd5d5 to c02c823 Compare May 3, 2020 01:34
* @param value the metadata to put into the Array metadata
* @throws TileDBError A TileDB exception
*/
public void putMetadata(String key, NativeArray value) throws TileDBError {
Copy link
Contributor Author

@pomadchin pomadchin May 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering, should the following overload be a part of this PR

public void putMetadata(String key, Object buffer, Class javaType) throws TileDBError {
  putMetadata(key, new NativeArray(ctx, buffer, javaType));
}

Or it can / should be discussed in a separate PR / issue in case this one would pass.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pomadchin let's look into this as a followup PR. We wanted to initial changes merged, as they are a good foundation to build upon.

For the overloads, is this something you are interested in contributing? If not, @gsvic can add them, we just want to coordinate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! #156

@normanb
Copy link

normanb commented May 3, 2020

Hi @pomadchin. Thank you for your contributions to the Java API. The _gdal metadata item stores the GDAL aux.xml file as you have seen. We are in the process of releasing an update to the GDAL TileDB driver that implements https://gdal.org/user/multidim_raster_data_model.html as well if that is of interest. The direct TileDB apis will be faster and easier to use though. Thank you for your inputs!

@pomadchin
Copy link
Contributor Author

@normanb Wow, thanks for sharing that! It sounds like the GDAL 3.2 feature?

Copy link
Contributor

@gsvic gsvic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @pomadchin for your effort, that's great work!

@gsvic gsvic merged commit dc80fec into TileDB-Inc:master May 4, 2020
@Shelnutt2
Copy link
Member

@pomadchin Thanks again for the PR. If you find anything else you'd like improved or changed in the TileDB-Java API please don't hesitate to open an issue. You can also post on the forums if you have more general questions or drop us an email at [email protected]

@pomadchin
Copy link
Contributor Author

Thanks for merging it in! Didn't know about forums (was actually trying to find smth like that before making this PR).

@pomadchin pomadchin deleted the feature/metadata branch May 4, 2020 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants