Skip to content

implements Deref<Target=str>, Display, AsRef, FromStr ... #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 29, 2023

Conversation

chanced
Copy link
Contributor

@chanced chanced commented Jun 28, 2023

This implements a few nice-to-haves for the Url type, such as

  • Deref<Target=str> so that &Url can be passed anywhere that expects &str plus all methods implemented for Deref<Target=str>
  • std::convert::AsRef<str>
  • std::fmt::Display, so that formatting the Url works as expected, also provides to_string.
  • std::str::FromStr for str::parse https://doc.rust-lang.org/std/primitive.str.html#method.parse

It also adds as_str, basically as an alias for href, for convention sake.

@chanced chanced changed the title implements std::ops::Deref<Target=str>, std::fmt::Display, `AsRef… implements Deref<Target=str>, Display, AsRef, FromStr ... Jun 28, 2023
@anonrig anonrig requested review from Boshen and Brooooooklyn June 28, 2023 18:47
Copy link
Member

@anonrig anonrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Should we add any tests?

@chanced
Copy link
Contributor Author

chanced commented Jun 28, 2023

Absolutely. Sorry, I meant to that before sending it over to you!

I'm going to go ahead and implement some more traits, such as Hash, PartialEq, Eq, Clone, PartailOrd, Ord, TryFrom<String>, TryFrom<&str>

I'll impl those along with tests shortly.

@chanced
Copy link
Contributor Author

chanced commented Jun 28, 2023

I'm not sure if you're interested in a TryFrom<&[u8]> implementation but there are two possible paths that I can see:

impl TryFrom<&[u8]> for Url {
    type Error = Error;
    fn try_from(value: &[u8]) -> Result<Self, Self::Error> {
        Self::parse(&String::from_utf8_lossy(value), None)
    }
}
impl<'a> TryFrom<&'a [u8]> for Url {
    type Error = &'a [u8];
    fn try_from(value: &'a [u8]) -> Result<Self, Self::Error> {
        let s = std::str::from_utf8(value).map_err(|_| value)?;
        Self::parse(s, None).map_err(|_| value)
    }
}

@anonrig
Copy link
Member

anonrig commented Jun 28, 2023

I'm not sure if you're interested in a TryFrom<&[u8]> implementation but there are two possible paths that I can see:

I don't have any strong preference. Although, which one is faster?

@chanced
Copy link
Contributor Author

chanced commented Jun 28, 2023

In the event of a failure to convert to utf8, the first as it'll fail fast. Otherwise they should be about the same. In truth, it's probably best to omit the TryFrom<&'a [u8]> and just let folks do the conversion themselves if need be. I was just going to impl it for completeness sake but it'll mean opaque error reporting as the error from parse anticipates a String value.

Should PartialEq take into account an empty fragment / hash? By that I mean should "http://example.com#" equal "http://example.com"

@chanced
Copy link
Contributor Author

chanced commented Jun 28, 2023

Also, what about case folding on equality of the scheme and authority? I'm guessing those should be folded but I'd rather not assume.

@anonrig
Copy link
Member

anonrig commented Jun 28, 2023

In the event of a failure to convert to utf8, the first as it'll fail fast. Otherwise they should be about the same. In truth, it's probably best to omit the TryFrom<&'a [u8]> and just let folks do the conversion themselves if need be. I was just going to impl it for completeness sake but it'll mean opaque error reporting as the error from parse anticipates a String value.

I trust your word on this, and having an opaque error reporting seem like an issue we shouldn't omit.

Should PartialEq take into account an empty fragment / hash? By that I mean should "http://example.com#" equal "http://example.com"

According to WHATWG http://example.com# returns empty hash, but if you look into the href, you'll see that it is: 'http://example.com/#'. This is indeed correct because according to spec, the behavior of having a null hash vs. empty string differs from each other. One method of checking it is to use has_hash() function.

Also, what about case folding on equality of the scheme and authority? I'm guessing those should be folded but I'd rather not assume.

What do you mean? I didn't understand.

@chanced
Copy link
Contributor Author

chanced commented Jun 28, 2023

According to WHATWG http://example.com# returns empty hash, but if you look into the href, you'll see that it is: 'http://example.com/#'. This is indeed correct because according to spec, the behavior of having a null hash vs. empty string differs from each other. One method of checking it is to use has_hash() function.

Ah, thank you! I've read through the spec a while back but I forgot the details on equality of hash vs null hash. My brain has a tendency of dumping specs pretty much instantaneously.

What do you mean? I didn't understand.

I meant case comparison, e.g. "HTTP://EXAMPLE.COM" vs "http://example.com" but I just checked and ya'll already handle that! Sorry, I should have checked first.

@anonrig
Copy link
Member

anonrig commented Jun 28, 2023

I meant case comparison, e.g. "HTTP://EXAMPLE.COM" vs "http://example.com" but I just checked and ya'll already handle that! Sorry, I should have checked first.

No worries at all. It's always good to receive feedback and talk about topics we're all interested in. Yes, according to spec, all characters need to be lowercased.

@anonrig
Copy link
Member

anonrig commented Jun 28, 2023

@chanced
Copy link
Contributor Author

chanced commented Jun 28, 2023

@anonrig awesome, will do. Thanks!

        let components = unsafe { ffi::ada_get_url_components(self.url) };
        f.debug_struct("Url")
            .field("href", &self.href())
            .field("protocol_end", &components.protocol_end)
            .field("username_end", &components.username_end)
            .field("host_start", &components.host_start)
            .field("host_end", &components.host_end)
            .field("port", &components.port)
            .field("pathname_start", &components.pathname_start)
            .field("search_start", &components.search_start)
            .field("hash_start", &components.hash_start)
            .finish()

is producing the following error:

Undefined symbols for architecture arm64:
            "_ada_get_url_components", referenced from:
                _$LT$ada_url..Url$u20$as$u20$core..fmt..Debug$GT$::fmt::h9686f0fe34ebf2b0 in ada_url-2e6ddbe6b40fddcd.5eapx340rf2g18vw.rcgu.o
          ld: symbol(s) not found for architecture arm64
          clang: error: linker command failed with exit code 1 (use -v to see invocation)     

@anonrig
Copy link
Member

anonrig commented Jun 28, 2023

_ada_get_url_components

It seems we never really exposed this in C. We need to fix this in ada-url/ada and release a new version. @lemire can you help?

@chanced chanced marked this pull request as draft June 28, 2023 20:42
The traits include:

- `std::fmt::Debug`
- `std::cmp::PartialEq`
- `std::cmp::Eq`
- `std::cmp::PartialOrd`
- `std::cmp::Ord`
- `std::hash::Hash`
- `std::borrow::Borrow<str>`
- `std::borrow::Borrow<[u8]>`
- `std::convert::AsRef<[u8]>`
- `std::convert::TryFrom<&str>`
- `std::convert::TryFrom<String>`
- `std::convert::TryFrom<&String>`
@lemire
Copy link
Member

lemire commented Jun 28, 2023

ada_get_components should be available, we even have tests for it:

https://github.com/ada-url/ada/blob/c426ef667798b804b1cfab65f857dc8af8804e8d/tests/ada_c.cpp#L124

@anonrig
Copy link
Member

anonrig commented Jun 28, 2023

I suspect that this project is missing .file("./deps/ada_c.cpp").

We have tests in lib.rs file. build.rs file handles the build step: https://github.com/ada-url/rust/blob/main/build.rs

@anonrig
Copy link
Member

anonrig commented Jun 28, 2023

OK @lemire and I found the issue: Line 94 of lib.rs should have been ada_get_components. cc @chanced

@chanced
Copy link
Contributor Author

chanced commented Jun 28, 2023

Awesome, I'll get that fixed and swapped out.

The only thing I don't know how to do is clone. I haven't dealt with FFI in rust and my preliminary search on figuring it out has come up dry.

I'll keep digging but it's definitely one of the more important traits to implement for a value data structure.

@lemire
Copy link
Member

lemire commented Jun 28, 2023

GitHub's search engine is interesting. I assumed it searched all files.

Capture d’écran, le 2023-06-28 à 17 10 51

@anonrig
Copy link
Member

anonrig commented Jun 28, 2023

The only thing I don't know how to do is clone. I haven't dealt with FFI in rust and my preliminary search on figuring it out has come up dry.

Maybe @Boshen or @Brooooooklyn can help with it.

@chanced
Copy link
Contributor Author

chanced commented Jun 28, 2023

So updating the function name worked, however the output is suspect.

Input

"https://www.ada-url.com/playground"

Output

Url {
    href: "https://www.ada-url.com/playground",
    protocol_end: 1863623904,
    username_end: 1,
    host_start: 15513396,
    host_end: 1,
    port: 1013989984,
    pathname_start: 1,
    search_start: 1,
    hash_start: 0,
}

Src

impl std::fmt::Debug for Url {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        let components = unsafe { ffi::ada_get_components(self.url) };
        f.debug_struct("Url")
            .field("href", &self.href())
            .field("protocol_end", &components.protocol_end)
            .field("username_end", &components.username_end)
            .field("host_start", &components.host_start)
            .field("host_end", &components.host_end)
            .field("port", &components.port)
            .field("pathname_start", &components.pathname_start)
            .field("search_start", &components.search_start)
            .field("hash_start", &components.hash_start)
            .finish()
    }
}

Test

    #[test]
    fn spike_debug() {
        let tests = [("https://www.ada-url.com/playground")];
        for value in tests {
            let url = Url::parse(value, None).expect("Should have parsed url");
            println!("{:#?}", url);
        }
    }

@anonrig
Copy link
Member

anonrig commented Jun 28, 2023

@chanced ada_get_components returns *mut ada_url_components. The following git diff fixes the issue, but I don't know if it's the best practice in here.

--- a/src/lib.rs
+++ b/src/lib.rs
@@ -91,7 +91,7 @@ pub mod ffi {
             base: *const c_char,
             base_length: usize,
         ) -> bool;
-        pub fn ada_get_components(url: *mut ada_url) -> ada_url_components;
+        pub fn ada_get_components(url: *mut ada_url) -> *mut ada_url_components;

         // Getters
         pub fn ada_get_origin(url: *mut ada_url) -> ada_owned_string;
@@ -502,7 +502,7 @@ impl std::convert::AsRef<[u8]> for Url {

 impl std::fmt::Debug for Url {
     fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
-        let components = unsafe { ffi::ada_get_components(self.url) };
+        let components = unsafe { ffi::ada_get_components(self.url).as_ref().unwrap() };
         f.debug_struct("Url")
             .field("href", &self.href())
             .field("protocol_end", &components.protocol_end)

@chanced
Copy link
Contributor Author

chanced commented Jun 28, 2023

@anonrig

That's definitely closer! It seems that perhaps null is being swapped out for u32::MAX:

Url {
    href: "https://www.ada-url.com/playground",
    protocol_end: 6,
    username_end: 8,
    host_start: 8,
    host_end: 23,
    port: 4294967295,
    pathname_start: 23,
    search_start: 4294967295,
    hash_start: 4294967295,
}

@lemire
Copy link
Member

lemire commented Jun 28, 2023

That's definitely closer! It seems that perhaps null is being swapped out for u32::MAX:

That's the correct behaviour.

@chanced
Copy link
Contributor Author

chanced commented Jun 28, 2023

@lemire makes sense

I guess I should back up and ask if ya'll even want the positions in the debug output?

If you do, do you want me to replace the u32s with Option<u32> (only in Debug::fmt) and filter out (via None) any value that is 4294967295?

@lemire
Copy link
Member

lemire commented Jun 28, 2023

@chanced For context, here is the type...

  pub struct ada_url_components {
        pub protocol_end: u32,
        pub username_end: u32,
        pub host_start: u32,
        pub host_end: u32,
        pub port: u32,
        pub pathname_start: u32,
        pub search_start: u32,
        pub hash_start: u32,
    }

Here is the C interface...

// This is a reference to ada::url_components::omitted
// It represents "uint32_t(-1)"
#define ada_url_omitted 0xffffffff

There is no "null" value given a u32. We use 0xffffffff to represent a missing element.

Note that the parser will not let you parse a URL that would span 0xffffffff bytes.

@chanced
Copy link
Contributor Author

chanced commented Jun 28, 2023

Yea, I saw the ada_url_components type; I just didn't know if rust was swapping in a value for an otherwise null value coming from c++.

@chanced
Copy link
Contributor Author

chanced commented Jun 28, 2023

@lemire sorry, meant to mention.

@anonrig
Copy link
Member

anonrig commented Jun 28, 2023

This is by default a design choice. We have an omitted value which is equal to Uint32 max value.

@lemire
Copy link
Member

lemire commented Jun 28, 2023

@chanced

I guess I should back up and ask if ya'll even want the positions in the debug output?

I will let @anonrig answer that.

@chanced
Copy link
Contributor Author

chanced commented Jun 28, 2023

I updated Debug to check for 0xFFFFFFFF on fields which I believe can be missing.

Removing the logic and either outputting the string value or u32::MAX is easy enough. Just let me know which way you'd prefer.

Currently, the output from Debug::fmt for "https://www.ada-url.com/playground" is:

Url {
    href: "https://www.ada-url.com/playground",
    protocol_end: 6,
    host_start: 8,
    host_end: 23,
    port: None,
    username_end: Some(
        8,
    ),
    search_start: None,
    hash_start: None,
    pathname_start: Some(
        23,
    ),
}

edit: woops, fixing port.

@chanced
Copy link
Contributor Author

chanced commented Jun 28, 2023

Aside from any changes to Debug, The only thing I have left is Clone.

It's quitting time for me at the moment. I'll probably circle back to this later tonight tho. Hopefully I'll have it wrapped up either tonight or tomorrow.

@anonrig
Copy link
Member

anonrig commented Jun 28, 2023

Perfect. Thanks for doing this!

@chanced
Copy link
Contributor Author

chanced commented Jun 29, 2023

Oh it's no trouble at all. Y'all have done the heavy lifting!

:)

@chanced
Copy link
Contributor Author

chanced commented Jun 29, 2023

I am absolutely not certain about this but there may need to be some sort of copy api endpoint on the C++ side of things to implement clone.

I'll look a bit more; it seems like cxx may be able to pull off cloning so it could be possible.

Anyway, here's a list of traits implemented and a note as to why:

Trait(s) Description
Display Provides to_string and allows for the value to be used in format! macros (e.g. println!).
Debug Allows debugger output in format macros, ({:?} syntax)
PartialEq, Eq Allows for comparison, url1 == url2, url1.eq(url2)
PartialOrd, Ord Allows for ordering url1 < url2, done so alphabetically. This is also allows Url to be used as a key in a BTreeMap
Hash Makes it so that Url can be hashed based on the string representation. This is important so that Url can be used as a key in a HashMap
FromStr Allows for use with str's parse method
TryFrom<String>, TryFrom<&str> Provides try_into methods for String and &str
Borrow<str>, Borrow<[u8]> Used in some crates such as qp-trie so that the Url can be used as a key in a qp trie.
Deref<Target=str> Allows for &Url to dereference as a &str. Also provides a number of string methods
AsRef<[u8]>, AsRef<str> Used to do a cheap reference-to-reference conversion.

@anonrig
Copy link
Member

anonrig commented Jun 29, 2023

Can you update README with this amazing table? It would be beneficial to have a supported traits section.

@anonrig
Copy link
Member

anonrig commented Jun 29, 2023

I am absolutely not certain about this but there may need to be some sort of copy api endpoint on the C++ side of things to implement clone.

We can implement Clone in a different PR, no rush.

@chanced
Copy link
Contributor Author

chanced commented Jun 29, 2023

Can you update README with this amazing table? It would be beneficial to have a supported traits section.

@anonrig absolutely

@chanced chanced marked this pull request as ready for review June 29, 2023 22:07
@anonrig anonrig merged commit 8f14071 into ada-url:main Jun 29, 2023
@anonrig
Copy link
Member

anonrig commented Jun 29, 2023

Thank you. I’ll release a new version soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants