Struct widestring::ustr::U32Str
source · pub struct U32Str { /* private fields */ }
Expand description
32-bit wide string slice with undefined encoding.
U32Str
is to U32String
as OsStr
is to
OsString
.
U32Str
are string slices that do not have a defined encoding. While it is sometimes
assumed that they contain possibly invalid or ill-formed UTF-32 data, they may be used for
any wide encoded string. This is because U32Str
is intended to be used with FFI
functions, where proper encoding cannot be guaranteed. If you need string slices that are
always valid UTF-32 strings, use Utf32Str
instead.
Because U32Str
does not have a defined encoding, no restrictions are placed on mutating
or indexing the slice. This means that even if the string contained properly encoded UTF-32
or other encoding data, mutationing or indexing may result in malformed data. Convert to a
Utf32Str
if retaining proper UTF-32 encoding is desired.
§FFI considerations
U32Str
is not aware of nul values and may or may not be nul-terminated. It is intended
to be used with FFI functions that directly use string length, where the strings are known
to have proper nul-termination already, or where strings are merely being passed through
without modification.
U32CStr
should be used instead if nul-aware strings are required.
§Examples
The easiest way to use U32Str
outside of FFI is with the u32str!
macro to convert string literals into UTF-32 string slices at compile time:
use widestring::u32str;
let hello = u32str!("Hello, world!");
You can also convert any u32
slice directly:
use widestring::{u32str, U32Str};
let sparkle_heart = [0x1f496];
let sparkle_heart = U32Str::from_slice(&sparkle_heart);
assert_eq!(u32str!("💖"), sparkle_heart);
// This UTf-16 surrogate is invalid UTF-32, but is perfectly valid in U32Str
let malformed_utf32 = [0x0, 0xd83d]; // Note that nul values are also valid an untouched
let s = U32Str::from_slice(&malformed_utf32);
assert_eq!(s.len(), 2);
When working with a FFI, it is useful to create a U32Str
from a pointer and a length:
use widestring::{u32str, U32Str};
let sparkle_heart = [0x1f496];
let sparkle_heart = unsafe {
U32Str::from_ptr(sparkle_heart.as_ptr(), sparkle_heart.len())
};
assert_eq!(u32str!("💖"), sparkle_heart);
Implementations§
source§impl U32Str
impl U32Str
sourcepub unsafe fn from_ptr<'a>(p: *const u32, len: usize) -> &'a Self
pub unsafe fn from_ptr<'a>(p: *const u32, len: usize) -> &'a Self
Constructs a wide string slice from a pointer and a length.
The len
argument is the number of elements, not the number of bytes. No
copying or allocation is performed, the resulting value is a direct reference to the
pointer bytes.
§Safety
This function is unsafe as there is no guarantee that the given pointer is valid for
len
elements.
In addition, the data must meet the safety conditions of
std::slice::from_raw_parts. In particular, the returned string reference must not
be mutated for the duration of lifetime 'a
, except inside an
UnsafeCell
.
§Panics
This function panics if p
is null.
§Caveat
The lifetime for the returned string is inferred from its usage. To prevent accidental misuse, it’s suggested to tie the lifetime to whichever source lifetime is safe in the context, such as by providing a helper function taking the lifetime of a host value for the string, or by explicit annotation.
sourcepub unsafe fn from_ptr_mut<'a>(p: *mut u32, len: usize) -> &'a mut Self
pub unsafe fn from_ptr_mut<'a>(p: *mut u32, len: usize) -> &'a mut Self
Constructs a mutable wide string slice from a mutable pointer and a length.
The len
argument is the number of elements, not the number of bytes. No
copying or allocation is performed, the resulting value is a direct reference to the
pointer bytes.
§Safety
This function is unsafe as there is no guarantee that the given pointer is valid for
len
elements.
In addition, the data must meet the safety conditions of std::slice::from_raw_parts_mut.
§Panics
This function panics if p
is null.
§Caveat
The lifetime for the returned string is inferred from its usage. To prevent accidental misuse, it’s suggested to tie the lifetime to whichever source lifetime is safe in the context, such as by providing a helper function taking the lifetime of a host value for the string, or by explicit annotation.
sourcepub const fn from_slice(slice: &[u32]) -> &Self
pub const fn from_slice(slice: &[u32]) -> &Self
Constructs a wide string slice from a slice of character data.
No checks are performed on the slice. It may be of any encoding and may contain invalid or malformed data for that encoding.
sourcepub fn from_slice_mut(slice: &mut [u32]) -> &mut Self
pub fn from_slice_mut(slice: &mut [u32]) -> &mut Self
Constructs a mutable wide string slice from a mutable slice of character data.
No checks are performed on the slice. It may be of any encoding and may contain invalid or malformed data for that encoding.
sourcepub fn to_ustring(&self) -> U32String
pub fn to_ustring(&self) -> U32String
Copies the string reference to a new owned wide string.
sourcepub const fn as_slice(&self) -> &[u32]
pub const fn as_slice(&self) -> &[u32]
Converts to a slice of the underlying elements of the string.
sourcepub fn as_mut_slice(&mut self) -> &mut [u32]
pub fn as_mut_slice(&mut self) -> &mut [u32]
Converts to a mutable slice of the underlying elements of the string.
sourcepub const fn as_ptr(&self) -> *const u32
pub const fn as_ptr(&self) -> *const u32
Returns a raw pointer to the string.
The caller must ensure that the string outlives the pointer this function returns, or else it will end up pointing to garbage.
The caller must also ensure that the memory the pointer (non-transitively) points to
is never written to (except inside an UnsafeCell
) using this pointer or any
pointer derived from it. If you need to mutate the contents of the string, use
as_mut_ptr
.
Modifying the container referenced by this string may cause its buffer to be reallocated, which would also make any pointers to it invalid.
sourcepub fn as_mut_ptr(&mut self) -> *mut u32
pub fn as_mut_ptr(&mut self) -> *mut u32
Returns an unsafe mutable raw pointer to the string.
The caller must ensure that the string outlives the pointer this function returns, or else it will end up pointing to garbage.
Modifying the container referenced by this string may cause its buffer to be reallocated, which would also make any pointers to it invalid.
sourcepub fn as_ptr_range(&self) -> Range<*const u32>
pub fn as_ptr_range(&self) -> Range<*const u32>
Returns the two raw pointers spanning the string slice.
The returned range is half-open, which means that the end pointer points one past the last element of the slice. This way, an empty slice is represented by two equal pointers, and the difference between the two pointers represents the size of the slice.
See as_ptr
for warnings on using these pointers. The end pointer
requires extra caution, as it does not point to a valid element in the slice.
This function is useful for interacting with foreign interfaces which use two pointers to refer to a range of elements in memory, as is common in C++.
sourcepub fn as_mut_ptr_range(&mut self) -> Range<*mut u32>
pub fn as_mut_ptr_range(&mut self) -> Range<*mut u32>
Returns the two unsafe mutable pointers spanning the string slice.
The returned range is half-open, which means that the end pointer points one past the last element of the slice. This way, an empty slice is represented by two equal pointers, and the difference between the two pointers represents the size of the slice.
See as_mut_ptr
for warnings on using these pointers. The end
pointer requires extra caution, as it does not point to a valid element in the
slice.
This function is useful for interacting with foreign interfaces which use two pointers to refer to a range of elements in memory, as is common in C++.
sourcepub const fn len(&self) -> usize
pub const fn len(&self) -> usize
Returns the length of the string as number of elements (not number of bytes).
sourcepub fn into_ustring(self: Box<Self>) -> U32String
pub fn into_ustring(self: Box<Self>) -> U32String
Converts a boxed wide string slice into an owned wide string without copying or allocating.
sourcepub fn display(&self) -> Display<'_, U32Str>
pub fn display(&self) -> Display<'_, U32Str>
Returns an object that implements Display
for printing
strings that may contain non-Unicode data.
This method assumes this string is intended to be UTF-32 encoding, but handles
ill-formed UTF-32 sequences lossily. The returned struct implements
the Display
trait in a way that decoding the string is lossy
UTF-32 decoding but no heap allocations are performed, such as by
to_string_lossy
.
By default, invalid Unicode data is replaced with
U+FFFD REPLACEMENT CHARACTER
(�). If you wish
to simply skip any invalid Uncode data and forego the replacement, you may use the
alternate formatting with {:#}
.
§Examples
Basic usage:
use widestring::U32Str;
// 𝄞mus<invalid>ic<invalid>
let s = U32Str::from_slice(&[
0x1d11e, 0x006d, 0x0075, 0x0073, 0xDD1E, 0x0069, 0x0063, 0xD834,
]);
assert_eq!(format!("{}", s.display()),
"𝄞mus�ic�"
);
Using alternate formatting style to skip invalid values entirely:
use widestring::U32Str;
// 𝄞mus<invalid>ic<invalid>
let s = U32Str::from_slice(&[
0x1d11e, 0x006d, 0x0075, 0x0073, 0xDD1E, 0x0069, 0x0063, 0xD834,
]);
assert_eq!(format!("{:#}", s.display()),
"𝄞music"
);
sourcepub fn get<I>(&self, i: I) -> Option<&Self>
pub fn get<I>(&self, i: I) -> Option<&Self>
Returns a subslice of the string.
This is the non-panicking alternative to indexing the string. Returns None
whenever equivalent indexing operation would panic.
sourcepub fn get_mut<I>(&mut self, i: I) -> Option<&mut Self>
pub fn get_mut<I>(&mut self, i: I) -> Option<&mut Self>
Returns a mutable subslice of the string.
This is the non-panicking alternative to indexing the string. Returns None
whenever equivalent indexing operation would panic.
sourcepub unsafe fn get_unchecked<I>(&self, i: I) -> &Self
pub unsafe fn get_unchecked<I>(&self, i: I) -> &Self
Returns an unchecked subslice of the string.
This is the unchecked alternative to indexing the string.
§Safety
Callers of this function are responsible that these preconditions are satisfied:
- The starting index must not exceed the ending index;
- Indexes must be within bounds of the original slice.
Failing that, the returned string slice may reference invalid memory.
sourcepub unsafe fn get_unchecked_mut<I>(&mut self, i: I) -> &mut Self
pub unsafe fn get_unchecked_mut<I>(&mut self, i: I) -> &mut Self
Returns aa mutable, unchecked subslice of the string.
This is the unchecked alternative to indexing the string.
§Safety
Callers of this function are responsible that these preconditions are satisfied:
- The starting index must not exceed the ending index;
- Indexes must be within bounds of the original slice.
Failing that, the returned string slice may reference invalid memory.
sourcepub fn split_at(&self, mid: usize) -> (&Self, &Self)
pub fn split_at(&self, mid: usize) -> (&Self, &Self)
Divide one string slice into two at an index.
The argument, mid
, should be an offset from the start of the string.
The two slices returned go from the start of the string slice to mid
, and from
mid
to the end of the string slice.
To get mutable string slices instead, see the split_at_mut
method.
sourcepub fn split_at_mut(&mut self, mid: usize) -> (&mut Self, &mut Self)
pub fn split_at_mut(&mut self, mid: usize) -> (&mut Self, &mut Self)
Divide one mutable string slice into two at an index.
The argument, mid
, should be an offset from the start of the string.
The two slices returned go from the start of the string slice to mid
, and from
mid
to the end of the string slice.
To get immutable string slices instead, see the split_at
method.
source§impl U32Str
impl U32Str
sourcepub unsafe fn from_char_ptr<'a>(p: *const char, len: usize) -> &'a Self
pub unsafe fn from_char_ptr<'a>(p: *const char, len: usize) -> &'a Self
Constructs a U32Str
from a char
pointer and a length.
The len
argument is the number of char
elements, not the number of bytes. No copying
or allocation is performed, the resulting value is a direct reference to the pointer bytes.
§Safety
This function is unsafe as there is no guarantee that the given pointer is valid for len
elements.
In addition, the data must meet the safety conditions of std::slice::from_raw_parts.
In particular, the returned string reference must not be mutated for the duration of
lifetime 'a
, except inside an UnsafeCell
.
§Panics
This function panics if p
is null.
§Caveat
The lifetime for the returned string is inferred from its usage. To prevent accidental misuse, it’s suggested to tie the lifetime to whichever source lifetime is safe in the context, such as by providing a helper function taking the lifetime of a host value for the string, or by explicit annotation.
sourcepub unsafe fn from_char_ptr_mut<'a>(p: *mut char, len: usize) -> &'a mut Self
pub unsafe fn from_char_ptr_mut<'a>(p: *mut char, len: usize) -> &'a mut Self
Constructs a mutable U32Str
from a mutable char
pointer and a length.
The len
argument is the number of char
elements, not the number of bytes. No copying
or allocation is performed, the resulting value is a direct reference to the pointer bytes.
§Safety
This function is unsafe as there is no guarantee that the given pointer is valid for len
elements.
In addition, the data must meet the safety conditions of std::slice::from_raw_parts_mut.
§Panics
This function panics if p
is null.
§Caveat
The lifetime for the returned string is inferred from its usage. To prevent accidental misuse, it’s suggested to tie the lifetime to whichever source lifetime is safe in the context, such as by providing a helper function taking the lifetime of a host value for the string, or by explicit annotation.
sourcepub fn from_char_slice(slice: &[char]) -> &Self
pub fn from_char_slice(slice: &[char]) -> &Self
sourcepub fn from_char_slice_mut(slice: &mut [char]) -> &mut Self
pub fn from_char_slice_mut(slice: &mut [char]) -> &mut Self
sourcepub fn to_os_string(&self) -> OsString
pub fn to_os_string(&self) -> OsString
Decodes a string to an owned OsString
.
This makes a string copy of the U16Str
. Since U16Str
makes no guarantees that its
encoding is UTF-16 or that the data valid UTF-16, there is no guarantee that the resulting
OsString
will have a valid underlying encoding either.
Note that the encoding of OsString
is platform-dependent, so on
some platforms this may make an encoding conversions, while on other platforms no changes to
the string will be made.
§Examples
use widestring::U32String;
use std::ffi::OsString;
let s = "MyString";
// Create a wide string from the string
let wstr = U32String::from_str(s);
// Create an OsString from the wide string
let osstr = wstr.to_os_string();
assert_eq!(osstr, OsString::from(s));
sourcepub fn to_string(&self) -> Result<String, Utf32Error>
pub fn to_string(&self) -> Result<String, Utf32Error>
Decodes the string to a String
if it contains valid UTF-32 data.
This method assumes this string is encoded as UTF-32 and attempts to decode it as such.
§Failures
Returns an error if the string contains any invalid UTF-32 data.
§Examples
use widestring::U32String;
let s = "MyString";
// Create a wide string from the string
let wstr = U32String::from_str(s);
// Create a regular string from the wide string
let s2 = wstr.to_string().unwrap();
assert_eq!(s2, s);
sourcepub fn to_string_lossy(&self) -> String
pub fn to_string_lossy(&self) -> String
Decodes the string reference to a String
even if it is invalid UTF-32 data.
This method assumes this string is encoded as UTF-16 and attempts to decode it as such. Any
invalid sequences are replaced with
U+FFFD REPLACEMENT CHARACTER
, which looks like this:
�
§Examples
use widestring::U32String;
let s = "MyString";
// Create a wide string from the string
let wstr = U32String::from_str(s);
// Create a regular string from the wide string
let lossy = wstr.to_string_lossy();
assert_eq!(lossy, s);
sourcepub fn chars(&self) -> CharsUtf32<'_> ⓘ
pub fn chars(&self) -> CharsUtf32<'_> ⓘ
Returns an iterator over the char
s of a string slice.
As this string has no defined encoding, this method assumes the string is UTF-32. Since it
may consist of invalid UTF-32, the iterator returned by this method
is an iterator over Result<char, DecodeUtf32Error>
instead of char
s
directly. If you would like a lossy iterator over chars
s directly, instead
use chars_lossy
.
It’s important to remember that char
represents a Unicode Scalar Value, and
may not match your idea of what a ‘character’ is. Iteration over grapheme clusters may be
what you actually want. That functionality is not provided by by this crate.
sourcepub fn chars_lossy(&self) -> CharsLossyUtf32<'_> ⓘ
pub fn chars_lossy(&self) -> CharsLossyUtf32<'_> ⓘ
Returns a lossy iterator over the char
s of a string slice.
As this string has no defined encoding, this method assumes the string is UTF-32. Since it
may consist of invalid UTF-32, the iterator returned by this method will replace unpaired
surrogates with
U+FFFD REPLACEMENT CHARACTER
(�). This is a lossy
version of chars
.
It’s important to remember that char
represents a Unicode Scalar Value, and
may not match your idea of what a ‘character’ is. Iteration over grapheme clusters may be
what you actually want. That functionality is not provided by by this crate.
sourcepub fn char_indices(&self) -> CharIndicesUtf32<'_> ⓘ
pub fn char_indices(&self) -> CharIndicesUtf32<'_> ⓘ
Returns an iterator over the chars of a string slice, and their positions.
As this string has no defined encoding, this method assumes the string is UTF-32. Since it
may consist of invalid UTF-32, the iterator returned by this method is an iterator over
Result<char, DecodeUtf32Error>
as well as their positions, instead of
char
s directly. If you would like a lossy indices iterator over
chars
s directly, instead use
char_indices_lossy
.
The iterator yields tuples. The position is first, the char
is second.
sourcepub fn char_indices_lossy(&self) -> CharIndicesLossyUtf32<'_> ⓘ
pub fn char_indices_lossy(&self) -> CharIndicesLossyUtf32<'_> ⓘ
Returns a lossy iterator over the chars of a string slice, and their positions.
As this string slice may consist of invalid UTF-32, the iterator returned by this method
will replace invalid values with
U+FFFD REPLACEMENT CHARACTER
(�), as well as the
positions of all characters. This is a lossy version of
char_indices
.
The iterator yields tuples. The position is first, the char
is second.
Trait Implementations§
source§impl AddAssign<&U32Str> for U32String
impl AddAssign<&U32Str> for U32String
source§fn add_assign(&mut self, rhs: &U32Str)
fn add_assign(&mut self, rhs: &U32Str)
+=
operation. Read moresource§impl AsRef<U32Str> for U32CString
impl AsRef<U32Str> for U32CString
source§impl AsRef<U32Str> for Utf32String
impl AsRef<U32Str> for Utf32String
source§impl BorrowMut<U32Str> for U32String
impl BorrowMut<U32Str> for U32String
source§fn borrow_mut(&mut self) -> &mut U32Str
fn borrow_mut(&mut self) -> &mut U32Str
source§impl<'a> Extend<&'a U32Str> for U32String
impl<'a> Extend<&'a U32Str> for U32String
source§fn extend<T: IntoIterator<Item = &'a U32Str>>(&mut self, iter: T)
fn extend<T: IntoIterator<Item = &'a U32Str>>(&mut self, iter: T)
source§fn extend_one(&mut self, item: A)
fn extend_one(&mut self, item: A)
extend_one
)source§fn extend_reserve(&mut self, additional: usize)
fn extend_reserve(&mut self, additional: usize)
extend_one
)source§impl<'a> FromIterator<&'a U32Str> for U32String
impl<'a> FromIterator<&'a U32Str> for U32String
source§impl PartialEq<&U32CStr> for U32Str
impl PartialEq<&U32CStr> for U32Str
source§impl PartialEq<&U32Str> for U32CStr
impl PartialEq<&U32Str> for U32CStr
source§impl<'a> PartialEq<&'a U32Str> for U32CString
impl<'a> PartialEq<&'a U32Str> for U32CString
source§impl PartialEq<&U32Str> for U32Str
impl PartialEq<&U32Str> for U32Str
source§impl<'a> PartialEq<&'a U32Str> for U32String
impl<'a> PartialEq<&'a U32Str> for U32String
source§impl PartialEq<U32CStr> for &U32Str
impl PartialEq<U32CStr> for &U32Str
source§impl PartialEq<U32CStr> for U32Str
impl PartialEq<U32CStr> for U32Str
source§impl PartialEq<U32CString> for &U32Str
impl PartialEq<U32CString> for &U32Str
source§fn eq(&self, other: &U32CString) -> bool
fn eq(&self, other: &U32CString) -> bool
self
and other
values to be equal, and is used
by ==
.source§impl PartialEq<U32CString> for U32Str
impl PartialEq<U32CString> for U32Str
source§fn eq(&self, other: &U32CString) -> bool
fn eq(&self, other: &U32CString) -> bool
self
and other
values to be equal, and is used
by ==
.source§impl PartialEq<U32Str> for &U32CStr
impl PartialEq<U32Str> for &U32CStr
source§impl PartialEq<U32Str> for &U32Str
impl PartialEq<U32Str> for &U32Str
source§impl PartialEq<U32Str> for U32CStr
impl PartialEq<U32Str> for U32CStr
source§impl PartialEq<U32Str> for U32CString
impl PartialEq<U32Str> for U32CString
source§impl PartialEq<U32Str> for U32String
impl PartialEq<U32Str> for U32String
source§impl PartialEq<U32Str> for Utf32Str
impl PartialEq<U32Str> for Utf32Str
source§impl PartialEq<U32Str> for Utf32String
impl PartialEq<U32Str> for Utf32String
source§impl PartialEq<U32String> for &U32Str
impl PartialEq<U32String> for &U32Str
source§impl PartialEq<U32String> for U32Str
impl PartialEq<U32String> for U32Str
source§impl PartialEq<Utf32Str> for U32Str
impl PartialEq<Utf32Str> for U32Str
source§impl PartialEq<Utf32String> for U32Str
impl PartialEq<Utf32String> for U32Str
source§fn eq(&self, other: &Utf32String) -> bool
fn eq(&self, other: &Utf32String) -> bool
self
and other
values to be equal, and is used
by ==
.source§impl PartialEq for U32Str
impl PartialEq for U32Str
source§impl<'a> PartialOrd<&'a U32Str> for U32CString
impl<'a> PartialOrd<&'a U32Str> for U32CString
1.0.0 · source§fn le(&self, other: &Rhs) -> bool
fn le(&self, other: &Rhs) -> bool
self
and other
) and is used by the <=
operator. Read moresource§impl<'a> PartialOrd<&'a U32Str> for U32String
impl<'a> PartialOrd<&'a U32Str> for U32String
1.0.0 · source§fn le(&self, other: &Rhs) -> bool
fn le(&self, other: &Rhs) -> bool
self
and other
) and is used by the <=
operator. Read moresource§impl PartialOrd<U32CStr> for U32Str
impl PartialOrd<U32CStr> for U32Str
1.0.0 · source§fn le(&self, other: &Rhs) -> bool
fn le(&self, other: &Rhs) -> bool
self
and other
) and is used by the <=
operator. Read moresource§impl PartialOrd<U32Str> for U32CStr
impl PartialOrd<U32Str> for U32CStr
1.0.0 · source§fn le(&self, other: &Rhs) -> bool
fn le(&self, other: &Rhs) -> bool
self
and other
) and is used by the <=
operator. Read moresource§impl PartialOrd<U32Str> for U32CString
impl PartialOrd<U32Str> for U32CString
1.0.0 · source§fn le(&self, other: &Rhs) -> bool
fn le(&self, other: &Rhs) -> bool
self
and other
) and is used by the <=
operator. Read moresource§impl PartialOrd<U32Str> for U32String
impl PartialOrd<U32Str> for U32String
1.0.0 · source§fn le(&self, other: &Rhs) -> bool
fn le(&self, other: &Rhs) -> bool
self
and other
) and is used by the <=
operator. Read moresource§impl PartialOrd for U32Str
impl PartialOrd for U32Str
1.0.0 · source§fn le(&self, other: &Rhs) -> bool
fn le(&self, other: &Rhs) -> bool
self
and other
) and is used by the <=
operator. Read more