regex_automata::util::alphabet

Struct Unit

Source
pub struct Unit(/* private fields */);
Expand description

Unit represents a single unit of haystack for DFA based regex engines.

It is not expected for consumers of this crate to need to use this type unless they are implementing their own DFA. And even then, it’s not required: implementors may use other techniques to handle haystack units.

Typically, a single unit of haystack for a DFA would be a single byte. However, for the DFAs in this crate, matches are delayed by a single byte in order to handle look-ahead assertions (\b, $ and \z). Thus, once we have consumed the haystack, we must run the DFA through one additional transition using a unit that indicates the haystack has ended.

There is no way to represent a sentinel with a u8 since all possible values may be valid haystack units to a DFA, therefore this type explicitly adds room for a sentinel value.

The sentinel EOI value is always its own equivalence class and is ultimately represented by adding 1 to the maximum equivalence class value. So for example, the regex ^[a-z]+$ might be split into the following equivalence classes:

0 => [\x00-`]
1 => [a-z]
2 => [{-\xFF]
3 => [EOI]

Where EOI is the special sentinel value that is always in its own singleton equivalence class.

Implementations§

Source§

impl Unit

Source

pub fn u8(byte: u8) -> Unit

Create a new haystack unit from a byte value.

All possible byte values are legal. However, when creating a haystack unit for a specific DFA, one should be careful to only construct units that are in that DFA’s alphabet. Namely, one way to compact a DFA’s in-memory representation is to collapse its transitions to a set of equivalence classes into a set of all possible byte values. If a DFA uses equivalence classes instead of byte values, then the byte given here should be the equivalence class.

Source

pub fn eoi(num_byte_equiv_classes: usize) -> Unit

Create a new “end of input” haystack unit.

The value given is the sentinel value used by this unit to represent the “end of input.” The value should be the total number of equivalence classes in the corresponding alphabet. Its maximum value is 256, which occurs when every byte is its own equivalence class.

§Panics

This panics when num_byte_equiv_classes is greater than 256.

Source

pub fn as_u8(self) -> Option<u8>

If this unit is not an “end of input” sentinel, then returns its underlying byte value. Otherwise return None.

Source

pub fn as_eoi(self) -> Option<u16>

If this unit is an “end of input” sentinel, then return the underlying sentinel value that was given to Unit::eoi. Otherwise return None.

Source

pub fn as_usize(self) -> usize

Return this unit as a usize, regardless of whether it is a byte value or an “end of input” sentinel. In the latter case, the underlying sentinel value given to Unit::eoi is returned.

Source

pub fn is_byte(self, byte: u8) -> bool

Returns true if and only of this unit is a byte value equivalent to the byte given. This always returns false when this is an “end of input” sentinel.

Source

pub fn is_eoi(self) -> bool

Returns true when this unit represents an “end of input” sentinel.

Source

pub fn is_word_byte(self) -> bool

Returns true when this unit corresponds to an ASCII word byte.

This always returns false when this unit represents an “end of input” sentinel.

Trait Implementations§

Source§

impl Clone for Unit

Source§

fn clone(&self) -> Unit

Returns a copy of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Unit

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Ord for Unit

Source§

fn cmp(&self, other: &Unit) -> Ordering

This method returns an Ordering between self and other. Read more
1.21.0 · Source§

fn max(self, other: Self) -> Self
where Self: Sized,

Compares and returns the maximum of two values. Read more
1.21.0 · Source§

fn min(self, other: Self) -> Self
where Self: Sized,

Compares and returns the minimum of two values. Read more
1.50.0 · Source§

fn clamp(self, min: Self, max: Self) -> Self
where Self: Sized,

Restrict a value to a certain interval. Read more
Source§

impl PartialEq for Unit

Source§

fn eq(&self, other: &Unit) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl PartialOrd for Unit

Source§

fn partial_cmp(&self, other: &Unit) -> Option<Ordering>

This method returns an ordering between self and other values if one exists. Read more
1.0.0 · Source§

fn lt(&self, other: &Rhs) -> bool

Tests less than (for self and other) and is used by the < operator. Read more
1.0.0 · Source§

fn le(&self, other: &Rhs) -> bool

Tests less than or equal to (for self and other) and is used by the <= operator. Read more
1.0.0 · Source§

fn gt(&self, other: &Rhs) -> bool

Tests greater than (for self and other) and is used by the > operator. Read more
1.0.0 · Source§

fn ge(&self, other: &Rhs) -> bool

Tests greater than or equal to (for self and other) and is used by the >= operator. Read more
Source§

impl Copy for Unit

Source§

impl Eq for Unit

Source§

impl StructuralPartialEq for Unit

Auto Trait Implementations§

§

impl Freeze for Unit

§

impl RefUnwindSafe for Unit

§

impl Send for Unit

§

impl Sync for Unit

§

impl Unpin for Unit

§

impl UnwindSafe for Unit

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dst: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.

Layout§

Note: Most layout information is completely unstable and may even differ between compilations. The only exception is types with certain repr(...) attributes. Please see the Rust Reference's “Type Layout” chapter for details on type layout guarantees.

Size: 4 bytes