regex_automata::util::look

Enum Look

Source
pub enum Look {
Show 18 variants Start = 1, End = 2, StartLF = 4, EndLF = 8, StartCRLF = 16, EndCRLF = 32, WordAscii = 64, WordAsciiNegate = 128, WordUnicode = 256, WordUnicodeNegate = 512, WordStartAscii = 1_024, WordEndAscii = 2_048, WordStartUnicode = 4_096, WordEndUnicode = 8_192, WordStartHalfAscii = 16_384, WordEndHalfAscii = 32_768, WordStartHalfUnicode = 65_536, WordEndHalfUnicode = 131_072,
}
Expand description

A look-around assertion.

An assertion matches at a position between characters in a haystack. Namely, it does not actually “consume” any input as most parts of a regular expression do. Assertions are a way of stating that some property must be true at a particular point during matching.

For example, (?m)^[a-z]+$ is a pattern that:

  • Scans the haystack for a position at which (?m:^) is satisfied. That occurs at either the beginning of the haystack, or immediately following a \n character.
  • Looks for one or more occurrences of [a-z].
  • Once [a-z]+ has matched as much as it can, an overall match is only reported when [a-z]+ stops just before a \n.

So in this case, abc and \nabc\n match, but \nabc1\n does not.

Assertions are also called “look-around,” “look-behind” and “look-ahead.” Specifically, some assertions are look-behind (like ^), other assertions are look-ahead (like $) and yet other assertions are both look-ahead and look-behind (like \b).

§Assertions in an NFA

An assertion in a thompson::NFA can be thought of as a conditional epsilon transition. That is, a matching engine like the PikeVM only permits moving through conditional epsilon transitions when their condition is satisfied at whatever position the PikeVM is currently at in the haystack.

How assertions are handled in a DFA is trickier, since a DFA does not have epsilon transitions at all. In this case, they are compiled into the automaton itself, at the expense of more states than what would be required without an assertion.

Variants§

§

Start = 1

Match the beginning of text. Specifically, this matches at the starting position of the input.

§

End = 2

Match the end of text. Specifically, this matches at the ending position of the input.

§

StartLF = 4

Match the beginning of a line or the beginning of text. Specifically, this matches at the starting position of the input, or at the position immediately following a \n character.

§

EndLF = 8

Match the end of a line or the end of text. Specifically, this matches at the end position of the input, or at the position immediately preceding a \n character.

§

StartCRLF = 16

Match the beginning of a line or the beginning of text. Specifically, this matches at the starting position of the input, or at the position immediately following either a \r or \n character, but never after a \r when a \n follows.

§

EndCRLF = 32

Match the end of a line or the end of text. Specifically, this matches at the end position of the input, or at the position immediately preceding a \r or \n character, but never before a \n when a \r precedes it.

§

WordAscii = 64

Match an ASCII-only word boundary. That is, this matches a position where the left adjacent character and right adjacent character correspond to a word and non-word or a non-word and word character.

§

WordAsciiNegate = 128

Match an ASCII-only negation of a word boundary.

§

WordUnicode = 256

Match a Unicode-aware word boundary. That is, this matches a position where the left adjacent character and right adjacent character correspond to a word and non-word or a non-word and word character.

§

WordUnicodeNegate = 512

Match a Unicode-aware negation of a word boundary.

§

WordStartAscii = 1_024

Match the start of an ASCII-only word boundary. That is, this matches a position at either the beginning of the haystack or where the previous character is not a word character and the following character is a word character.

§

WordEndAscii = 2_048

Match the end of an ASCII-only word boundary. That is, this matches a position at either the end of the haystack or where the previous character is a word character and the following character is not a word character.

§

WordStartUnicode = 4_096

Match the start of a Unicode word boundary. That is, this matches a position at either the beginning of the haystack or where the previous character is not a word character and the following character is a word character.

§

WordEndUnicode = 8_192

Match the end of a Unicode word boundary. That is, this matches a position at either the end of the haystack or where the previous character is a word character and the following character is not a word character.

§

WordStartHalfAscii = 16_384

Match the start half of an ASCII-only word boundary. That is, this matches a position at either the beginning of the haystack or where the previous character is not a word character.

§

WordEndHalfAscii = 32_768

Match the end half of an ASCII-only word boundary. That is, this matches a position at either the end of the haystack or where the following character is not a word character.

§

WordStartHalfUnicode = 65_536

Match the start half of a Unicode word boundary. That is, this matches a position at either the beginning of the haystack or where the previous character is not a word character.

§

WordEndHalfUnicode = 131_072

Match the end half of a Unicode word boundary. That is, this matches a position at either the end of the haystack or where the following character is not a word character.

Implementations§

Source§

impl Look

Source

pub const fn reversed(self) -> Look

Flip the look-around assertion to its equivalent for reverse searches. For example, StartLF gets translated to EndLF.

Some assertions, such as WordUnicode, remain the same since they match the same positions regardless of the direction of the search.

Source

pub const fn as_repr(self) -> u32

Return the underlying representation of this look-around enumeration as an integer. Giving the return value to the Look::from_repr constructor is guaranteed to return the same look-around variant that one started with within a semver compatible release of this crate.

Source

pub const fn from_repr(repr: u32) -> Option<Look>

Given the underlying representation of a Look value, return the corresponding Look value if the representation is valid. Otherwise None is returned.

Source

pub const fn as_char(self) -> char

Returns a convenient single codepoint representation of this look-around assertion. Each assertion is guaranteed to be represented by a distinct character.

This is useful for succinctly representing a look-around assertion in human friendly but succinct output intended for a programmer working on regex internals.

Trait Implementations§

Source§

impl Clone for Look

Source§

fn clone(&self) -> Look

Returns a copy of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Look

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl PartialEq for Look

Source§

fn eq(&self, other: &Look) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Copy for Look

Source§

impl Eq for Look

Source§

impl StructuralPartialEq for Look

Auto Trait Implementations§

§

impl Freeze for Look

§

impl RefUnwindSafe for Look

§

impl Send for Look

§

impl Sync for Look

§

impl Unpin for Look

§

impl UnwindSafe for Look

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dst: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.

Layout§

Note: Most layout information is completely unstable and may even differ between compilations. The only exception is types with certain repr(...) attributes. Please see the Rust Reference's “Type Layout” chapter for details on type layout guarantees.

Size: 4 bytes

Size for each variant:

  • Start: 0 bytes
  • End: 0 bytes
  • StartLF: 0 bytes
  • EndLF: 0 bytes
  • StartCRLF: 0 bytes
  • EndCRLF: 0 bytes
  • WordAscii: 0 bytes
  • WordAsciiNegate: 0 bytes
  • WordUnicode: 0 bytes
  • WordUnicodeNegate: 0 bytes
  • WordStartAscii: 0 bytes
  • WordEndAscii: 0 bytes
  • WordStartUnicode: 0 bytes
  • WordEndUnicode: 0 bytes
  • WordStartHalfAscii: 0 bytes
  • WordEndHalfAscii: 0 bytes
  • WordStartHalfUnicode: 0 bytes
  • WordEndHalfUnicode: 0 bytes