regex_automata::util::start

Struct Config

Source
pub struct Config { /* private fields */ }
Expand description

The configuration used to determine a DFA’s start state for a search.

A DFA has a single starting state in the typical textbook description. That is, it corresponds to the set of all starting states for the NFA that built it, along with their espsilon closures. In this crate, however, DFAs have many possible start states due to a few factors:

  • DFAs support the ability to run either anchored or unanchored searches. Each type of search needs its own start state. For example, an unanchored search requires starting at a state corresponding to a regex with a (?s-u:.)*? prefix, which will match through anything.
  • DFAs also optionally support starting an anchored search for any one specific pattern. Each such pattern requires its own start state.
  • If a look-behind assertion like ^ or \b is used in the regex, then the DFA will need to inspect a single byte immediately before the start of the search to choose the correct start state.

Indeed, this configuration precisely encapsulates all of the above factors. The Config::anchored method sets which kind of anchored search to perform while the Config::look_behind method provides a way to set the byte that occurs immediately before the start of the search.

Generally speaking, this type is only useful when you want to run searches without using an Input. In particular, an Input wants a haystack slice, but callers may not have a contiguous sequence of bytes as a haystack in all cases. This type provides a lower level of control such that callers can provide their own anchored configuration and look-behind byte explicitly.

§Example

This shows basic usage that permits running a search with a DFA without using the Input abstraction.

use regex_automata::{
    dfa::{Automaton, dense},
    util::start,
    Anchored,
};

let dfa = dense::DFA::new(r"(?-u)\b\w+\b")?;
let haystack = "quartz";

let config = start::Config::new().anchored(Anchored::Yes);
let mut state = dfa.start_state(&config)?;
for &b in haystack.as_bytes().iter() {
    state = dfa.next_state(state, b);
}
state = dfa.next_eoi_state(state);
assert!(dfa.is_match_state(state));

This example shows how to correctly run a search that doesn’t begin at the start of a haystack. Notice how we set the look-behind byte, and as a result, the \b assertion does not match.

use regex_automata::{
    dfa::{Automaton, dense},
    util::start,
    Anchored,
};

let dfa = dense::DFA::new(r"(?-u)\b\w+\b")?;
let haystack = "quartz";

let config = start::Config::new()
    .anchored(Anchored::Yes)
    .look_behind(Some(b'q'));
let mut state = dfa.start_state(&config)?;
for &b in haystack.as_bytes().iter().skip(1) {
    state = dfa.next_state(state, b);
}
state = dfa.next_eoi_state(state);
// No match!
assert!(!dfa.is_match_state(state));

If we had instead not set a look-behind byte, then the DFA would assume that it was starting at the beginning of the haystack, and thus \b should match. This in turn would result in erroneously reporting a match:

use regex_automata::{
    dfa::{Automaton, dense},
    util::start,
    Anchored,
};

let dfa = dense::DFA::new(r"(?-u)\b\w+\b")?;
let haystack = "quartz";

// Whoops, forgot the look-behind byte...
let config = start::Config::new().anchored(Anchored::Yes);
let mut state = dfa.start_state(&config)?;
for &b in haystack.as_bytes().iter().skip(1) {
    state = dfa.next_state(state, b);
}
state = dfa.next_eoi_state(state);
// And now we get a match unexpectedly.
assert!(dfa.is_match_state(state));

Implementations§

Source§

impl Config

Source

pub fn new() -> Config

Create a new default start configuration.

The default is an unanchored search that starts at the beginning of the haystack.

Source

pub fn from_input_forward(input: &Input<'_>) -> Config

A convenience routine for building a start configuration from an Input for a forward search.

This automatically sets the look-behind byte to the byte immediately preceding the start of the search. If the start of the search is at offset 0, then no look-behind byte is set.

Source

pub fn from_input_reverse(input: &Input<'_>) -> Config

A convenience routine for building a start configuration from an Input for a reverse search.

This automatically sets the look-behind byte to the byte immediately following the end of the search. If the end of the search is at offset haystack.len(), then no look-behind byte is set.

Source

pub fn look_behind(self, byte: Option<u8>) -> Config

Set the look-behind byte at the start of a search.

Unless the search is intended to logically start at the beginning of a haystack, this should always be set to the byte immediately preceding the start of the search. If no look-behind byte is set, then the start configuration will assume it is at the beginning of the haystack. For example, the anchor ^ will match.

The default is that no look-behind byte is set.

Source

pub fn anchored(self, mode: Anchored) -> Config

Set the anchored mode of a search.

The default is an unanchored search.

Source

pub fn get_look_behind(&self) -> Option<u8>

Return the look-behind byte in this configuration, if one exists.

Source

pub fn get_anchored(&self) -> Anchored

Return the anchored mode in this configuration.

Trait Implementations§

Source§

impl Clone for Config

Source§

fn clone(&self) -> Config

Returns a copy of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Config

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl Freeze for Config

§

impl RefUnwindSafe for Config

§

impl Send for Config

§

impl Sync for Config

§

impl Unpin for Config

§

impl UnwindSafe for Config

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dst: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.

Layout§

Note: Most layout information is completely unstable and may even differ between compilations. The only exception is types with certain repr(...) attributes. Please see the Rust Reference's “Type Layout” chapter for details on type layout guarantees.

Size: 12 bytes