Last year I built a scanner generator like flex or jflex in Scala for a class. Part of this project is a small library to manage and manipulate finite automatons. It works completely as expected, except for some errors in the conversion to point code, which is of little concern.

I would like to review some aspects of this library, including, among others:

- Quality and usefulness of the documentation of classes and methods.
- Application of functional concepts (tail recursion)
- Appointment of classes, methods and variables.
- Simplicity and, more subjectively, elegance of solutions. Especially in my implementation of the construction of the subset to make the automaton deterministic.
- … and any other comments you may have

**References to codes other than the code provided**

The library uses a custom class for sets called `MySet`

, which is beyond the scope of this question. I can ask for opinions for this class in another question. `MySet`

supports the use of ranges as sets, complementary and differential sets without listing all values. It also provides the mathematical operators for unions, intersections and differences.

**Code**

The library consists mainly of the class. `AutomatonSpecification`

, which encapsulates the list of transitions, as well as the start and acceptance states.

It provides several methods to manipulate (create a manipulated copy) an automaton, especially to eliminate epsilon transitions, make it deterministic by building subsets and calculating a minimum automaton.

AutomatonSpecification.scala:

```
package automata.finiteautomata
/**
* Encapsulates all aspects of a finite automaton.
* This includes the transitions as well as the start and accepting states of the automaton
*
* @param transitions The transitions in this automaton
* @param accepting A list of accepting states along with their result value
* @param start The set of possible start states
* @tparam InputElement The type of the input elements of the automaton
* @tparam StateType The type of the states in this automaton
* @tparam Result The result type of this automaton
*/
case class AutomatonSpecification(InputElement, StateType <: State(StateType), Result)
(transitions: List(Transition(InputElement, StateType)), accepting: List(Acceptance(StateType, Result)), start: Set(StateType)) {
/**
* Finds the epsilon closure of a given state.
* Calculates the epsilon closures of all epsilon reachable states as a byproduct.
*
* @param state The state to find the epsilon closure of
* @param knownClosures A map containing known epsilon closures. Acts as a cache.
* @return A tuple consisting of 1. The set of epsilon reachable states and 2. A map of all known epsilon closures
*/
def epsilonClosureOf(state: StateType, knownClosures: Map(StateType, Set(StateType)) = Map()): (Set(StateType), Map(StateType, Set(StateType))) = {
def helper(currentState: StateType, known: Map(StateType, Set(StateType)), visitedStates: Set(StateType)): (Set(StateType), Map(StateType, Set(StateType))) = {
if (visitedStates.contains(currentState)) return (visitedStates, known)
val visited = visitedStates + currentState
known.get(currentState) match {
case Some(closure) => (closure, known)
case None =>
val (c, k) =
transitions
.filter(t => t.oldState == currentState && t.condition == EpsilonCondition)
.map(_.newState).toSet
.foldLeft((visited, known)) {
case ((newVisited, newKnown), reachablestate) =>
//Get the transitive closure of the epsilon-reachable state
//All known closures (f._2) are passed on
//Already visited states (f._1) are also passed on.
//The last part is needed to correctly handle cycles of epsilon-reachable states
val (closure, newknown) = helper(reachablestate, newKnown, newVisited)
(newVisited ++ closure, newknown)
}
(c, k + (currentState -> c))
}
}
helper(state, knownClosures, Set())
}
/**
* Finds the epsilon closure of all states of this automaton
*
* @return A map containing the epsilon closures of all states.
*/
def allEpsilonClosures: Map(StateType, Set(StateType)) = this.states.foldLeft(Map(StateType, Set(StateType))())(
(k, state) => this.epsilonClosureOf(state, k)._2)
/**
* Tail recursive function to remove all epsilon transitions in a specification of a finite automaton in an iterative manner
*
* @return The specification of an equivalent automaton without any epsilon transitions
*/
@scala.annotation.tailrec
final def withoutEpsilonIterative: AutomatonSpecification(InputElement, StateType, Result) = {
//If there are no epsilon transitions left, return the argument
if (this.transitions.forall(_.condition != EpsilonCondition)) return this
val transformed: List((List(Transition(InputElement, StateType)), Option(Acceptance(StateType, Result)))) =
this.transitions.map {
case Transition(a, EpsilonCondition, b) if a == b =>
(Nil, None) //Reflective epsilon transitions are meaningless and result in infinite loops.
case Transition(oldState, EpsilonCondition, newState) =>
//1. Replace epsilon transition with all possible transitive transitions (potentially including new epsilon-transitions)
//2. If the target state was accepting, add the initial state to the set of accepting states
(this.transitions.filter(_.oldState == newState).map(x => Transition(oldState, x.condition, x.newState)),
this.accepting.find(_.state == newState).map(_.copy(state = oldState)))
case t => (List(t), None) //All non-epsilon transitions are kept
}
//tailrecursive call to eliminate the next set of epsilon transitions
AutomatonSpecification(transformed.flatMap(_._1), transformed.flatMap(_._2) ++ this.accepting,
this.start).withoutEpsilonIterative
}
/**
* Removes all epsilon transitions in the specification.
* Curiously around 4x slower than 'withoutEpsilonIterative' for small automata.
* The resulting specification may contain unreachable states
*
* @return The specification of an equivalent automaton without any epsilon transitions.
*/
def withoutEpsilon: AutomatonSpecification(InputElement, StateType, Result) = {
val closures = this.allEpsilonClosures
val transitions = for (state <- this.states.toList;
closing <- closures(state);
transition <- this.transitions.filter(
_.oldState == closing) if transition.condition != EpsilonCondition)
yield transition.copy(oldState = state)
val accepting = this.accepting.flatMap({
case Acceptance(state, result) => closures.filter({
case (_, value) => value.contains(state)
}).keys.map(s => Acceptance(s, result)).toList
})
AutomatonSpecification(transitions, accepting, this.start.flatMap(closures)).cullAcceptingStates
}
/**
* Makes the automaton deterministic by constructing all reachable state-subsets.
* Requires the specification to not contain any epsilon transition.
* Removes unreachable states as a byproduct.
*
* @return A new specification containing aggregate states and no sources of non-deterministic behaviour
*/
def toDeterministic: AutomatonSpecification(InputElement, AggregateState(StateType), Result) =
this.toDeterministic(AggregateState(this.start))
/**
* Makes the automaton deterministic by constructing all reachable state-subsets.
* Requires the specification to not contain any epsilon transition.
* Removes unreachable states as a byproduct.
*
* @param aggregateStartState An aggregate start state
* @return A new specification containing aggregate states and no sources of non-deterministic behaviour
*/
private def toDeterministic(aggregateStartState: AggregateState(StateType)): AutomatonSpecification(InputElement, AggregateState(StateType), Result) = {
case class StateTransition(condition: TransitionCondition(InputElement), target: AggregateState(StateType))
/**
* Transforms a list of non-deterministic transitions for a given state into a list of usable deterministic transitions
*
* @param current The current aggregate state
* @param transitions A list of all transitions that can be taken from the current state
* @return A usable list of AggregateTransitions with no remaining sources of non-determinism
*/
def solveAll(current: AggregateState(StateType), transitions: List(StateTransition)): List(Transition(InputElement, AggregateState(StateType))) = {
val resolved = resolveAllCollisions(transitions)
resolved.map(t => Transition(InputElement, AggregateState(StateType))(current, t.condition, t.target))
}
/**
* Transforms a list of non-deterministic transitions into a list of deterministic transitions
*
* @param transitions A list of possible transitions
* @return A list of deterministic transitions without the origin state
*/
def resolveAllCollisions(transitions: List(StateTransition)): List(StateTransition) = {
transitions match {
case Nil => Nil
case head :: Nil => head :: Nil
case head :: tail =>
resolveCollisionsForSingleHead(head, tail) match {
case Some(replacement) => resolveAllCollisions(replacement)
case None => head :: resolveAllCollisions(tail)
}
}
}
/**
* Resolves the first collision between a given transition and any of a list of transitions
*
* @param head The transition to resolve collisions with
* @param tail List of all potential collision candidates
* @return None if there was no collision, some list of transitions if there was a collision that was replaced.
* The list is functionally equivalent to all transitions head :: tail
*/
def resolveCollisionsForSingleHead(head: StateTransition, tail: List(StateTransition)): Option(List(StateTransition)) = {
@scala.annotation.tailrec
def helper(tail: List(StateTransition), safe: List(StateTransition)): Option(List(StateTransition)) = {
tail match {
case Nil => None
case next :: ts => resolveOptionalCollision(head, next) match {
case None => helper(ts, next :: safe)
case Some(replacement) => Some(replacement.filterNot(_.condition.isEmpty) ::: ts ::: safe)
}
}
}
helper(tail, Nil)
}
/**
* Resolves the collision between to transitions
*
* @param a The first transition
* @param b The second transition
* @return None if there was no collision, some list of transitions if the transitions collide and need to be replaced.
* The list may contain "empty" transitions, i.e. SetConditions that with empty character sets.
* While not harmful, they should be filtered for efficiency.
*/
def resolveOptionalCollision(a: StateTransition, b: StateTransition): Option(List(StateTransition)) = {
(a, b) match {
case (StateTransition(SetCondition(t1), aNew), StateTransition(SetCondition(t2), bNew)) =>
val intersect = t1 ∩ t2
if (intersect.isEmpty) None
else Some(List(
StateTransition(SetCondition(t1 t2), aNew),
StateTransition(SetCondition(t2 t1), bNew),
StateTransition(SetCondition(intersect), aNew ++ bNew)
))
case _ => None
}
}
@scala.annotation.tailrec
def constructAggregateStatesAndTransitions(queue: List(AggregateState(StateType)), known: Set(AggregateState(StateType)), transes: List(Transition(InputElement, AggregateState(StateType)))): (Set(AggregateState(StateType)), List(Transition(InputElement, AggregateState(StateType)))) = {
queue match {
case Nil => (known, transes)
case head :: tail if known.contains(head) => constructAggregateStatesAndTransitions(tail, known, transes)
case head :: tail =>
val resolved: List(Transition(InputElement, AggregateState(StateType))) = {
//Filter transitions on the old state, and map to an instance of C
val cs = this.transitions.filter(t => head.states.contains(t.oldState)).map(
t => StateTransition(t.condition, AggregateState(Set(t.newState))))
solveAll(head, cs) //Solve the collisions in those transitions
}
val newStates = resolved.map(
_.newState) //Get all aggregate states, are can be reached by the resolved transitions
//Recursive call to get transitions of all remaining aggregate states, adding the newly found states to the queue
constructAggregateStatesAndTransitions(newStates ++ tail, known + head, resolved.map(
t => Transition(InputElement, AggregateState(StateType))(head, t.condition, t.newState)) ::: transes)
}
}
//Compound expression to hide variables from nested functions
{
//Construct all reachable aggregate states and the corresponding transitions
val (states, ts) = constructAggregateStatesAndTransitions(aggregateStartState :: Nil, Set(), Nil)
val accepting = this.accepting.flatMap({
case Acceptance(state, result) => states.filter(value => value.states.contains(state)).map(
s => Acceptance(s, result)).toList
})
AutomatonSpecification(ts, accepting, Set(aggregateStartState)).cullAcceptingStates
}
}
/**
* Maps all states to a numbered state, to make it more convenient to use.
*
* @return An updated specification containing numbered states.
*/
def withNumberedStates: AutomatonSpecification(InputElement, NumberedState, Result) =
this.mapStates(this.getIntegralStateMap)
/**
* Maps all states in this specification.
* Guaranteed to preserve determinism if 'f' is injective.
*
* @param f The function, that maps old states to new states.
* @tparam NewState The type of the new states
* @return An updated specification.
*/
def mapStates(NewState <: State(NewState))(f: StateType => NewState): AutomatonSpecification(InputElement, NewState, Result) = AutomatonSpecification(
this.transitions.map(_.mapStates(f)), this.accepting.map(a => a.copy(state = f(a.state))), this.start.map(f))
/**
* @return A Set of all States in this Automaton
*/
def states: Set(StateType) = this.transitions.flatMap(t => List(t.oldState, t.newState)).toSet ++ this.start
/**
* Removes all redundant and therefore ineffective accepting states
*
* @return A new automaton specification without the redundant acceptances
*/
private def cullAcceptingStates: AutomatonSpecification(InputElement, StateType, Result) = {
def filter(ac: List(Acceptance(StateType, Result))): List(Acceptance(StateType, Result)) = {
ac match {
case Nil => Nil
case head :: tail => head :: filter(tail.filterNot(_.state == head.state))
}
}
this.copy(accepting = filter(this.accepting))
}
/**
* Minimizes the automaton using Brzozowski’s algorithm.
* Limitation: Only works correctly when all results of accepting states are equal.
*
* @return A minimized automaton with aggregate states
*/
def minimized: AutomatonSpecification(InputElement, AggregateState(AggregateState(StateType)), Result) =
this.reversed.toDeterministic.reversed.toDeterministic
/**
* Reverses the automaton.
* The result automaton accepts the reversed language of this automaton.
* Only works when all semantics of accepting states are equal.
*
* @return A potentially non-deterministic finite automaton specification, that accepts the reversed language.
*/
def reversed: AutomatonSpecification(InputElement, StateType, Result) =
AutomatonSpecification(
this.transitions.map(_.reverse),
this.start.map(s => Acceptance(s, this.accepting.head.result)).toList,
this.accepting.map(_.state).toSet
)
/**
* Groups transistion edges by their start- and end-node.
* Merges the transitions that can be merged
*
* @return A new AutomatonSpecification with the combined transitions
*/
def mergeTransitions: AutomatonSpecification(InputElement, StateType, Result) = {
def union(a: TransitionCondition(InputElement), b: TransitionCondition(InputElement)): TransitionCondition(InputElement) = (a, b) match {
case (EpsilonCondition, EpsilonCondition) => EpsilonCondition
case (EpsilonCondition, other) => other
case (other, EpsilonCondition) => other
case (SetCondition(lhs), SetCondition(rhs)) => SetCondition(lhs ∪ rhs)
}
val mergedTransitions = this.transitions.groupBy(t => (t.oldState, t.newState)).map({
case ((old, target), trans) =>
Transition(old, trans.map(_.condition).reduce(union), target)
}).toList
this.copy(transitions = mergedTransitions)
}
/**
* Creates a dot graph-representation of this automaton.
*
* @return A string in valid dot-syntax for graphical representation
*/
def toDot(f: InputElement => String = _.toString): String = {
val regularstates = this.states &~ this.accepting.map(_.state).toSet &~ this.start
val statemap = this.getIntegralStateMap.mapValues(_.state.toString)
s"""digraph G{
| {
| node(shape=circle style=filled fillcolor=white)
| ${
this.accepting.map(
a => s"""${statemap(a.state)} (shape=doublecircle, label="${a.state.toDot}(${a.result})")""").mkString("n ")
}
| ${this.start.map(a => s"""${statemap(a)} (fillcolor=yellow label="${a.toDot}")""").mkString("n ")}
| ${regularstates.map(a => s"""${statemap(a)} (label="${a.toDot}")""").mkString("n ")}
| }
| ${this.transitions.map(t => s"${t.toDot(statemap, f)};").mkString("n ")}
|}
""".stripMargin
}
/**
* Creates a human-readable representation of this automaton
*
* @return A string describing the automaton
*/
override def toString: String = s"Automaton Specification: Start with $start, End with $accepting:n${
this.transitions.mkString("n")
}"
/**
* Maps all states of this automaton to a numbered state
*
* @return A map containing a numbered state for every state this automaton contains
*/
private def getIntegralStateMap: Map(StateType, NumberedState) =
this.states.zipWithIndex.toMap.mapValues(NumberedState)
/**
* Merges this automaton with another automaton.
* The merge is done by mapping the states of the second automaton to remove all clashes and keeping the starting states of both automatons.
* The resulting automaton is by nature non-deterministic.
*
* @param other The second automaton.
* @return A new automaton specification whose accepted language is the union of both automatons.
*/
def mergeWith(other: AutomatonSpecification(InputElement, StateType, Result)): AutomatonSpecification(InputElement, StateType, Result) = {
val highest = this.getHighestStateIndex + 1
val transformed = other.mapStates(_ + highest)
AutomatonSpecification(this.transitions ::: transformed.transitions, this.accepting ::: transformed.accepting,
this.start ++ transformed.start)
}
private def getHighestStateIndex: Int = this.states.map(_.max).max
}
```

Accompanying this class there are some small classes that represent states and transitions located in the object of the package.

package.scale:

```
package automata
package object finiteautomata {
/**
* A state for a finite automaton
*
* @tparam ConcreteState The concrete State class
*/
sealed trait State(ConcreteState) {
/**
* Adds an offset to all relevant statenumbers of this state
*
* @param offset The offset to add
* @return A new ConcreteState including the offset
*/
def +(offset: Int): ConcreteState
/**
* @return A dot-representation of this state
*/
def toDot: String
/**
* @return The maximum state number that is used in this state
*/
def max: Int
}
/**
* A simple numbered state for finite automata
*
* @param state The state number
*/
case class NumberedState(state: Int) extends State(NumberedState) {
override def +(offset: Int): NumberedState = NumberedState(state + offset)
override def toDot: String = state.toString
override def max: Int = this.state
}
/**
* An aggregate state for finite automata. Represents the union of multiple base states.
*
* @param states The base states included in this aggregate state
* @tparam BaseState The type of the base states
*/
case class AggregateState(BaseState <: State(BaseState))(states: Set(BaseState)) extends State(AggregateState(BaseState)) {
def ++(other: AggregateState(BaseState)): AggregateState(BaseState) = AggregateState(this.states ++ other.states)
override def +(offset: Int): AggregateState(BaseState) = AggregateState(this.states.map(_ + offset))
override def toDot: String = s"{${this.states.map(_.toDot).mkString(",")}}"
override def max: Int = states.map(_.max).max
}
/**
* Represents an accepting state by associating a result with it
*
* @param state The accepting state
* @param result The result of the automaton if it halts in the accepting state
* @tparam State The state type of the automaton
* @tparam Result The result type of the automaton
*/
case class Acceptance(State, Result)(state: State, result: Result)
/**
* Abstract base for transition conditions of a finite automaton
*
* @tparam InputElement The type of input elements for the automaton
*/
sealed trait TransitionCondition(+InputElement) {
/**
* @return True if the transition can never be triggered, false otherwise
*/
def isEmpty: Boolean
/**
* Creates a dot-representation for this transition
*
* @param f A mapping function to map InputElements to a dot-representation
* @return The completed dot-representation
*/
def toDot(f: InputElement => String = _.toString): String
}
/**
* A transition condition described by a set of all input elements that can trigger the transition
*
* @param set A set containing all elements that trigger the associated transition
* @tparam InputElement The type of input elements for the automaton
*/
case class SetCondition(InputElement)(set: automata.util.sets.MySet(InputElement)) extends TransitionCondition(InputElement) {
override def isEmpty: Boolean = set.isEmpty
override def toDot(f: InputElement => String = _.toString): String = set.toFormattedString(f)
}
/**
* The epsilon transition condition. Always triggers and consumes no input
*/
case object EpsilonCondition extends TransitionCondition(Nothing) {
override def isEmpty: Boolean = false
override def toDot(f: Nothing => String = _.toString): String = "ϵ"
}
/**
* Represents a complete transition of an automaton
*
* @param oldState The origin state for this transition
* @param condition The condition on which the transition is triggered
* @param newState The target state of this transition.
* @tparam InputElement The type of input elements for the automaton
* @tparam StateType The type of states in the automaton
*/
case class Transition(+InputElement, StateType <: State(StateType))(oldState: StateType,
condition: TransitionCondition(InputElement),
newState: StateType) {
/**
* Applies a function on the origin and target states of this transition
*
* @param f The mapping function
* @tparam NewState The type of the new states
* @return A new Transition describing the reverse direction of this transition
*/
def mapStates(NewState <: State(NewState))(f: StateType => NewState): Transition(InputElement, NewState) =
Transition(InputElement, NewState)(f(this.oldState), condition, f(this.newState))
/**
* Reverses the transition by flipping origin and target states.
*
* @return The new, reversed transition
*/
def reverse: Transition(InputElement, StateType) = Transition(newState, condition, oldState)
/**
* @return A simple string representation of the transistion
*/
override def toString: String = {
s"$oldState with $condition to $newState"
}
/**
* @param statemap A function that maps states to their dot-representation
* @param f A function that maps input elements to their dot-representation
* @return A dot representation for this transition
*/
def toDot(statemap: StateType => String, f: InputElement => String = _.toString): String = {
s"""${statemap(oldState)} -> ${statemap(newState)} (label="${condition.toDot(f)}")"""
}
}
}
```