class documentation

class TokenSet: (source)

View In Hierarchy

A token set is used to store the unique list of all tokens within an index. Token sets are also used to represent an incoming query to the index, this query token set and index token set are then intersected to find which tokens to look up in the inverted index. A token set can hold multiple tokens, as in the case of the index token set, or it can hold a single token as in the case of a simple query token set. Additionally token sets are used to perform wildcard matching. Leading, contained and trailing wildcards are supported, and from this edit distance matching can also be provided. Token sets are implemented as a minimal finite state automata, where both common prefixes and suffixes are shared between tokens. This helps to reduce the space used for storing the token set. TODO: consider https://github.com/glyph/automat

Class Method from_clause Undocumented
Class Method from_fuzzy_string Creates a token set representing a single string with a specified edit distance.
Class Method from_list Undocumented
Class Method from_string Creates a TokenSet from a string.
Method __init__ Undocumented
Method __repr__ Undocumented
Method __str__ Undocumented
Method intersect Returns a new TokenSet that is the intersection of this TokenSet and the passed TokenSet.
Method to_list Undocumented
Instance Variable edges Undocumented
Instance Variable final Undocumented
Instance Variable id Undocumented
Class Variable _next_id Undocumented
@classmethod
def from_clause(cls, clause): (source)

Undocumented

@classmethod
def from_fuzzy_string(cls, string, edit_distance): (source)

Creates a token set representing a single string with a specified edit distance. Insertions, deletions, substitutions and transpositions are each treated as an edit distance of 1. Increasing the allowed edit distance will have a dramatic impact on the performance of both creating and intersecting these TokenSets. It is advised to keep the edit distance less than 3.

@classmethod
def from_list(cls, list_of_words): (source)

Undocumented

@classmethod
def from_string(self, string): (source)

Creates a TokenSet from a string. The string may contain one or more wildcard characters (*) that will allow wildcard matching when intersecting with another TokenSet

def __init__(self): (source)

Undocumented

def __repr__(self): (source)

Undocumented

def __str__(self): (source)

Undocumented

def intersect(self, other): (source)

Returns a new TokenSet that is the intersection of this TokenSet and the passed TokenSet. This intersection will take into account any wildcards contained within the TokenSet.

def to_list(self): (source)

Undocumented

edges: dict = (source)

Undocumented

final: bool = (source)

Undocumented

Undocumented

_next_id: int = (source)

Undocumented