class documentation

class Builder: (source)

View In Hierarchy

Performs indexing on a set of documents and returns instances of lunr.Index ready for querying. All configuration of the index is done via the builder, the fields to index, the document reference, the text processing pipeline and document scoring parameters are all set on the builder before indexing.

Method __init__ Undocumented
Method add Adds a document to the index.
Method b A parameter to tune the amount of field length normalisation that is applied when calculating relevance scores.
Method build Builds the index, creating an instance of `lunr.Index`.
Method field Adds a field to the list of document fields that will be indexed.
Method k1 A parameter that controls the speed at which a rise in term frequency results in term frequency saturation.
Method ref Sets the document field used as the document reference.
Method use Applies a plugin to the index builder.
Instance Variable average_field_length Undocumented
Instance Variable document_count Undocumented
Instance Variable field_lengths Undocumented
Instance Variable field_term_frequencies Undocumented
Instance Variable field_vectors Undocumented
Instance Variable inverted_index Undocumented
Instance Variable metadata_whitelist Undocumented
Instance Variable pipeline Undocumented
Instance Variable search_pipeline Undocumented
Instance Variable term_index Undocumented
Instance Variable token_set Undocumented
Method _calculate_average_field_lengths Calculates the average document length for this index
Method _create_field_vectors Builds a vector space model of every document using lunr.Vector.
Method _create_token_set Creates a token set of all tokens in the index using `lunr.TokenSet`
Instance Variable _b Undocumented
Instance Variable _documents Undocumented
Instance Variable _fields Undocumented
Instance Variable _k1 Undocumented
Instance Variable _ref Undocumented
def __init__(self): (source)

Undocumented

def add(self, doc, attributes=None): (source)

Adds a document to the index. Before adding documents to the index it should have been fully setup, with the document ref and all fields to index already having been specified. The document must have a field name as specified by the ref (by default this is 'id') and it should have all fields defined for indexing, though None values will not cause errors. Args: - doc (dict): The document to be added to the index. - attributes (dict, optional): A set of attributes corresponding to the document, currently a single `boost` -> int will be taken into account.

def b(self, number): (source)

A parameter to tune the amount of field length normalisation that is applied when calculating relevance scores. A value of 0 will completely disable any normalisation and a value of 1 will fully normalise field lengths. The default is 0.75. Values of b will be clamped to the range 0 - 1.

def build(self): (source)

Builds the index, creating an instance of `lunr.Index`. This completes the indexing process and should only be called once all documents have been added to the index.

def field(self, field_name, boost=1, extractor=None): (source)

Adds a field to the list of document fields that will be indexed. Every document being indexed should have this field. None values for this field in indexed documents will not cause errors but will limit the chance of that document being retrieved by searches. All fields should be added before adding documents to the index. Adding fields after a document has been indexed will have no effect on already indexed documents. Fields can be boosted at build time. This allows terms within that field to have more importance on search results. Use a field boost to specify that matches within one field are more important that other fields. Args: field_name (str): Name of the field to be added, must not include a forward slash '/'. boost (int): Optional boost factor to apply to field. extractor (callable): Optional function to extract a field from the document. Raises: ValueError: If the field name contains a `/`.

def k1(self, number): (source)

A parameter that controls the speed at which a rise in term frequency results in term frequency saturation. The default value is 1.2. Setting this to a higher value will give slower saturation levels, a lower value will result in quicker saturation.

def ref(self, ref): (source)

Sets the document field used as the document reference. Every document must have this field. The type of this field in the document should be a string, if it is not a string it will be coerced into a string by calling `str`. The default ref is 'id'. The ref should _not_ be changed during indexing, it should be set before any documents are added to the index. Changing it during indexing can lead to inconsistent results.

def use(self, fn, *args, **kwargs): (source)

Applies a plugin to the index builder. A plugin is a function that is called with the index builder as its context. Plugins can be used to customise or extend the behaviour of the index in some way. A plugin is just a function, that encapsulated the custom behaviour that should be applied when building the index. The plugin function will be called with the index builder as its argument, additional arguments can also be passed when calling use.

average_field_length = (source)

Undocumented

document_count: int = (source)

Undocumented

field_lengths: dict = (source)

Undocumented

field_term_frequencies: dict = (source)

Undocumented

field_vectors = (source)

Undocumented

inverted_index: dict = (source)

Undocumented

metadata_whitelist: list = (source)

Undocumented

pipeline = (source)

Undocumented

search_pipeline = (source)

Undocumented

term_index: int = (source)

Undocumented

token_set = (source)

Undocumented

def _calculate_average_field_lengths(self): (source)

Calculates the average document length for this index

def _create_field_vectors(self): (source)

Builds a vector space model of every document using lunr.Vector.

def _create_token_set(self): (source)

Creates a token set of all tokens in the index using `lunr.TokenSet`

Undocumented

_documents: dict = (source)

Undocumented

_fields: dict = (source)

Undocumented

Undocumented

_ref = (source)

Undocumented