module documentation

This is the ``docutils.parsers.rst.states`` module, the core of the reStructuredText parser. It defines the following: :Classes: - `RSTStateMachine`: reStructuredText parser's entry point. - `NestedStateMachine`: recursive StateMachine. - `RSTState`: reStructuredText State superclass. - `Inliner`: For parsing inline markup. - `Body`: Generic classifier of the first line of a block. - `SpecializedBody`: Superclass for compound element members. - `BulletList`: Second and subsequent bullet_list list_items - `DefinitionList`: Second+ definition_list_items. - `EnumeratedList`: Second+ enumerated_list list_items. - `FieldList`: Second+ fields. - `OptionList`: Second+ option_list_items. - `RFC2822List`: Second+ RFC2822-style fields. - `ExtensionOptions`: Parses directive option fields. - `Explicit`: Second+ explicit markup constructs. - `SubstitutionDef`: For embedded directives in substitution definitions. - `Text`: Classifier of second line of a text block. - `SpecializedText`: Superclass for continuation lines of Text-variants. - `Definition`: Second line of potential definition_list_item. - `Line`: Second line of overlined section title or transition marker. - `Struct`: An auxiliary collection class. :Exception classes: - `MarkupError` - `ParserError` - `MarkupMismatch` :Functions: - `escape2null()`: Return a string, escape-backslashes converted to nulls. - `unescape()`: Return a string, nulls removed or restored to backslashes. :Attributes: - `state_classes`: set of State classes used with `RSTStateMachine`. Parser Overview =============== The reStructuredText parser is implemented as a recursive state machine, examining its input one line at a time. To understand how the parser works, please first become familiar with the `docutils.statemachine` module. In the description below, references are made to classes defined in this module; please see the individual classes for details. Parsing proceeds as follows: 1. The state machine examines each line of input, checking each of the transition patterns of the state `Body`, in order, looking for a match. The implicit transitions (blank lines and indentation) are checked before any others. The 'text' transition is a catch-all (matches anything). 2. The method associated with the matched transition pattern is called. A. Some transition methods are self-contained, appending elements to the document tree (`Body.doctest` parses a doctest block). The parser's current line index is advanced to the end of the element, and parsing continues with step 1. B. Other transition methods trigger the creation of a nested state machine, whose job is to parse a compound construct ('indent' does a block quote, 'bullet' does a bullet list, 'overline' does a section [first checking for a valid section header], etc.). - In the case of lists and explicit markup, a one-off state machine is created and run to parse contents of the first item. - A new state machine is created and its initial state is set to the appropriate specialized state (`BulletList` in the case of the 'bullet' transition; see `SpecializedBody` for more detail). This state machine is run to parse the compound element (or series of explicit markup elements), and returns as soon as a non-member element is encountered. For example, the `BulletList` state machine ends as soon as it encounters an element which is not a list item of that bullet list. The optional omission of inter-element blank lines is enabled by this nested state machine. - The current line index is advanced to the end of the elements parsed, and parsing continues with step 1. C. The result of the 'text' transition depends on the next line of text. The current state is changed to `Text`, under which the second line is examined. If the second line is: - Indented: The element is a definition list item, and parsing proceeds similarly to step 2.B, using the `DefinitionList` state. - A line of uniform punctuation characters: The element is a section header; again, parsing proceeds as in step 2.B, and `Body` is still used. - Anything else: The element is a paragraph, which is examined for inline markup and appended to the parent element. Processing continues with step 1.

Class Body Generic classifier of the first line of a block.
Class BulletList Second and subsequent bullet_list list_items.
Class Definition Second line of potential definition_list_item.
Class DefinitionList Second and subsequent definition_list_items.
Class EnumeratedList Second and subsequent enumerated_list list_items.
Class Explicit Second and subsequent explicit markup construct.
Class ExtensionOptions Parse field_list fields for extension options.
Class FieldList Second and subsequent field_list fields.
Class Inliner Parse inline markup; call the `parse()` method.
Class InterpretedRoleNotImplementedError Undocumented
Class Line Second line of over- & underlined section title or transition marker.
Class LineBlock Second and subsequent lines of a line_block.
Class MarkupError Undocumented
Class MarkupMismatch Undocumented
Class NestedStateMachine StateMachine run from within other StateMachine runs, to parse nested document structures.
Class OptionList Second and subsequent option_list option_list_items.
Class ParserError Undocumented
Class QuotedLiteralBlock Nested parse handler for quoted (unindented) literal blocks.
Class RFC2822Body RFC2822 headers are only valid as the first constructs in documents. As soon as anything else appears, the `Body` state should take over.
Class RFC2822List Second and subsequent RFC2822-style field_list fields.
Class RSTState reStructuredText State superclass.
Class RSTStateMachine reStructuredText's master StateMachine.
Class SpecializedBody Superclass for second and subsequent compound element members. Compound elements are lists and list-like constructs.
Class SpecializedText Superclass for second and subsequent lines of Text-variants.
Class Struct Stores data attributes for dotted-attribute access.
Class SubstitutionDef Parser for the contents of a substitution_definition element.
Class Text Classifier of second line of a text block.
Class UnknownInterpretedRoleError Undocumented
Function build_regexp Build, compile and return a regular expression based on `definition`.
Variable state_classes Standard set of State classes used to start `RSTStateMachine`.
Function _loweralpha_to_int Undocumented
Function _lowerroman_to_int Undocumented
Function _upperalpha_to_int Undocumented
def build_regexp(definition, compile=True): (source)

Build, compile and return a regular expression based on `definition`. :Parameter: `definition`: a 4-tuple (group name, prefix, suffix, parts), where "parts" is a list of regular expressions and/or regular expression definitions to be joined into an or-group.

state_classes = (source)

Standard set of State classes used to start `RSTStateMachine`.

def _loweralpha_to_int(s, _zero=ord('a')-1): (source)

Undocumented

def _lowerroman_to_int(s): (source)

Undocumented

def _upperalpha_to_int(s, _zero=ord('A')-1): (source)

Undocumented