docutils.parsers.rst.states

module documentation

(source)

This is the ``docutils.parsers.rst.states`` module, the core of the reStructuredText parser. It defines the following: :Classes: - `RSTStateMachine`: reStructuredText parser's entry point. - `NestedStateMachine`: recursive StateMachine. - `RSTState`: reStructuredText State superclass. - `Inliner`: For parsing inline markup. - `Body`: Generic classifier of the first line of a block. - `SpecializedBody`: Superclass for compound element members. - `BulletList`: Second and subsequent bullet_list list_items - `DefinitionList`: Second+ definition_list_items. - `EnumeratedList`: Second+ enumerated_list list_items. - `FieldList`: Second+ fields. - `OptionList`: Second+ option_list_items. - `RFC2822List`: Second+ RFC2822-style fields. - `ExtensionOptions`: Parses directive option fields. - `Explicit`: Second+ explicit markup constructs. - `SubstitutionDef`: For embedded directives in substitution definitions. - `Text`: Classifier of second line of a text block. - `SpecializedText`: Superclass for continuation lines of Text-variants. - `Definition`: Second line of potential definition_list_item. - `Line`: Second line of overlined section title or transition marker. - `Struct`: An auxiliary collection class. :Exception classes: - `MarkupError` - `ParserError` - `MarkupMismatch` :Functions: - `escape2null()`: Return a string, escape-backslashes converted to nulls. - `unescape()`: Return a string, nulls removed or restored to backslashes. :Attributes: - `state_classes`: set of State classes used with `RSTStateMachine`. Parser Overview =============== The reStructuredText parser is implemented as a recursive state machine, examining its input one line at a time. To understand how the parser works, please first become familiar with the `docutils.statemachine` module. In the description below, references are made to classes defined in this module; please see the individual classes for details. Parsing proceeds as follows: 1. The state machine examines each line of input, checking each of the transition patterns of the state `Body`, in order, looking for a match. The implicit transitions (blank lines and indentation) are checked before any others. The 'text' transition is a catch-all (matches anything). 2. The method associated with the matched transition pattern is called. A. Some transition methods are self-contained, appending elements to the document tree (`Body.doctest` parses a doctest block). The parser's current line index is advanced to the end of the element, and parsing continues with step 1. B. Other transition methods trigger the creation of a nested state machine, whose job is to parse a compound construct ('indent' does a block quote, 'bullet' does a bullet list, 'overline' does a section [first checking for a valid section header], etc.). - In the case of lists and explicit markup, a one-off state machine is created and run to parse contents of the first item. - A new state machine is created and its initial state is set to the appropriate specialized state (`BulletList` in the case of the 'bullet' transition; see `SpecializedBody` for more detail). This state machine is run to parse the compound element (or series of explicit markup elements), and returns as soon as a non-member element is encountered. For example, the `BulletList` state machine ends as soon as it encounters an element which is not a list item of that bullet list. The optional omission of inter-element blank lines is enabled by this nested state machine. - The current line index is advanced to the end of the elements parsed, and parsing continues with step 1. C. The result of the 'text' transition depends on the next line of text. The current state is changed to `Text`, under which the second line is examined. If the second line is: - Indented: The element is a definition list item, and parsing proceeds similarly to step 2.B, using the `DefinitionList` state. - A line of uniform punctuation characters: The element is a section header; again, parsing proceeds as in step 2.B, and `Body` is still used. - Anything else: The element is a paragraph, which is examined for inline markup and appended to the parent element. Processing continues with step 1.

Class	`Body`	Generic classifier of the first line of a block.
Class	`BulletList`	Second and subsequent bullet_list list_items.
Class	`Definition`	Second line of potential definition_list_item.
Class	`DefinitionList`	Second and subsequent definition_list_items.
Class	`EnumeratedList`	Second and subsequent enumerated_list list_items.
Class	`Explicit`	Second and subsequent explicit markup construct.
Class	`ExtensionOptions`	Parse field_list fields for extension options.
Class	`FieldList`	Second and subsequent field_list fields.
Class	`Inliner`	Parse inline markup; call the `parse()` method.
Class	`InterpretedRoleNotImplementedError`	Undocumented
Class	`Line`	Second line of over- & underlined section title or transition marker.
Class	`LineBlock`	Second and subsequent lines of a line_block.
Class	`MarkupError`	Undocumented
Class	`MarkupMismatch`	Undocumented
Class	`NestedStateMachine`	StateMachine run from within other StateMachine runs, to parse nested document structures.
Class	`OptionList`	Second and subsequent option_list option_list_items.
Class	`ParserError`	Undocumented
Class	`QuotedLiteralBlock`	Nested parse handler for quoted (unindented) literal blocks.
Class	`RFC2822Body`	RFC2822 headers are only valid as the first constructs in documents. As soon as anything else appears, the `Body` state should take over.
Class	`RFC2822List`	Second and subsequent RFC2822-style field_list fields.
Class	`RSTState`	reStructuredText State superclass.
Class	`RSTStateMachine`	reStructuredText's master StateMachine.
Class	`SpecializedBody`	Superclass for second and subsequent compound element members. Compound elements are lists and list-like constructs.
Class	`SpecializedText`	Superclass for second and subsequent lines of Text-variants.
Class	`Struct`	Stores data attributes for dotted-attribute access.
Class	`SubstitutionDef`	Parser for the contents of a substitution_definition element.
Class	`Text`	Classifier of second line of a text block.
Class	`UnknownInterpretedRoleError`	Undocumented
Function	`build_regexp`	Build, compile and return a regular expression based on `definition`.
Variable	`state_classes`	Standard set of State classes used to start `RSTStateMachine`.
Function	`_loweralpha_to_int`	Undocumented
Function	`_lowerroman_to_int`	Undocumented
Function	`_upperalpha_to_int`	Undocumented

def build_regexp(definition, compile=True): (source)

Build, compile and return a regular expression based on `definition`. :Parameter: `definition`: a 4-tuple (group name, prefix, suffix, parts), where "parts" is a list of regular expressions and/or regular expression definitions to be joined into an or-group.

state_classes = (source)

Standard set of State classes used to start `RSTStateMachine`.

def _loweralpha_to_int(s, _zero=ord('a')-1): (source)

Undocumented

def _lowerroman_to_int(s): (source)

Undocumented

def _upperalpha_to_int(s, _zero=ord('A')-1): (source)

Undocumented