pylint.checkers.similar.Similar

class documentation

class Similar: (source)

Known subclasses: pylint.checkers.similar.SimilarChecker

Finds copy-pasted lines of code in a project.

Method	`__init__`	Undocumented
Method	`append_stream`	Append a file to search for similarities.
Method	`combine_mapreduce_data`	Reduces and recombines data into a format that we can report on.
Method	`get_map_data`	Returns the data we can use for a map/reduce process.
Method	`run`	Start looking for similarities and display results on stdout.
Instance Variable	`linesets`	Undocumented
Instance Variable	`namespace`	Undocumented
Method	`_compute_sims`	Compute similarities in appended files.
Method	`_display_sims`	Display computed similarities on stdout.
Method	`_find_common`	Find similarities in the two given linesets.
Method	`_get_similarity_report`	Create a report from similarities.
Method	`_iter_sims`	Iterate on similarities among all files, by making a Cartesian product.

def __init__(self, min_lines: int = DEFAULT_MIN_SIMILARITY_LINE, ignore_comments: bool = False, ignore_docstrings: bool = False, ignore_imports: bool = False, ignore_signatures: bool = False): (source) ¶

overridden in pylint.checkers.similar.SimilarChecker

Undocumented

def append_stream(self, streamid: str, stream: STREAM_TYPES, encoding: str|None = None): (source) ¶

Append a file to search for similarities.

def combine_mapreduce_data(self, linesets_collection: list[list[LineSet]]): (source) ¶

Reduces and recombines data into a format that we can report on. The partner function of get_map_data()

def get_map_data(self) -> list[LineSet]: (source) ¶

overridden in pylint.checkers.similar.SimilarChecker

Returns the data we can use for a map/reduce process. In this case we are returning this instance's Linesets, that is all file information that will later be used for vectorisation.

def run(self): (source) ¶

Start looking for similarities and display results on stdout.

linesets = (source) ¶

overridden in pylint.checkers.similar.SimilarChecker

Undocumented

namespace = (source) ¶

Undocumented

def _compute_sims(self) -> list[tuple[int, set[LinesChunkLimits_T]]]: (source) ¶

Compute similarities in appended files.

def _display_sims(self, similarities: list[tuple[int, set[LinesChunkLimits_T]]]): (source) ¶

Display computed similarities on stdout.

def _find_common(self, lineset1: LineSet, lineset2: LineSet) -> Generator[Commonality, None, None]: (source) ¶

Find similarities in the two given linesets. This the core of the algorithm. The idea is to compute the hashes of a minimal number of successive lines of each lineset and then compare the hashes. Every match of such comparison is stored in a dict that links the couple of starting indices in both linesets to the couple of corresponding starting and ending lines in both files. Last regroups all successive couples in a bigger one. It allows to take into account common chunk of lines that have more than the minimal number of successive lines required.

def _get_similarity_report(self, similarities: list[tuple[int, set[LinesChunkLimits_T]]]) -> str: (source) ¶

Create a report from similarities.

def _iter_sims(self) -> Generator[Commonality, None, None]: (source) ¶

Iterate on similarities among all files, by making a Cartesian product.