Finds copy-pasted lines of code in a project.
Method | __init__ |
Undocumented |
Method | append |
Append a file to search for similarities. |
Method | combine |
Reduces and recombines data into a format that we can report on. |
Method | get |
Returns the data we can use for a map/reduce process. |
Method | run |
Start looking for similarities and display results on stdout. |
Instance Variable | linesets |
Undocumented |
Instance Variable | namespace |
Undocumented |
Method | _compute |
Compute similarities in appended files. |
Method | _display |
Display computed similarities on stdout. |
Method | _find |
Find similarities in the two given linesets. |
Method | _get |
Create a report from similarities. |
Method | _iter |
Iterate on similarities among all files, by making a Cartesian product. |
int
= DEFAULT_MIN_SIMILARITY_LINE, ignore_comments: bool
= False, ignore_docstrings: bool
= False, ignore_imports: bool
= False, ignore_signatures: bool
= False):
(source)
¶
pylint.checkers.similar.SimilarChecker
Undocumented
Reduces and recombines data into a format that we can report on. The partner function of get_map_data()
pylint.checkers.similar.SimilarChecker
Returns the data we can use for a map/reduce process. In this case we are returning this instance's Linesets, that is all file information that will later be used for vectorisation.
LineSet
, lineset2: LineSet
) -> Generator[ Commonality, None, None]
:
(source)
¶
Find similarities in the two given linesets. This the core of the algorithm. The idea is to compute the hashes of a minimal number of successive lines of each lineset and then compare the hashes. Every match of such comparison is stored in a dict that links the couple of starting indices in both linesets to the couple of corresponding starting and ending lines in both files. Last regroups all successive couples in a bigger one. It allows to take into account common chunk of lines that have more than the minimal number of successive lines required.