How to efficiently query and mark whether A contains B in two excel files (Aline B)

mainly wants to do tagging for named entity recognition;
specifically, there are two excel files called A (n row 1 column), B (n row 1 column).
for example, each line in An is a descriptive sentence, and each line in B is an entity name
how to efficiently implement functions similar to the following:
for index,row in A.iterrows ():

row[""]BA

ask for advice.

Mar.06,2021

if both An and B are big enough, you can put the content of B in the AC automaton, and then use the content matching of A)


feel that this is similar to token parsing.
can use or merge each line of B as a regular expression.
for each row of A to match the above regularities, getting all the matching token


AC automata on the current row through re.finditer is the best choice, which is efficient enough. You can take a look at the https://github.com/vi3k6i5/fl. library, which contains an AC automaton implementation for finding and replacing.

Menu