Python Natural language processing, encountered a problem?

my question is that I now have three sentences
case1 = u"to deal with the management in front of the shop in Li Village"
case2 = u "Licun River Patrol"
case3 ="I am doing river management work by the Licun River"to compare these three sentences with the sentences related to the content of "Li Cun River Governance"?
which related technologies are used? Hope to provide relevant solutions

the first type

< H1 > calculate the str1 (input value keyword) and str2 to be compared, and return similarity < / H1 >

def simicos (str1, str2):

-sharp str2 

word_list = [word for word in jieba.cut(str2)]

all_word_list = [word_list, []]

-sharp str1 

word_test_list = [word for word in jieba.cut(str1)]

-sharp 

dictionary = corpora.Dictionary(all_word_list)

-sharp BOW

corpus = [dictionary.doc2bow(word) for word in all_word_list]

word_test_vec = dictionary.doc2bow(word_test_list)

-sharp TFIDFtf-idf 

tfidf = models.TfidfModel(corpus)

-sharp tf-idf

-sharp print(tfidf[corpus])

similar = similarities.SparseMatrixSimilarity(

    tfidf[corpus], num_features=len(dictionary.keys()))

sim = similar[tfidf[word_test_vec]]

-sharp print(sim)

return sim[0]


:
-sharp str1str2

def simicos (str1, str2):

cut_str1 = [w for w, t in posseg.lcut(str1) if "n" in t or "v" in t]
cut_str2 = [w for w, t in posseg.lcut(str2) if "n" in t or "v" in t]
all_words = set(cut_str1 + cut_str2)
freq_str1 = [cut_str1.count(x) for x in all_words]
freq_str2 = [cut_str2.count(x) for x in all_words]
sum_all = sum(map(lambda z, y: z * y, freq_str1, freq_str2))
sqrt_str1 = math.sqrt(sum(x ** 2 for x in freq_str1))
sqrt_str2 = math.sqrt(sum(x ** 2 for x in freq_str2))
return sum_all / (sqrt_str1 * sqrt_str2)

neither of these two schemes can achieve word meaning matching, how to better match a sentence (the published task, the text may be very long, and what the task does every day, how to better match


based on semantics, don't just use tf-idf model, try lsi and lda model.

Menu