Python crawler regularity problem

<tr>
                                    <td>8</td>
                                    <td>
                                        
                                            
                                                
                                            
                                            
                                        
                                    </td>
                                    <td>
                                        
                                            
                                            3
                                        
                                    </td>
                                    <td>
                                        
                                            
                                            
                                                300
                                            
                                        
                                    </td>
                                    <td>
                                        <a href="javascript:;" onclick="plan_edit("76261");"></a>
                                    </td>
                                </tr>
                            
                                <tr>
                                    <td>7</td>
                                    <td>
                                        
                                            
                                                
                                            
                                            
                                        
                                    </td>
                                    <td>
                                        
                                            
                                            1
                                        
                                    </td>
                                    <td>
                                        
                                            
                                            
                                                300
                                            
                                        
                                    </td>
                                    <td>
                                        <a href="javascript:;" onclick="plan_edit("76246");"></a>
                                    </td>
                                </tr>
                            
                                <tr>
                                    <td>5</td>
                                    <td>
                                        
                                            
                                                
                                            
                                            
                                        
                                    </td>
                                    <td>
                                        
                                            
                                            1
                                        
                                    </td>
                                    <td>
                                        
                                            
                                            
                                                300
                                            
                                        
                                    </td>
                                    <td>
                                        <a href="javascript:;" onclick="plan_edit("76181");"></a>
                                    </td>
                                </tr>
                            
                                <tr>
                                    <td>4</td>
                                    <td>
                                        
                                            
                                                
                                            
                                            
                                        
                                    </td>
                                    <td>
                                        
                                            
                                            1
                                        
                                    </td>
                                    <td>
                                        
                                            
                                            
                                                300
                                            
                                        
                                    </td>
                                    <td>
                                        <a href="javascript:;" onclick="plan_edit("76179");"></a>
                                    </td>
                                </tr>
                            
                                <tr>
                                    <td>3</td>
                                    <td>
                                        
                                            
                                                
                                            
                                            
                                        
                                    </td>
                                    <td>
                                        
                                            
                                            0
                                        
                                    </td>
                                    <td>
                                        
                                            
                                            
                                                300
                                            
                                        
                                    </td>
                                    <td>
                                        <a href="javascript:;" onclick="plan_edit("76176");"></a>
                                    </td>
                                </tr>
                                

I want to match tests 8 to 3. I wrote this

.
 feeds_plan_campaign_name = re.findall("""<tr>[.\S\s]*<td>(.*?)</td>[.\S\s]*<td>[.\S\s]*</td>[.\S\s]*<td>[.\S\s]*</td>[.\S\s]*<td>[.\S\s]*</td>[.\S\s]*<td>[.\S\s]*</td>[.\S\s]*</tr>""",feeds_plan_page_data.text,re.S)
                                    print len(feeds_plan_campaign_name)
                                    for k in range(len(feeds_plan_campaign_name)):
                                        print "name1" + feeds_plan_campaign_name[k]
                                

the only printed result is Test 3. Why is that?

Feb.28,2021

for this structured content (with full html tags), I recommend against using regular expressions regex,. A better way is to use Xpath,. If you don't know what Xpath, is, please learn w3school .

in just three minutes, you will love this tool as much as I do. (escape)

if you insist on using regularities, I wrote one, which is really troublesome! And there is no portability. As follows, for reference:

'''
regex = re.compile("<tr>[\s]*<td>([\u4e00-\u9fa50-9]*)</td>[\s]*<td>[\s]*([\u4e00-\u9fa5]*)
[\s]*</td>[\s]*<td>[\s]*([0-9]*)[\s]*</td>[\s]*<td>[\s]*([0-9]*)")
'''
-sharp (, ''' ''',)

the returned value is a list (multidimensional array) you need.


admire the way the landlord understands the rules < td > [.Ss] < / td > [.Ss] Don't a bunch of repetitions expand with parentheses?
if you don't want to capture, use a non-capture parenthesis (?: < td > [.Ss] < / td > [.Ss] ) *

there is only one capture bracket for the rule above you. What else do you expect to return


Why not try beautifulsoup4 (laugh)


>>> from pyquery import PyQuery
>>> print([i.text for i in PyQuery(s)('tr > td:first')])
['8', '7', '5', '4', '3']
Menu