Python regular matches the contents of the specified start HTML tag and end tag

python regular matches the contents of the specified start HTML tag and end tag

 <div class="pbm mbm bbda cl">
         <h2 class="mbn"></h2>
         <a href="http://bbs.aa.cc/home.php?mod=spacecp&ac=profile&op=verify&vid=1" target="_blank"><img src="http://bbs.aa.cc/data/attachment/common/c4/common_1_verify_icon.png" class="vm" alt="" title="" /></a>
        </div>

       <div class="pbm mbm bbda cl">
         <h2 class="mbn"></h2>
         <ul>
          <li> <em class="xg1"></em><span style="color:-sharpFF0000"><a href="http://bbs.aa.cc/home.php?mod=spacecp&ac=usergroup&gid=21" target="_blank"><font color="-sharpFF0000"></font></a></span> </li>
          <li><em class="xg1"></em></li>
         </ul>
         <ul id="pbbs" class="pf_l">
          <li><em></em>28 </li>
          <li><em></em>2017-7-16 18:00</li>
          <li><em></em>2018-6-26 18:00</li>
          <li><em></em>2018-6-15 19:06</li>
          <li><em></em>2018-5-26 18:11</li>
          <li><em></em></li>
         </ul>
        </div>

how does the python regular match the second div
I want this piece

<div class="pbm mbm bbda cl">
         <h2 class="mbn"></h2>
         <ul>
          <li> <em class="xg1"></em><span style="color:-sharpFF0000"><a href="http://bbs.aa.cc/home.php?mod=spacecp&ac=usergroup&gid=21" target="_blank"><font color="-sharpFF0000"></font></a></span> </li>
          <li><em class="xg1"></em></li>
         </ul>
         <ul id="pbbs" class="pf_l">
          <li><em></em>28 </li>
          <li><em></em>2017-7-16 18:00</li>
          <li><em></em>2018-6-26 18:00</li>
          <li><em></em>2018-6-15 19:06</li>
          <li><em></em>2018-5-26 18:11</li>
          <li><em></em></li>
         </ul>
        </div>

it's best to use dom operation, not regular


html='html'
result = re.findall('<div(.*?)</div>',html,re.S)
print(result[1])

result is a list, that contains two < div to < / div > content. Take out the latter, which is the second part you need.


it is difficult to match content at this level with lxml or bs4, regular expressions

Menu