Python regular matches the contents of the specified start HTML tag and end tag

python regular matches the contents of the specified start HTML tag and end tag

 <div class="pbm mbm bbda cl">
         <h2 class="mbn"></h2>
         <a href="" target="_blank"><img src="" class="vm" alt="" title="" /></a>

       <div class="pbm mbm bbda cl">
         <h2 class="mbn"></h2>
          <li> <em class="xg1"></em><span style="color:-sharpFF0000"><a href="" target="_blank"><font color="-sharpFF0000"></font></a></span> </li>
          <li><em class="xg1"></em></li>
         <ul id="pbbs" class="pf_l">
          <li><em></em>28 </li>
          <li><em></em>2017-7-16 18:00</li>
          <li><em></em>2018-6-26 18:00</li>
          <li><em></em>2018-6-15 19:06</li>
          <li><em></em>2018-5-26 18:11</li>

how does the python regular match the second div
I want this piece

<div class="pbm mbm bbda cl">
         <h2 class="mbn"></h2>
          <li> <em class="xg1"></em><span style="color:-sharpFF0000"><a href="" target="_blank"><font color="-sharpFF0000"></font></a></span> </li>
          <li><em class="xg1"></em></li>
         <ul id="pbbs" class="pf_l">
          <li><em></em>28 </li>
          <li><em></em>2017-7-16 18:00</li>
          <li><em></em>2018-6-26 18:00</li>
          <li><em></em>2018-6-15 19:06</li>
          <li><em></em>2018-5-26 18:11</li>

it's best to use dom operation, not regular

result = re.findall('<div(.*?)</div>',html,re.S)

result is a list, that contains two < div to < / div > content. Take out the latter, which is the second part you need.

it is difficult to match content at this level with lxml or bs4, regular expressions
