解决gbk页面采集报illegal multibyte sequence错误处理

      发布在:Python技术      评论:0 条评论
<p>最近在采集gbk网页数据发现解码报illegal multibyte sequence错误,于是采用最新的国标gb18030解决,在此记录下,方便下次翻阅。</p><pre style="background-color:#262e37;color:#ffffff;font-family:&#39;Consolas&#39;;font-size:11.3pt;"><span style="color:#75715e;"># d = pq(getres.content.decode(&#39;gbk&#39;)) <br/></span>d <span style="color:#f92672;">= </span><span style="color:#a6e22e;">pq</span>(getres.content.<span style="color:#a6e22e;">decode</span>(<span style="color:#008080;font-weight:bold;">&#39;gb18030&#39;</span>))</pre><p>参考来源:<a href="https://blog.csdn.net/mingyuli/article/details/80972575">https://blog.csdn.net/mingyuli/article/details/80972575</a></p><p>GB2312、GBK、GB18030 区别参考:<a href="https://blog.csdn.net/dataastron/article/details/79148574">https://blog.csdn.net/dataastron/article/details/79148574</a></p>
相关文章
热门推荐