- 
								How to clean up some unwanted HTML attributes in crawler data
								
 for example, for the following data 
<p id="a">data
 I just want to keep 
data
 is there a quick way to do this? 
... 
- 
								Pyspider cross-task send_message has no effect
								
 first project 
self.send_message("DETAIL", {  url  : href }, url=   msg %s  %href)
 second project name  "DETAIL " 
@every(minutes=7 * 60)
def on_start(self):
    pass
        
@config(priority=3)
def on_message(self, project, msg):
    self.... 
- 
								Pyspider reports an error after running detail page
								
 index page, can be displayed after the first run, but an error will be reported as soon as you run detail page 
 
 
... 
- 
								Pyspider pkg_resources.DistributionNotFound: wsgidav
								
 the pyspider installation prompt was successful and there was a pkg_resources.DistributionNotFound: wsgidav problem at run time. 
[root@localhost ~]-sharp pip install pyspider
Collecting pyspider
  Downloading https:  files.pythonhosted.org packages df ... 
- 
								Pyspider crawler result gets data-bgimage attribute value
								
 <a href="testtese" target="_blank" data-bgimage="testtese">< a>
 the a tag acquired by the crawler contains href, target, data-bgimage and other attributes, which can be obtained with this.attr.href and this.at... 
- 
								May I ask pyspider how to climb a web page with regular url, content in json format?
								
 for example, there are 10 url:   http:  www.baidu.com userid=1 http:  www.baidu.com userid=2 http:  www.baidu.com userid=3. http:  www.baidu.com userid=10
 the content of the web page is 
{
    "data": {
        "1": {
            &q... 
- 
								After pyspider run, log prompts the tornado_fetcher.py file to report an error with the encoding problem.
								   there is no problem starting to use the default taskdb,projectdb. If you change it to mysql storage, you will throw this exception .... 
- 
								Pyspider debugging is correct, but automatic running has no result.
								
 1. Write a pyspider script, debug and run without error, and can also be inserted into the database, but after the first successful automatic run, it will never run successfully again. The prompt message is all success, but no data is inserted.  the cod... 
- 
								Docker follows the tutorials to deploy MYSQL problems encountered with pyspider,.
								
 execute the command:  docker run-- name scheduler-d-- link mysql:mysql-- link rabbitmq:rabbitmq binux pyspider:latest scheduler 
 finally, there was a problem with the deployment of webui. I went to check the scheduler log:  docker logs scheduler:  the ... 
- 
								Excuse me, how does the pyspider, running on the centos7.2 server open webui through the public network IP?
								
 excuse me, how does the pyspider, running on the centos7.2 server open webui? through the public network IP?  config is written like this 
{
    "scheduler" : {
        "xmlrpc-host": "0.0.0.0",
        "delete-time&qu... 
- 
								How does pyspider judge the end of a task?
								
 I now set the crawl to be performed automatically every 30 minutes   
 because the data has to be processed before it can be saved to the database, I need to process it after one round of the task.  before I set automatic execution, I used  "on_finished... 
- 
								Pyspider crawler page contains lazy load lazy-load, to get no data
								 use pyspider to get  Mango TV  page  popular variety  column content (  div.mg-main ul > li.v-item  ), because the page uses a lazy loading mode, so can not get specific information, how to let the page to load this part of the content, and then get the ... 
- 
								The pyspider task restarts, but the result shows that none
								
< H2 > ask for advice. I don  t quite understand why the error report on the terminal is none, and I don  t know what it has to do with on_result. <   H2 >
-sharp! usr bin env python
-sharp -*- encoding: utf-8 -*-
-sharp Created on 2018-05-22 15:22:51
-s... 
- 
								Pyspider uses the on_message method and does not return result
								
 use the send_message and on_message methods to handle situations where multiple task results are returned from a single page, and prepare to override the on_result method for further processing. However, the msg returned by the on_message method is not ... 
- 
								Using pyspider to call phantomjs to render the page Times error: "no response from phantomjs", status code 599
								 use pyspider to call phantomjs to render the page. Error:  "no response from phantomjs ", status code 599. Phantomjs works on the terminal, but an error is reported as soon as you use the pyspider call, and both pyspider and phantomjs search for the late... 
- 
								How does pyspider kill duplicate queues in scheduler
								
Click RUN on the  console and report this  [E 180704 09:49:46 scheduler:1223] 1062 (23000): Duplicate entry   on_start   for key   PRIMARY   ).
 mysql.connector.errors.IntegrityError: 1062 (23000): Duplicate entry   on_start   for key   PRIMARY   )
 norm... 
- 
								Pyspider can't handle Tmall International at all.
								headerrequestspyspiderfetch_type="js"URL>1024
phantomjsrestartfetch_errorfetch_error
... 
- 
								Does pyspider support mongodb clusters as taskdb?
								
 problem description 
 capture answers similar to Zhihu because there are so many answers from Zhihu, response.save is used to save the results of crawling ahead 
 because Zhihu site cannot be crawled too fast, the task may not be completed in time 
 so ... 
- 
								What if pyspider always hangs up items and disappears on the server?
								 centos7  pyspider  1, run in the background with nohup pyspider all > pyspider.log 2 > & 1 & occasionally hang up  2, and there is no reason for outputting pyspider.log.  3, what if the previously written project disappears after restarting pyspider. ... 
- 
								What is the reason for pyspider processor:202 and tornado_fetcher:212 abnormal error reporting? What should be done?
								
 problem description 
 when there are many pyspider projects, it is always stuck there and cannot run tasks automatically 
 the environmental background of the problems and what methods you have tried 
 it is not possible to add more than one processor f...