How does Python quickly detect the validity of URL (50W +) and resolve the IP address area?

URL is stored in the text (CSV). You need to test the validity of the URL, parse the IP address and its corresponding physical location, and then append the result to the line of the URL

.

sample data

1,www.qq.com,
2,www.baidu.com,
.
.
.

expected results

1,www.qq.com,,,61.129.7.47,
2,www.baidu.com,,,14.215.177.39, 
.
.
.

A novice to python, his own idea is to read URL, in blocks and then check validity and parse IP one by one URL, etc., but this single-mode detection is bound to be crazily slow for 50W + tasks. In addition, I do not have a thorough understanding of multi-process and cooperative process, so I hope to learn multi-process and cooperative process through this example.

the bosses who want to answer leave ideas and code notes!

Feb.06,2022

this answer may be helpful to you

https://stackoverflow.com/a/1.

and this

https://github.com/lorenzog/d.


Let's start with a problem. The URL of this kind of big factory often corresponds to a lot of IP addresses, so if you don't mind this, you can do this.
build a function that takes an URL address and returns city information.
parse the file, extract the URL, and store it in the list.
gets the return value to write the file.
assuming that you have completed the second and third steps, then

import aiohttp
import asyncio
import aiofiles

async def foo(url):
    params={'url': url}
    async with aiohttp.ClientSession() as session:
        async with session.get(url='http://httpbin.org/get', params=params):
            pass

if __name__ == "__main__":
    urls = list()
    loop = asyncio.get_event_loop()
    tasks = [foo(url=url) for url in urls]
    loop.run_until_complete(asyncio.wait(tasks))
    loop.close()
Menu