Continuing with the subject Development Tools, one of the tasks to be performed is a small development in Python, to be precise a simple web crawler or web spider, that recieves an url parameter and return all the links we found on that website.
As we discussed in a previous post on this subject, Python is easy to learn and in just a few minutes we got these results:
> python ecarrerasg_crawler -n 2 https://ecarrerasg.wordpress.com
This will be the main script of ecarrerasg_crawler
I add the url of the source of my version of Crawler, if anyone wants to see the code.
The class was divided into two parts, first a few small examples of the libraries used to get what we’ve seen a few lines above, you can see the few lines of code needed to make such an application Python, quickly and easily. On the other hand, a simple way to publish our programs in Python, allowing others to use, evolution or just consulting the code.
I will focus on the second part, becuase I find it interesting that exists a repository for storing applications in Python, with the aim of providing a database of programs that allow to provide visibility and enable accessibility to different libraries and applications created by other users in this language, and of course easily and fast. This repository is called PyPI “Python Package Index”, and it’s easy to upload a project and information related to it.
Thus, we can see a number of tools working together, we have a github for source repository and control version, which gives us a simple and secure place to store our programs, and lose nothing in the way; and on the other hand, a repository in order to provide accessibility and visibility to our projects in Python.
What’s next? We’ll see along the way …