For a long time, I have been wanting to soak myself up with Spatial Data Science and to play with it. Combining GIS (geographic information system), DBMS (Database Management System), Data Analytics y Big Data was fun for me, but I could not find an excuse or specific problem that I was motivated to solve.
Ernesto Talvi, candidate for the Presidency of the Republic by the Colorado Party, has been repeating to anyone who wants to listen his proposal to build 136 public secondary schools all over the country. It was in his proposal where I found a problem that interested me: where would Ernesto Talvi build these secondary schools?
In his speeches he says he would build them in “the most vulnerable neighborhoods of the country” where “only 13 out of 100 students finish secondary school”. This is reasonable since, according to the INEEd (Spanish for: National Institute of Educational Evaluation), the most important differences concerning the graduation rate of upper secondary education are found when the socioeconomic context is taken into account: the graduation rate for teenagers in a good socioeconomic context is of 64,1% and for teenagers in bad socioeconomic contexts is 12,5%.
I thought of another way of solving the problem regarding the location of the 136 secondary schools. My idea was to locate them in such a way so that the whole country is covered as much as possible; this means trying to minimize the distance from anywhere in Uruguay to a secondary school.
The first intuitive solution that occurred to me was to pinpoint all the existing secondary schools on the map of Uruguay and draw circumferences that would increase their diameter. As the diameter increases it would reach a point in which said circumferences would cover all the country except for one spot. The spot, which is outside the circumferences, is going to be the furthest to any existing secondary school and it is where we should build the first one. This entire process should be repeated 136 times more, taking into account in each repetition the new secondary schools already built. And that is it!
The first problem would be to geographically locate every existing secondary school, but I did not know if that information existed. Luckily, I ran into “El chino”, whose name is actually Daniel Carranza, Co-founder of DATA Uruguay, in a meeting. He is involved in everything related to Open Government, Open Data and Civic Tech, and he mentioned that the information I was looking for did exist. I sent him an email asking how to get that information and he replied with a link to the open data catalog.
The problem I faced was that the coordinates were in the EPSG:32721 WGS 84 / UTM zone 21S system. The area of use of this system is between the 60 ° W and 54 ° W meridians and between the 80 ° S parallel and the Ecuador. In other words, it is used in Argentina, Bolivia, Brazil, Paraguay and Uruguay.
I did not like to use that system very much; I would rather use the EPSG:4326 WGS 84 system which is used by the GPS and Google Maps. It is worldwide and not only to a specific region. Also, it is the one I am used to working with.
I struggled to try to make a function that transformed a point from one coordinate to another, I tried to find out how it could be done, but I could not. My friend Andres Aguiar (from Quanam), whom I completely trust, came to the rescue and he found (I am not sure where from) an “approximate” formula (he knows about these things because he sails). The problem with the formula is that it assumes the earth is a perfect sphere and we know it is not.
I found web pages that make the conversion and I tried to use WS type (with libraries like Python as http.client, urllib ): they assumed I was a scrapper and blocked me.
I tried to do it using Selenium (a tool used to automate tasks that one can do from their own browser, usually to automatize tests of web applications), but I could not do that either: all steps are recorded, the script is generated, but the output input does not have the method store value, store text, store title, nothing.
With the BadBoy software, an application used for similar purposes to Selenium, I was not lucky either.
I tried with several, APIS but as I did not want to pay, the amount of requests was limited for me, as well as messages like the ones below were very frequent, and it was impossible to do a sleep.thread until I could check again because it never ended:
urllib.error.HTTPError: HTTP Error 429: Too Many Requests
<geonames><status message=”the hourly limit of 1000 credits for hectorcotelo has been exceeded. Please throttle your requests or use the commercial service.” value=”19″/></geonames>
Finally, we took the easier path, something I had not though of before, a library named pyproj! This library makes cartographic transformations between geographic coordinates (lat/lon) and map projections (x/y). It can also be transformed directly from one map projection coordinate system to another. The coordinates can be given as numpy arrays, python arrays, lists, etc.
There follows an example of the code:
By solving that issue, I could start with the solution.
We extend the circumferences …
A little more…
A little bit more…
We can see in the picture above that there are three areas that are not blue by using a circumference of that size. Those spots are -57.049, -31.798 (in Paysandú, close to Paso del Parque Daymán), -30.886, -56.430 (in Salto, close to the limit with Artigas) and -32.220, -53.717 (in Cerro Largo, close to the Yaguarón river).
That solution would be easy to apply. If we have one circumference whose center is spot C (a, b) and with radius r, the ordinary equation is (x ─ a) ^2 + (y ─ b)^2 = r^2. If we replace each of these spots located in Uruguay (x, y) in each of the equations of the circumferences of the secondary schools, and all of them are equal to 0, that means that the spot is outside of all the circumferences. On the other hand, if one of them equals a negative number, it means that the spot is inside the “radius of influence” of that secondary school.
You can see that, even though it can be done, it does not seem to be easy to computerize. It is intuitively right. If the idea is for the solution to be visual then it is also right, but it is a solution that was proposed looking at the visual side of it, the way a human would solve it, versus the best way in which a computer would.
In next episodes, I will investigate other possibilities to reach a solution. I won´t give up.
Business Analytics & Information Management consultant