Earlier this week, I was talking with a friend who just moved from Massachusetts to Cleveland, they were a bit surprised about the use of middle school and junior high were both used to describe schools consisting of Grades 6-8.
I was curious about this myself and wondered which was used more often in Ohio and this general outline that I use for counting the use of middle school and junior high in ohio can be used applied if you want to see which tag is used more often in a specific state, country, or other place.
Of course, there’s multiple ways to do this and I wish there an easier way but here’s what a I did….
Because I was comparing 2 tags within an entire state, I couldn’t use the USA implementation of taginfo which uses the entire USA, and I couldn’t open an OSM file containing the entire state of ohio in an osm editor like josm..
So, for my state of ohio, I figured this answer out by:
1. downloading an extract of my state from geofabrik.
- Create the postgis database (I have ubuntu 12.04, postgis 2.0, postgresql 9.1, I assume you already have this installed; commands are different if you’re using postgis 1.5… I should explain this step out more clearly in a different post since I never found good introductory documentation for postgis/postgresql/osm when I first starting learning this back in ’11.)
2a. createdb nameofyourdatabase
2.b psql -d nameofyourdatabase -c “CREATE EXTENSION postgis;”
3. Fill your database with the OSM data through the osm2pgsql software by the following command: osm2pgsql -s -d nameofyourdb ~/path/to/data.osm.pbf
- Using pgadmin, with the gui, I connected to my database and I clicked on magnifying glass with SQL within it and entered the following SQL. input the following SQL:
SELECT name from planet_osm_polygon WHERE lower(name) ~ ‘junior high’ UNION ALL select name from planet_osm_point WHERE lower(name) ~ ‘junior high’ ORDER BY name
(thanks to Paul Norman to assist me with the proper SQL query syntax).
SELECT name from planet_osm_polygon WHERE lower(name) ~ ‘junior high’
this means is that I selected all closed ways that has the name ‘junior high’ in it (case insensitive).
Now here’s what the finer points of what syntax means:
* name – This is the column ‘name’. Now, osm2pgsql creates columns based on the first half, also called the ‘key’, of an osm tag. Osm tags are written out as key=value
Other columns generated by your osm2pgsql include highway, amenity, leisure, and many more. Because it’s highly unlikely that there’s a store named junior high and the sake of simplicity, I didn’t need to specify that a tag must have amenity=school and have junior high in its name.
- planet_osm_polygon – This is the name of a table in osm2pgsql that contains all closed ways. Here’s the names of the other tables in osm2pgsql.
- WHERE – specifies the condition in which I want to select the name table. If I wanted to query a simple tag that has a standard key and value, like amenity=parking ; I could simplify do WHERE amenity IN (‘parking’). But since ‘junior high’ occurs in the middle of a text phrase, the tilde (known as ~) will search for the pattern ‘junior high’ within the tag value.
So, this will return results for: name=Mooney Junior High School; name=Junior High School; name=wilkens junior high ; regardless of case sensitivity. (I admit I don’t fully understand this aspect of the syntax, so someone clarify if I’m incorrect !)
- UNION ALL – this allows you to do multiple queries within one and include the results of both queries at once .
select name from planet_osm_point WHERE lower(name) ~ ‘junior high’ ORDER BY name
Because OSM objects can be tagged as either nodes or as ways, I need to also search for any nodes that have junior high in its name ! The name of the nodes’ table is planet_osm_point and the structure of the syntax is nearly the same.
- ORDER BY – this is simple, it merely sorts the results by a column. In this instance, I want to sort them in alphabetical order, so I did ORDER BY name.
Now, we can execute our query by clicking “Execute query” (its icon looks like a play button on a DVD/VCR),
Now in pgadmin’s lower-right hand corner, will be the number of results that are returned and the names of all of the schools with junior high in it…
So, we see: 219 of Junior High in Ohio ! There’s a few duplicate ones, which is interesting. Some may be the same name in 2 different places, some may be duplicate nodes of the same school.
And we repeat the process again for middle school, and… 389 !
Middle school is used more often in Ohio than in Junior High… :)
As you continue using osm2pgsql and working with OSM data, you’ll realize that if you are interested in generated statistics of tags or creating maps using postgresql in OSM data, most of your interaction with postgresql will be creating queries and using SELECT statements. You’ll want to guide your learning on that.