Once upon a day I got in contact with shape files. This is a common format for GIS data like for example district borders. I got in contact which such data where I had to update a database with such shapes used to overlay a map. A shape file with its extension „.shp“ contains the shape polygons itself and usually comes with a „.dbf“ file (dBase 3) containing the shapes metadata like address in our example. Furthermore there can be a „.prj“ and/or „.qpj“ file containing information about the format of geo coordinates used.
Inspect the shapes metadata
There is a small tool around called dbview to open and query dBase 3 database files.
See the PROJ.4 parameters for a given projection file
The GDAL package contains a nice command line tool called gdalsrsinfo. This transforms the projection config in „.prj“ or „.qpj“ files in parameters usable by proj4j.
gdalsrsinfo my-shapes.prj PROJ.4 : '+proj=tmerc +lat_0=0 +lon_0=12 +k=1 +x_0=4500000 +y_0=0 +datum=potsdam +units=m +no_defs ' OGC WKT : PROJCS["DHDN_3_degree_Gauss_Kruger_zone_4", GEOGCS["GCS_DHDN", DATUM["Deutsches_Hauptdreiecksnetz", SPHEROID["Bessel_1841",6377397.155,299.1528128]], PRIMEM["Greenwich",0], UNIT["Degree",0.017453292519943295]], PROJECTION["Transverse_Mercator"], PARAMETER["latitude_of_origin",0], PARAMETER["central_meridian",12], PARAMETER["scale_factor",1], PARAMETER["false_easting",4500000], PARAMETER["false_northing",0], UNIT["Meter",1]]
Transform shape files at scale
We use AWS EMR with Spark to crunch stuff one could call „big data“. We also used it to merge shape files with other data and export them as CSV. The magellan library is a nice solution to work with shape files with spark. As for now it is only compatible with Spark 1.4.1 but using AWS EMR it is pretty simple to use whatever version you want. The library is implemented with spark-sql which massively simplifies working with such files.