Uncategorized

Pyncf – NetCDF files in pure Python

pynetcdf

Ever wanted to handle NetCDF in pure Python? So have I.

Inspired by the pyshp library, which provides simple pythonic and dependency free data access to vector data, I wanted to create a library for an increasingly popular file format in the raster part of the GIS world, namely, NetCDF. From landuse to climate data, data sought after by GIS practioners are increasingly often found only in the NetCDF format. So I started looking into the format specification to see if it would be easy to set up some basic reading, and to my surprise, after only a few days I ended up with a basic working version that I named Pyncf, available on GitHub. 

My problem was that existing NetCDF libraries for python all rely on interfacing with underlying C based implementations and can be hard to setup outside the context of a full GDAL or SciPy stack.

But most of the complexity of the format is in reading the metadata in the header, which makes it easy to implement in python and should not have to suffer from the slowness of python. Reading the actual data, which NetCDF can store a lot of, is where one might argue that a C implementation is needed for reasons of speed. But given that the main purpose of the format data model is to provide efficient access to any part of its vast data without having to read all of it via byte offset pointers, this too can be easily and relatively efficiently implemented in python without significant slowdowns. Besides, in many cases, the main use of NetCDF is not for storing enormously vast raster arrays, but rather for storing multiple relatively small raster arrays on different themes, and of providing variations of these across some dimension, such as time.

All of this makes it feasible and desirable with a pure python implementation for reading and writing NetCDF files, expanding access to the various data sources now using this format to a much broader set of users and applications, especially in portable environments.

As of right now, basic metadata and data extraction is functional, but has not been tested very extensively, so likely to contain some issues. No file writing implemented yet. Only Classic and 64-bit formats supported so far, though NetCDF-4 should be easy to implement.

In addition to possible API changes I am contemplating changing the name, some of the obvious names were already taken, though I think I will likely stick to Pyncf. Open to ideas and suggestions either way, as well as contributions and issues raised on GitHub.

Documentation is so far a little sparse, so how about some basic examples.

Basically, you load some data file which allows access to its meta data in the “header” attribute, a dictionary structure based exactly on the format specification, which you will just have to explore for now:

import pynetcdf
ncfile = pynetcdf.NetCDF(filepath="somefile.nc")
headerdict = ncfile.header

For more intuitive access to metadata there are also some more specific methods for that, all retrieving dictionaries:

ncfile.get_dimensions()
nc.get_diminfo("time")

ncfile.get_nonrecord_variables()
ncfile.get_record_variables()
nc.get_varinfo("temperature")

When it comes to actual data retrieval, there are two main methods. One for reading a dimension’s index values if defined in a variable, and another for retrieving a 2d list of lists of a multidimensional variable’s data values, by specifying which two dimensions to get your data for and fixing all remaining dimensions at a certain value:

timelabels = ncfile.read_dimension_values("time")
datamatrix = ncfile.read_2d_data(ydim="latitude", xdim="longitude", time=43)
Advertisement

PyCRS – A package for reading and formatting CRS definitions

pycrslogo

Python should have a standalone GIS library focused solely on coordinate reference system metadata. That is, a library focused on the various formats used to store and represent crs definitions, including OGC WKT, ESRI WKT, Proj4, and various short-codes defined by organizations like EPSG, ESRI, and SR-ORG. Correctly parsing and converting between these formats is essential in many types of GIS work. For instance when trying to use PyProj to transform coordinates from a non-proj4 crs format. Or when wanting to convert the crs from a GeoJSON file to a .prj file. Or when simply adding a crs definition to a file that was previously missing one.

Currently, the only way to read and convert between crs formats was to use the extensive Python GDAL suite and its srs submodule, but the requirements of some applications might exclude the use of GDAL. There have also been some online websites/services, but these only allow partial lookups or one-way conversion from one format to another.

I therefore created a new package called PyCRS which I hope will make it easier for lightweight applications to read a broader range of data files and correctly interpret and possibly transform their crs definitions. Written entirely in Python I also hope it will help clarify the differences between the various formats, and make it easier for more people to help keep it up-to-date and bug-free.

You can pip install it with “pip install pycrs”, or check it out on GitHub.

Currently, the supported formats are OGC WKT (v1), ESRI WKT, Proj4, and any EPSG, ESRI, or SR-ORG code available from spatialreference.org. In the future I hope to add support for OGC URN identifier strings, and GeoTIFF file tags.

The package is still in alpha version, so it will not perfectly parse or convert between all crs, and it is likely to have several (hopefully minor) differences from the results of other parsers like GDAL. In the source repository there is a tester.py script, which uses a barrage of commonly used crs as listed on http://www.remotesensing.org/geotiff/proj_list/. Currently, the overall success rate for loading as well as converting between the three main formats is 70-90%, and visual inspections of rendering the world with each crs generally look correct. However, whether the converted crs strings are logically equivalent to each other from a mathematical standpoint is something that needs a more detailed quality check.

Pytess: A Convenient GIS Tesselator package

pytess

Bill Simons and Carston Farmer made a great Python script that created Delauney triangulations and Voronoi diagrams from a set of input points. In my experience however, it was not entirely intuitive or straight forward how to send over your data points, and the results were not immediately usable and required a bit of processing before you could treat them as regular GIS polygons. Also, their script was not up on PyPi and did not have a very Pythonic API syntax either.

So with a few touch ups I built on their script and released a new Pytess tesselator package on PyPi! Go ahead and pip install pytess or check it out on GitHub!

Python GIS Challenge, the Book

B04085_MockupCover_Normal

The Python GIS Challenge is going to be culminating in a book I am writing for Packt Publishing. Short and concise, the book contains descriptions of how to implement a wide range of essential GIS features in Python, including embedding all of it in an interactive GUI application. All in the spirit of as lightweight and as few dependencies as possible, and tailored especially for beginners while also useful for advanced users. Due out this month or next, check out the book description where you can also preorder it:

https://www.packtpub.com/application-development/python-geospatial-development-essentials

PyGeoj – A simple Python Geojson file reader and writer.

geojson

Just wrapped up a new library called PyGeoj that makes it a breeze to read and write geojson as files. By treating geojson as an actual fileformat instead of just as a set of formatting rules, the aim was that PyGeoj should make it just as easy to deal with .geojson files as with .shp files — sort of like the PyShp for Geojson data.

Since the arrival of the Geojson format, there has been an increasing ability to send and receive geometry types between different libraries when doing Python GIS programming. As useful as this is, it seems to me that this has made geojson more of an inter-library communication language than an actual file-format (like the shapefile). Using the built-in json library and playing with the geojson dictionary directly is one option, using the python-geojson library is another, but both require a fairly intimate knowledge of the format specification (ie more suitable for developers than actual end-users).

PyGeoj focuses on only a few basic classes, and hopefully intuitive attributes and methods. As such, loading a file can be done with:

       testfile = pygeoj.load(“testfile.geojson”)

Resulting in a file instance whose file-information like “crs” or “bbox” and features can be accessed, edited, and eventually saved:

      print testfile.bbox
      for feature in testfile:
             print feature.properties
      testfile.save(“testfile_copy.geojson”)

Although geojson is generally slower and more memory consuming than the shapefile format, PyGeoj will hopefully encourage use of the geojson format for everyday-tasks or sending data to other libraries across “long distances” (eg online).

Shapy – A new pure-Python Shapely in the works

ImageOver the matter of the last few days, I was working on understanding the pure-Python version of Angus Johnson’s polygon Clipper library (that would be me to the right about now). As I got it working and wanted an easy way to test it I began adding more and more functionality, and before I knew it turned out to be quite solid and useful.

The package I ended up writing consiImagests of a bunch of geometrical
shape objects that the user can
create, measure, and manipulate with operations like intersect, union, distance, etc, which uses Angus’ Clipper and some of my own code in the background. This might sound a lot like the Sean Gillies’ Shapely library, and that’s fine because I modeled it directly on its geometry
types, attribute names, and methods. And hence the name Shapy
– a “lighter” pure-Python version of Shapely.

Some additional cool features are:

  • All shape types support the __geo_interface__ attribute so they can easily be sent to and used with other packages.
  • Shapes can be automatically created from any Python object that has the Image__geo_interface__ attribute.
  • And most but not least, a .view() method will visualize the shape for you, without having to think about imaging library dependencies, since Shapy comes packaged along with my other pure-Python PyDraw module. This is especially cool since you can for instance use the pure-Python PyShp shapefile reader to loop through shapes and instantly view each shape up close, since the shapes have the __geo_interface__ protocol.

Note though that Shapy is still in the works and lacks some features, but it can be used for very basic playing around with. See the docs on the Github page for more info on what’s currently supported, and available commands.

I would love contributions or suggestions on this one, since we’re dealing with quite advanced geometrical operations and vector math. Eventually it will probably need some optimization as well, but for now I just want to get it up and running with the basics.

Introducing a Pure-Python Quadtree Spatial Index for GIS-use

Image

I just finished writing a quadtree spatial index package in pure python which is aimed for GIS use.

Spatial index packages for Python already exist, a popular one being the Rtree package, which comes in both a C and pure-Python variety. In comparison to Rtree indexes however, QuadTree indexes are supposedly preferable over Rtree indexes when the index has to be updated often. I only know of one QuadTree index package for Python, but that one was last updated in 2004, so I’m not sure if it is compatible with newer versions of Python, and it has to be compiled.

And so I threw together the pure-Python PyQuadTree package; for portable and easy-install uses, and when heavy updates to the index is expected. In reality I didn’t really make the quadtree index code, that credit goes to Matt Rasmussen’s original Quadtree code. What I did was I added support for irregularly shaped quad trees instead of only squares, and simplified the front-end user functions for GIS users. The API syntax is purposely made very similar to the Rtree package and so should be familiar to users of that package. Exact guidelines for how to use it is found on the Github package page linked to below.

Click here to go the Github page where you can download and try the package.

GeoVis v0.2.0 – Tons of New Features

A major new update of the GeoVis map-making library is now available.

Image

New features in v0.2.0 include:

  • Attribute classification with legend (categorical, equal classes, equal interval, natural breaks)
  • Render points as circle, square, or pyramid
  • Map zooming
  • Customize map with basic shapes and text
  • And more . . .

Check out all new features, full documentation, and examples from the code repository at GitHub.

First Milestone Reached: The Visualizing Library is Now Complete!

readme_topbanner

The first milestone in our journey has been reached! I just completed the visualizing library for viewing your shapefiles. It is very easy to install, simple to use, and can be used in many flexible ways. I have written more about it elsewhere, and besides the project GitHub frontpage describes and documents it quite well, so I will not go into depth about it right now. So you can either check it out from the Downloads page, or go directly to its Github repository.

And the Challenge Begins!

Hilly Road

The Python-GIS Challenge is a series of open-source projects to connect and integrate many of Python’s existing spatial extensions into a single easy-to-use library. Each project will focus on a particular GIS-related field and will be made available for free download once it is completed. This website will document the progress, challenges, and setbacks experienced along this journey.