USGS - science for a changing world

Documentation

Writing Fault-Resistant Code

Writing programs to access USGS web services is usually straightforward. However, it can be easy to write a program that may later stop working if the USGS changes the service. You can reduce the likelihood of this happening to you by adopting some or all of our suggestions for writing fault-resistant code.

Join the Water Data System Notification Service

Most changes to services are anticipated to be minor, which means that well written programs should not break when new versions are introduced. Occasionally, a major change that is likely to break a program may have to be introduced. The USGS tries to give advance notification and when possible provide examples of data in a new format so developers can avoid having their applications break. However, this is only possible if we can communicate with you. To stay informed, you are advised to join the USGS Water Data for the Nation Notification List. We won't spam you and you should receive relatively few but important emails of significant system events or upgrades.

Check HTTP error codes

Since this system uses Hypertext Transfer Protocol (HTTP), any application errors are reported in the HTTP headers. This means that when writing applications, it is important to first examine the HTTP status code External Link that is returned in the HTTP response. The application server will return the error code along with a message describing the error in the event there is a problem. Programmers should always check the HTTP response code and if not a 200 handle the response as an exception. Among the status codes you may see:

If your application is server-based, acquire data in XML if the format exists

Most services offer data in a Extensible Markup Language (XML). XML was written specifically to minimize issues associated with changes to data formats (which is why it is called extensible). If your application is correctly written, new tags and attributes to XML data should not cause your application to fail.

We recommend you acquire data in XML for all server-based applications if it is available. For example, if you have programs written in PHP, Perl, Python, Java, JSP or ASP that collect then process and/or re-serve USGS data, you are strongly encouraged to retrieve data in a XML format. You should avoid processing data in tab-delimited (RDB) or Excel formats specifically because you are likely to eventually have the application break when the content of the data changes.

If your application is browser-based or client-based (such as a native app), acquire data in the JSON format if available

JSON (Javascript Object Notation) External Link is a compact way of acquiring data for a service optimized to be consumed by asynchronous Javascript supported by modern browsers. All services support the Cross-Origin Resource Sharing (CORS) External Link specification. If your browser supports CORS the data can be directly acquired by the browser without introducing a browser security extension.

If a service offers a version number, request that version of the service

If you specify a version number for a data format, as long as it is supported it is unlikely to have its structure changed. As an example, the USGS Instantaneous Values REST Web Service offers a WaterML 1.1 version of the service. You can specify either the most current version of WaterML (the default) or a specific version that is still supported. So rather than create a URL like this:

http://waterservices.usgs.gov/nwis/iv?format=waterml&sites=01646500&parameterCd=00060,00065

why not specify the version as well, like this?

http://waterservices.usgs.gov/nwis/iv?format=waterml,1.1&sites=01646500&parameterCd=00060,00065

Check the service format syntax to see if it supports a version.

Write your queries efficiently

Here are some easy ways to get the data you are interested in efficiently:

Warning: if the USGS determines that your usage is excessive, your IP(s) may be blocked. If you get a HTTP_403 (Access Forbidden) error, you have likely been blocked. We may require you rewrite your queries or poll less frequently to use the service. If this happens to you, contact us.

Why you should avoid tab-delimited (RDB) files

It's not always possible to avoid retrieving data in a tab-delimited format, since it is a legacy format and still useful. However, using it can be dangerous. For example, in November 2009, the USGS Water Data for the Nation site introduced a time zone column into many of its tab-delimited (RDB) files. Some of the changes introduced are shown below in bold:

agency_cd	site_no	datetime	tz_cd	07_00065	07_00065_cd	02_00060	02_00060_cd
5s	15s	16d	6s	14n	10s	14n	10s
USGS	06130500	2009-10-29 00:00	 MDT	2.35		163
USGS	06130500	2009-10-29 00:15	 MDT	2.35		163

This change allowed measurement times to be reported more accurately. However, some users who were not prepared for the change had their programs stop working. Previously, the fourth column held a number. Now it contains characters indicating the time zone.

While it is relatively simple to write a program that parses tab-delimited text and hard-code assumptions like the fourth column will always contain a number, when changes are introduced it can cause an application to fail. This is why the USGS recommends acquiring data in XML format instead of tab-delimited (RDB) format.

Parse XML using an XML parser

Almost all programming languages offer an XML parser. An XML parser is a tool that allows data inside an XML structure to be more easily manipulated. However, some programmers may prefer to use simple logic, such as reading data into an array, rather than use a parser. This is not recommended and could cause your application to break. While it does take a little effort to learn to use a parser, the effort invested pays many dividends, including making your program less likely to break if the schema changes.

Consider this simple XML document:

<?xml version="1.0"?>
<note>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>

One way to get the data would be to read each line into an array. If you needed to capture the information in the <body> tag, you could just read the sixth element of the array and strip off the tags. This would work fine until one day someone changed the schema to this:

<?xml version="1.0"?>
<note>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <priority>Normal</priority>
  <body>Don't forget me this weekend!</body>
</note>

At this point your script would return incorrect information. While you could certainly examine all elements of your array for the <body> tag, parsers will typically put the data into a hierarchy automatically. Parsers also often have methods allowing an XML hierarchy to be searched.

Use standard libraries

Avoid reinventing the wheel. Regardless of your programming language, there is probably an off the shelf library available if the data you are trying to manipulate is in a standardized format. For example, WaterML expressed dates and times in the ISO-8601 standard. If you are a Java developer, the Joda Time API External Link makes it easy to reliably parse these date/time values. Perl and PHP have similar libraries. For browser-based applications, jQuery External Link and Dojo External Link are among two of a number of popular Javascript frameworks that offer rich sets of features that are easy to integrate and will make your application look more professional.

Other tips

Consider using curl or wget to acquire data

For UNIX or Linux based systems, curl External Link or wget External Link may be available. wget is also available for Windows. These utilities allow sophisticated methods for acquiring data over the internet. Some programming languages have integrations with curl and wget. For example, PHP has an integrated curl library.

Use scheduled tasks to automate data collection

There are numerous ways to automate the downloading of data. Most operating systems come with the ability to automatically perform scheduled tasks at regular intervals. If your computer runs either Linux or some variant of Unix, the cron utility External Link will be of interest. Windows has a task scheduler External Link. You can use the appropriate utility to run your program.

Guidance on how often you should fetch data

Most USGS water data changes infrequently. Consequently, there is little need to fetch the same information repeatedly. The USGS monitors usage of its servers and if it detects users who are egregiously acquiring infrequently changing data, it may block service so others users are not impacted.

Warning on provisional data

Much of the time-series data provided by these web services are provisional. In most cases, these data are accurate, but some measurements or calculations may not be accurate. This is particularly true of recently acquired data. These errant "spikes" are often later corrected during a formal review. When time-series data are acquired in WaterML (an XML format) each measurement is marked with an attribute indicating whether the data are provisional or not.

You should be careful to qualify provisional data as provisional and treat provisional data as potentially erroneous. Guidance on USGS provisional data. External Link

Questions?

If you have any questions, use this form.