Writing Fault Resistant Code

Questions that are most often asked that do not fall in any specific service topic.

Writing programs to access USGS web services is usually straightforward. However, it can be easy to write a program that may later stop working if the USGS changes the service. You can reduce the likelihood of this happening to you by adopting some or all of our suggestions for writing fault-resistant code.

Join the Water Data System Notification Service

Most changes to services are anticipated to be minor, which means that well written programs should not break when new versions are introduced. Occasionally, a major change that is likely to break a program may have to be introduced. The USGS tries to give advance notification and when possible provide examples of data in a new format so developers can avoid having their applications break. However, this is only possible if we can communicate with you. To stay informed, you are advised to join the USGS Water Data for the Nation Notification List . We won’t spam you and you should receive relatively few but important emails of significant system events or upgrades.

Check HTTP error codes

Since this system uses Hypertext Transfer Protocol (HTTP), any application errors are reported in the HTTP headers. This means that when writing applications, it is important to first examine the HTTP status code that is returned in the HTTP response. The application server will return the error code along with a message describing the error in the event there is a problem. Programmers should always check the HTTP response code and if not a 200 handle the response as an exception. Among the status codes you may see:

  • 200 OK - The request was successfully executed.
  • 400 Bad Request - This often occurs if the URL arguments are inconsistent, for example in the instantaneous values service using startDT and endDT with the period argument. An accompanying error should describe why the request was bad.
  • 403 Access Forbidden - This should only occur if for some reason the USGS has blocked your Internet Protocol (IP) address from using the service. This can happen if we believe that your use of the service is so excessive that it is seriously impacting others using the service. To get unblocked, send us the URL you are using along with the IP using this form . We may require changes to your query and frequency of use in order to give you access to the service again.
  • 404 Not Found - Returned if and only if the query expresses a combination of elements where data do not exist. For multi-site queries, if any data are found, it is returned for those site/parameters/date ranges where there are data.
  • 500 Internal Server Error - If you see this, it means there is a problem with the web service itself. It usually means the application server is down unexpectedly. This could be caused by a host of conditions but changing your query will not solve this problem. The application support team has to fix it. Most of these errors are quickly detected and the support team is notified if they occur.
  • 503 Service Unavailable - The application server is working but this application is down at the moment. When something causes this to happen, the support team should be quickly notified. Hopefully the service will be available shortly.

If your application is server-based, acquire data in XML if the format exists

Most services offer data in a Extensible Markup Language (XML). XML was written specifically to minimize issues associated with changes to data formats (which is why it is called extensible). If your application is correctly written, new tags and attributes to XML data should not cause your application to fail.

We recommend you acquire data in XML for all server-based applications if it is available. For example, if you have programs written in PHP, Perl, Python, Java, JSP or ASP that collect then process and/or re-serve USGS data, you are strongly encouraged to retrieve data in a XML format. You should avoid processing data in tab-delimited (RDB) or Excel formats specifically because you are likely to eventually have the application break when the content of the data changes.

If your application is browser-based or client-based (such as a native app), acquire data in the JSON format if available

JSON (Javascript Object Notation) is a compact way of acquiring data for a service optimized to be consumed by asynchronous Javascript supported by modern browsers. All services support the Cross-Origin Resource Sharing (CORS) specification. If your browser supports CORS the data can be directly acquired by the browser without introducing a browser security extension.

If a service offers a version number, request that version of the service

If you specify a version number for a data format, as long as it is supported it is unlikely to have its structure changed. As an example, the USGS Instantaneous Values REST Web Service offers a WaterML 1.1 version of the service. You can specify either the most current version of WaterML (the default) or a specific version that is still supported. So rather than create a URL like this:

https://waterservices.usgs.gov/nwis/iv?format=waterml&sites=01646500&parameterCd=00060,00065

why not specify the version as well, like this?

https://waterservices.usgs.gov/nwis/iv?format=waterml,1.1&sites=01646500&parameterCd=00060,00065

Check the service format syntax to see if it supports a version.

Write your queries efficiently

Here are some easy ways to get the data you are interested in efficiently:

  • If regularly polling for certain site numbers, retrieve them in one query, by including them in the sites parameter. Separate site numbers with commas. Ex:

    https://waterservices.usgs.gov/nwis/iv/?format=waterml,2.0&sites=01646500,01638500&parameterCd=00060,00065

  • Similarly, if you need certain parameters from each site of interest, separate the parameter codes with commas. If the site does not serve a particular parameter, no error will occur. The parameter simply will not appear in the output for that site. See the previous example, which will look for 00060 (discharge) and 00065 (gage height) at both sites, even if not present.

  • If you keep a local cache of data, don’t poll for the same data over and over again. Instead, poll for new or changed data using the modifiedSince argument. See each service description for examples of how this is used.

  • For instantaneous values, polling hourly is usually sufficient, since typically a site transmits only once an hour when it is permitted to broadcast to broadcast to the satellite.

  • For daily values, polling once a day is sufficient as daily values are typically computed once a day.

  • When using the statistics service, bear in mind that historical statistics rarely if ever change. The same query should never need to be rerun.

  • Site information rarely changes, so if polling the site service to see if site information has changed, once a month is more than adequate. Certain period of record information changes daily if you need that detail, if the site is actively collecting time-series data.

Warning: if the USGS determines that your usage is excessive, your IP(s) may be blocked. If you get a HTTP_403 (Access Forbidden) error, you have likely been blocked. We may require you rewrite your queries or poll less frequently to use the service. If this happens to you, contact us .

Why you should avoid tab-delimited (RDB) files

It’s not always possible to avoid retrieving data in a tab-delimited format, since it is a legacy format and still useful. However, using it can be dangerous. For example, in November 2009, the USGS Water Data for the Nation site introduced a time zone column into many of its tab-delimited (RDB) files. Some of the changes introduced are shown below in bold:

agency_cd site_no datetime **tz_cd** 07_00065 07_00065_cd 02_00060 02_00060_cd
5s 15s 16d **6s** 14n 10s 14n 10s
USGS 06130500 2009-10-29 00:00 **MDT** 2.35 163
USGS 06130500 2009-10-29 00:15 **MDT** 2.35 163

This change allowed measurement times to be reported more accurately. However, some users who were not prepared for the change had their programs stop working. Previously, the fourth column held a number. Now it contains characters indicating the time zone.

While it is relatively simple to write a program that parses tab-delimited text and hard-code assumptions like the fourth column will always contain a number, when changes are introduced it can cause an application to fail. This is why the USGS recommends acquiring data in XML format instead of tab-delimited (RDB) format.

Parse XML using an XML parser

Almost all programming languages offer an XML parser. An XML parser is a tool that allows data inside an XML structure to be more easily manipulated. However, some programmers may prefer to use simple logic, such as reading data into an array, rather than use a parser. This is not recommended and could cause your application to break. While it does take a little effort to learn to use a parser, the effort invested pays many dividends, including making your program less likely to break if the schema changes.

Consider this simple XML document:

<?xml version="1.0"?>
<note>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>

One way to get the data would be to read each line into an array. If you needed to capture the information in thetag, you could just read the sixth element of the array and strip off the tags. This would work fine until one day someone changed the schema to this:

<?xml version="1.0"?>
<note>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <priority>Normal</priority>
  <body>Don't forget me this weekend!</body>
</note>

At this point your script would return incorrect information. While you could certainly examine all elements of your array for the <body> tag, parsers will typically put the data into a hierarchy automatically. Parsers also often have methods allowing an XML hierarchy to be searched.

Use standard libraries

Avoid reinventing the wheel. Regardless of your programming language, there is probably an off the shelf library available if the data you are trying to manipulate is in a standardized format. For example, WaterML expressed dates and times in the ISO-8601 standard. If you are a Java developer, the Joda Time API makes it easy to reliably parse these date/time values. Perl and PHP have similar libraries. For browser-based applications, jQuery and Dojo are among two of a number of popular Javascript frameworks that offer rich sets of features that are easy to integrate and will make your application look more professional.

Other tips

Consider using curl or wget to acquire data

For UNIX or Linux based systems, curl or wget may be available. wget is also available for Windows. These utilities allow sophisticated methods for acquiring data over the internet. Some programming languages have integrations with curl and wget. For example, PHP has an integrated curl library.

Use scheduled tasks to automate data collection

There are numerous ways to automate the downloading of data. Most operating systems come with the ability to automatically perform scheduled tasks at regular intervals. If your computer runs either Linux or some variant of Unix, the cron utility will be of interest. Windows has a task scheduler . You can use the appropriate utility to run your program.

Guidance on how often you should fetch data

Most USGS water data changes infrequently. Consequently, there is little need to fetch the same information repeatedly. The USGS monitors usage of its servers and if it detects users who are egregiously acquiring infrequently changing data, it may block service so others users are not impacted.

  • Instantaneous values are updated generally once an hour (sometimes less often) for a real-time site. Typically, recent instantaneous values become available and older values although typically marked as provisional are not manually corrected. Do not fetch the same data more than hourly. We encourage use of the modifiedSince feature of the service which offers a “send me the data only if something has changed” feature and using a local cache otherwise.
  • Daily values change rarely. Daily values are computations based on time-series measurements for a particular day and for a particular parameter. A daily value is only recomputed if a USGS water science center has corrected one or more of the time-series values for a given day. We recommend checking to see if old daily values have changed for a particular site no more than monthly.
  • Site data changes very rarely. We recommend acquiring a range of sites no more than monthly.
  • Statistics are based on daily values. They should change only if a USGS water science center approves provisional data. This is done irregularly, so follow the guidance for daily values.

Warning on provisional data

Much of the time-series data provided by these web services are provisional. In most cases, these data are accurate, but some measurements or calculations may not be accurate. This is particularly true of recently acquired data. These errant “spikes” are often later corrected during a formal review. When time-series data are acquired in WaterML (an XML format) each measurement is marked with an attribute indicating whether the data are provisional or not.

You should be careful to qualify provisional data as provisional and treat provisional data as potentially erroneous. Guidance on USGS provisional data.

Questions?

If you have any questions, use this form .