Writing programs to access USGS web services is usually straightforward. However, it can be easy to write a program that may later stop working if the USGS changes the service. You can reduce the likelihood of this happening to you by adopting some or all of our suggestions for writing fault-resistant code.
Most changes to services are anticipated to be minor, which means that well written programs should not break when new versions are introduced. Occasionally, a major change that is likely to break a program may have to be introduced. The USGS tries to give advance notification and when possible provide examples of data in a new format so developers can avoid having their applications break. However, this is only possible if we can communicate with you. To stay informed, you are advised to join the USGS Water Data for the Nation Notification List. We won't spam you and you should receive relatively few but important emails of significant system events or upgrades.
Since this system uses Hypertext Transfer Protocol (HTTP), any application errors are reported in the HTTP headers. This means that when writing applications, it is important to first examine the HTTP status code that is returned in the HTTP response. The application server will return the error code along with a message describing the error in the event there is a problem. Programmers should always check the HTTP response code and if not a 200 handle the response as an exception. Among the status codes you may see:
Most services offer data in a Extensible Markup Language (XML). XML was written specifically to minimize issues associated with changes to data formats (which is why it is called extensible). If your application is correctly written, new tags and attributes to XML data should not cause your application to fail.
We recommend you acquire data in XML for all server-based applications if it is available. For example, if you have programs written in PHP, Perl, Python, Java, JSP or ASP that collect then process and/or re-serve USGS data, you are strongly encouraged to retrieve data in a XML format. You should avoid processing data in tab-delimited (RDB) or Excel formats specifically because you are likely to eventually have the application break when the content of the data changes.
If you specify a version number for a data format, as long as it is supported it is unlikely to have its structure changed. As an example, the USGS Instantaneous Values REST Web Service offers a WaterML 1.1 version of the service. You can specify either the most current version of WaterML (the default) or a specific version that is still supported. So rather than create a URL like this:
why not specify the version as well, like this?
Check the service format syntax to see if it supports a version.
Here are some easy ways to get the data you are interested in efficiently:
Warning: if the USGS determines that your usage is excessive, your IP(s) may be blocked. If you get a HTTP_403 (Access Forbidden) error, you have likely been blocked. We may require you rewrite your queries or poll less frequently to use the service. If this happens to you, contact us.
It's not always possible to avoid retrieving data in a tab-delimited format, since it is a legacy format and still useful. However, using it can be dangerous. For example, in November 2009, the USGS Water Data for the Nation site introduced a time zone column into many of its tab-delimited (RDB) files. Some of the changes introduced are shown below in bold:
agency_cd site_no datetime tz_cd 07_00065 07_00065_cd 02_00060 02_00060_cd 5s 15s 16d 6s 14n 10s 14n 10s USGS 06130500 2009-10-29 00:00 MDT 2.35 163 USGS 06130500 2009-10-29 00:15 MDT 2.35 163
This change allowed measurement times to be reported more accurately. However, some users who were not prepared for the change had their programs stop working. Previously, the fourth column held a number. Now it contains characters indicating the time zone.
While it is relatively simple to write a program that parses tab-delimited text and hard-code assumptions like the fourth column will always contain a number, when changes are introduced it can cause an application to fail. This is why the USGS recommends acquiring data in XML format instead of tab-delimited (RDB) format.
Almost all programming languages offer an XML parser. An XML parser is a tool that allows data inside an XML structure to be more easily manipulated. However, some programmers may prefer to use simple logic, such as reading data into an array, rather than use a parser. This is not recommended and could cause your application to break. While it does take a little effort to learn to use a parser, the effort invested pays many dividends, including making your program less likely to break if the schema changes.
Consider this simple XML document:
<?xml version="1.0"?> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
One way to get the data would be to read each line into an array. If you needed to capture the information in the <body> tag, you could just read the sixth element of the array and strip off the tags. This would work fine until one day someone changed the schema to this:
<?xml version="1.0"?> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <priority>Normal</priority> <body>Don't forget me this weekend!</body> </note>
At this point your script would return incorrect information. While you could certainly examine all elements of your array for the <body> tag, parsers will typically put the data into a hierarchy automatically. Parsers also often have methods allowing an XML hierarchy to be searched.
For UNIX or Linux based systems, curl or wget may be available. wget is also available for Windows. These utilities allow sophisticated methods for acquiring data over the internet. Some programming languages have integrations with curl and wget. For example, PHP has an integrated curl library.
There are numerous ways to automate the downloading of data. Most operating systems come with the ability to automatically perform scheduled tasks at regular intervals. If your computer runs either Linux or some variant of Unix, the cron utility will be of interest. Windows has a task scheduler . You can use the appropriate utility to run your program.
Most USGS water data changes infrequently. Consequently, there is little need to fetch the same information repeatedly. The USGS monitors usage of its servers and if it detects users who are egregiously acquiring infrequently changing data, it may block service so others users are not impacted.
Much of the time-series data provided by these web services are provisional. In most cases, these data are accurate, but some measurements or calculations may not be accurate. This is particularly true of recently acquired data. These errant "spikes" are often later corrected during a formal review. When time-series data are acquired in WaterML (an XML format) each measurement is marked with an attribute indicating whether the data are provisional or not.
You should be careful to qualify provisional data as provisional and treat provisional data as potentially erroneous. Guidance on USGS provisional data.
If you have any questions, use this form.