Reading and writing time series data#
The TimeSeries object includes read() and
write() methods to enable reading from and writing to files
respectively.
For example, to read from an ASCII file containing time and amplitude columns:
>>> from gwpy.timeseries import TimeSeries
>>> data = TimeSeries.read('my-data.txt')
TimeSeries.read() will attempt to automatically identify the file
format based on the file extension and/or the contents of the file, however,
the format keyword argument can be used to manually identify the input
file-format.
The read() and write() methods take
different arguments and keywords based on the input/output file format,
see Built-in file formats for details on reading/writing for
each of the built-in formats.
Reading remote data#
TimeSeries.read supports reading data directly from remote sources,
where the input filename is a http://, https://, or ftp:// URL,
as supported by Astropy’s Downloadable Data Management (astropy.utils.data).
Additionally, GWpy supports reading data directly from Pelican or OSDF
URLs, where the input filename is a pelican:// or osdf:// URL,
if the additional requests-pelican
package is installed.
>>> from gwpy.timeseries import TimeSeries
>>> data = TimeSeries.read(
... "osdf:///gwdata/O3b/strain.4k/hdf.v1/H1/1268776960/"
... "H-H1_GWOSC_O3b_4KHZ_R1-1269358592-4096.hdf5",
... format="hdf5.gwosc",
... )
>>> print(data)
TimeSeries([2.69325914e-20, 2.98416387e-20, 2.56382655e-20, ...,
3.49428204e-20, 3.19402495e-20, 2.73257992e-20],
unit: dimensionless,
t0: 1269358592.0 s,
dt: 0.000244140625 s,
name: H1:GWOSC-4KHZ_R1_STRAIN,
channel: None)
Tip: Installing GWpy with Pelican support
To install GWpy with Pelican support, use:
pip install gwpy[pelican]
Automatic discovery of GW data#
For full details on automatic data discovery, including data from GWOSC or GWDataFind, see Data Discovery.
Built-in file formats#
ASCII#
GWpy supports writing TimeSeries (and FrequencySeries) data to ASCII in a two-column time and amplitude format.
Reading#
To read a TimeSeries from ASCII:
>>> t = TimeSeries.read('data.txt')
See numpy.loadtxt() for keyword argument options.
Writing#
To write a TimeSeries to ASCII:
>>> t.write('data.txt')
See numpy.savetxt() for keyword argument options.
GWF#
Additional dependencies: FrameCPP or FrameL or LALFrame
The raw observatory data are archived in .gwf files, a custom binary
format that efficiently stores the time streams and all necessary metadata,
for more details about this particular data format,
take a look at the specification document LIGO-T970130.
GWF library availability#
GWpy can use any of the three named GWF input/output libraries, and will try to find them in the order they are listed (FrameCPP first, then FrameL, then LALFrame). If you need to read/write GWF files, any of them will work, but re recommend to try and install the libraries in that order; FrameCPP provides a more complete Python API than the others.
However, not all libraries may be available on all platforms, the linked pages for each library include an up-to-date listing of the supported platforms.
Reading#
To read data from a GWF file, pass the input file path (or paths) and the name of the data channel to read:
>>> data = TimeSeries.read('HLV-HW100916-968654552-1.gwf', 'L1:LDAS-STRAIN')
Note
The HLV-HW100916-968654552-1.gwf file is included with the GWpy source under /gwpy/testing/data/.
Reading a StateVector uses the same syntax:
>>> data = StateVector.read('my-state-data.gwf', 'L1:GWO-STATE_VECTOR')
For instance, to read injections flags from a GWOSC GWF file,
you can use the LOSC-INJMASK channel:
>>> injections = StateVector.read('my-state-data.gwf', 'L1:LOSC-INJMASK')
Multiple files can be read by passing a list of files:
>>> data = TimeSeries.read([file1, file2], 'L1:LDAS-STRAIN')
When reading multiple files, the nproc keyword argument can be used to distribute the reading over multiple CPUs, which should make it faster:
>>> data = TimeSeries.read([file1, file2, file3, file4], 'L1:LDAS-STRAIN', nproc=2)
The above command will separate the input list of 4 file paths into two sets of 2 files, combining the results into a single TimeSeries before returning.
The start and end keyword arguments can be used to downselect data to a specific [start, end) time segment when reading:
>>> data = TimeSeries.read('HLV-HW100916-968654552-1.gwf', 'L1:LDAS-STRAIN', start=968654552.5, end=968654553)
Additionally, the following keyword arguments can be used:
Warning
These keyword arguments are only supported when using the
LDAStools.frameCPP GWF API.
Reading multiple channels#
To read multiple channels from one or more GWF files (rather than opening and closing the files multiple times), use the TimeSeriesDict or StateVectorDict classes, and pass a list of data channel names:
>>> data = TimeSeriesDict.read('HLV-HW100916-968654552-1.gwf', ['H1:LDAS-STRAIN', 'L1:LDAS-STRAIN'])
Note
A mix of TimeSeries and StateVector objects can be read by using only TimeSeriesDict class, and casting the returned data to a StateVector using view().
Writing#
To write data held in any of the gwpy.timeseries classes to a GWF file, simply use:
>>> data.write('output.gwf')
If the output file already exists it will be overwritten, use overwrite=False to
prevent this (an OSError will be raised).
Note
When writing a timeseries to a GWF, the TimeSeries.name
property is used for the name variable of the GWF data structures
(FrProcData and FrVect).
So, if you want to write a file and then read it back in, you must ensure
that the name property is correctly assigned, e.g:
>>> channel = "L1:CHANNEL_NAME"
>>> output_file = "output.gwf"
>>> data = TimeSeries([1, 2, 3])
>>> data.name = channel
>>> data.write(output_file)
>>> data = TimeSeries.read(output_file, channel)
HDF5#
GWpy allows storing data in HDF5 format files, using a custom specification for storage of metadata.
Warning
To read GWOSC data from HDF5, please see HDF5 (GWOSC).
Reading#
To read TimeSeries or StateVector data held in HDF5 files pass the filename (or filenames) or the source, and the path of the data inside the HDF5 file:
>>> data = TimeSeries.read('HLV-HW100916-968654552-1.hdf', 'L1:LDAS-STRAIN')
As with GWF, the start and end keyword arguments can be used to downselect data to a specific [start, end) time segment when reading:
>>> data = TimeSeries.read('HLV-HW100916-968654552-1.hdf', 'L1:LDAS-STRAIN', start=968654552.5, end=968654553)
Analogously to GWF, you can read multiple TimeSeries from an HDF5 file via TimeSeriesDict.read():
>>> data = TimeSeriesDict.read('HLV-HW100916-968654552-1.hdf')
By default, all matching datasets in the file will be read, to restrict the output, specify the names of the datasets you want:
>>> data = TimeSeriesDict.read('HLV-HW100916-968654552-1.hdf', ['H1:LDAS-STRAIN', 'L1:LDAS-STRAIN'])
Writing#
Data held in a TimeSeries, TimeSeriesDict, `StateVector, or StateVectorDict can be written to an HDF5 file via:
>>> data.write('output.hdf')
The output argument ('output.hdf') can be a file path, an open h5py.File object, or a h5py.Group object, to append data to an existing file.
If the target file already exists, an IOError will be raised, use overwrite=True to force a new file to be written.
To write a TimeSeries to an existing file, use append=True:
>>> data.write('output.hdf', append=True)
To replace an existing dataset in an existing file, while preserving other data, use both append=True and overwrite=True:
>>> data.write('output.hdf', append=True, overwrite=True)
HDF5 (GWOSC)#
GWOSC write data in HDF5 using a custom schema that is incompatible
with format='hdf5'.
Reading#
GWpy can read data from GWOSC HDF5 files using the format='hdf5.gwosc'
keyword:
>>> data = TimeSeries.read(
... "H-H1_GWOSC_16KHZ_R1-1187056280-4096.hdf5",
... format="hdf5.gwosc",
... )
By default, TimeSeries.read() will return the contents of the
/strain/Strain dataset, while StateVector.read() will return those
of /quality/simple.
It’s possible to change which datasets are read with the path, value_dataset
and bits_dataset keywords.
For instance to read injections flags:
>>> injections = StateVector.read(
... "H-H1_GWOSC_16KHZ_R1-1187056280-4096.hdf5",
... format="hdf5.gwosc",
... path="quality/injections",
... value_dataset="Injmask",
... bits_dataset="InjDescriptions",
... )
As with regular HDF5, the start and end keyword arguments can be used
to downselect data to a specific [start, end) time segment when reading.
WAV#
Any TimeSeries can be written to / read from a WAV file using TimeSeries.read():
Warning
No metadata are stored in the WAV file except the sampling rate, so any units or GPS timing information are lost when converting to/from WAV.
Reading#
To read a TimeSeries from WAV:
>>> t = TimeSeries.read('data.wav')
See scipy.io.wavfile.read() for any keyword argument options.
Writing#
To write a TimeSeries to WAV:
>>> t.write('data.wav')
See scipy.io.wavfile.write() for keyword argument options.