Reading and writing time series data#

The TimeSeries object includes read() and write() methods to enable reading from and writing to files respectively. For example, to read from an ASCII file containing time and amplitude columns:

>>> from gwpy.timeseries import TimeSeries
>>> data = TimeSeries.read('my-data.txt')

TimeSeries.read() will attempt to automatically identify the file format based on the file extension and/or the contents of the file, however, the format keyword argument can be used to manually identify the input file-format.

The read() and write() methods take different arguments and keywords based on the input/output file format, see Built-in file formats for details on reading/writing for each of the built-in formats.

Reading remote data#

TimeSeries.read supports reading data directly from remote sources, where the input filename is a http://, https://, or ftp:// URL, as supported by Astropy’s Downloadable Data Management (astropy.utils.data).

Additionally, GWpy supports reading data directly from Pelican or OSDF URLs, where the input filename is a pelican:// or osdf:// URL, if the additional requests-pelican package is installed.

Reading data from Pelican#

>>> from gwpy.timeseries import TimeSeries
>>> data = TimeSeries.read(
...     "osdf:///gwdata/O3b/strain.4k/hdf.v1/H1/1268776960/"
...     "H-H1_GWOSC_O3b_4KHZ_R1-1269358592-4096.hdf5",
...     format="hdf5.gwosc",
... )
>>> print(data)
TimeSeries([2.69325914e-20, 2.98416387e-20, 2.56382655e-20, ...,
            3.49428204e-20, 3.19402495e-20, 2.73257992e-20],
           unit: dimensionless,
           t0: 1269358592.0 s,
           dt: 0.000244140625 s,
           name: H1:GWOSC-4KHZ_R1_STRAIN,
           channel: None)

Tip: Installing GWpy with Pelican support

To install GWpy with Pelican support, use:

pip install gwpy[pelican]

Automatic discovery of GW data#

For full details on automatic data discovery, including data from GWOSC or GWDataFind, see Data Discovery.

Built-in file formats#

ASCII#

GWpy supports writing TimeSeries (and FrequencySeries) data to ASCII in a two-column time and amplitude format.

Reading#

To read a TimeSeries from ASCII:

>>> t = TimeSeries.read('data.txt')

See numpy.loadtxt() for keyword argument options.

Writing#

To write a TimeSeries to ASCII:

>>> t.write('data.txt')

See numpy.savetxt() for keyword argument options.

GWF#

Additional dependencies: FrameCPP or FrameL or LALFrame

The raw observatory data are archived in .gwf files, a custom binary format that efficiently stores the time streams and all necessary metadata, for more details about this particular data format, take a look at the specification document LIGO-T970130.

GWF library availability#

GWpy can use any of the three named GWF input/output libraries, and will try to find them in the order they are listed (FrameCPP first, then FrameL, then LALFrame). If you need to read/write GWF files, any of them will work, but re recommend to try and install the libraries in that order; FrameCPP provides a more complete Python API than the others.

However, not all libraries may be available on all platforms, the linked pages for each library include an up-to-date listing of the supported platforms.

Reading#

To read data from a GWF file, pass the input file path (or paths) and the name of the data channel to read:

>>> data = TimeSeries.read('HLV-HW100916-968654552-1.gwf', 'L1:LDAS-STRAIN')

Note

The HLV-HW100916-968654552-1.gwf file is included with the GWpy source under /gwpy/testing/data/.

Reading a StateVector uses the same syntax:

>>> data = StateVector.read('my-state-data.gwf', 'L1:GWO-STATE_VECTOR')

For instance, to read injections flags from a GWOSC GWF file, you can use the LOSC-INJMASK channel:

>>> injections = StateVector.read('my-state-data.gwf', 'L1:LOSC-INJMASK')

Multiple files can be read by passing a list of files:

>>> data = TimeSeries.read([file1, file2], 'L1:LDAS-STRAIN')

When reading multiple files, the nproc keyword argument can be used to distribute the reading over multiple CPUs, which should make it faster:

>>> data = TimeSeries.read([file1, file2, file3, file4], 'L1:LDAS-STRAIN', nproc=2)

The above command will separate the input list of 4 file paths into two sets of 2 files, combining the results into a single TimeSeries before returning.

The start and end keyword arguments can be used to downselect data to a specific [start, end) time segment when reading:

>>> data = TimeSeries.read('HLV-HW100916-968654552-1.gwf', 'L1:LDAS-STRAIN', start=968654552.5, end=968654553)

Additionally, the following keyword arguments can be used:

Warning

These keyword arguments are only supported when using the LDAStools.frameCPP GWF API.

Keyword arguments for `TimeSeries.read`#
Keyword	Type	Default	Usage
`scaled`	`bool`	`True`	Apply ADC calibration when reading
`type`	`str`	`None`	`dict` of channel types (`'ADC'`, `'Proc'`, or `'Sim'`) for each channel to be read. This option optimises the reading operation.

Reading multiple channels#

To read multiple channels from one or more GWF files (rather than opening and closing the files multiple times), use the TimeSeriesDict or StateVectorDict classes, and pass a list of data channel names:

>>> data = TimeSeriesDict.read('HLV-HW100916-968654552-1.gwf', ['H1:LDAS-STRAIN', 'L1:LDAS-STRAIN'])

Note

A mix of TimeSeries and StateVector objects can be read by using only TimeSeriesDict class, and casting the returned data to a StateVector using view().

Writing#

To write data held in any of the gwpy.timeseries classes to a GWF file, simply use:

>>> data.write('output.gwf')

If the output file already exists it will be overwritten, use overwrite=False to prevent this (an OSError will be raised).

Note

When writing a timeseries to a GWF, the TimeSeries.name property is used for the name variable of the GWF data structures (FrProcData and FrVect). So, if you want to write a file and then read it back in, you must ensure that the name property is correctly assigned, e.g:

>>> channel = "L1:CHANNEL_NAME"
>>> output_file = "output.gwf"
>>> data = TimeSeries([1, 2, 3])
>>> data.name = channel
>>> data.write(output_file)
>>> data = TimeSeries.read(output_file, channel)

HDF5#

GWpy allows storing data in HDF5 format files, using a custom specification for storage of metadata.

Warning

To read GWOSC data from HDF5, please see HDF5 (GWOSC).

Reading#

To read TimeSeries or StateVector data held in HDF5 files pass the filename (or filenames) or the source, and the path of the data inside the HDF5 file:

>>> data = TimeSeries.read('HLV-HW100916-968654552-1.hdf', 'L1:LDAS-STRAIN')

As with GWF, the start and end keyword arguments can be used to downselect data to a specific [start, end) time segment when reading:

>>> data = TimeSeries.read('HLV-HW100916-968654552-1.hdf', 'L1:LDAS-STRAIN', start=968654552.5, end=968654553)

Analogously to GWF, you can read multiple TimeSeries from an HDF5 file via TimeSeriesDict.read():

>>> data = TimeSeriesDict.read('HLV-HW100916-968654552-1.hdf')

By default, all matching datasets in the file will be read, to restrict the output, specify the names of the datasets you want:

>>> data = TimeSeriesDict.read('HLV-HW100916-968654552-1.hdf', ['H1:LDAS-STRAIN', 'L1:LDAS-STRAIN'])

Writing#

Data held in a TimeSeries, TimeSeriesDict, `StateVector, or StateVectorDict can be written to an HDF5 file via:

>>> data.write('output.hdf')

The output argument ('output.hdf') can be a file path, an open h5py.File object, or a h5py.Group object, to append data to an existing file.

If the target file already exists, an IOError will be raised, use overwrite=True to force a new file to be written.

To write a TimeSeries to an existing file, use append=True:

>>> data.write('output.hdf', append=True)

To replace an existing dataset in an existing file, while preserving other data, use both append=True and overwrite=True:

>>> data.write('output.hdf', append=True, overwrite=True)

HDF5 (GWOSC)#

GWOSC write data in HDF5 using a custom schema that is incompatible with format='hdf5'.

Reading#

GWpy can read data from GWOSC HDF5 files using the format='hdf5.gwosc' keyword:

>>> data = TimeSeries.read(
...     "H-H1_GWOSC_16KHZ_R1-1187056280-4096.hdf5",
...     format="hdf5.gwosc",
... )

By default, TimeSeries.read() will return the contents of the /strain/Strain dataset, while StateVector.read() will return those of /quality/simple.

It’s possible to change which datasets are read with the path, value_dataset and bits_dataset keywords. For instance to read injections flags:

>>> injections = StateVector.read(
...     "H-H1_GWOSC_16KHZ_R1-1187056280-4096.hdf5",
...     format="hdf5.gwosc",
...     path="quality/injections",
...     value_dataset="Injmask",
...     bits_dataset="InjDescriptions",
... )

As with regular HDF5, the start and end keyword arguments can be used to downselect data to a specific [start, end) time segment when reading.

WAV#

Any TimeSeries can be written to / read from a WAV file using TimeSeries.read():

Warning

No metadata are stored in the WAV file except the sampling rate, so any units or GPS timing information are lost when converting to/from WAV.

Reading#

To read a TimeSeries from WAV:

>>> t = TimeSeries.read('data.wav')

See scipy.io.wavfile.read() for any keyword argument options.

Writing#

To write a TimeSeries to WAV:

>>> t.write('data.wav')

See scipy.io.wavfile.write() for keyword argument options.