Reading and writing time series data#

The TimeSeries object includes read() and write() methods to enable reading from and writing to files respectively. For example, to read from an ASCII file containing time and amplitude columns:

>>> from gwpy.timeseries import TimeSeries
>>> data = TimeSeries.read('my-data.txt')

TimeSeries.read() will attempt to automatically identify the file format based on the file extension and/or the contents of the file, however, the format keyword argument can be used to manually identify the input file-format.

The read() and write() methods take different arguments and keywords based on the input/output file format, see Built-in file formats for details on reading/writing for each of the built-in formats.

Reading remote data#

TimeSeries.read supports reading data directly from remote sources, where the input filename is a http://, https://, or ftp:// URL, as supported by Astropy’s Downloadable Data Management (astropy.utils.data).

Additionally, GWpy supports reading data directly from Pelican or OSDF URLs, where the input filename is a pelican:// or osdf:// URL, if the additional requests-pelican package is installed.

Reading data from Pelican#
>>> from gwpy.timeseries import TimeSeries
>>> data = TimeSeries.read(
...     "osdf:///gwdata/O3b/strain.4k/hdf.v1/H1/1268776960/"
...     "H-H1_GWOSC_O3b_4KHZ_R1-1269358592-4096.hdf5",
...     format="hdf5.gwosc",
... )
>>> print(data)
TimeSeries([2.69325914e-20, 2.98416387e-20, 2.56382655e-20, ...,
            3.49428204e-20, 3.19402495e-20, 2.73257992e-20],
           unit: dimensionless,
           t0: 1269358592.0 s,
           dt: 0.000244140625 s,
           name: H1:GWOSC-4KHZ_R1_STRAIN,
           channel: None)

Tip: Installing GWpy with Pelican support

To install GWpy with Pelican support, use:

pip install gwpy[pelican]

Automatic discovery of GW data#

For full details on automatic data discovery, including data from GWOSC or GWDataFind, see Data Discovery.

Built-in file formats#

ASCII#

GWpy supports writing TimeSeries (and FrequencySeries) data to ASCII in a two-column time and amplitude format.

Reading#

To read a TimeSeries from ASCII:

>>> t = TimeSeries.read('data.txt')

See numpy.loadtxt() for keyword argument options.

Writing#

To write a TimeSeries to ASCII:

>>> t.write('data.txt')

See numpy.savetxt() for keyword argument options.

GWF#

Additional dependencies: FrameCPP or FrameL or LALFrame

The raw observatory data are archived in .gwf files, a custom binary format that efficiently stores the time streams and all necessary metadata, for more details about this particular data format, take a look at the specification document LIGO-T970130.

GWF library availability#

GWpy can use any of the three named GWF input/output libraries, and will try to find them in the order they are listed (FrameCPP first, then FrameL, then LALFrame). If you need to read/write GWF files, any of them will work, but re recommend to try and install the libraries in that order; FrameCPP provides a more complete Python API than the others.

However, not all libraries may be available on all platforms, the linked pages for each library include an up-to-date listing of the supported platforms.

Reading#

To read data from a GWF file, pass the input file path (or paths) and the name of the data channel to read:

>>> data = TimeSeries.read('HLV-HW100916-968654552-1.gwf', 'L1:LDAS-STRAIN')

Note

The HLV-HW100916-968654552-1.gwf file is included with the GWpy source under /gwpy/testing/data/.

Reading a StateVector uses the same syntax:

>>> data = StateVector.read('my-state-data.gwf', 'L1:GWO-STATE_VECTOR')

For instance, to read injections flags from a GWOSC GWF file, you can use the LOSC-INJMASK channel:

>>> injections = StateVector.read('my-state-data.gwf', 'L1:LOSC-INJMASK')

Multiple files can be read by passing a list of files:

>>> data = TimeSeries.read([file1, file2], 'L1:LDAS-STRAIN')

When reading multiple files, the nproc keyword argument can be used to distribute the reading over multiple CPUs, which should make it faster:

>>> data = TimeSeries.read([file1, file2, file3, file4], 'L1:LDAS-STRAIN', nproc=2)

The above command will separate the input list of 4 file paths into two sets of 2 files, combining the results into a single TimeSeries before returning.

The start and end keyword arguments can be used to downselect data to a specific [start, end) time segment when reading:

>>> data = TimeSeries.read('HLV-HW100916-968654552-1.gwf', 'L1:LDAS-STRAIN', start=968654552.5, end=968654553)

Additionally, the following keyword arguments can be used:

Warning

These keyword arguments are only supported when using the LDAStools.frameCPP GWF API.

Keyword arguments for TimeSeries.read#

Keyword

Type

Default

Usage

scaled

bool

True

Apply ADC calibration when reading

type

str

None

dict of channel types ('ADC', 'Proc', or 'Sim') for each channel to be read. This option optimises the reading operation.

Reading multiple channels#

To read multiple channels from one or more GWF files (rather than opening and closing the files multiple times), use the TimeSeriesDict or StateVectorDict classes, and pass a list of data channel names:

>>> data = TimeSeriesDict.read('HLV-HW100916-968654552-1.gwf', ['H1:LDAS-STRAIN', 'L1:LDAS-STRAIN'])

Note

A mix of TimeSeries and StateVector objects can be read by using only TimeSeriesDict class, and casting the returned data to a StateVector using view().

Writing#

To write data held in any of the gwpy.timeseries classes to a GWF file, simply use:

>>> data.write('output.gwf')

If the output file already exists it will be overwritten, use overwrite=False to prevent this (an OSError will be raised).

Note

When writing a timeseries to a GWF, the TimeSeries.name property is used for the name variable of the GWF data structures (FrProcData and FrVect). So, if you want to write a file and then read it back in, you must ensure that the name property is correctly assigned, e.g:

>>> channel = "L1:CHANNEL_NAME"
>>> output_file = "output.gwf"
>>> data = TimeSeries([1, 2, 3])
>>> data.name = channel
>>> data.write(output_file)
>>> data = TimeSeries.read(output_file, channel)

HDF5#

GWpy allows storing data in HDF5 format files, using a custom specification for storage of metadata.

Warning

To read GWOSC data from HDF5, please see HDF5 (GWOSC).

Reading#

To read TimeSeries or StateVector data held in HDF5 files pass the filename (or filenames) or the source, and the path of the data inside the HDF5 file:

>>> data = TimeSeries.read('HLV-HW100916-968654552-1.hdf', 'L1:LDAS-STRAIN')

As with GWF, the start and end keyword arguments can be used to downselect data to a specific [start, end) time segment when reading:

>>> data = TimeSeries.read('HLV-HW100916-968654552-1.hdf', 'L1:LDAS-STRAIN', start=968654552.5, end=968654553)

Analogously to GWF, you can read multiple TimeSeries from an HDF5 file via TimeSeriesDict.read():

>>> data = TimeSeriesDict.read('HLV-HW100916-968654552-1.hdf')

By default, all matching datasets in the file will be read, to restrict the output, specify the names of the datasets you want:

>>> data = TimeSeriesDict.read('HLV-HW100916-968654552-1.hdf', ['H1:LDAS-STRAIN', 'L1:LDAS-STRAIN'])

Writing#

Data held in a TimeSeries, TimeSeriesDict, `StateVector, or StateVectorDict can be written to an HDF5 file via:

>>> data.write('output.hdf')

The output argument ('output.hdf') can be a file path, an open h5py.File object, or a h5py.Group object, to append data to an existing file.

If the target file already exists, an IOError will be raised, use overwrite=True to force a new file to be written.

To write a TimeSeries to an existing file, use append=True:

>>> data.write('output.hdf', append=True)

To replace an existing dataset in an existing file, while preserving other data, use both append=True and overwrite=True:

>>> data.write('output.hdf', append=True, overwrite=True)

HDF5 (GWOSC)#

GWOSC write data in HDF5 using a custom schema that is incompatible with format='hdf5'.

Reading#

GWpy can read data from GWOSC HDF5 files using the format='hdf5.gwosc' keyword:

>>> data = TimeSeries.read(
...     "H-H1_GWOSC_16KHZ_R1-1187056280-4096.hdf5",
...     format="hdf5.gwosc",
... )

By default, TimeSeries.read() will return the contents of the /strain/Strain dataset, while StateVector.read() will return those of /quality/simple.

It’s possible to change which datasets are read with the path, value_dataset and bits_dataset keywords. For instance to read injections flags:

>>> injections = StateVector.read(
...     "H-H1_GWOSC_16KHZ_R1-1187056280-4096.hdf5",
...     format="hdf5.gwosc",
...     path="quality/injections",
...     value_dataset="Injmask",
...     bits_dataset="InjDescriptions",
... )

As with regular HDF5, the start and end keyword arguments can be used to downselect data to a specific [start, end) time segment when reading.

WAV#

Any TimeSeries can be written to / read from a WAV file using TimeSeries.read():

Warning

No metadata are stored in the WAV file except the sampling rate, so any units or GPS timing information are lost when converting to/from WAV.

Reading#

To read a TimeSeries from WAV:

>>> t = TimeSeries.read('data.wav')

See scipy.io.wavfile.read() for any keyword argument options.

Writing#

To write a TimeSeries to WAV:

>>> t.write('data.wav')

See scipy.io.wavfile.write() for keyword argument options.