TeaFiles

Millions of Values Per Second

 time series persistence  high performance  C++  C#  Python  Open Source  No-SQL  Free

The bare speed of binary files. With sugar.

  • TeaFile is a file format
  • to store time series
  • in binary flat files.
  • An optional header holds a description of file contents
  • including a description of the item type layout (schema).
  • The file format is designed to be simple so that APIs are created easily.
  • DiscreteLogics publishes the format and
  • releases APIs for C#, C++ and Python under the GPL

TeaFiles provide fast read/write access to time series data from any software package on any platform. Time Series are considered homogeneous collections of items, ordered by their timestamp. Items are stored in raw binary format, such that data can be memory mapped for fast read/write access. In order to ensure correct data interpretation when data is exchanged between multiple applications, TeaFiles optionally embedd a description of the data layout in the file header, along with other optional description of the file's contents.

Design

Performant

The most performant way to write time series into a persistent media are flat files. The most efficient way to read time series data from a persistent media is to memory map a file, possibly enhanced by a read ahead mechanism.

Simple & Solid

Simple time series persistence means to us easy to use APIs, simple file layout, well understood technologies involved. It happens that the file system is just that: simple and rock solid on every operating system.

Self contained

The drawback of binary files is their opaqueness: Reading them requires knowledge about the structure of their content. TeaFiles overcome this by packing meta data into the file that describes the items in the file making it self contained and self describing. Every TeaFile can therefore be opened without any further knowledge about its content and structure.

Versatile

Analysing time series data often involves more than a single software or tool, like R, Octave/Matlab, custom C++, Java or C# programs. TeaFiles provide a simple, very loosely coupled way to make these programs work together - the file is the interface. Number and time formats have been carefully examined to provide such universal accessibility.

Open for all programs and operating systems

To allow data exchange between arbitrary programs, the file format was designed to be as simple as possible, so that writing access libraries (APIs) for new targets remains as simple as possible.

File Format Spec

The file format specification is freely available at www.discretelogics.com/resources/teafilespec.

TeaFile APIs

TeaFiles can be read and written using raw file I/O methods available in every programming environment. APIs encapsulate access to TeaFiles more conveniently. We provide several open source APIs introduced below, all licensed under the GPL . Find more detailed information about them in the corresponding repositories.

TeaFiles.Net

TeaFiles.Net is a .Net assembly, published at Github: https://github.com/discretelogics/TeaFiles.Net.

Create a TeaFile and write values

// the time series item type
struct Tick
{
    public Time Time;
    public double Price;
    public int Volume;
}

// create file and write some values
using (var tf = TeaFile<Tick>.Create("silver.tea"))
{
    tf.Write(new Tick { Price = 5, Time = DateTime.Now, Volume = 700 });
    tf.Write(new Tick { Price = 15, Time = DateTime.Now.AddHours(1), Volume = 1700 });
    // ...
}

The call to TeaFile<Tick>.Create() does the whole work provided by the C# API for TeaFiles: It analyzes the Tick struct to find fields names, types and field offsets and writes these values into the file header. We just wrote our first TeaFile, so lets read it.

Read the file - typed

// read typed
using (var tf = TeaFile<Tick>.OpenRead("silver.tea"))
{
    Tick value = tf.Read();
    Console.WriteLine(value);
}

Notably the type expect to be stored in the file was provided up front calling TeaFile<Tick>.OpenRead(). This is perfectly fine if we have this knowledge. But what if not? "Untyped reading" allows to open a file without knowing the type inside:

Read the file - untyped

// read untyped - we know nothing about the type of item in the file
using (var tf = TeaFile.OpenRead("silver.tea"))
{                
    foreach(Item item in tf.Items)
    {
        Console.WriteLine(tf.Description.ItemDescription.GetNameValueString(item));
    }
}
output:
Price=5 Time=20.8.2011 23:50
Price=15 Time=21.8.2011 00:50

This time the call to TeaFile.OpenRead() returns the untyped version of a TeaFile that exposes a description of the item stored in the file. So TeaFile is the anonymous sister of TeaFile<T>, they are both unrelated classes from C# point of view, but logically related - they both serve as interface to the contents of a TeaFile, untyped or typed.

The item values are returned as collection of Item classes that hold a collection of values, one for each field in the item struct. The ItemDescription instance in turn offers the GetNameValueString method that can transform an item into a pretty printed string of that item. Such anonymous file reading can be used in 2 ways: Either you really access the data inside by iterating the collection of Item values, which is much slower than accssing the file data the typed way. Or you simply open the file, check its ItemDescription which gives all information about the items stored and use this information to create such struct in C# then using it to instantiate a typed TeaFile<T> instance.

Description

Finally, lets see the description that was written into the file:

using (var tf = TeaFile<Tick>.OpenRead("silver.tea"))
{
    Console.WriteLine(tf.Description);
    Console.WriteLine("ItemAreaStart={0}", tf.ItemAreaStart);
    Console.WriteLine("ItemAreaEnd={0}", tf.ItemAreaEnd);
    Console.WriteLine("ItemAreaSize={0}", tf.ItemAreaSize);
    foreach (Tick tick in tf.Items.Take(5))
    {
        Console.WriteLine(tick);
    }
}
output:
... TeaFile Description ...
#Item
Tick 24 3 fields:
Time, 0, Int64
Price, 8, Double
Volume, 16, Int32
#Content
empty
#NameValues
empty
... TeaFile Description End ...

ItemAreaStart=144
ItemAreaEnd=312
ItemAreaSize=168
Time=13.01.2012 21:00:57 Price=5 Volume=700
Time=14.01.2012 21:00:57 Price=15 Volume=1700
Time=15.01.2012 21:00:57 Price=25 Volume=3000
Time=16.01.2012 21:00:57 Price=35 Volume=4000
Time=17.01.2012 21:00:57 Price=505 Volume=5000

Installation

Using NuGet Package Manager: Add the package "TeaFiles.Net" to your project.
Alternatively, download the source code from github and reference it from your project.

Source

github.com/discretelogics/TeaFiles.Net

Documentation

discretelogics.com/doc/teafiles.net

TeaFiles C++

TeaFiles C++ is a C++ library that compiles under Windows and Linux, using MSVC or g++.

struct Tick
{    
  teatime::Time Time;
  double Price;
  int Volume;
};

// create file and write ticks
{
    auto tf = TeaFile<Tick>::Create(filename);
    tf->Write(GetRandomTick());
}

// read the file memory mapped
auto tf = TeaFile<Tick>::OpenRead(filename);
auto items = tf->OpenReadableMapping();
for(Tick *t = items->begin(); t != items->end(); ++t)
{
    cout << t->Price << endl;
    ...
}

Looks quite similar as the .Net version. The read code now uses memory mapping, which is considerably faster than normal file reading. There is a back difference however that is not visible yet here: The file holds a rudimentary description of the item only, in particular the layout of the item is not included in the header of the TeaFile. This makes it impossible to read this file untyped or to even inspect its content. In other words, the file we wrote is not really self describing (at least a little bit of information is included: the name of the item type "Tick" and its size). We will improve this:

Reflection in C++

In order to make our file self describing, we give our API code more knowledge about the type. Since C++ still lacks serious reflection capabilities, we help out a bit as follows:

template<>
struct Description<Tick> : public DefaultDescription<Tick>
{
  Description()
  {
    this->AddField(&Tick::Time, "Time");
    this->AddField(&Tick::Price, "Price");
    this->AddField(&Tick::Volume, "Volume");
  }
}

This allows the C++ API to analyze the current struct and do the same checks as the .Net API when reading a TeaFile. (This Description class could easily be created by tools or even the C Preprocessor.)

Source

github.com/discretelogics/TeaFiles.Cpp

Documentation

Not available yet. The source code is well readable and examples are included in the source.

TeaFiles Python

The Python API provides access to TeaFiles from a scripting environment that is available in quite every platform.

>>> tf = TeaFile.create(""acme.tea"", ""Time Price Volume"", ""qdq"", ""ACME at NYSE"",
    {""decimals"": 2, ""url"": ""www.acme.com"" })
>>> tf.write(DateTime(2011, 3, 4,  9, 0), 45.11, 4500)
>>> tf.write(DateTime(2011, 3, 4, 10, 0), 46.33, 1100)
>>> tf.close()

>>> tf = TeaFile.openread(""acme.tea"")
>>> tf.read()
    TPV(Time=2011-03-04 09:00:00:000, Price=45.11, Volume=4500)
>>> tf.read()
    TPV(Time=2011-03-04 10:00:00:000, Price=46.33, Volume=1100)
>>> tf.read()
>>> tf.close()

Pretty useful is also the simple examiniation of file contents via the getsnapshot function:

>>> TeaFile.printsnapshot('acme.tea')
TeaFile('acme.tea') 53 items

ItemDescription
Name:       TPV
Size:       24
Fields:
[Time         Type:  Int64   Offset: 0   IsTime:1   IsEventTime:1,
 Price        Type: Double   Offset: 8   IsTime:0   IsEventTime:0,
 Volume       Type:  Int64   Offset:16   IsTime:0   IsEventTime:0]

ContentDescription
acme prices

NameValues
{u'decimals': 2, u'exchange': u'nyse'}

TimeScale
Epoch:           719162
Ticks per Day: 86400000
Wellknown Scale:   Java

Items
TPV(Time=2000-01-01 00:00:00:000, Price=37.579128977028674, Volume=8047)
TPV(Time=2000-01-08 00:00:00:000, Price=10.618929589509186, Volume=232)
TPV(Time=2000-01-15 00:00:00:000, Price=73.08506970525428, Volume=1711)
TPV(Time=2000-01-22 00:00:00:000, Price=73.7749103916519, Volume=4397)
TPV(Time=2000-01-29 00:00:00:000, Price=10.323610234110403, Volume=3376)

Installation

$ pip install teafiles
The package is at pypi.python.org/pypi/teafiles

Source

bitbucket.org/discretelogics/teafiles.py

Documentation

Full documentation is at discretelogics.com/doc/teafiles.py

License

The TeaFile APIs available at these code repositories are licensed under the GNU General Public License v3. In addition to the terms of this license, use and distribution of this code shall be attributed to discretelogics, referencing "discretelogics.com".

Details about GPLv3: www.gnu.org/copyleft/gpl.html .

APIs governed by this license:


If your usage is not covered by GPLv3 please contact us to negotiate license conditions.