TeaFiles provide fast read/write access to time series data from any software package on any platform. Time Series are considered homogeneous collections of items, ordered by their timestamp. Items are stored in raw binary format, such that data can be memory mapped for fast read/write access. In order to ensure correct data interpretation when data is exchanged between multiple applications, TeaFiles optionally embedd a description of the data layout in the file header, along with other optional description of the file's contents.
The most performant way to write time series into a persistent media are flat files. The most efficient way to read time series data from a persistent media is to memory map a file, possibly enhanced by a read ahead mechanism.
Simple time series persistence means to us easy to use APIs, simple file layout, well understood technologies involved. It happens that the file system is just that: simple and rock solid on every operating system.
The drawback of binary files is their opaqueness: Reading them requires knowledge about the structure of their content. TeaFiles overcome this by packing meta data into the file that describes the items in the file making it self contained and self describing. Every TeaFile can therefore be opened without any further knowledge about its content and structure.
Analysing time series data often involves more than a single software or tool, like R, Octave/Matlab, custom C++, Java or C# programs. TeaFiles provide a simple, very loosely coupled way to make these programs work together - the file is the interface. Number and time formats have been carefully examined to provide such universal accessibility.
To allow data exchange between arbitrary programs, the file format was designed to be as simple as possible, so that writing access libraries (APIs) for new targets remains as simple as possible.
The file format specification is freely available at www.discretelogics.com/resources/teafilespec.
TeaFiles can be read and written using raw file I/O methods available in every programming environment. APIs encapsulate access to TeaFiles more conveniently. We provide several open source APIs introduced below, all licensed under the GPL . Find more detailed information about them in the corresponding repositories.
TeaFiles.Net is a .Net assembly, published at Github: https://github.com/discretelogics/TeaFiles.Net.
// the time series item type
struct Tick
{
public Time Time;
public double Price;
public int Volume;
}
// create file and write some values
using (var tf = TeaFile<Tick>.Create("silver.tea"))
{
tf.Write(new Tick { Price = 5, Time = DateTime.Now, Volume = 700 });
tf.Write(new Tick { Price = 15, Time = DateTime.Now.AddHours(1), Volume = 1700 });
// ...
}
The call to TeaFile<Tick>.Create() does the whole work provided by the C# API for TeaFiles: It analyzes the Tick struct to find fields names, types and field offsets and writes these values into the file header. We just wrote our first TeaFile, so lets read it.
// read typed
using (var tf = TeaFile<Tick>.OpenRead("silver.tea"))
{
Tick value = tf.Read();
Console.WriteLine(value);
}
Notably the type expect to be stored in the file was provided up front calling TeaFile<Tick>.OpenRead(). This is perfectly fine if we have this knowledge. But what if not? "Untyped reading" allows to open a file without knowing the type inside:
// read untyped - we know nothing about the type of item in the file
using (var tf = TeaFile.OpenRead("silver.tea"))
{
foreach(Item item in tf.Items)
{
Console.WriteLine(tf.Description.ItemDescription.GetNameValueString(item));
}
}
output: Price=5 Time=20.8.2011 23:50 Price=15 Time=21.8.2011 00:50
This time the call to TeaFile.OpenRead() returns the untyped version of a TeaFile that exposes a description of the item stored in the file. So TeaFile is the anonymous sister of TeaFile<T>, they are both unrelated classes from C# point of view, but logically related - they both serve as interface to the contents of a TeaFile, untyped or typed.
The item values are returned as collection of Item classes that hold a collection of values, one for each field in the item struct. The ItemDescription instance in turn offers the GetNameValueString method that can transform an item into a pretty printed string of that item. Such anonymous file reading can be used in 2 ways: Either you really access the data inside by iterating the collection of Item values, which is much slower than accssing the file data the typed way. Or you simply open the file, check its ItemDescription which gives all information about the items stored and use this information to create such struct in C# then using it to instantiate a typed TeaFile<T> instance.
Finally, lets see the description that was written into the file:
using (var tf = TeaFile<Tick>.OpenRead("silver.tea"))
{
Console.WriteLine(tf.Description);
Console.WriteLine("ItemAreaStart={0}", tf.ItemAreaStart);
Console.WriteLine("ItemAreaEnd={0}", tf.ItemAreaEnd);
Console.WriteLine("ItemAreaSize={0}", tf.ItemAreaSize);
foreach (Tick tick in tf.Items.Take(5))
{
Console.WriteLine(tick);
}
}
output: ... TeaFile Description ... #Item Tick 24 3 fields: Time, 0, Int64 Price, 8, Double Volume, 16, Int32 #Content empty #NameValues empty ... TeaFile Description End ... ItemAreaStart=144 ItemAreaEnd=312 ItemAreaSize=168 Time=13.01.2012 21:00:57 Price=5 Volume=700 Time=14.01.2012 21:00:57 Price=15 Volume=1700 Time=15.01.2012 21:00:57 Price=25 Volume=3000 Time=16.01.2012 21:00:57 Price=35 Volume=4000 Time=17.01.2012 21:00:57 Price=505 Volume=5000
Using NuGet Package Manager: Add the package "TeaFiles.Net" to your project.
Alternatively, download the source code from github and reference it from your project.
TeaFiles C++ is a C++ library that compiles under Windows and Linux, using MSVC or g++.
struct Tick
{
teatime::Time Time;
double Price;
int Volume;
};
// create file and write ticks
{
auto tf = TeaFile<Tick>::Create(filename);
tf->Write(GetRandomTick());
}
// read the file memory mapped
auto tf = TeaFile<Tick>::OpenRead(filename);
auto items = tf->OpenReadableMapping();
for(Tick *t = items->begin(); t != items->end(); ++t)
{
cout << t->Price << endl;
...
}
Looks quite similar as the .Net version. The read code now uses memory mapping, which is considerably faster than normal file reading. There is a back difference however that is not visible yet here: The file holds a rudimentary description of the item only, in particular the layout of the item is not included in the header of the TeaFile. This makes it impossible to read this file untyped or to even inspect its content. In other words, the file we wrote is not really self describing (at least a little bit of information is included: the name of the item type "Tick" and its size). We will improve this:
In order to make our file self describing, we give our API code more knowledge about the type. Since C++ still lacks serious reflection capabilities, we help out a bit as follows:
template<>
struct Description<Tick> : public DefaultDescription<Tick>
{
Description()
{
this->AddField(&Tick::Time, "Time");
this->AddField(&Tick::Price, "Price");
this->AddField(&Tick::Volume, "Volume");
}
}
This allows the C++ API to analyze the current struct and do the same checks as the .Net API when reading a TeaFile. (This Description class could easily be created by tools or even the C Preprocessor.)
github.com/discretelogics/TeaFiles.Cpp
Not available yet. The source code is well readable and examples are included in the source.
The Python API provides access to TeaFiles from a scripting environment that is available in quite every platform.
>>> tf = TeaFile.create(""acme.tea"", ""Time Price Volume"", ""qdq"", ""ACME at NYSE"", {""decimals"": 2, ""url"": ""www.acme.com"" }) >>> tf.write(DateTime(2011, 3, 4, 9, 0), 45.11, 4500) >>> tf.write(DateTime(2011, 3, 4, 10, 0), 46.33, 1100) >>> tf.close() >>> tf = TeaFile.openread(""acme.tea"") >>> tf.read() TPV(Time=2011-03-04 09:00:00:000, Price=45.11, Volume=4500) >>> tf.read() TPV(Time=2011-03-04 10:00:00:000, Price=46.33, Volume=1100) >>> tf.read() >>> tf.close()
Pretty useful is also the simple examiniation of file contents via the getsnapshot function:
>>> TeaFile.printsnapshot('acme.tea') TeaFile('acme.tea') 53 items ItemDescription Name: TPV Size: 24 Fields: [Time Type: Int64 Offset: 0 IsTime:1 IsEventTime:1, Price Type: Double Offset: 8 IsTime:0 IsEventTime:0, Volume Type: Int64 Offset:16 IsTime:0 IsEventTime:0] ContentDescription acme prices NameValues {u'decimals': 2, u'exchange': u'nyse'} TimeScale Epoch: 719162 Ticks per Day: 86400000 Wellknown Scale: Java Items TPV(Time=2000-01-01 00:00:00:000, Price=37.579128977028674, Volume=8047) TPV(Time=2000-01-08 00:00:00:000, Price=10.618929589509186, Volume=232) TPV(Time=2000-01-15 00:00:00:000, Price=73.08506970525428, Volume=1711) TPV(Time=2000-01-22 00:00:00:000, Price=73.7749103916519, Volume=4397) TPV(Time=2000-01-29 00:00:00:000, Price=10.323610234110403, Volume=3376)
$ pip install teafiles
The package is at pypi.python.org/pypi/teafiles
bitbucket.org/discretelogics/teafiles.py
Full documentation is at discretelogics.com/doc/teafiles.py
The TeaFile APIs available at these code repositories are licensed under the GNU General Public License v3. In addition to the terms of this license, use and distribution of this code shall be attributed to discretelogics, referencing "discretelogics.com".
Details about GPLv3: www.gnu.org/copyleft/gpl.html .
APIs governed by this license:
If your usage is not covered by GPLv3 please contact us to negotiate license conditions.