teafile Module

class teafiles.teafile.TeaFile(filename)

Create, write and read or just inspect a file in the TeaFile format.

1. create and write a teafile holding Time/Price/Volume items.

>>> tf = TeaFile.create("acme.tea", "Time Price Volume", "qdq", "prices of acme at NYSE", {"decimals": 2, "url": "www.acme.com" })
>>> tf.write(DateTime(2011, 3, 4,  9, 0), 45.11, 4500)
>>> tf.write(DateTime(2011, 3, 4, 10, 0), 46.33, 1100)
>>> tf.close()

Note: Arguments of tf.write show up in intellisense with their names “Time”, “Price” and “Volume”.

2. read a teafile. TeaFiles are self describung a filename is sufficient, - we might have no clue what is inside the file, due to TeaFiles

>>> tf = TeaFile.openread("acme.tea")
>>> tf.read()
TPV(Time=2011-03-04 09:00:00:000, Price=45.11, Volume=4500)
>>> tf.read()
TPV(Time=2011-03-04 10:00:00:000, Price=46.33, Volume=1100)
>>> tf.read()
>>> tf.close()

Since the item structure is described in the file, we can always open the data items in the file. We can even do so on many platforms and with many applications, like from R on Linux, Mac OS or Windows, or using proprietary C++ or C# code.

3. describe: See the description property about accessing the values passed to create. As a teaser, lets access the content description and namevalues collection for the file above:

>>> tf.description.contentdescription
u'prices of acme at NYSE'
>>> tf.description.namevalues
{u'url': u'www.acme.com', u'decimals': 2}
static create(filename, fieldnames, fieldformat=None, contentdescription=None, namevalues=None)

creates a new file and writes its header based on the description passed. leaves the file open, such that items can be added immediately. the caller must close the file finally.

args:
  • filename: The filename, that will internally be passed to io.open, so the same rules apply.
  • fieldnames: The name of fields passed either as a string that seperates each fieldname by whitespace or as a list ot strings
  • fieldformat: Holds a composed by the format character for each field. the format characters are those used by the struct module. example: “qdd” means that items stored in the file have 3 fields, the first is of type int64, the second and third are double values. If omitted, all fields are considered to be of type int64.
  • contentdescription: A teafile can store one contentdescription, a string that describes what the contents in the file is about. examples: “Weather NYC”, “Network load”, “ACME stock”. Applications can use this string as the “title” of the time series, for instance in a chart.
  • namevalues: A collection of name-value pairs used to store descriptions about the file. Often additional properties, like the “data provider”, “feed”, “feed id”, “ticker”. By convention, the name “decimals” is used to store an integer describing how many numbers of decimals to be used to format floating point values. This api for instance makes use of this convention. Besides formatting, an application might also treat this number as the accuracy of floating point values.
>>> from teafiles import *
>>> tf = TeaFile.create("lab.tea", "Time Temperature Humidity", "qdd") # create a file with 3 fields of types int64, double, double
>>> tf.write(DateTime(2011, 3, 1), 44.2, 33.7)
>>> tf.write(DateTime(2011, 3, 2), 45.1, 31.8)
>>> tf.close()
>>> tf.itemcount
2L

note that itemcount is still accessible, even after the file is closed.

static openread(filename)

Open a TeaFile for read only.

>>> from teafiles import *
>>> with TeaFile.create("lab.tea", "Time Temperature Humidity", "qdd") as tf:
...     tf.write(DateTime(2011, 3, 1), 44.2, 33.7)
...     tf.write(DateTime(2011, 3, 2), 45.1, 31.8)
...
>>> from pprint import pprint
>>> with TeaFile.openread("lab.tea") as tf:
...         pprint(list(tf.items()))
...
[TTH(Time=2011-03-01 00:00:00:000, Temperature=44.2, Humidity=33.7),
 TTH(Time=2011-03-02 00:00:00:000, Temperature=45.1, Humidity=31.8)]

The instance demonstrates that is is not writable by not having a write method at all:

>>> tf.write
Traceback (most recent call last):
  ...
AttributeError: TeaFile instance has no attribute 'write'
static openwrite(filename)

Open a TeaFile for read and write.

The file returned will have its filepointer set to the end of the file, as this function calls seekend() before returning the TeaFile instance.

>>> with TeaFile.create('lab.tea', 'A B') as tf:
...     for i in range(3):
...         tf.write(i, 10*i)
...
>>> TeaFile.printitems("lab.tea")
[AB(A=0, B=0), AB(A=1, B=10), AB(A=2, B=20)]
>>>
>>> with TeaFile.openwrite('lab.tea') as tf:  # open the file to add more items
...     tf.write(7, 77)
...
>>> TeaFile.printitems("lab.tea")
[AB(A=0, B=0), AB(A=1, B=10), AB(A=2, B=20), AB(A=7, B=77)]
read()

Read then next item at the position of the file pointer. If no more items exist, None is returned.

>>> with TeaFile.create('lab.tea', 'A B') as tf:
...     for i in range(3):
...         tf.write(i, 10*i)
...
>>> tf = TeaFile.openread('lab.tea')
>>> tf.read()
AB(A=0, B=0)
>>> tf.read()
AB(A=1, B=10)
>>> tf.read()
AB(A=2, B=20)
>>> tf.read()
>>>
_write(*itemValues)

Internal item write method accepting a value for each field.

A typed write method will be created inside the create and openwrite methods, available as write(field1, field2, ....).

>>> tf = TeaFile.create("acme.tea", "Time Price Volume", "qdq", "prices of acme at NYSE", {"decimals": 2, "url": "www.acme.com" })
>>> tf.write(DateTime(2011, 3, 4,  9, 0), 45.11, 4500)
>>> tf.write(DateTime(2011, 3, 4, 10, 0), 46.33, 1100)
>>> tf.close()

Note: Arguments of the tf.write show up in intellisense with their names “Time”, “Price” and “Volume”. This however works usually only in interactive shells, not in py-script editors, since they do not instantiate the class.

flush()

Flush buffered bytes to disk.

When items are written via write, they do not land directly in the file, but are buffered in memory. flush persists them on disk. Since the number of items in a TeaFile is computed from the size of the file, the itemcount property is accuraty only after items have been flushed.

>>> with TeaFile.create('lab.tea', 'A') as tf:
...     for i in range(3):
...         tf.write(i)
...
>>> tf = TeaFile.openwrite('lab.tea')
>>> tf.itemcount
3L
>>> tf.write(71)
>>> tf.itemcount
3L
>>> tf.flush()
>>> tf.itemcount
4L
>>> tf.close()
seekitem(itemindex)

Sets the file pointer to the item at index temindex.

>>> with TeaFile.create("lab.tea", "A") as tf:
...     for i in range(20):
...         tf.write(i)
...
>>> tf = TeaFile.openread('lab.tea')
>>> tf.read()
A(A=0)
>>> tf.read()
A(A=1)
>>> tf.read()
A(A=2)
>>> tf.seekitem(7)
>>> tf.read()
A(A=7)
>>> tf.seekitem(2)
>>> tf.read()
A(A=2)
>>> tf.close()
seekend()

Sets the file pointer past the last item.

>>> with TeaFile.create('lab.tea', 'A') as tf:
...     for i in range(10):
...         tf.write(i)
...
>>> tf = TeaFile.openread('lab.tea')
>>> tf.read()
A(A=0)
>>> tf.seekend()
>>> tf.read()
>>> # nothing returned, we are at the end of file
>>> tf.close()
items(start=0, end=None)

Returns an iterator over the items in the file allowing start and end to be passed as item index. Calling this method will modify the filepointer.

Optional, the range of the iterator can be returned

>>> with TeaFile.create('lab.tea', 'A') as tf:
...     for i in range(10):
...         tf.write(i)
...
>>> tf = TeaFile.openread('lab.tea')
>>> tf.items()
<generator object items at 0x...>
>>> list(tf.items())
[A(A=0), A(A=1), A(A=2), A(A=3), A(A=4), A(A=5), A(A=6), A(A=7), A(A=8), A(A=9)]
>>> list(tf.items(2, 4))
[A(A=2), A(A=3), A(A=4), A(A=5)]
>>> list(tf.items(1, 5))
[A(A=1), A(A=2), A(A=3), A(A=4), A(A=5)]
>>>
itemcount

The number of items in the file.

close()

Closes the file.

TeaFile implements the context manager protocol and using this protocol is prefered, so manually closing the file should be required primarily in interactive mode.

description

Returns the description of the file.

TeaFile describe the structure of its items and annotations about its content in their header. This property returns this description which in turn (optionally) holds * the itemdescription describing field names, types and offsets * a contentdescription describing the content * a namevalue collection holding name-value pairs and * a timescale describing how time stored as numbers shall be interpreted as time.:

tf = TeaFile.create('lab.tea', 'Time Price Volume', 'qdq', 'ACME stock', {'exchange': 'nyse', 'decimals': 2})
tf.description
# returns:

ItemDescription
Name:       TPV
Size:       24
Fields:
[Time         Type:  Int64   Offset: 0   IsTime:0   IsEventTime:0,
 Price        Type: Double   Offset: 8   IsTime:0   IsEventTime:0,
 Volume       Type:  Int64   Offset:16   IsTime:0   IsEventTime:0]

ContentDescription
ACME stock

NameValues
{'feed': 'bluum', 'decimals': 2, 'exchange': 'nyse'}
TimeScale
Epoch:           719162
Ticks per Day: 86400000
Wellknown Scale:   Java

Note that the description object remains valid even after the file is closed.

getvaluestring(field, item)

Returns the string representation of an item, considerung the number of decimals if available.

>>> tf = TeaFile.create('lab.tea', 'Time Price', 'qd', 'ACME stock', {'exchange': 'nyse', 'decimals': 2})
>>> tf.write(DateTime(2010, 2, 3), 44.444444)
>>> tf.write(DateTime(2010, 2, 3), 44.333333)
>>> tf.close()
>>> tf = TeaFile.openread('lab.tea')
>>> item = tf.read()
>>> item                                    # decimals=2 is not considered
TP(Time=2010-02-03 00:00:00:000, Price=44.444444)
>>> pricefield = tf.description.itemdescription.fields[1]
>>> pricefield
Price        Type: Double   Offset: 8   IsTime:0   IsEventTime:0
>>> tf.getvaluestring(pricefield, item)     # decimals=2 is considered
44.44
static printitems(filename, maxnumberofitems=10)

Prints all items in the file. By default at most 10 items are printed.

>>> with TeaFile.create("lab.tea", "A B") as tf:
...     for i in range(40):
...         tf.write(i, 10 * i)
...
>>> TeaFile.printitems("lab.tea")
[AB(A=0, B=0), AB(A=1, B=10), AB(A=2, B=20), AB(A=3, B=30), AB(A=4, B=40), AB(A=5, B=50), AB(A=6, B=60), AB(A=7, B=70), AB(A=8, B=80), AB(A=9, B=90)]
10 of 40 items
>>>
>>> TeaFile.printitems("lab.tea", 5)
[AB(A=0, B=0), AB(A=1, B=10), AB(A=2, B=20), AB(A=3, B=30), AB(A=4, B=40)]
5 of 40 items
>>>
static printsnapshot(filename)

Prints a snapshot of an existing file, that is its complete description and the first 5 items.

Example output:

>> TeaFile.printsnapshot('lab.tea')
TeaFile('lab.tea') 40 items

ItemDescription
Name:       AB
Size:       16
Fields:
[A            Type:  Int64   Offset: 0   IsTime:0   IsEventTime:0,
 B            Type:  Int64   Offset: 8   IsTime:0   IsEventTime:0]

ContentDescription
None

NameValues
None

TimeScale
Epoch:           719162
Ticks per Day: 86400000
Wellknown Scale:   Java

Items
AB(A=0, B=0)
AB(A=1, B=10)
AB(A=2, B=20)
AB(A=3, B=30)
AB(A=4, B=40)
class teafiles.teafile.TimeScale(epoch, ticksperday)

The TeaFile format is time format agnostic. Times in such file can be integral or float values counting seconds, milliseconds from an epoch like 0001-01-01 or 1970-01-01. The epoch together with the tick size define the time scale modeled by this class. These values are stored in the file.

In order to support many platforms, the epoch value of 1970-01-01 and a tick size of Milliseconds is recommended. Moreover, APIs for TeaFiles should primarily support this time scale before any other, to allow exchange between applications and operating systems. In this spirit, the clockwise module in this package uses this 1970 / millisecond time scale.

static Java()

Returns a TimeScale instance with the epoch 1970-01-01 and millisecond resolution. This time scale is that used by Java, so we call this the Java TimeScale.

wellknownname

Returns ‘Java’ if epoch == 719162 (1970-01-01) and ticksperday == 86400 * 1000. Returns ‘Net’ if epoch == 0 (0001-01-01) and ticksperday == 86400 * 1000 * 1000 * 10. Returns None otherwise.

class teafiles.teafile.TeaFileDescription
Holds the description of a time series. Its attributes are the
itemdescription, describing the item’s fields and layout contentdescription, a simple string describing what the time series is about namevalues, a collection of name-value pairs holding int32,double,text or uuid values and the timescale, describing the format of times inside the file
class teafiles.teafile.ItemDescription

The item description describes the item type. Each teafile is a homogenous collection of items and an instance of this class describes the fields of this item, that is

the name of each field the field’s offset inside the item its type.
static create(itemname, fieldnames, fieldformat)

Creates an ItemDescription instance to be used for the creation of a new TeaFile.

itemname is the name for the items in the file (eg “Tick”) fieldnames is a list of the names (eg [“Time”, “Price”, “Volume”]). Alternatively, fieldnames is a string that holds fieldnames separated by whitspace (“Time Price Volume”)

fieldformat specifies the layout of the item as used by struct.pack(fmt, ...). However the following restrictions apply:

1. the repeat oerator is not allowed. So while “4h” means the same as “hhhh” for struct.pack/unpack, this method allows only the latter without repeat number. 2. padding bytes (format character ‘x’) are not available. 3. Only these formats are allowed: “b”, “h”, “i”, “q”, “B”, “H”, “I”, “Q”, “f”, “d”.
getfieldbyoffset(offset)

Returns a field given its offset

class teafiles.teafile.FieldType

An enumeration of field types and utility functions related to.

Double = 10
Float = 9
Int16 = 2
Int32 = 3
Int64 = 4
Int8 = 1
UInt16 = 6
UInt32 = 7
UInt64 = 8
UInt8 = 5
static getformatcharacter(fieldtype)

get the formatting character of a field type, as used by the struct module

static getfromformatcharacter(c)

get the field type given its formatting character as used by the struct module

static getmagicvalue(fieldtype)

given a fieldtype, get a magic value. This is used for analyzing the item layout.

static getname(fieldtype)

get the string representation of fieldtype`

static getsize(fieldtype)

get the size of a field type

class teafiles.teafile.Field

Describes a field inside an item.

Attributes are:
  • name
  • offset
  • istime
  • iseventtime
  • index
  • formatchar
getvalue(item)

Given a field and an item, returns the value of this field.

If the field is a time field, the value is packed into a Time, unless configured otherwise by setting use_time_decoration to False.

Previous topic

Python API for TeaFiles

Next topic

clockwise module

This Page