Create, write and read or just inspect a file in the TeaFile format.
1. create and write a teafile holding Time/Price/Volume items.
>>> tf = TeaFile.create("acme.tea", "Time Price Volume", "qdq", "prices of acme at NYSE", {"decimals": 2, "url": "www.acme.com" })
>>> tf.write(DateTime(2011, 3, 4, 9, 0), 45.11, 4500)
>>> tf.write(DateTime(2011, 3, 4, 10, 0), 46.33, 1100)
>>> tf.close()
Note: Arguments of tf.write show up in intellisense with their names “Time”, “Price” and “Volume”.
2. read a teafile. TeaFiles are self describung a filename is sufficient, - we might have no clue what is inside the file, due to TeaFiles
>>> tf = TeaFile.openread("acme.tea")
>>> tf.read()
TPV(Time=2011-03-04 09:00:00:000, Price=45.11, Volume=4500)
>>> tf.read()
TPV(Time=2011-03-04 10:00:00:000, Price=46.33, Volume=1100)
>>> tf.read()
>>> tf.close()
Since the item structure is described in the file, we can always open the data items in the file. We can even do so on many platforms and with many applications, like from R on Linux, Mac OS or Windows, or using proprietary C++ or C# code.
3. describe: See the description property about accessing the values passed to create. As a teaser, lets access the content description and namevalues collection for the file above:
>>> tf.description.contentdescription
u'prices of acme at NYSE'
>>> tf.description.namevalues
{u'url': u'www.acme.com', u'decimals': 2}
creates a new file and writes its header based on the description passed. leaves the file open, such that items can be added immediately. the caller must close the file finally.
>>> from teafiles import *
>>> tf = TeaFile.create("lab.tea", "Time Temperature Humidity", "qdd") # create a file with 3 fields of types int64, double, double
>>> tf.write(DateTime(2011, 3, 1), 44.2, 33.7)
>>> tf.write(DateTime(2011, 3, 2), 45.1, 31.8)
>>> tf.close()
>>> tf.itemcount
2L
note that itemcount is still accessible, even after the file is closed.
Open a TeaFile for read only.
>>> from teafiles import *
>>> with TeaFile.create("lab.tea", "Time Temperature Humidity", "qdd") as tf:
... tf.write(DateTime(2011, 3, 1), 44.2, 33.7)
... tf.write(DateTime(2011, 3, 2), 45.1, 31.8)
...
>>> from pprint import pprint
>>> with TeaFile.openread("lab.tea") as tf:
... pprint(list(tf.items()))
...
[TTH(Time=2011-03-01 00:00:00:000, Temperature=44.2, Humidity=33.7),
TTH(Time=2011-03-02 00:00:00:000, Temperature=45.1, Humidity=31.8)]
The instance demonstrates that is is not writable by not having a write method at all:
>>> tf.write
Traceback (most recent call last):
...
AttributeError: TeaFile instance has no attribute 'write'
Open a TeaFile for read and write.
The file returned will have its filepointer set to the end of the file, as this function calls seekend() before returning the TeaFile instance.
>>> with TeaFile.create('lab.tea', 'A B') as tf:
... for i in range(3):
... tf.write(i, 10*i)
...
>>> TeaFile.printitems("lab.tea")
[AB(A=0, B=0), AB(A=1, B=10), AB(A=2, B=20)]
>>>
>>> with TeaFile.openwrite('lab.tea') as tf: # open the file to add more items
... tf.write(7, 77)
...
>>> TeaFile.printitems("lab.tea")
[AB(A=0, B=0), AB(A=1, B=10), AB(A=2, B=20), AB(A=7, B=77)]
Read then next item at the position of the file pointer. If no more items exist, None is returned.
>>> with TeaFile.create('lab.tea', 'A B') as tf:
... for i in range(3):
... tf.write(i, 10*i)
...
>>> tf = TeaFile.openread('lab.tea')
>>> tf.read()
AB(A=0, B=0)
>>> tf.read()
AB(A=1, B=10)
>>> tf.read()
AB(A=2, B=20)
>>> tf.read()
>>>
Internal item write method accepting a value for each field.
A typed write method will be created inside the create and openwrite methods, available as write(field1, field2, ....).
>>> tf = TeaFile.create("acme.tea", "Time Price Volume", "qdq", "prices of acme at NYSE", {"decimals": 2, "url": "www.acme.com" })
>>> tf.write(DateTime(2011, 3, 4, 9, 0), 45.11, 4500)
>>> tf.write(DateTime(2011, 3, 4, 10, 0), 46.33, 1100)
>>> tf.close()
Note: Arguments of the tf.write show up in intellisense with their names “Time”, “Price” and “Volume”. This however works usually only in interactive shells, not in py-script editors, since they do not instantiate the class.
Flush buffered bytes to disk.
When items are written via write, they do not land directly in the file, but are buffered in memory. flush persists them on disk. Since the number of items in a TeaFile is computed from the size of the file, the itemcount property is accuraty only after items have been flushed.
>>> with TeaFile.create('lab.tea', 'A') as tf:
... for i in range(3):
... tf.write(i)
...
>>> tf = TeaFile.openwrite('lab.tea')
>>> tf.itemcount
3L
>>> tf.write(71)
>>> tf.itemcount
3L
>>> tf.flush()
>>> tf.itemcount
4L
>>> tf.close()
Sets the file pointer to the item at index temindex.
>>> with TeaFile.create("lab.tea", "A") as tf:
... for i in range(20):
... tf.write(i)
...
>>> tf = TeaFile.openread('lab.tea')
>>> tf.read()
A(A=0)
>>> tf.read()
A(A=1)
>>> tf.read()
A(A=2)
>>> tf.seekitem(7)
>>> tf.read()
A(A=7)
>>> tf.seekitem(2)
>>> tf.read()
A(A=2)
>>> tf.close()
Sets the file pointer past the last item.
>>> with TeaFile.create('lab.tea', 'A') as tf:
... for i in range(10):
... tf.write(i)
...
>>> tf = TeaFile.openread('lab.tea')
>>> tf.read()
A(A=0)
>>> tf.seekend()
>>> tf.read()
>>> # nothing returned, we are at the end of file
>>> tf.close()
Returns an iterator over the items in the file allowing start and end to be passed as item index. Calling this method will modify the filepointer.
Optional, the range of the iterator can be returned
>>> with TeaFile.create('lab.tea', 'A') as tf:
... for i in range(10):
... tf.write(i)
...
>>> tf = TeaFile.openread('lab.tea')
>>> tf.items()
<generator object items at 0x...>
>>> list(tf.items())
[A(A=0), A(A=1), A(A=2), A(A=3), A(A=4), A(A=5), A(A=6), A(A=7), A(A=8), A(A=9)]
>>> list(tf.items(2, 4))
[A(A=2), A(A=3), A(A=4), A(A=5)]
>>> list(tf.items(1, 5))
[A(A=1), A(A=2), A(A=3), A(A=4), A(A=5)]
>>>
The number of items in the file.
Closes the file.
TeaFile implements the context manager protocol and using this protocol is prefered, so manually closing the file should be required primarily in interactive mode.
Returns the description of the file.
TeaFile describe the structure of its items and annotations about its content in their header. This property returns this description which in turn (optionally) holds * the itemdescription describing field names, types and offsets * a contentdescription describing the content * a namevalue collection holding name-value pairs and * a timescale describing how time stored as numbers shall be interpreted as time.:
tf = TeaFile.create('lab.tea', 'Time Price Volume', 'qdq', 'ACME stock', {'exchange': 'nyse', 'decimals': 2})
tf.description
# returns:
ItemDescription
Name: TPV
Size: 24
Fields:
[Time Type: Int64 Offset: 0 IsTime:0 IsEventTime:0,
Price Type: Double Offset: 8 IsTime:0 IsEventTime:0,
Volume Type: Int64 Offset:16 IsTime:0 IsEventTime:0]
ContentDescription
ACME stock
NameValues
{'feed': 'bluum', 'decimals': 2, 'exchange': 'nyse'}
TimeScale
Epoch: 719162
Ticks per Day: 86400000
Wellknown Scale: Java
Note that the description object remains valid even after the file is closed.
Returns the string representation of an item, considerung the number of decimals if available.
>>> tf = TeaFile.create('lab.tea', 'Time Price', 'qd', 'ACME stock', {'exchange': 'nyse', 'decimals': 2})
>>> tf.write(DateTime(2010, 2, 3), 44.444444)
>>> tf.write(DateTime(2010, 2, 3), 44.333333)
>>> tf.close()
>>> tf = TeaFile.openread('lab.tea')
>>> item = tf.read()
>>> item # decimals=2 is not considered
TP(Time=2010-02-03 00:00:00:000, Price=44.444444)
>>> pricefield = tf.description.itemdescription.fields[1]
>>> pricefield
Price Type: Double Offset: 8 IsTime:0 IsEventTime:0
>>> tf.getvaluestring(pricefield, item) # decimals=2 is considered
44.44
Prints all items in the file. By default at most 10 items are printed.
>>> with TeaFile.create("lab.tea", "A B") as tf:
... for i in range(40):
... tf.write(i, 10 * i)
...
>>> TeaFile.printitems("lab.tea")
[AB(A=0, B=0), AB(A=1, B=10), AB(A=2, B=20), AB(A=3, B=30), AB(A=4, B=40), AB(A=5, B=50), AB(A=6, B=60), AB(A=7, B=70), AB(A=8, B=80), AB(A=9, B=90)]
10 of 40 items
>>>
>>> TeaFile.printitems("lab.tea", 5)
[AB(A=0, B=0), AB(A=1, B=10), AB(A=2, B=20), AB(A=3, B=30), AB(A=4, B=40)]
5 of 40 items
>>>
Prints a snapshot of an existing file, that is its complete description and the first 5 items.
Example output:
>> TeaFile.printsnapshot('lab.tea')
TeaFile('lab.tea') 40 items
ItemDescription
Name: AB
Size: 16
Fields:
[A Type: Int64 Offset: 0 IsTime:0 IsEventTime:0,
B Type: Int64 Offset: 8 IsTime:0 IsEventTime:0]
ContentDescription
None
NameValues
None
TimeScale
Epoch: 719162
Ticks per Day: 86400000
Wellknown Scale: Java
Items
AB(A=0, B=0)
AB(A=1, B=10)
AB(A=2, B=20)
AB(A=3, B=30)
AB(A=4, B=40)
The TeaFile format is time format agnostic. Times in such file can be integral or float values counting seconds, milliseconds from an epoch like 0001-01-01 or 1970-01-01. The epoch together with the tick size define the time scale modeled by this class. These values are stored in the file.
In order to support many platforms, the epoch value of 1970-01-01 and a tick size of Milliseconds is recommended. Moreover, APIs for TeaFiles should primarily support this time scale before any other, to allow exchange between applications and operating systems. In this spirit, the clockwise module in this package uses this 1970 / millisecond time scale.
Returns a TimeScale instance with the epoch 1970-01-01 and millisecond resolution. This time scale is that used by Java, so we call this the Java TimeScale.
Returns ‘Java’ if epoch == 719162 (1970-01-01) and ticksperday == 86400 * 1000. Returns ‘Net’ if epoch == 0 (0001-01-01) and ticksperday == 86400 * 1000 * 1000 * 10. Returns None otherwise.
The item description describes the item type. Each teafile is a homogenous collection of items and an instance of this class describes the fields of this item, that is
the name of each field the field’s offset inside the item its type.
Creates an ItemDescription instance to be used for the creation of a new TeaFile.
itemname is the name for the items in the file (eg “Tick”) fieldnames is a list of the names (eg [“Time”, “Price”, “Volume”]). Alternatively, fieldnames is a string that holds fieldnames separated by whitspace (“Time Price Volume”)
fieldformat specifies the layout of the item as used by struct.pack(fmt, ...). However the following restrictions apply:
1. the repeat oerator is not allowed. So while “4h” means the same as “hhhh” for struct.pack/unpack, this method allows only the latter without repeat number. 2. padding bytes (format character ‘x’) are not available. 3. Only these formats are allowed: “b”, “h”, “i”, “q”, “B”, “H”, “I”, “Q”, “f”, “d”.
Returns a field given its offset
An enumeration of field types and utility functions related to.
get the formatting character of a field type, as used by the struct module
get the field type given its formatting character as used by the struct module
given a fieldtype, get a magic value. This is used for analyzing the item layout.
get the string representation of fieldtype`
get the size of a field type
Describes a field inside an item.
Given a field and an item, returns the value of this field.
If the field is a time field, the value is packed into a Time, unless configured otherwise by setting use_time_decoration to False.