We’ve been working for a while on a serialisation scheme that doesn’t duplicate serialised data, even if it’s referenced multiple times on the wire. @parkri can talk about it a bit more, but we’re very close to turning the new scheme on. The current scheme is using a generic Kryo scheme which isn’t that well optimised. I don’t think we’ve done space comparisons yet, but optimising the new format is something that’s going to be a focus soon.
Although we’re heading towards a Corda 1.0 release quite quickly, this initial release won’t stabilise the wire protocol. That’s the next target after 1.0 to follow soon after. We’ll look at tuning the transaction size as part of that. I think there’s going to be a lot of low hanging fruit - simply gzipping can remove a lot of unnecessary redundancy at minimal CPU cost.
Let’s reason about storage costs in dollar terms. 350kb of data per second would result in about 30 gigabytes of data per day, or about 11 terabytes of transaction data per year. According to this helpful article from Backblaze, this would cost about $480 / year in hard disk cost, if we ignore replication and backup. Let’s say we duplicate data three times for backup and redundancy. So that’s about $1500 / year in hard disk costs, to keep up with this data rate. Of course there’ll be some indexing overhead and the like. $1500 is something of a lower bound.
Is this too expensive? I’m not sure - that depends on the context. Too expensive for an individual hobbyist, yes, probably. Too expensive for a bank? I suspect it is not a big cost in the context of a bank.
This is before we consider non-HDD storage, like the commercial cloud BluRay storage services like Amazon Glacier. Those are supposedly much cheaper than HDDs.
None of this is an excuse to not optimise the wire protocol, of course. There are many simple tricks we can use, like the one Bitcoin uses where keys are represented as short hashes. It’d be easy to resolve an unknown hash to the full composite key on demand using an extra flow roundtrip, if we want to take out the overhead of the composite key being a part of every transaction.
I’ll file some tasks in JIRA to study the size of a SignedTransaction and ensure we optimise it before we commit to the wire protocol. Hope that helps!