Modeling a multiple ownership scenario

Hello all!

I have a set of programs that perform profit & loss, and other types of calculations of a broker's F&O trades for a settlement period. These process between 100 TB and 150 TB of data for each type of analysis.

The production versions are written in Go. I have ported the P&L calculator to Rust for learning purposes.

The pre-processing steps of the pipeline sort the input data chronologically (down to jiffies), and group it by the broker's clients. The current program then employs the standard structure of:

  • one queue for buys,
  • one queue for sells,
  • a running totals structure,
  • control break logic when the date changes (intraday vs. carryover), and
  • control break logic when the broker's client changes.

As each trade is read, it is parsed into an instance of Trade. Depending upon whether it is a buy or a sell, we try to offset it against available sells or buys, respectively, performing several calculations and accumulating them. In case of a left over quantity, the trade is stored in the appropriate queue. In that case, a last_trade points to the corresponding entry in the appropriate queue; else, it is owned by last_trade.

So, a given trade may or may not get into a queue. But, in case it does, it outlives last_trade, which always points to the most recent input entry.

Since I am new to Rust, I could not figure out how to model this in a way that satisfies the borrow checker. After a few trials, I ended up cloning each trade as needed. I quickly achieved accuracy parity with the Go version; but, the Rust version took 11x the running time of the Go one.

After some thinking, I removed cloning, and changed the queues and last_trade to have types Vec<Rc<RefCell<Trade>>> and Rc<RefCell<Trade>>, respectively. I changed the rest of the program accordingly. This brought the running time of the Rust version down to 3x that of the Go one.

I understand that at least one of the reasons for the lower performance of the Rust version is the millions/billions of field accesses that require either borrow() or borrow_mut() at runtime (rather than compile-time).

What is a good way to model multiple ownership that doesn't so adversely impact performance?

Thanks!

                                    -- O|O

Instead of using Rc for your references, I recommend storing an id. Then, you can have a single object with all of the data and look up there using the id. The id can be either an index into a Vec or a key in a HashMap or similar. This would completely remove the need for Rc and RefCell.

2 Likes

Thanks, @alice, for the response!

I do use this approach in several other places. I have to think a little about handling the conditional inclusion of the Trade instance in the queues, and the corresponding ownership transfer.

I shall try, and report back.

Thanks!

I have removed Rc<RefCell<Trade>> completely, and introduced the following enum to track the Trade instance.

#[derive(Debug)]
enum LastTrade {
    Trade(Trade, f64), // Trade instance itself.
    Buy(usize),        // Index into the vector of buys.
    Sell(usize),       // Index into the vector of sells.
    None,
}

The f64 holds the mutable quantity part that was originally a part of Trade, but was creating problems saying the argument (which was being 'move'd) was not declared mutable.

The queues for buys and sells now are of type Vec<(Trade, f64)>, correspondingly.

The above change has saved about 10% time, but the Rust version's running time is still at around 2.7x of the Go one.

Are the tuples killing the performance now?

Thanks.

Not meant to be rude at all, but did you use the release profile to compile your code?

Yes, @ifsheldon, I did cargo build --release.

Thanks.

I have setup a repository with the code at: GitHub - js-ojus/nsecalc.

I hope that looking at the code can help point out obvious inefficiencies in my code.

Thanks!

You can test defining types for each of the elements of this struct:

and thus avoid so many Strings.

Once the types are defined you can impl the trait std::str::FromStr for when you have to do the parse.

also it is not good practice to compare floating point numbers in equality

Thanks, @elsuizo, for your response.

  1. Assuming that I wish to define types for date, broker_id, etc., what should the underlying type be? I ask this, since it is string data, after all!

  2. Regarding comparing floating point numbers for equality, I agree. I usually use a |x-y| < โˆ† comparison, where โˆ† โ‰ˆ 0.0.

Thanks!

I think Vec::remove(0) is an obvious inefficiency since it moves all elements left, sometimes leading to inadvertent quadratic complexity. You'd need to use VecDeque for efficient front removal.

1 Like

Another thing you can try is to use Rc<str> for your strings. That makes cloning them much cheaper. You can create an Rc<str> with Rc::from(the_string) or Rc::from(the_string.as_str()).

Thanks, @godmar, for your response.

Changing Vec to VecDeque has increased the running time by about 10%!

Perhaps, one of the reasons is the discontiguous memory layout of VecDeque.

Thanks!

VecDeque adds some overhead, of course. You could try tuning it with the with_capacity constructor.
And if your vectors from which you remove are short, remove(0) has little overhead. (If this were a competitive programming problem though there would be a secret test case that exercises exactly the worst case of repeatedly removing the first element of a very long vector, and for all you know there could be real-world inputs in your case as well.)

Without knowing the details of your data, my initial thoughts would be something like this:

struct Trade {
    date: chrono::NaiveDate,
    broker_id: u64,
    client_id: u64,
    symbol: smallstr::SmallString<[u8;6]>,
    trade_type: TradeType, // custom enum
    expiry_date: chrono::NaiveDate,
    strike_price: f64,
    time: chrono::DateTime<Utc>,
    price: f64,
    value: f64,
    b_or_s: String, // Don't know what this is; perhaps a custom enum
    uid: u64,
}

Also, financial data is usually better served by fixed-point arithmetic instead of floating-point, as it better matches standard accounting practice. See the fixed crate for a possible implementation (NB: I've never used it, so can't vouch for quality)

Sure, @godmar, noted.


@2e71828: You may observe that I do not particularly utilise most of those string fields, except to construct a unique ID for control break logic. That is the primary reason why I have left them as strings.

On the other hand, having an enum for b_or_s (buy or sell) is a good idea, and I have done that.

Thanks!

I am trying this, @alice.

When trying to print profits, the compiler complains that str does not have a known size. How do we get &str from Rc<str>?

On the other hand, converting things to Rc<&str> mandates introduction of explicit lifetimes; and, when I do that, I am running into a lifetime issue with the construction of uid in parse_record ("returning a struct that references a local variable").

Any pointers?

Thanks!

If you get errors about not having a known size, you're doing something wrong somewhere. The type you want is Rc<str>, not Rc<&str>. To get a &str, you can often just take a reference to the Rc<str>.

@alice:

I have pushed the changes to a branch: nsecalc/profit_calc.rs at js ยท js-ojus/nsecalc ยท GitHub.

The problems are in constructing the vector for csv::Writer::write_record in the methods write_profit and write_remaining (they are both at the end of the file).

Where are the mistakes?

Thanks.

Just remove the .as_str() calls and add an .as_bytes() on each String in the vector. For the string constants, use b"the string" to get a byte array.

diff --git a/profit/src/calc/profit_calc.rs b/profit/src/calc/profit_calc.rs
index 4985849..ea19b70 100644
--- a/profit/src/calc/profit_calc.rs
+++ b/profit/src/calc/profit_calc.rs
@@ -424,12 +424,12 @@ impl ProfitCalculator {
             }
             LastTrade::None => return Err(CalcError::LastRecord),
         };
-        let _id = format!("{:.1}", self.tots.intra_day).as_str();
-        let _co = format!("{:.2}", self.tots.carry_over).as_str();
-        let _ibvs = format!("{:.2}", self.tots.ib_val_sum).as_str();
-        let _isvs = format!("{:.2}", self.tots.is_val_sum).as_str();
-        let _cbvs = format!("{:.2}", self.tots.cb_val_sum).as_str();
-        let _csvs = format!("{:.2}", self.tots.cs_val_sum).as_str();
+        let _id = format!("{:.1}", self.tots.intra_day);
+        let _co = format!("{:.2}", self.tots.carry_over);
+        let _ibvs = format!("{:.2}", self.tots.ib_val_sum);
+        let _isvs = format!("{:.2}", self.tots.is_val_sum);
+        let _cbvs = format!("{:.2}", self.tots.cb_val_sum);
+        let _csvs = format!("{:.2}", self.tots.cs_val_sum);
         let sv = vec![
             trd.date.as_bytes(),
             trd.broker_id.as_bytes(),
@@ -438,12 +438,12 @@ impl ProfitCalculator {
             trd.trade_type.as_bytes(),
             trd.expiry_date.as_bytes(),
             trd.strike_price.as_bytes(),
-            _id,
-            _co,
-            _isvs,
-            _ibvs,
-            _csvs,
-            _cbvs,
+            _id.as_bytes(),
+            _co.as_bytes(),
+            _isvs.as_bytes(),
+            _ibvs.as_bytes(),
+            _csvs.as_bytes(),
+            _cbvs.as_bytes(),
         ];
         out_pro.write_record(sv)?;
 
@@ -463,12 +463,12 @@ impl ProfitCalculator {
             &_v
         };
         for (trd, tqty) in trades {
-            let _tq = format!("{:.1}", tqty).as_str();
-            let _tp = format!("{:.2}", trd.price).as_str();
-            let _tv = format!("{:.2}", trd.value).as_str();
+            let _tq = format!("{:.1}", tqty);
+            let _tp = format!("{:.2}", trd.price);
+            let _tv = format!("{:.2}", trd.value);
             let _b_or_s = match trd.b_or_s {
-                BuySell::Buy => "B",
-                BuySell::Sell => "S",
+                BuySell::Buy => b"B",
+                BuySell::Sell => b"S",
             };
             let sv = vec![
                 trd.date.as_bytes(),
@@ -478,9 +478,9 @@ impl ProfitCalculator {
                 trd.trade_type.as_bytes(),
                 trd.expiry_date.as_bytes(),
                 trd.strike_price.as_bytes(),
-                _tq,
-                _tp,
-                _tv,
+                _tq.as_bytes(),
+                _tp.as_bytes(),
+                _tv.as_bytes(),
                 _b_or_s,
             ];
             out_rem.write_record(sv)?