Open-sourcing our code for getting data from banks and brokers

We recently open-sourced SFTP Wrangler, a software system we developed to make it easier to get data about our investments from banks and brokers.

screenshot of sftpwrangler github repo

Why did we build this?

Because:

  1. We obviously need to know what’s in our portfolio, significant parts of which are held via banks and brokers.

  2. We want the data to always be up to date and to arrive free of any human toil. Waiting on monthly PDF statements or manual CSV downloads is not an option for us because that’s too much of a delay, and we wouldn’t want to employ humans to do daily toil.

  3. For many banks and brokers, the only way to get the data is using a technology called SFTP. One could achieve the same thing more conveniently with other technologies (various types of APIs); however, many organisations only support SFTP, so we must adapt.

  4. Using SFTP well is time-intensive. To make it such that over the course of years of usage it reliably receives data and doesn’t get hacked, significant skill and toil are required. I used to operate VPN servers, and running SFTP servers has many parallels. I’d rather not do that in-house.

  5. Luckily, AWS provides SFTP as a managed service called AWS Transfer Family, so we built our solution around that. SFTP Wrangler is about stringing together various AWS services such that any firm that needs to receive data via SFTP can do so with minimal ongoing effort by humans.

  6. We figured that by sharing we might do something useful for other firms.

How we use SFTP Wrangler at TB

Here’s how SFTP Wrangler fits into the bigger picture.

flowchart showing how data flows

Notice some of the pieces we’ve mentioned in other posts:


We’re also happy to see a growing number of banks and fintechs use APIs to make it easy to get data, including Wise, where a growing number of talented ExpressVPN alumni are now building.

We've been running on AWS Transfer Family for about two years now, and we’ve had zero issues with it. Our time spent with SFTP ops has been literally zero. The only source of occasional toil has been when banks that do SFTP Pull suffer from outages in their own servers - suggesting they might benefit from using AWS themselves.

How others might use it

Other investment firms and family offices: probably all of your private banks and brokerage firms have some kind of method to automatically pass you T+1 data daily, and most likely they do so only via SFTP. Even if you already have a vendor that aggregates your accounts, I suspect there’s still value in having the source data yourself. It gives you optionality for doing useful things with your data down the road without them being silo’ed in a vendor.

To anyone else in the many other industries still dealing with SFTP: we’ve been very satisfied with AWS Transfer Family and we’d recommend using it. The open-source SFTP Wrangler code on GitHub shows how we configure and connect it with other AWS services, and maybe it provides some inspiration that can save you time as well

Why we’re open-sourcing the SFTP Wrangler code

We sometimes do knowledge exchange sessions with friends of the firm, and we’re usually surprised how much both sides learn from each other. What seems obvious to one side might end up being a useful pointer to the other. That’s one motivation for open-sourcing our code. It might not be a turnkey solution for others to use, but it can be a conversation starter on engineering methods.

Potentially noteworthy are how SFTP Wrangler:

  • Manages infrastructure as code using Terraform.

  • Tests itself automatically.

  • Runs in containers only, even on workstations, one of several methods to reduce security risks. We’ve included a threat model on GitHub and look forward to seeing feedback from security-minded people.

Lastly, some thoughts on why I’d much rather trust Amazon to run SFTP servers than us doing it in-house

It’s difficult to keep SFTP servers up-and-running and reasonably free of hackers 24/7/365. For example, even just last year in 2024, there was this bug in the software underpinning SFTP that allowed hackers direct access to some types of servers. Sysadmins needed to race to patch their servers, and there have been many more such cases over the years. I think it’s impractical for a small firm to operate such servers securely, so I’m very glad that Amazon has this service. Compared to “doing it ourselves”, Amazon has saved us weeks of work per year and probably averted several hacking events.

Next
Next

Using Arch to streamline oversight of our alternative investments