Key takeaways
- Qraft needed a streaming solution with high throughput and low latency to power its AI-powered products.
- Apache Kafka could not provide the latency Qraft required for their products, where milliseconds matter.
- Qraft chose Pulsar for its ability to manage distributed transactions within a microservice architecture and its feature flexibility.
Background
Founded in 2016, Qraft Technologies is a Seoul-based fintech company that provides innovative AI solutions to various institutional clients. Currently, Qraft operates four AI-driven ETFs with high AUM (assets under management) that are listed on the New York Stock Exchange.
Qraft’s flagship product is AI eXEcution (AXE), a deep-reinforcement-learning-based execution system, with the goal of maximizing returns by minimizing transaction costs of mass trading any financial product. AXE identifies the optimal strategy by learning market microstructure from historical tick data, including not only the price and transaction volume of individual stocks but also the transaction details and limit order book data. In 2018, in a competition sponsored by NVIDIA, Shinhan Bank, KOSCOM, and PwC, professional traders from well-known securities firms competed against AXE in buying stocks at a cheaper price. The competition lasted for five days. In the end, AXE outperformed the human traders by a wide margin, winning the $100,000 grand prize.
South Korean market tick data is generated daily in units of several gigabytes, requiring strict latency constraints. After researching streaming platforms, Qraft decided to use Apache Pulsar over Apache Kafka, to power Qraft’s event streaming machine learning model. Pulsar provides Qraft with a distributed, cloud-native, open-source messaging and streaming platform for real-time workloads.
An intelligent execution system
Before diving into the design details, here is a quick refresher on two financial terms: order execution and market impact. Simply put, order execution describes the process of buying or selling a fixed number of shares of certain securities. Naturally, the process of dealing with a large number of shares will impact the market causing price fluctuations. For example, if Warren Buffett wants to buy a million dollars worth of stocks, the act of buying will almost certainly impact the market, which will generate high transaction costs.
The best practice, naturally, is to slice large orders into many smaller ones. This is where AXE chips in. The AI-powered order execution system learns patterns of tick data from individual stocks and searches for an optimal strategy to place orders at different bid/ask prices.
AXE learns how to lower the trading cost compared to rule-based algorithms
Given the characteristic of execution, the model Qraft designed is an event streaming model.
Our main requirements for event streaming technologies are as follows:
- High throughput. The system must be able to handle large amounts of tick data that happens in the exchanges and must be scalable to place multiple orders in real-time with the same resiliency.
- Low latency. During the process of finding an efficient strategy to reduce transaction costs, the system must adapt to the fast-changing market conditions. Also, the system is designed to reduce the number of network communications between microservices.
- High availability. It is crucial for the order execution system to be able to manage failover scenarios. There is potential for major revenue loss if the ability to recover after certain failures is compromised.
A quest for an event streaming platform
The first technology Qraft investigated was Apache Kafka. Because it’s a distributed system, Kafka is scalable and supports failover scenarios. But the latency fell short of expectations for Qraft’s use case, where topics had to be horizontally scaled in decent numbers. When a high throughput is forced into the message queue, ideally latency should be less than 10 ms.
The next technology Qraft tested was Apache Pulsar. According to the benchmark report, it appeared Pulsar might be the right choice for Qraft. As shown below, Pulsar’s 99th percentile latency is between 5 and 15 ms, whereas Kafka’s can reach up to seconds and is hugely influenced by the number of topics, subscriptions, and different durability guarantees.
The next step was to achieve the metrics that Apache Pulsar claims. After spending a month testing Pulsar, all the latency was ideally below 10 ms and then below 5 ms after some performance tuning. Qraft has a future goal for a latency of less than 1 millisecond, which they hope to achieve with Pulsar.
Other than low latency, three other Pulsar features impressed the Qraft team:
- Features that offer flexibility. For example, sometimes users ask for a trade-off between low latency and consistency. The ability to turn off journaling for BookKeeper would meet this request.
- Failover capability. Pulsar handled failover tests successfully.
- Multi-language support. Pulsar officially supports Java, Go, Python, C++, and C#, with many custom clients for other programming languages provided by the community.
Implementing Pulsar with AXE
Qraft uses a Saga design pattern as a way to manage distributed transactions in a microservice architecture for AXE. As shown in the diagram below, Pulsar plays an essential role in all of this.
A diagram shows how Pulsar interacts with other microservices
AXE includes four different microservices with the following roles:
- Receiver receives live tick data from the exchange, pre-processes it, and sends it to Pulsar with each topic representing single equity. In this sense, Apple is one topic and Amazon is a separate topic.
- Worker (manager) receives data sent from Endpoint via Pulsar, and topics are identified by unique IDs for each job. The data is then sent to Pulsar after the AXE algorithm generates actions, saying, for example, “to buy or sell 10 shares of Apple at this price right now”. This is the actual service that powers the ML execution algorithms.
- Health-checker takes charge of registering jobs from the clients and distributing them to a pool of workers. A job could refer to executing 1000 shares of Apple stocks from 9:30 to 16:00.
- The endpoint connects with the client via FIX (financial information exchange protocol) or the AXE native communications protocol. All data received by the clients are sent to Workers or Health-checkers respectively. It also pulls data from Workers and Health-checkers to send to clients.
Lessons learned implementing Pulsar
Unlike Kafka, Pulsar is not as well known or supported in Korea, and this impacted the Qraft team as they were getting up to speed with Pulsar. But, in the process of working with Pulsar, the team ended up not only learning a lot directly from the source code, but they were also able to take advantage of Pulsar being open source and modifying a Pulsar client to meet their needs.
- Learning from the source code. Looking into the source code and figuring out things on their own was definitely challenging for the team, but they feel they learned a lot and they were able to successfully implement Pulsar in the end.
- Developing a custom-modified client. Because the team needed a Rust client, they tried using the Pulsar-rs client at first. But the client didn’t fully meet the team’s needs. However, the team was able to fork the repository to modify and then maintain a custom version of the client. Currently, Qraft team has merged their version with the mainstream and is currently making contributions.
The Qraft team looks forward to continuing to modify and develop this custom client and hopes it can be of use to other developers as well.
Conclusion
Apache Pulsar is a distributed, cloud-native messaging and streaming platform featuring high throughput, low latency, and high availability. It now plays an essential part in helping Qraft’s AI-powered order execution system to find the optimal strategy in real time. Going forward, Qraft hopes to strengthen its relationship with the Pulsar community and work together to help take Pulsar to the next level.
Reference
- https://github.com/apache/pulsar
- https://github.com/wyyerd/pulsar-rs
- https://streamnative.io/blog/benchmarking-pulsar-and-kafka-report-2020
- https://medium.com/qraft/ai-execution-system-2cdcdb9728fc
- https://www.qraftec.com/s/Qraft_AI-Asset-Management-Report_Outlook_eng.pdf
- https://www.slideshare.net/ssuserf8ed47/qraft-optimized-order-execution-with-reinforcement-learning-seongminkim
- https://news.mtn.co.kr/v/2018112316363121644
- https://www.entrepreneur.com/article/36779
More on Apache Pulsar
Pulsar has become one of the most active Apache projects over the past few years, with a vibrant community that continues to drive innovation and improvements to the project.
- Start your on-demand Pulsar training today with StreamNative Academy.
- Spin up a Pulsar cluster in minutes with StreamNative Cloud. StreamNative Cloud provides a simple, fast, and cost-effective way to run Pulsar in the public cloud.
- Save your spot at the Pulsar Summit San Francisco. The first in-person Pulsar Summit is taking place this August! Sign up today to join the Pulsar community and the messaging and event streaming community.
- Read more related articles:
- [Case Study] Keytop Delivers Enhanced Parking Experience with Apache Pulsar
- [Blog] Spring into Pulsar
- [Blog] Developing Event-Driven Microservices with Apache Pulsar: Part I
Newsletter
Our strategies and tactics delivered right to your inbox