# How to Save 70% Gas Fee on Aptos PancakeSwap with Only ~40 Lines of Code?

# Overview

TL;DR:

Since the deployment of PancakeSwap on Aptos, there has been an influx of activity and users on the Aptos Ecosystem. Hence, we noticed the opportunity to assist the PancakeSwap team to reduce gas consumption. The gas usage of PancakeSwap is relatively high with ~0.0105 APT per call to swap (see the original gas cost on Aptos Explorer, using Testnet Mode).

After collaborating with MoveBit, the new version which will be released soon will be around ~0.0034 APT per call (see the optimized gas cost on Aptos Explorer, using Testnet Mode).

Furthermore, after U128 overflow detection is added, the fee can be further reduced to 0.001478 which will account for a total reduction of 58% in the normal situation.

What’s more important is that all the change is within~40 lines of code.

Can you believe that?

**Problem Analysis**

Community shared their feedback that stablecoin swap on the Pancakeswap Move Ecosystem takes relatively higher gas fee.

We took a thorough look of the whole codebase and main swap logic is straightforward.

Then our focus switched to its dependencies. It basically depends on two things:

- Aptos standard library & Aptos framework
- pancake’s implementation of Math lib and U256 lib

We did a bit of review of libraries written by Aptos, and it is not very likely to cause high gas usage. Then we found that the U256 lib is the suspicious part:

`swap()`

makes use of 8 `u256::mul()`

and 2 `u256::sub()`

. Those are, however, expensive. For example, `sub()`

requires at most 16 calls to `overflowing_sub()`

. And `mul()`

takes 16 iterations, each of them calls 3 `overflowing_add()`

and 5 other functions, which makes a total of more than 128 function invocations. Seems overly luxurious! Besides, although `U256`

has fixed fields `v3..v0`

, the `U256`

module uses `get()`

and `put()`

as getter/setter to manipulate the fields. They use a lot of redundant `if-else`

, which also results in more gas usage.

# Expensive Code Identification

We first need to ensure that our previous inference is correct. That is, we need to confirm it is unnecessary usage of `U256`

causing the problem.

Let’s verify our guess.

Initially, we deployed the original PancakeSwap on the Aptos Testnet. As we’ve mentioned earlier, it is relatively costly: It takes ~0.0105 APT per swap action (see the original gas cost on Aptos Explorer in the test mode).

Then, we remove the usage of `U256`

completely and replace all of them with regular built-in primitive types. The result is that this version only takes ~0.0033 APT (see no-U256 gas cost in the test mode). Of course, even though we can run it on Testnet with a few mocked transactions without any problem, this implementation is inherently wrong. Because it ignored possible overflow and might abort when a large number gets involved.

# Optimization Step1

Before diving into any further refactoring. We should take a look at the computation on L552 of `swap.move`

.

To simplify our discussion, we’ll use shorter symbols for variables in the program. Let *a** *denote amount, ** r **denote reserve, and a variable with a prime symbol denotes an adjusted value. For example：

Without consideration of any limit of bounded integer size, mathematically, the computation is equivalent to

Check whether

holds

Then we can analyze the range of them.

For ** b’_x**, we see that,

and*b_x*are*P*`u64`

, thus their product can always be represented by a`u128`

is a*a_ {x,in}*`u64`

. Multiplying it with 25 results in a product within the limit of`u128`

- Ignoring the underflow, the difference between two
`u128`

can be represented by a`u128`

.

Similarly, we can also use a `u128`

to store *b'_y.*

Now ** p**, which is the product of

**and**

*b’_x***might overflow if we were to define it as a**

*b’_y*`u128`

. We have to use a 256-bit unsigned integer to store it.Lastly, we can see ** k** as the product of

**and**

*r_x P***Similar to our previous analysis,**

*r_y P.***and**

*r_x P***can both be**

*r_y P*`u128`

because all factors of them are `u64`

. Therefore, **should also be a**

*k*`u256`

.Now, we can eliminate all unnecessary `u256`

variables, so the code becomes:

`let prec = (PRECISION as u128);`

let balance_x_adjusted = (balance_x as u128) * prec - (amount_x_in as u128) * 25u128;

let balance_y_adjusted = (balance_y as u128) * prec - (amount_y_in as u128) * 25u128;

let reserve_x_adjusted = (reserves.reserve_x as u128) * prec;

let reserve_y_adjusted = (reserves.reserve_y as u128) * prec;

let p = u256::mul(u256::from_u128(balance_x_adjusted), u256::from_u128(balance_y_adjusted));

let k = u256::mul(u256::from_u128(reserve_x_adjusted), u256::from_u128(reserve_y_adjusted));

let compare = u256::compare(&u256::mul(balance_x_adjusted, balance_y_adjusted), &k);

assert!(compare == u256::get_greater_than() || compare == u256::get_equal(), ERROR_K);

With only 2 `u256`

multiplications, the gas fee becomes ~0.0052 APT, which is 50% of the original gas cost (see it here).

## Optimization Step2

Do you remember how we learned multiplication when we were kids? At first, we are taught to memorize the multiplication table of digits no greater than 9. Then for multiplication involving numbers larger than 10, we do it digit by digit.

For example, to calculate 472 × 32, we will have a table like this:

`471 <-- 471 interpreted as 4*100 + 7*10 + 1`

* 32 <-- 32 interpreted as 3*10 + 2

-------

2 <-- 1*2, 0 left shift

14 <-- 7*2, 1 left shift

8 <-- 4*2, 2 left shift

3 <-- 1*3, 1 left shift

21 <-- 7*3, 2 left shift

+ 12 <-- 4*3, 3 left shift

-------

15072

In the decimal world, each digit ranges from 0 to 9.

Now, let’s imagine each `u64`

is a single digit, ranging from 0 to 2⁶⁴ — 1 (namely, `MAX_U64`

). Then, each `u128`

is essentially two `u64`

digits, which can `U256`

be four `u64`

digits. Digits with smaller indices mean *insignificant parts*, while those with larger indices mean *significant parts*.

To create a lossless multiplication of `a`

and `b`

of type `u128`

, we can arrange the computation in a similar manner :

` a1 a0 <--- a = (a1<<64) + a0`

* b1 b0 <--- b = (b1<<64) + b0

-------------

w1 w0 <--- [Step 1] a0*b0, 0 left shift

x1 x0 <--- [Step 2] a1*b0, 1 left shift

y1 y0 <--- [Step 3] a0*b1, 1 left shift

+ z1 z0 <--- [Step 4] a1*b1, 2 left shift

-------------

v3 v2 v1 v0 <--- interpreted as (v3<<192) + (v2<<128) + (v1<<64) + v0

## Step 0. Digits extraction

So we first create those “digits” of `u64`

:

`let a1 = (a >> 64);`

let a0 = (a & U64_MAX);

let b1 = (b >> 64);

let b0 = (b & U64_MAX);

Note that they are actually of type `u128`

, as we want to do lossless multiplication of them later. Defining them as `u128`

directly saves us from doing repetitive type narrowing and widening, which charges more gas.

## Step 1: a0 and b0

Then we calculate `a0 * b0`

and save them to `v1`

and `v0`

:

`let p = a0 * b0;`

let v1 = (p >> 64);

let v0 = (p & U64_MAX);

Also, note that we don’t actually create `w1`

and `w0`

. We will add values in later steps to `v3..v0`

on-the-fly to use fewer local variables and instructions.

## Step 2: a1 and b0

The calculation of `a0 * b1`

takes more steps than the last step. The code looks like this:

`let p = a1 * b0;`

v1 = v1 + (p & U64_MAX) ;

let v2 = (p >> 64);

if (v1 > U64_MAX) {

v2 = v2 + (v1 >> 64);

v1 = (v1 & U64_MAX);

};

Since we have written value to `v1`

and `v0`

, now we have to handle the addition carefully.

After adding `x0`

(defined as `(a1*b0) & U64_MAX`

, namely `p & U64_MAX`

) to `v1`

on line 2, we now have to save the possible carry number to `v2`

(line 3).

Furthermore, `v1`

might overflow after its increment. We have to give it a check on line 4. If this is the case, move the overflowed part to `v2`

(from line 5)

## Step 3: a0 and b1

Multiplying `a0`

and `b1`

is similar to the last step, but we have more edge cases to consider. Be careful!

`let p = a0 * b1;`

v1 = v1 + (p & U64_MAX);

if (v1 > U64_MAX) {

v2 = v2 + (v1 >> 64);

v1 = (v1 & U64_MAX);

};

v2 = v2 + (p >> 64);

let v3 = (v2 >> 64);

if (v2 > U64_MAX) {

v2 = (v2 & U64_MAX);

};

Like step 2, we add `y0`

to `v1`

on line 2. Since `v2`

might contain a number, we cannot assign value to it directly like before. So we should handle the possible overflow `v1`

from line 3 to line 6. Then we add the significant part of `a0*b1`

to `v2`

on line 7. Now, `v2`

might be a value that doesn't fit in `u64`

, we define `v3`

to be the possible overflowed part and let `v2`

be the insignificant part.

## Step 4: a1 and b1

Yay, just one more step to get it done! And it has nothing new:

`let p = a1 * b1;`

v2 = v2 + (p & U64_MAX);

if (v2 > U64_MAX) {

v3 = v3 + (v2 >> 64);

v2 = (v2 & U64_MAX);

};

v3 = v3 + (p >> 64);

Everything is straightforward: get `z1 z0`

(line 1); add `z0`

to `v2`

(line 2); handle the possible overflow of `v2`

(line 3-line 6); add a significant part `z1`

to `v3`

.

But, did we just forget to check whether `v3`

would overflow if it were a `u64`

?The answer is no. It is possible for `v3 as u64`

to overflow because it is ensured by math that the product of two `u128`

will always fit in a `U256`

.

## Final Touch

As the last step, we simply construct a `U256`

from `v3..v0`

:

`U256 { `

v0: (v0 as u64),

v1: (v1 as u64),

v2: (v2 as u64),

v3: (v3 as u64),

}

Remember that `v3..v0`

are actually `u128`

s, so don't forget the type narrowing to `u64`

.

We also want to make sure that this implementation is correct. We can write some unit testing:

`#[test]`

fun test_mul_u128() {

let a = U128_MAX - 3;

let b = U128_MAX - 9;

let p1 = mul_u128(a, b);

let p2 = mul(from_u128(a), from_u128(b)); assert!(p1 == p2, 0);

}

Then we can run `aptos move test`

. You will see that all the tests are passed.

Lastly, we may try the Move Prover. Create a file called `u256.spec.move`

, and add specs:

`spec pancake::u256 {`

spec module { pragma verify = false; }

spec fun value_of_U256(a: U256): num {

a.v0 +

a.v1 * P64 +

a.v2 * P64 * P64 +

a.v3 * P64 * P64 * P64

}

spec mul_u128 {

pragma verify = true;

pragma timeout = 3600;

ensures value_of_U256(result) == a * b;

}

}

When you run `aptos move prove`

, you will see error information complaining that Move Prover doesn't support bitwise-and operations. One workaround is to replace all `p & U64_MAX`

with `p % (1 + U64_MAX)`

. However, unfortunately, this would cause SMT solver timeout (both Z3 and CVC5). Hopefully, it would get verified with a newer version of SMT solvers.

## Complete Code

In file `pancake-swap/sources/utils/u256.move`

, add

`public fun mul_u128(a: u128, b: u128): U256 {`

let a1 = (a >> 64);

let a0 = (a & U64_MAX);

let b1 = (b >> 64);

let b0 = (b & U64_MAX);

// Step 1, v0 v1 created

let p = a0 * b0;

let v1 = (p >> 64);

let v0 = (p & U64_MAX);

// Step 2: v2 created, v1 modified

let p = a1 * b0;

v1 = v1 + (p & U64_MAX) ;

let v2 = (p >> 64);

if (v1 > U64_MAX) {

v2 = v2 + (v1 >> 64);

v1 = (v1 & U64_MAX);

};

// Step 3: v3 created, v2 v1 modified

let p = a0 * b1;

v1 = v1 + (p & U64_MAX);

if (v1 > U64_MAX) {

v2 = v2 + (v1 >> 64);

v1 = (v1 & U64_MAX);

};

v2 = v2 + (p >> 64);

let v3 = (v2 >> 64);

if (v2 > U64_MAX) {

v2 = (v2 & U64_MAX);

};

// Step 4: v3 v2 modified

let p = a1 * b1;

v2 = v2 + (p & U64_MAX);

if (v2 > U64_MAX) {

v3 = v3 + (v2 >> 64);

v2 = (v2 & U64_MAX);

};

v3 = v3 + (p >> 64);

// Result

U256 { v0: (v0 as u64), v1: (v1 as u64), v2: (v2 as u64), v3: (v3 as u64) }

}

`/// Greater or equal to`

public fun ge(a: &U256, b: &U256): bool {

if (a.v3 != b.v3) { return a.v3 >= b.v3 };

if (a.v2 != b.v2) { return a.v2 >= b.v2 };

if (a.v1 != b.v1) { return a.v1 >= b.v1 };

(a.v0 >= b.v0)

}

Replace Line 522 with Line 560 in `swap.move`

with:

`let prec = (PRECISION as u128);`

let balance_x_adjusted = (balance_x as u128) * prec - (amount_x_in as u128) * 25u128;

let balance_y_adjusted = (balance_y as u128) * prec - (amount_y_in as u128) * 25u128;

let reserve_x_adjusted = (reserves.reserve_x as u128) * prec;

let reserve_y_adjusted = (reserves.reserve_y as u128) * prec;

let p = u256::mul_u128(balance_x_adjusted, balance_y_adjusted);

let k = u256::mul_u128(reserve_x_adjusted, reserve_y_adjusted);

assert!(u256::ge(&p, &k), ERROR_K);

## Conclusion

When writing code, always keep gas costs in mind! Yes, we ain’t no longer 1980s programmers who have to save every bit of resource. However, we still must not waste too much of them.

Currently, the Move compiler doesn’t do many optimizations. So you should have a basic impression of how your code will be compiled into bytecodes.

Then form some idioms. For example, `loop {}`

is slightly better than `while (true) {}`

.

Happy Moving!

**About MoveBit**

MoveBit is a security company for the Move ecosystem with a vision to make the Move ecosystem the most secure Web3 destination. The MoveBit team is composed of security leaders from academia and enterprise with 10 years of security experience. The team was one of the earliest contributors to the Move ecosystem, working with Move developers to set the standard for secure Move applications.

MoveBit Social Media Platforms:

Official Website | Twitter | Medium | GitHub | Discord