Skip to content

Fix array assignment and deletion at storage slot overflow boundary #15984

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 12 commits into
base: develop
Choose a base branch
from

Conversation

r0qs
Copy link
Member

@r0qs r0qs commented Apr 7, 2025

Fixes #15587

@r0qs r0qs marked this pull request as draft April 7, 2025 09:37
@r0qs r0qs force-pushed the storageOverflowDeleteBug branch 2 times, most recently from cbd97cc to b14ee0a Compare April 7, 2025 10:02
@r0qs r0qs added this to the 0.8.30 milestone Apr 7, 2025
@r0qs r0qs force-pushed the storageOverflowDeleteBug branch 10 times, most recently from 550b207 to 1614491 Compare April 11, 2025 14:37
@r0qs r0qs force-pushed the storageOverflowDeleteBug branch from 1614491 to 0d84691 Compare April 14, 2025 15:34
Copy link
Member

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First batch of review comments. Mostly about making sure the test coverage is adequate.

@cameel

This comment was marked as resolved.

@r0qs r0qs changed the title Storage overflow delete bug Fix array assignment and deletion at storage slot overflow boundary Apr 23, 2025
@r0qs r0qs force-pushed the storageOverflowDeleteBug branch 3 times, most recently from b57d7a2 to 47471c6 Compare April 23, 2025 20:47
Comment on lines 670 to +654
if (_type.storageStride() < 32)
clearStorageLoop(TypeProvider::uint256());
clearStorageLoop(TypeProvider::uint256(), /* _canOverflow */ false);
else
clearStorageLoop(_type.baseType());
clearStorageLoop(_type.baseType(), /* _canOverflow */ false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the docs for delete, it does not even say that the elements of dynamic arrays are necessarily zeroed. It only says this:

delete a assigns the initial value for the type to a. I.e. for integers it is equivalent to a = 0, but it can also be used on arrays, where it assigns a dynamic array of length zero or a static array of the same length with all elements set to their initial value.

The way it's written, it rather sounds like it's equivalent to an assignment and does not to reset the elements. It's an important detail because in storage this cleanup is expensive and the user should be aware of it.

It's not strictly related to the bug (other than one could argue that any change related to dynamic arrays is not a bug because of this because cleanup is not a documented feature), but we should take the opportunity to correct that and state that the elements get cleared. Either here or in a separate PR.

@clonker clonker force-pushed the storageOverflowDeleteBug branch from 7b9d321 to 459e0a0 Compare May 6, 2025 15:29
@r0qs r0qs force-pushed the storageOverflowDeleteBug branch from 459e0a0 to c724157 Compare May 6, 2025 15:33
Comment on lines 291 to +298
// stack: target_ref target_data_end target_data_pos_updated
if (targetBaseType->storageBytes() < 32)
utils.clearStorageLoop(TypeProvider::uint256());
{
if (!targetType->isDynamicallySized() && !sourceType->isDynamicallySized())
if (auto const* targetArrayType = dynamic_cast<ArrayType const*>(targetType))
{
auto const* sourceArrayType = dynamic_cast<ArrayType const*>(sourceType);
solAssert(sourceArrayType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this bit added here because it is not guaranteed that when there's nothing to clear the position is equal to the end? If so, it needs a comment providing that context. Otherwise it looks just like an optimization (which could be done just as well in a separate PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole bit is added so that we don't end up with a target position that is bigger than the target end position, which can happen for packed arrays. If that were to happen, we'd run into out of gas because the EQ check in the clearing loop never takes. It also i an optimization in that sense because we can skip the clearing in such cases...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should put that in a comment.

@r0qs r0qs force-pushed the storageOverflowDeleteBug branch from c724157 to 935a52f Compare May 6, 2025 16:06
@clonker clonker force-pushed the storageOverflowDeleteBug branch from 935a52f to ba4853a Compare May 6, 2025 16:54
@r0qs r0qs force-pushed the storageOverflowDeleteBug branch 4 times, most recently from 63243bd to ee9196c Compare June 4, 2025 13:37
Copy link
Member

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug is problematic enough that I decided to crawl through the codegen and inspect all the storage loops: Storage loop audit in solc 0.8.30.

It took a while but was worth it. I think I found an extra buggy case the PR missed. It also puts the gas cost of the change in perspective (we do not seem to care as much about the minuscule overhead of the comparison in other storage loops). And reveals which top-level language features are potentially affected (and should have test coverage).

Comment on lines +815 to +817
if (_canOverflow)
_context << Instruction::EQ;
else
_context << Instruction::GT << Instruction::ISZERO;
Copy link
Member

@cameel cameel Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at all the storage loops we have my opinion is that we should actually do it a bit differently. EQ (with or without the extra check) may be good enough, but there's actually a better way to go about it. I also think gas differences are not that significant in the cases we have here.

Safety

There are in total 11 functions that contain loops over storage:

  1. YulUtilFunctions::copyArrayToStorageFunction()
  2. YulUtilFunctions::copyValueArrayToStorageFunction()
  3. YulUtilFunctions::copyByteArrayToStorageFunction()
  4. YulUtilFunctions::clearStorageRangeFunction()
  5. ArrayUtils::copyArrayToStorage()
  6. ArrayUtils::resizeDynamicArray()
  7. ArrayUtils::clearStorageLoop()
  8. YulUtilFunctions::copyArrayFromStorageToMemoryFunction()
  9. ABIFunctions::abiEncodingFunctionSimpleArray()
  10. ABIFunctions::abiEncodingFunctionCompactStorageArray()
  11. ArrayUtils::copyArrayToMemory()

The 3 that the PR fixes (clearStorageRangeFunction(), copyArrayToStorage(), clearStorageLoop()) are the only ones that that do not handle wrap-around correctly due to the comparison of storage pointers. 7 out of the remaining 8 avoid it by using array indices and comparing them against array length, which will never wrap (arrays cannot be bigger than storage). The last one (copyArrayToMemory()) uses memory pointers, which is a valid approach as well - memory pointers cannot wrap around in practice, because the amount of memory in use is strongly limited by gas.

I think that iteration against array length is the best approach. It's as safe as EQ with the extra check, but better, because there's no risk we will forget or misapply the check - it's an integral part of the loop. It is also more consistent with what we're doing in other functions.

Cost

When it comes to the cost:

  • The EQ check is optimal, because it does not add any overhead compared to the buggy implementation.
  • Switching to comparing memory/calldata pointers would be optimal as well, but possible only in copyArrayToStorage() and only when we're not copying from storage to storage.
  • EQ with the extra check adds a per-loop cost: comparison + jump.
  • Iteration against array length adds a per-iteration cost: conversion from an index to a storage address, i.e. a few ADDs and MULs, dependent on how many times the storage address is used.
    • The cost can be reduced by maintaining separate variables for the array index and the storage pointer, Then it's just one extra per-loop initialization and one extra per-iteration increment.
    • In places where we do not have array length at hand, iteration against the number of slots (end - start) is another safe alternative (though it needs the extra check, just like EQ).

Ultimately, I don't think that the cost difference between all these solutions is significant. It's just a micro-optimization that's dwarfed by other costs. Every loop iteration performs at least one SSTORE, which even for a warm write is an order of magnitude costlier (100 gas). A cold one can cost up to 20000 gas.

This optimization also may not even have been deliberate. If it were, why wasn't it done consistently for all of those functions? At least in the IR codegen this goes against the philosophy of keeping things simple. It may be more justifiable in the legacy codegen, but still why do it only in some cases?

Finally, at least in the IR codegen, it will affect only storage clearing. Storage copies and reads already bear the extra cost and they seem like something much more worthy of being fine-tuned if we do think it's necessary.

auto const* sourceArrayType = dynamic_cast<ArrayType const*>(sourceType);
solAssert(sourceArrayType);

auto const numSourceItemsPerSlot = 32 / sourceArrayType->baseType()->storageBytes();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In storage we only use pointers for dynamic arrays. Static ones are stored contiguously. So something like uint[3][4] consists of 4 uint[3] arrays, placed one after another.

/**
* The type of an array. The flavours are byte array (bytes), statically- (<type>[<length>])
* and dynamically-sized array (<type>[]).
* In storage, all arrays are packed tightly (as long as more than one elementary type fits in
* one slot). Dynamically sized arrays (including byte arrays) start with their size as a uint and
* thus start on their own slot.
*/
class ArrayType: public ReferenceType

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one case the PR missed: the copy loop in $copyArrayToStorage_<type>_to_<type>().

// stack: target_ref target_data_end source_data_pos target_data_pos source_data_end [target_byte_offset] [source_byte_offset]
evmasm::AssemblyItem copyLoopStart = _context.newTag();
_context << copyLoopStart;
// check for loop condition
_context
<< dupInstruction(3 + byteOffsetSize) << dupInstruction(2 + byteOffsetSize)
<< Instruction::GT << Instruction::ISZERO;
evmasm::AssemblyItem copyLoopEnd = _context.appendConditionalJump();

The comparison is source_data_pos >= source_data_end, where both values are storage addresses and can overflow. According to my analysis, it should be possible to trigger this with push(), assignment or a storage variable initializer when the source of the assignment is at the storage boundary and the target is anywhere in storage.

It seems that the PR only has tests that write to an array at storage boundary. It does not test copying values from such an array.


auto const numSourceItemsPerSlot = 32 / sourceArrayType->baseType()->storageBytes();
auto const paddedLength = numSourceItemsPerSlot * sourceArrayType->storageSize();
auto const paddedTargetType = TypeProvider::array(targetArrayType->location(), targetArrayType->baseType(), paddedLength);
Copy link
Member

@cameel cameel Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I see it now. The "padding" here is not the padding I was thinking of (the remaining part of a slot that's too small to fit any array items in), but rather the extra space beyond the end of the target, corresponding to unoccupied item spots in the final slot of source.

Even with your description it took me a while to wrap my head around it though. This should be really profusely commented in the source. I think that the most confusing thing to understand was that the copying loop iterates over array items, but the condition is checked against full slots and iterates too far. Now that I got it, I see that you you mentioned that and there's even a small comment in the source, but the code is barely readable and I was just making incorrect assumptions along the way. It made no sense to me for the function to work like this and I only understood it after carefully reading the code a few times over and trying to write a step by step reply on how target_data_pos cannot possibly exceed target_data_end here :)

Actually, it still makes no sense to me. It means that in the bytes1/bytes31 example you mention we'll end up overwriting the space after the target array. That space may be occupied by another variable. How does it not end up corrupting storage?

One situation where this might not be unsafe could be if the array was guaranteed to be small and allocated at a hash-based address in storage. It does not seem to be the case here though. We only do this with dynamic arrays and as far as I can tell this function is supposed to handle everything except for byte arrays, i.e. including static arrays.

// gas legacy: 199887
// gas legacyOptimized: 196845
// gas legacy: 199827
// gas legacyOptimized: 203694
Copy link
Member

@cameel cameel Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is odd. The only thing that changed in IR codegen is the replacement of eq() with sub() and both have the same cost (3 gas).

My guess would be that sub() prevents some optimizations from kicking in where they would for eq() (please verify this).

It this is the case then this fix could be more costly than it appears unless we also tweak the optimizer. This might be another argument for switching to iteration against length instead of just changing the comparison.

Copy link
Member

@cameel cameel Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The coverage we have here is not good enough. At the very least it does not cover the one buggy loop I mentioned above. It could also be missing other relevant cases.

Check out the diagrams I included in Storage loop audit in solc 0.8.30. It shows which language features can ultimately lead to execution of those loops. For each path we should have at least one test with data placed at the storage boundary. Possibly more to account for things like static/dynamic, byte/non-byte arrays, nested arrays, partial assignments, etc.

The most important cases are the writes, since only those seem to be buggy and the fix affects only them.

IMO it would be best to also have reads covered as well to make sure future implementation changes don't break them. That could be done in a separate PR though.

@r0qs r0qs modified the milestones: 0.8.30, 0.8.31 Jul 14, 2025
@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Jul 28, 2025
@github-actions github-actions bot closed this Aug 4, 2025
@r0qs r0qs reopened this Aug 4, 2025
@ethereum ethereum deleted a comment from github-actions bot Aug 4, 2025
@ethereum ethereum deleted a comment from github-actions bot Aug 4, 2025
@github-actions github-actions bot removed closed-due-inactivity stale The issue/PR was marked as stale because it has been open for too long. labels Aug 5, 2025
@r0qs r0qs added the must have Something we consider an essential part of Solidity 1.0. label Aug 7, 2025
@r0qs r0qs force-pushed the storageOverflowDeleteBug branch from ee9196c to 6e8db55 Compare August 11, 2025 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
must have Something we consider an essential part of Solidity 1.0.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent Treatment of Storage Arrays on the Slot Overflow Boundary
4 participants