Thoughts, Questions and Confusions about the Sum trait [RUST]
Thoughts, Questions and Confusions about the Sum trait [RUST]
Intro
I'm not on top of traits or generics but found myself looking some of them up anyhow, and came across the Sum trait.
Here is the Std Lib documentation on Sum (I believe).
And I guess all of the generics and/or type logic and how they interoperate has thrown me for a bit of a spin ... so I thought I'd put my thoughts here. Maybe I'll work things out in writing it or maybe someone here can help me/us out?
A bit long ... sorry
Trait Definition
From the docs and source, here is the trait's signature:
rust
// core::iter::Sum
pub trait Sum<A = Self>: Sized {
// Required method
fn sum<I: Iterator<Item = A>>(iter: I) -> Self;
}
First thoughts: Defined on elements not iterators?
- The part that confused me at first was what
Selfis actually. Naively, I imagined it was referring to the iterator (or type that'd implementedIterator) ... but that clearly can't be true because the return type isSelf. - So ...
Sumis implemented not on any collection but on the element type?! - If so, why not rely on the
AddTrait at the element level, which is responsible for the addition operator (see docs here)?
Kinda seems so?
- So, in trying to understand this, I thought I'd look at the source of
Iterator::sum()first figuring that it'd be the main implementation.- See docs on sum() here and source code here
- This is the
sumyou'd be calling in something likevec![1, 2, 3].into_iter().sum()to get6.
rust
core::iter::Iterator::sum
fn sum<S>(self) -> S
where
Self: Sized,
S: Sum<Self::Item>,
{
Sum::sum(self)
}
- Ok, so the call of
Sum::sum(self)clearly indicates that this is not whereSumis defined (instead it must be inSum::sum()somehow). - Moreover,
selfis being passed intoSum::sum(), withselfbeing theIteratorhere ... which means there's no method being called onIteratoritself but something from another module. - Additionally, this method is bound by the generic
<S>which is defined in thewhereclause asSum<Self::Item>... which ... wait WTF is going on?- So this method (
Iterator::sum()) must return a type that has implemented the traitSum?? - If that's correct, then that confirms my suspicion that
Sumis implemented on the elements of an iterator (where I'm sure those comfortable with the generics syntax of the definition above are yelling YES!! OF course!!) - That's because the return type of
sum()would generally have to be the same type as the summed elements, soSis both the type of the elements in the iterator and the return type ofsum. All good. - And indeed, in the definition of the
typealiasSwe've gotSum<Self::Item>which binds the return type ofIterator::sum()to the type of the iterator's elements (ieSelf::Item)Self::Itemis technically theItemtype of theIteratorwhich can, AFAIU, be defined as distinct from the type of the elements of the collection from which the iterator is derived but that's another story.
- So this method (
Back to the beginning
- So back to trying to understand the definition of
core::iter::Sum(which I believe is the definition of the trait):
rust
// core::iter::Sum
pub trait Sum<A = Self>: Sized {
// Required method
fn sum<I: Iterator<Item = A>>(iter: I) -> Self;
}
- The trait itself is bound to
Sized. I don't know the details aroundSized(see docs here and The book, ch 19.4 here) but it seems fundamental likely that it applies to vectors and the like. - The generic
A = Selfand its occurrences in the generics for thesum()function and its return type ... are a lot:- AFAIU,
Self, ie the type onSumis implemented for, must be theItemtype for theIteratorthat will be passed into thesummethod. - But it must also be the return type of
sum()... which makes sense.
- AFAIU,
- So the confusing part here then is the generic type of the
sum()method:<I: Iterator<Item = A>>.- Remember,
A = Self, so it's really<I: Iterator<Item = Self>>(right?) - This generic type is any
IteratorwhoseItem(ie, the type that is returned each iteration) is the same type asSelf.
- Remember,
- Which means that if I want to sum a vector if
i32numbers, I'd have to make sure I've implementedSumnot onVecbut oni32and defined it as a method that takes any iterator ofi32(ieSelf) elements to then return ani32element. - Ok ....
Confirmation
- We can look at the implementors of
core::iter::Sum( see docs here) and check the source for thei32implementation ... - Which gives us this source code:
rust
integer_sum_product! { i8 i16 i32 i64 i128 isize u8 u16 u32 u64 u128 usize }
- Which is using this macro defined in the same file:
rust
macro_rules! integer_sum_product {
(@impls $zero:expr, $one:expr, #[$attr:meta], $($a:ty)*) => ($(
#[$attr]
impl Sum for $a {
fn sum<I: Iterator<Item=Self>>(iter: I) -> Self {
iter.fold(
$zero,
#[rustc_inherit_overflow_checks]
|a, b| a + b,
)
}
}
- which ... uses
fold()(basicallyreducebut with an initial value) and plain addition in the anonymous/closure function|a, b| a + b. What!?
Why? How?
- Ok that was a long way to go to find the addition operator at the bottom of the heap of traits!
- Hopefully I've grasped the mechanics?!
- I'm not quite clear on why it's build this way. I'm guessing there's some flexibility baked into the way that the relevant implementation of
Sumdepends on the element type, which can be flexibly defined as theItemtype of anIteratorindependently of the type of the collection's elements. That is, an iterator can utilise a type different from the actual elements of a collection and then rely on its particular implementation of sum. And then this can be independent fromAdd. - But that feels like a lot of obscure flexibility for a pretty basic operation, no?
- For example, this code doesn't compile because a type needs to be specified, presumably type inference gets lost amongst all the generics?
rust
// doesn't compile
let x = vec![1i32, 2, 3].into_iter().sum();
// These do compile
let x2 = vec![1i32, 2, 3].into_iter().sum::<i32>(); // turbofish!!
let x3: i32 = vec![1i32, 2, 3].into_iter().sum();
- Design choices aside ...
- I'm still unclear as to how
Iterator::sum()works
rust
fn sum<S>(self) -> S
where
Self: Sized,
S: Sum<Self::Item>,
{
Sum::sum(self)
}
- How does
Sum::sum(self)work!? selfis theIterator(sovec![1i32, 2, 3].iter()).- And
Sum::sum()is the essential trait addressed above. - How does rust go from a call of a trait's method to using the actual implementation on the specific type? I guess I hadn't really thought about it, but it makes sense and it's what all those
Selfs are for. - In this case though, it's rather confusing that the relevant implementation isn't on the type of
self, but because of the definition ofSum, the implementation is on the type of the elements (orItemspecifically) ofself. Sighs
Thoughts??
You made me curious and I found some discussion on the subject: https://github.com/rust-lang/rust/issues/27739. Short version is that you could do that if you had some other trait that would tell you what the zero value of the type is, so you know what is the sum of
vec![]. Originally the standard library did just that, the trait was literally calledZero. But there were some issues with it and it has been removed in favor of the current design.Unfortunately with this design of the
Sumtrait it is impossible to guess the result type from the iterator type. For example, see https://godbolt.org/z/c8M7eshaM.Cheers! And thanks for digging up the github issue!
I'd also caught mentions of the whole zero thing being behind the design. Which is funny because once you get down to the implementation for the numeric types, zero seems (I'm not on top of macro syntax) to be just a parameter of the macro, which then gets undefined in the call of the macro, so I have to presume it defaults to
0somehow??. In short, the zero has to be provided in the implementation ofsumfor a specific type. Which I suppose is flexible. Though in this case I can't discern what the zero is for the integer types (it's explicitly0.0for floats).Guessing Result Types
This bit, I don't understand!
// core::iter::iterator fn sum<S>(self) -> S where Self: Sized, S: Sum<Self::Item>, { Sum::sum(self) }The definition is pretty clear here right? The generic here is
Sum<Self::Item>, abbreviated toS... which AFAIU ... means that the element type of the iterator --- hereSelf::Item--- is the type that has implementedSum... and the type that will be returned.IE, whatever the type of the iterator's elements ... that is the type of the result of
sum... right?What's more, rust knows when you've provided the wrong type.
EG ... this doesn't compile:
let x4 = vec![1i32, 2, 3].into_iter().sum::<i64>();The error message here is: "the trait
Sum<i32>is not implemented fori64".So the compiler knows the relevant types and can check whether the appropriate trait is implemented ... but doesn't do so for type inference, which AFAICT, would be fine if there is only one valid trait defined (as is the case for all of the basic numerics).
So it seems that it isn't impossible but rather it's just not done for some reason of efficiency?
Otherwise, it is interesting to consider the flexibility provided here (at least to me) ... that one can apparently implement
Sum<i32>for another different numeric type.However, I don't actually understand how that could be done (I probably don't understand traits and generics well enough here).
As the definition of the
Sumtrait is:pub trait Sum<A = Self>: Sized { fn sum<I: Iterator<Item = A>>(iter: I) -> Self; }So I'd presume the
A = Selffollowed byI: Iterator<Item = A>for the iterator binds the implementation pretty clearly to the type of the iterator's elements.So how could a
Sum<i32>possibly be implemented fori64??In the end though I think it's informative to see that this is a pretty valid alternative to using
sumand disambiguating the result type:vec![1i32, 2, 3].into_iter().reduce(|a,x| a+x); // or if you want to provide the 0 vec![1i32, 2, 3].into_iter().fold(0, |a,x| a+x);It uses hardly any more characters too!
Thanks again for the response!!
Quite confusingly, the two
=s have very different meaning here. TheItem = Asyntax just says that the iterator's item type, which is set as the trait's associated type, should beA. So, you could read this as "Ishould implement theIteratortrait, and theItemassociated type of this implementation should beA".However,
A = Selfdoes not actually mean any requirement ofA. Instead, it means thatSelfis the default value ofA: that is, you can doimpl Sum<i64> for i32and then you will haveSelfequal toi32andAequal toi64, but you can also doimpl Sum for i32and it will essentially be a shorthand forimpl Sum<i32> for i32, giving you bothSelfandAequal toi32.In the end, we have the relationship that the iterator item should be the same as
A, but we do not have the relationship thatSelfshould be the same asA. So, given this trait, the iterator item can actually be different toA.Note that the standard library does actually have implementations where these two differ. For instance, it has
impl<'a> Sum<&'a i32> for i32, giving you a possibility to sum the iterator of&i32intoi32. This is useful when you think about this: you might want to sum such an iterator without.copied()for some extra ergonomics, but you can't just return&i32, there is nowhere to store the referencedi32. So, you need to return thei32itself.In
Sum<Self::Item>,Self::Itemis theAparameter, andSum<Self::Item>, orS, is the type that implements the trait (which is calledSelfin the definition of theSumtrait, but is different to theSelfin thesummethod definition). As above,AandScan be different.It might be helpful to contrast this definition with a more usual one, where the trait does not have parameters:
fn some_function<S>(…) -> … where S: SomeTrait, {…} fn sum<S>(…) -> … where S: Sum<Self::Item>, {…}Note that you might have an intuition from some other languages that in case of polymorphism, the chosen function either depends on the type of one special parameter (like in many OOP languages, where everything is decided by the class of the called object), or of the parameter list as a whole (like in C++, where the compiler won't let you define
int f()andfloat f()at the same time, but will be fine withint f(int)andfloat f(float)). As you can see, in Rust, the return type also matters. A simpler example of this is theDefaulttrait.Regarding inference, some examples (Compiler Explorer link):
vec![1i32].into_iter().sum(); // or: <_ as Sum<_>>::sum(vec![1i32].into_iter()); // error[E0283]: type annotations needed // note: cannot satisfy `_: Sum<i32>`Compiler knows that the iterator contains
i32s, so it looks for something that implementsSum<i32>. But we don't tell the compiler what to choose, and the compiler does not want to guess by itself.vec![1i32].into_iter().sum::<i32>(); // or: <i32 as Sum<_>>::sum(vec![1i32].into_iter());As above the compiler knows that it wants to call something that implements
Sum<i32>, but now it only has to check thati32is such type. It is, so the code compiles.vec![1i32].iter().sum::<i32>(); // or: <i32 as Sum<_>>::sum(vec![1i32].iter());Now we actually have a iterator of references, as we used
.iter()instead of.into_iter(). But the code still compiles, sincei32also implementsSum<&i32>.vec![1i64].into_iter().sum::<i32>(); // or: <i32 as Sum<_>>::sum(vec![1i64].into_iter()); // error[E0277]: a value of type `i32` cannot be made by summing an iterator over elements of type `i64` // help: the trait `Sum<i64>` is not implemented for `i32`Now the compiler can calculate itself that it want to call something that implements
Sum<i64>. However,i32does not actually implement it, hence the error. If it did, the code would compile correctly.vec![].into_iter().sum::<i32>(); // or: <i32 as Sum<_>>::sum(vec![].into_iter()); // error[E0283]: type annotations needed // (in the second case) note: multiple `impl`s satisfying `i32: Sum<_>` found in the `core` crate: impl Sum for i32; impl<'a> Sum<&'a i32> for i32;Now the situation is reversed. The compiler knows the return type, so it knows that
i32should implement someSum<_>. But it doesn't know the iterator element type, and so it doesn't know if it should choose the owned value, or the reference version. Note that the wording is different, the compiler wants to guess, but it can't, as there are multiple possible choices. But if there is only one choice, the compiler does guess it:struct X {} impl Sum for X { fn sum<I: Iterator<Item = X>>(_: I) -> Self { Self{} } } vec![].into_iter().sum::<X>(); // or: <X as Sum<_>>::sum(vec![].into_iter());builds correctly. I am not sure about the reason for the difference (I feel like it's related to forward compatibility and the fact that outside the standard library I can do
impl Sum<i32> for MyTypebut notimpl Sum<MyType> for i32, but I don't really know).Hope that helps :3
EDIT:
Ah, I read this, thought about this, and forgot about this almost immediately. I know almost nothing about macros, but if I understand correctly, the zero is in line 92, here:
($($a:ty)*) => ( integer_sum_product!(@impls 0, 1, #[stable(feature = "iter_arith_traits", since = "1.12.0")], $($a)*); integer_sum_product!(@impls Wrapping(0), Wrapping(1), #[stable(feature = "wrapping_iter_arith", since = "1.14.0")], $(Wrapping<$a>)*); );The intention seems to be to take a list of types (
i8 i16 i32 i64 i128 isize u8 u16 u32 u64 u128 usize), and then for each type to generate both the regular andWrappingversion, each time calling into the path you have seen before. For floats there is noWrappingversion, so this time0.0is really the only kind of zero that can appear.