I have a fast radix sort. Perhaps 1.2–2x faster than SortingAlgorithms’s radix sort according to my very limited benchmarks. SorthingAlgorithms’ is also quite a bit faster than Base’s current defaults, putting the total speedup around 2–7x, again, according to very limited benchmarks.

**This only works for a small subset of types**

Specifically, mine works only for `AbstractVector{UIntN}`

, and with a thin, two pass pre- and one pass post- processor also works for `AbstractVector{IntN}`

, `AbstractVector{FloatN}`

.

My plan is to implement a trait `serializable(::Type)`

* and functions `serialize()`

/`deserialize()`

that convert things uniquely to and back from an `Unsigned`

serialization that preserves `isless`

, and on `sort!(v::AbstractVector{T})`

, check if `serializiable(T)`

, and if so, dispatch to a serialized radix sort which works as follows:

- First pass serializing each element in the input vector (
`serialize(x)`

is guaranteed to return a serialization that fits anywhere an`x`

could fit (i.e. is smaller in memory)) & at the same time calculating the maximum and minimum serialization (these are both`Unsigned`

). - Report the minimum and maximum to a heuristic which

a) Decides whether to dispatch to counting sort

b) Counts how many bits are in`max`

and in`max-min`

c) Decides whether it is worth it to conduct a second pass (see 3)

d) Decides how big`chunk_size`

should be - Conduct a second compression pass

a) Maybe subtract out the minimum (an optimization for the common case of sorting`Signed`

which also helps with sorting arrays of positive, floating-point values)

b) Maybe also copy the vector into a smaller datatype - Sort using the radix sort algorithm (this takes about 80% of runtime in my profiling)
- Decompress & deserialize in one pass

*`serialize`

functions also take an `order::Base.Order.Ordering`

as an input argument, and, at least at first, only `DirectOrdering`

s would be serializable. Here is a tested draft of serialization.

Questions & Suggestions welcome!