AxisIndices
Introduction
The goals of this package are:
- Facilitate multidimensional indexing (e.g.,
instance_of_an_array[indices]
) that supports semantic user facing indices (e.g.,indices = Second(1)
). - Accomplishing the first goal should not interfere in the ability to perform the vast majority of array related methods (e.g,
vcat
,append!
, etc.). - It should be easy to implement new subtypes of
AbstractAxis
that accommodate novel behavior and needs.
These goals are accomplished predominantly through the AbstractAxis
type. It is a subtype of AbstractUnitRange{<:Integer}
with an additional interface for creating keys and interacting with them. This additional interface is intended to be easily extended to new types that may be needed for a variety of different situations. An additional AxisArray
type is provided that uses any subtype of AbstractAxis
for each axis. However, many methods are provided and documented internally so that it's easy for users to alter the behavior of an AxisArray
with a new AbstractAxis
subtype or create an entirely unique multidimensional structure.
Quick Start
Custom indexing only requires specifying a tuple of keys[1] for the indices of each dimension.
julia> using AxisIndices
julia> A = AxisArray(reshape(1:9, 3,3),
(2:4, # first dimension has keys 2:4
3.0:5.0)); # second dimension has keys 3.0:5.0
Most code should work just the same for an AxisArray
...
julia> A[2, 1]
1
julia> A[2:4, 1:2] == parent(A)[1:3, 1:2]
true
julia> sum(A) == sum(parent(A))
true
But now the indices of each dimension have keys that we can filter through.
julia> A[==(2), ==(3.0)] == parent(A)[findfirst(==(2), 2:4), findfirst(==(3.0), 3.0:5.0)] == 1
true
julia> A[<(4), <(5.0)] == parent(A)[findall(<(4), 2:4), findall(<(5.0), 3.0:5.0)] == [1 4; 2 5]
true
Any value that is not a CartesianIndex
or subtype of Real
is considered a dedicated key type. In other words, it could never be used for default indexing and will be treated the same as the ==
filtering syntax above.
julia> AxisArray([1, 2, 3], (["one", "two", "three"],))["one"]
1
Note that the first call only returns a single element, where the last call returns an array. This is because all keys must be unique so there can only be one value that returns true
if filtering by ==
, which is the same as indexing by 1
(e.g., only one index can equal 1
). The last call uses operators that can produce any number of true
values and the resulting output is an array. This is the same as indexing an array by any vector (i.e., returns another array).
The Axis Interface
The supertype to all axis types herein is the AbstractAxis
, which is a subtype of AbstractUnitRange{<:Integer}
.
If we have a set of keys a b c
and a set of indices 1 2 3
then the key a
maps to the index 1
. Given these definitions, the AbstractAxis
differs from the classic dictionary in the following two ways:
- The
valtype
ofAbstractAxis
is always an integer. - The
values
are always unique and continuous.
The two main axis types defined here are Axis
and SimpleAxis
. The standard syntax for indexing doesn't change at all for these types.
julia> using AxisIndices
julia> using Dates
julia> using ChainedFixes # provides `and`, `or`, `⩓`, `⩔` methods
julia> sa = SimpleAxis(1:10)
SimpleAxis(1:10)
julia> sa[2]
2
julia> sa[>(2)]
SimpleAxis(3:10)
julia> a = Axis(1:10)
Axis(1:10 => SimpleAxis(1:10))
julia> a[2]
2
julia> a[2:3]
Axis(2:3 => SimpleAxis(2:3))
But now we can also use functions to index by the keys of an AbstractAxis
.
julia> a = Axis(2.0:11.0)
Axis(2.0:1.0:11.0 => SimpleAxis(1:10))
julia> a[1]
1
julia> a[isequal(2.0)]
1
julia> a[>(2)]
Axis(3.0:1.0:11.0 => SimpleAxis(2:10))
julia> a[>(2.0)]
Axis(3.0:1.0:11.0 => SimpleAxis(2:10))
julia> a[and(>(2.0), <(8.0))]
Axis(3.0:1.0:7.0 => SimpleAxis(2:6))
julia> sa[in(3:5)]
SimpleAxis(3:5)
This also allows certain syntax special treatment because they are obviously not referring to traditional integer based indexing.
julia> x, y, z = Axis([:one, :two, :three]), Axis(["one", "two", "three"]), Axis(Second(1):Second(1):Second(3));
julia> x[:one]
1
julia> x[:one] == y["one"] == z[Second(1)]
true
julia> x[[:one, :two]]
2-element AxisArray(::Array{Int64,1}
• axes:
1 = [:one, :two]
)
1
:one 1
:two 2
Note in the last example that a vector was returned instead of an AbstractAxis
. An AbstractAxis
is a subtype of AbstractUnitRange
and therefore cannot be reformed after any operation that does not guarantee the return of another unit range. This is similar to the behavior of UnitRange
in base.
Indexing an Axis
Setup for running axis examples.
julia> using AxisIndices, Unitful, ChainedFixes
julia> using Unitful: s
julia> time1 = Axis((1.5:.5:10)s)
Axis((1.5:0.5:10.0) s => SimpleAxis(1:18))
julia> time2 = Axis((1.5:.5:10)s, SimpleAxis(2:19))
Axis((1.5:0.5:10.0) s => SimpleAxis(2:19))
Indexing With Integers
Integers will map directly to the indices of an axis.
julia> time1[1]
1
julia> time1[2]
2
julia> time2[2]
2
julia> time2[1]
ERROR: BoundsError: attempt to access Axis((1.5:0.5:10.0) s => SimpleAxis(2:19)) at index [1]
[...]
Notice that time2[1]
throws an error. This is because the indices of the time2
axis don't contain a 1 and begins at 2. This allows an axis to map to any single dimensional memory mapping, even if it doesn't start at 1.
Indexing an axis with a collection of integers works similarly to indexing any other AbstractUnitRange
. That is, using other subtypes of AbstractUnitRange
preserves the structure...
julia> time1[1:2]
Axis((1.5:0.5:2.0) s => SimpleAxis(1:2))
julia> time2[2:3]
Axis((1.5:0.5:2.0) s => SimpleAxis(2:3))
However, we can't ensure that the resulting range will have a step of one in other cases so only the indices are returned.
julia> time1[1:2:3]
2-element AxisArray(::StepRange{Int64,Int64}
• axes:
1 = (1.5:1.0:2.5) s
)
1
1.5 s 1
2.5 s 3
julia> time1[[1, 2, 3]]
3-element AxisArray(::Array{Int64,1}
• axes:
1 = Unitful.Quantity{Float64,𝐓,Unitful.FreeUnits{(s,),𝐓,nothing}}[1.5 s, 2.0 s, 2.5 s]
)
1
1.5 s 1
2.0 s 2
2.5 s 3
julia> time1[firstindex(time1):end]
Axis((1.5:0.5:10.0) s => SimpleAxis(1:18))
Indexing With Keys
julia> time1[1.5s]
1
julia> time2[1.5s]
2
julia> time1[1.5s..3s]
Axis((1.5:0.5:3.0) s => SimpleAxis(1:4))
julia> time1[3s..4.5s]
Axis((3.0:0.5:4.5) s => SimpleAxis(4:7))
Approximate Indexing
julia> axis = Axis([pi + 0, pi + 1]);
julia> axis[3.141592653589793]
1
julia> axis[3.14159265358979]
ERROR: BoundsError: attempt to access Axis([3.141592653589793, 4.141592653589793] => SimpleAxis(1:2)) at index [3.14159265358979]
[...]
julia> axis[isapprox(3.14159265358979)]
1
julia> axis[isapprox(3.14, atol=1e-2)]
1
Indexing With Functions
Operators that typically return true
or false
can often
julia> time1[<(3.0s)]
Axis((1.5:0.5:2.5) s => SimpleAxis(1:3))
julia> time1[>(3.0s)]
Axis((3.5:0.5:10.0) s => SimpleAxis(5:18))
julia> time1[==(6.0s)]
10
julia> time1[!=(6.0s)] == vcat(1:9, 11:18)
true
These operators can also be combined to get more specific regions of an axis.
julia> time1[and(>(2.5s), <(10.0s))]
Axis((3.0:0.5:9.5) s => SimpleAxis(4:17))
julia> time1[>(2.5s) ⩓ <(10.0s)] # equivalent to `and` you can use \And<TAB>
Axis((3.0:0.5:9.5) s => SimpleAxis(4:17))
julia> time1[or(<(2.5s), >(9.0s))] == vcat(1:2, 17:18)
true
julia> time1[<(2.5s) ⩔ >(9.0s)] == vcat(1:2, 17:18) # equivalent to `or` you can use \Or<TAB>
true
The Array Interface
Construction
Take a standard array and attach custom keys along the indices of each dimension.
julia> using AxisIndices
julia> A_base = [1 2; 3 4];
julia> A_axis = AxisArray(A_base, ["a", "b"], [:one, :two])
2×2 AxisArray(::Array{Int64,2}
• axes:
1 = ["a", "b"]
2 = [:one, :two]
)
:one :two
"a" 1 2
"b" 3 4
Note that the keys provided are converted to a subtype of AbstractAxis
.
julia> axes(A_axis, 1)
Axis(["a", "b"] => SimpleAxis(1:2))
An AxisArray
may also be initialized using similar syntax as Array{T}(undef, dims)
.
julia> A_axis = AxisArray{Int}(undef, ["a", "b"], [:one, :two]);
julia> A_axis[:,:] = A_base;
julia> A_axis
2×2 AxisArray(::Array{Int64,2}
• axes:
1 = ["a", "b"]
2 = [:one, :two]
)
:one :two
"a" 1 2
"b" 3 4
We can also attach metadata to an array.
julia> using Metadata
julia> attach_metadata(AxisArray(A_base, (["a", "b"], [:one, :two])), (m1 = 1, m2 = 2))
2×2 attach_metadata(AxisArray(::Array{Int64,2}
• axes:
1 = ["a", "b"]
2 = [:one, :two]
), ::NamedTuple{(:m1, :m2),Tuple{Int64,Int64}}
• metadata:
m1 = 1
m2 = 2
)
:one :two
"a" 1 2
"b" 3 4
julia> attach_metadata(NamedAxisArray{(:xdim, :ydim)}(A_base, ["a", "b"], [:one, :two]), (m1 = 1, m2 = 2))
2×2 NamedDimsArray(attach_metadata(AxisArray(::Array{Int64,2}
• axes:
xdim = ["a", "b"]
ydim = [:one, :two]
), ::NamedTuple{(:m1, :m2),Tuple{Int64,Int64}}
• metadata:
m1 = 1
m2 = 2
))
:one :two
"a" 1 2
"b" 3 4
Combining Different Axes
One of benefits of AxisIndices using a unified backend for multiple axis types is that they can be arbitrarily mixed together. For example, here's an example the first indices are offset by 4 and the last indices are centered.
julia> AxisArray(ones(3,3), offset(4), center)
3×3 AxisArray(::Array{Float64,2}
• axes:
1 = 5:7
2 = -1:1
)
-1 0 1
5 1.0 1.0 1.0
6 1.0 1.0 1.0
7 1.0 1.0 1.0
Although this example isn't particularly useful, being able to arbitrarily mix axes with static characteristics, metadata, offset indices, semantic keys, etc. lends itself to easy custom designs and algorithms.
Performance
Indexing CartesianAxes
is comparable to that of CartesianIndices
.
julia> using AxisIndices, BenchmarkTools
julia> cartaxes = CartesianAxes((Axis(2.0:5.0), Axis(1:4)));
julia> cartinds = CartesianIndices((1:4, 1:4));
julia> @btime getindex(cartaxes, 2, 2)
20.848 ns (1 allocation: 32 bytes)
CartesianIndex(2, 2)
julia> @btime getindex(cartinds, 2, 2)
22.317 ns (1 allocation: 32 bytes)
CartesianIndex(2, 2)
julia> @btime getindex(cartaxes, ==(3.0), 2)
444.374 ns (7 allocations: 416 bytes)
CartesianIndex(2, 2)
Indexing LinearAxes
is comparable to that of LinearIndices
julia> using AxisIndices, BenchmarkTools
julia> linaxes = LinearAxes((Axis(1.0:4.0), Axis(1:4)));
julia> lininds = LinearIndices((1:4, 1:4));
julia> @btime getindex(linaxes, 2, 2)
18.275 ns (0 allocations: 0 bytes)
6
julia> @btime getindex(lininds, 2, 2)
18.849 ns (0 allocations: 0 bytes)
6
julia> @btime getindex(linaxes, ==(3.0), 2)
381.098 ns (6 allocations: 384 bytes)
7
You may notice there's significant overhead for using the filtering syntax. However, the filtering syntax takes advantage of a special type in base, Fix2
. This means that we can take advantage of filtering methods that have been optimized for specific types of keys. Here we do the same thing as above but we create a function that knows it's going to perform filtering.
julia> getindex_filter(a, i1, i2) = a[==(i1), ==(i2)]
getindex_filter (generic function with 1 method)
julia> @btime getindex_filter(linaxes, 3.0, 2)
57.216 ns (0 allocations: 0 bytes)
7
julia> linaxes2 = LinearAxes((Axis(Base.OneTo(4)), Axis(Base.OneTo(4))));
julia> @btime getindex_filter(linaxes2, 3, 2)
22.070 ns (0 allocations: 0 bytes)
7
Indexing linaxes
is much faster now that it can be optimized inside of a function call. However, it's still a little over twice as slow as normal indexing. That's largely because of the cost of searching 1.0:4.0
(which is a StepRangeLen
type in this case). The second benchmark demonstrates how close we really are to standard indexing given similar range types.
Notes
- axis: maps a set of keys to a set of indices.
- indices: a set of integers (e.g.,
<:Integer
) that locate the in memory locations of elements. - keys: maps a set of any type to a set of indices
- indexing: anytime one calls
getindex
or uses square brackets to navigate the elements of a collection
Also note the use of argument (abbreviated arg
in code) and arguments (abbreviated args
in code). These terms specifically refer to what users pass to an indexing method. Therefore, an argument may be a key (:a
), index (1
), or something else that maps to one of the two (==(1)
).
- 1Terminology here is important here. Keys, indices, and axes each have a specific meaning. Throughout the documentation the following functional definitions apply: