Lang/dataStructures

Learning a new programming language usually consists of learning two basic things first: 1. Data structures and their APIs 2. Control logic (if/else, loops - for, while, do-while etc)

In a similar fashion, we take a look at how to construct familiar data structures first.

Data Structures

module Lang.dataStructures where

open import Agda.Builtin.Nat public using (Nat)

Types

Agda is an implementation of type theory, a branch of mathematics which deals with Type as an object and various theorems (APIs) for working with Types. A Type presents a notion of an object of a certain kind, i.e. it assigns some extra information to an object.

Programmers might already be well familiar with the notion of Type as it’s use is widespread in programming.

val x:Int = 0

int x;

variable
  x : Nat

The Set type is the root of all types, somewhat akin to java’s Object or scala’s Any. Every other Type is a Set, whereas Set itself if of type Set : Set₁. We will look deeper into this at a later stage.

Data keyword

Programming languages often come bundled with some primitive data types like int, float, string etc and some that combine these primitive types into more complex structures, e.g. map or dict (python) can be used to construct say map<string, array<string>> or the Either datatype in haskell, the Option datatype in scala, tuple in python and so on.

Most languages allow creating new data types using either cartesian product or cartesian sum (disjoint union) on a bunch of pre-existing data types and enclosing them inside a container.

Some examples of cartesian product of String, Int and another String would be:

val a : Tuple2[String, Tuple2[Int, String]] = ...

val b : Tuple2[Tuple2[String, Int], String] = ...

val c : Map[String, (Int, String)] = ...

val a : Option[A] = ... // Either A or null

val a : Either[A, B] = ... // Either type A or type B

Some languages also provide the mechanism to define new data types, sometimes by alias-ing a data type with a name like in scala:

type newData = Map[String, List[Float]]

Others provide the facility to define completely new data types, like haskell does with the data keyword:

-- this states that the type `Bool` can have two values False and True
data Bool = False | True

A haskell datatype can also have constructors. For e.g. if we were to define a shape type which can either be a circle or a rectangle:

data Shape = Circle Float Float Float | Rectangle Float Float Float Float

-- uses the Circle constructor to create an object of type Shape
Circle 1.2 12.1 123.1

-- uses the Rectangle constructor to create an object of type Shape
Rectangle 1.2 12.1 123.1 1234.5

It is important to appreciate that Shape is a new Type, one that did not previously exist before in the language.

-- lets assume the type ℝ, or real numbers is already defined
module _ {ℝ : Set} where
  data Shape : Set where
    Circle : ℝ → ℝ → ℝ → Shape
    Rectangle : ℝ → ℝ → ℝ → ℝ → Shape

Functions

In Type theory, a function f that operates on values of type 𝔸, also called domain of the function and produces values of type 𝔹, also called co-domain of the function, is represented as:

not : Bool → Bool
not true  = false
not false = true

This pattern matching way if typically found in functional programming language and the reader is cautioned that this will be used heavily throughout this work. As Agda is implemented in haskell and its syntax shares high degrees of similarity, it is encouraged to be a bit well versed with basic haskell.

Trivial Types

Empty

An empty type cannot be created cause it has no constructor. This is the most barebones of a data definition.

data ⊥ : Set where

Singleton

data ⊤ : Set where
  singleton : ⊤

Boolean type

data Bool : Set where
  true : Bool
  false : Bool

Natural numbers

Natural numbers are the integral numbers used for counting, e.g. 0,1,2,3… Natural numbers support an operation called succ for succession, which can be used to create new natural numbers from known ones, e.g. 3 = 2 + 1 = 1 + 1 + 1. Thus, all natural numbers can be created from zero and n successive numbers after zero. All we need to know are:

data ℕ : Set where
  zero : ℕ
  succ : ℕ → ℕ

The operations for natural numbers, addition, subtraction, multiplication and powers can be defined as functions in Agda:

_+_ : ℕ → ℕ → ℕ
x + zero = x
x + (succ y) = succ (x + y)

_−_ : ℕ → ℕ → ℕ
zero  − m     = zero
succ n − zero  = succ n
succ n − succ m = n − m

_×_ : ℕ → ℕ → ℕ
x × zero     = zero
x × (succ y) = x + (x × y)

_² : ℕ → ℕ
x ² = x × x

_^_ : ℕ → ℕ → ℕ
x ^ zero = succ zero
x ^ (succ y) = x × (x ^ y)

one   = succ zero
two   = succ one
three = succ two
four  = succ three
five  = succ four
six   = succ five
seven = succ six
eight = succ seven
nine  = succ eight
ten   = succ nine

and so on. Here, each member of the integer family can be derived from a smaller member by successively applying succ to it. Such a type is called an Inductive type.

x + (succ y) = succ (x + y)

Here, we mean that for all inputs of the type x + (succ y) where succ y corresponds to a natural number that has been constructed using a succ i.e. it is (succ y) > 0, return succ (x + y). This pattern is called “pattern matching”. Pattern matching is heavily used all over the functional programming world, e.g. in scala:

(1 to 100).map{
  // pattern match against all values of type integer
  case(y:Int) if y > 5 =>
    y+1
  // if above pattern does not match
  case _ =>
    0
}

Binary Trees

We define a binary tree using the following definition. Note that this merely creates an empty structure of a tree, the nodes or leaves contain no information in them, except for their position in the tree:

data BinTree : Set where
  leaf : BinTree
  node : BinTree → BinTree → BinTree

Now let us augment the binary trees with leaves containing natural numbers in leaf nodes:

data ℕBinTree : Set where
  leaf : ℕ → ℕBinTree
  node : ℕBinTree → ℕBinTree → ℕBinTree

data ℕNodeBinTree : Set where
  leaf : ℕ → ℕNodeBinTree
  node : ℕ → ℕNodeBinTree → ℕNodeBinTree → ℕNodeBinTree

Binary trees with each node containing a natural number and each leaf containing a boolean:

data ℕMixedBinTree : Set where
  leaf : Bool → ℕMixedBinTree
  node : ℕ → ℕMixedBinTree → ℕMixedBinTree → ℕMixedBinTree

Graph

data Vertex : Set
data Edge   : Set
data Graph  : Set

data Vertex where
  vertex : ℕ → Vertex

data Edge where
  edge : Vertex → Vertex → Edge

data Graph where
  idGraph : Edge → Graph
  _+|+_ : Graph → Edge → Graph

infixl 3 _+|+_

graph : Graph
graph = idGraph (edge (vertex zero)   (vertex seven))     +|+
                edge  (vertex one)    (vertex three)      +|+
                edge  (vertex seven)  (vertex four)       +|+
                edge  (vertex nine)   (vertex (succ six))

List

data List (A : Set) : Set where
  [] : List A
  _::_ : A → List A → List A

infixr 5 _::_

bunch : List Bool
bunch = false :: false :: true :: false :: true :: []

Finite set

A finite set can be considered as a finite bunch of numbered objects, such that each object can be uniquely identified by an integer, thus making the set countable. This count is called the Cardinality of the set.

Formally, a finite set Fin is a set for which there exists a bijection (one-to-one and onto correspondence)

where \(n ∈ ℕ\) and [n] is the set of all natural numbers from 0 to n.

data Fin : ℕ → Set where
  id : (n : ℕ) → Fin (succ n)
  succ : (n : ℕ) → Fin n → Fin (succ n)

Fin n represents the set of first n natural numbers, i.e., the set of all numbers smaller than n. We create a finite set of four elements:

fourFin : Fin four
fourFin = succ three (succ two (succ one (id zero)))

Indexed sequences or Vectors

We now define a finite sized indexed list, also called a vector Vec. The constructor consists of:

data Vec (A : Set) : ℕ → Set where
  [] : Vec A zero
  cons : (n : ℕ) → A → Vec A n → Vec A (succ n)

vec1 : Vec Bool one
vec1 = cons zero true []

vec2 : Vec Bool two
vec2 = cons one false vec1

vec3 : Vec Bool three
vec3 = cons two true vec2

Note that each vector has its size encoded into it’s type. This is not to be confused with set theory based lists, where any two list of different number of elements have the same type.

val x : List[Int] = List(1,2,3,4,5)
val y : List[Int] = List(1,2,3,4,5,6,7,8,9,0)

data ⟂ : Set where

data BoolVec(A B : Set) : Bool → Set where
  id₁ : B → BoolVec A B false
  id₂ : A → BoolVec A B true

containsB : BoolVec ⟂ ℕ false
containsB = id₁ three

containsA : BoolVec ℕ ⟂ true
containsA = id₂ four

Data Structures

Types

`Data` keyword

Functions

Trivial Types

Empty

Singleton

Boolean type

`Natu`ral numbers

Binary Trees

Graph

List

Finite set

Indexed sequences or Vectors

Data Structures

Types

Data keyword

Functions

Trivial Types

Empty

Singleton

Boolean type

Natural numbers

Binary Trees

Graph

List

Finite set

Indexed sequences or Vectors

`Data` keyword

`Natu`ral numbers