Julia has a rich standard library that is available with every Julia installation. Contrary to everything that we have seen so far, e.g. types, data structures and filesystem; you must load standard library modules into your environment to use a particular module or function.
This is done via using
or import
. In this book, we will load code via using
:
using ModuleName
After doing this, you can access all functions and types inside ModuleName
.
Knowing how to handle dates and timestamps is important in data science. As we said in Why Julia? (Section 2) section, Python’s pandas
uses its own datetime
type to handle dates. The same is true in the R tidyverse’s lubridate
package, which also defines its own datetime
type to handle dates. In Julia packages don’t need to write their own dates logic, because Julia has a dates module in its standard library called Dates
.
To begin, let’s load the Dates
module:
using Dates
Date
and DateTime
TypesThe Dates
standard library module has two types for working with dates:
Date
: representing time in days andDateTime
: representing time in millisecond precision.We can construct Date
and DateTime
with the default constructor either by specifying an integer to represent year, month, day, hours and so on:
Date(1987) # year
1987-01-01
Date(1987, 9) # year, month
1987-09-01
Date(1987, 9, 13) # year, month, day
1987-09-13
DateTime(1987, 9, 13, 21) # year, month, day, hour
1987-09-13T21:00:00
DateTime(1987, 9, 13, 21, 21) # year, month, day, hour, minute
1987-09-13T21:21:00
For the curious, September 13th 1987, 21:21 is the official time of birth of the first author, Jose.
We can also pass Period
types to the default constructor. Period
types are the human-equivalent representation of time for the computer. Julia’s Dates
have the following Period
abstract subtypes:
subtypes(Period)
DatePeriod
TimePeriod
which divide into the following concrete types, and they are pretty much self-explanatory:
subtypes(DatePeriod)
Day
Month
Quarter
Week
Year
subtypes(TimePeriod)
Hour
Microsecond
Millisecond
Minute
Nanosecond
Second
So, we could alternatively construct Jose’s official time of birth as:
DateTime(Year(1987), Month(9), Day(13), Hour(21), Minute(21))
1987-09-13T21:21:00
Most of the time, we won’t be constructing Date
or DateTime
instances from scratch. Actually, we will probably be parsing strings as Date
or DateTime
types.
The Date
and DateTime
constructors can be fed a string and a format string. For example, the string "19870913"
representing September 13th 1987 can be parsed with:
Date("19870913", "yyyymmdd")
1987-09-13
Notice that the second argument is a string representation of the format. We have the first four digits representing year y
, followed by two digits for month m
and finally two digits for day d
.
It also works for timestamps with DateTime
:
DateTime("1987-09-13T21:21:00", "yyyy-mm-ddTHH:MM:SS")
1987-09-13T21:21:00
You can find more on how to specify different date formats in the Julia Dates
’ documentation. Don’t worry if you have to revisit it all the time, we ourselves do that too when working with dates and timestamps.
According to Julia Dates
’ documentation, using the Date(date_string, format_string)
method is fine if it’s only called a few times. If there are many similarly formatted date strings to parse, however, it is much more efficient to first create a DateFormat
type, and then pass it instead of a raw format string. Then, our previous example becomes:
format = DateFormat("yyyymmdd")
Date("19870913", format)
1987-09-13
Alternatively, without loss of performance, you can use the string literal prefix dateformat"..."
:
Date("19870913", dateformat"yyyymmdd")
1987-09-13
It is easy to extract desired information from Date
and DateTime
objects. First, let’s create an instance of a very special date:
my_birthday = Date("1987-09-13")
1987-09-13
We can extract anything we want from my_birthday
:
year(my_birthday)
1987
month(my_birthday)
9
day(my_birthday)
13
Julia’s Dates
module also has compound functions that return a tuple of values:
yearmonth(my_birthday)
(1987, 9)
monthday(my_birthday)
(9, 13)
yearmonthday(my_birthday)
(1987, 9, 13)
We can also see the day of the week and other handy stuff:
dayofweek(my_birthday)
7
dayname(my_birthday)
Sunday
dayofweekofmonth(my_birthday)
2
Yep, Jose was born on the second Sunday of September.
NOTE: Here’s a handy tip to just recover weekdays from
Dates
instances. Just use afilter
ondayofweek(your_date) <= 5
. For business day you can checkout theBusinessDays.jl
package.
We can perform operations in Dates
instances. For example, we can add days to a Date
or DateTime
instance. Notice that Julia’s Dates
will automatically perform the adjustments necessary for leap years, and for months with 30 or 31 days (this is known as calendrical arithmetic).
my_birthday + Day(90)
1987-12-12
We can add as many as we like:
my_birthday + Day(90) + Month(2) + Year(1)
1989-02-11
In case you’re ever wondering: “What can I do with dates again? What is available?”, then you can use methodswith
to check it out. We show only the first 20 results here:
first(methodswith(Date), 20)
[1] +(x::Date, y::Day) @ Dates /opt/hostedtoolcache/julia/1.10.7/x64/share/julia/stdlib/v1.10/Dates/src/arithmetic.jl:81
[2] +(x::Date, y::Week) @ Dates /opt/hostedtoolcache/julia/1.10.7/x64/share/julia/stdlib/v1.10/Dates/src/arithmetic.jl:79
[3] +(x::Date, y::Quarter) @ Dates /opt/hostedtoolcache/julia/1.10.7/x64/share/julia/stdlib/v1.10/Dates/src/arithmetic.jl:75
[4] +(dt::Date, z::Month) @ Dates /opt/hostedtoolcache/julia/1.10.7/x64/share/julia/stdlib/v1.10/Dates/src/arithmetic.jl:56
[5] +(dt::Date, y::Year) @ Dates /opt/hostedtoolcache/julia/1.10.7/x64/share/julia/stdlib/v1.10/Dates/src/arithmetic.jl:29
[6] +(dt::Date, t::Time) @ Dates /opt/hostedtoolcache/julia/1.10.7/x64/share/julia/stdlib/v1.10/Dates/src/arithmetic.jl:21
[7] +(x::Date, y::Unitful.Quantity) @ Unitful ~/.julia/packages/Unitful/dHMk1/src/dates.jl:197
[8] +(t::Time, dt::Date) @ Dates /opt/hostedtoolcache/julia/1.10.7/x64/share/julia/stdlib/v1.10/Dates/src/arithmetic.jl:22
[9] +(y::Unitful.Quantity, x::Date) @ Unitful ~/.julia/packages/Unitful/dHMk1/src/dates.jl:201
[10] -(x::Date, y::Day) @ Dates /opt/hostedtoolcache/julia/1.10.7/x64/share/julia/stdlib/v1.10/Dates/src/arithmetic.jl:82
[11] -(x::Date, y::Week) @ Dates /opt/hostedtoolcache/julia/1.10.7/x64/share/julia/stdlib/v1.10/Dates/src/arithmetic.jl:80
[12] -(x::Date, y::Quarter) @ Dates /opt/hostedtoolcache/julia/1.10.7/x64/share/julia/stdlib/v1.10/Dates/src/arithmetic.jl:76
[13] -(dt::Date, z::Month) @ Dates /opt/hostedtoolcache/julia/1.10.7/x64/share/julia/stdlib/v1.10/Dates/src/arithmetic.jl:68
[14] -(dt::Date, y::Year) @ Dates /opt/hostedtoolcache/julia/1.10.7/x64/share/julia/stdlib/v1.10/Dates/src/arithmetic.jl:37
[15] -(x::Date, y::Unitful.Quantity) @ Unitful ~/.julia/packages/Unitful/dHMk1/src/dates.jl:197
[16] (::Colon)(a::T, b::T) where T<:Date @ Dates /opt/hostedtoolcache/julia/1.10.7/x64/share/julia/stdlib/v1.10/Dates/src/ranges.jl:7
[17] convert(::Type{DateTime}, dt::Date) @ Dates /opt/hostedtoolcache/julia/1.10.7/x64/share/julia/stdlib/v1.10/Dates/src/conversions.jl:30
[18] convert(::Type{Day}, dt::Date) @ Dates /opt/hostedtoolcache/julia/1.10.7/x64/share/julia/stdlib/v1.10/Dates/src/conversions.jl:37
[19] floor(dt::Date, p::Day) @ Dates /opt/hostedtoolcache/julia/1.10.7/x64/share/julia/stdlib/v1.10/Dates/src/rounding.jl:73
[20] floor(dt::Date, p::Week) @ Dates /opt/hostedtoolcache/julia/1.10.7/x64/share/julia/stdlib/v1.10/Dates/src/rounding.jl:66
From this, we can conclude that we can also use the plus +
and minus -
operator. Let’s see how old Jose is, in days:
today() - my_birthday
13625 days
The default duration of Date
types is a Day
instance. For the DateTime
, the default duration is Millisecond
instance:
DateTime(today()) - DateTime(my_birthday)
1177200000000 milliseconds
One nice thing about Dates
module is that we can also easily construct date and time intervals. Julia is clever enough to not have to define the whole interval types and operations that we covered in Section 3.3.6. It just extends the functions and operations defined for range to Date
’s types. This is known as multiple dispatch and we already covered this in Why Julia? (Section 2).
For example, suppose that you want to create a Day
interval. This is easy done with the colon :
operator:
Date("2021-01-01"):Day(1):Date("2021-01-07")
2021-01-01
2021-01-02
2021-01-03
2021-01-04
2021-01-05
2021-01-06
2021-01-07
There is nothing special in using Day(1)
as the interval, we can use whatever Period
type as interval. For example, using 3 days as the interval:
Date("2021-01-01"):Day(3):Date("2021-01-07")
2021-01-01
2021-01-04
2021-01-07
Or even months:
Date("2021-01-01"):Month(1):Date("2021-03-01")
2021-01-01
2021-02-01
2021-03-01
Note that the type of this interval is a StepRange
with the Date
and concrete Period
type we used as interval inside the colon :
operator:
date_interval = Date("2021-01-01"):Month(1):Date("2021-03-01")
typeof(date_interval)
StepRange{Date, Month}
We can convert this to a vector with the collect
function:
collected_date_interval = collect(date_interval)
2021-01-01
2021-02-01
2021-03-01
And have all the array functionalities available, like, for example, indexing:
collected_date_interval[end]
2021-03-01
We can also broadcast date operations to our vector of Date
s:
collected_date_interval .+ Day(10)
2021-01-11
2021-02-11
2021-03-11
Similarly, these examples work for DateTime
types too.
Another important module in Julia’s standard library is the Random
module. This module deals with random number generation. Random
is a rich library and, if you’re interested, you should consult Julia’s Random
documentation. We will cover only three functions: rand
, randn
and seed!
.
To begin, we first load the Random
module. Since we know exactly what we want to load, we can just as well do that explicitly:
using Random: seed!
We have two main functions that generate random numbers:
rand
: samples a random element of a data structure or type.randn
: samples a random number from a standard normal distribution (mean 0 and standard deviation 1).rand
By default, if you call rand
without arguments it will return a Float64
in the interval \([0, 1)\), which means between 0 inclusive to 1 exclusive:
rand()
0.38818840951942324
You can modify rand
arguments in several ways. For example, suppose you want more than 1 random number:
rand(3)
[0.2196970622875195, 0.05200058919635486, 0.8525404465165708]
Or, you want a different interval:
rand(1.0:10.0)
3.0
You can also specify a different step size inside the interval and a different type. Here we are using numbers without the dot .
so Julia will interpret them as Int64
and not as Float64
:
rand(2:2:20)
14
You can also mix and match arguments:
rand(2:2:20, 3)
[2, 12, 12]
It also supports a collection of elements as a tuple:
rand((42, "Julia", 3.14))
3.14
And also arrays:
rand([1, 2, 3])
3
Dict
s:
rand(Dict(:one => 1, :two => 2))
:one => 1
For all the rand
arguments options, you can specify the desired random number dimensions in a tuple. If you do this, the returned type will be an array. For example, here’s a 2x2 matrix of Float64
numbers between 1.0 and 3.0:
rand(1.0:3.0, (2, 2))
2×2 Matrix{Float64}:
2.0 3.0
3.0 3.0
randn
randn
follows the same general principle from rand
but now it only returns numbers generated from the standard normal distribution. The standard normal distribution is the normal distribution with mean 0 and standard deviation 1. The default type is Float64
and it only allows for subtypes of AbstractFloat
or Complex
:
randn()
-0.6682606025554321
We can only specify the size:
randn((2, 2))
2×2 Matrix{Float64}:
-0.848072 0.340978
1.16349 1.02915
seed!
To finish off the Random
overview, let’s talk about reproducibility. Often, we want to make something replicable. Meaning that, we want the random number generator to generate the same random sequence of numbers. We can do so with the seed!
function:
seed!(123)
rand(3)
[0.521213795535383, 0.5868067574533484, 0.8908786980927811]
seed!(123)
rand(3)
[0.521213795535383, 0.5868067574533484, 0.8908786980927811]
In some cases, calling seed!
at the beginning of your script is not good enough. To avoid rand
or randn
to depend on a global variable, we can instead define an instance of a seed!
and pass it as a first argument of either rand
or randn
.
my_seed = seed!(123)
Random.TaskLocalRNG()
rand(my_seed, 3)
[0.521213795535383, 0.5868067574533484, 0.8908786980927811]
rand(my_seed, 3)
[0.19090669902576285, 0.5256623915420473, 0.3905882754313441]
NOTE: Note that these numbers might differ for different Julia versions. To have stable streams across Julia versions use the
StableRNGs.jl
package.
We’ll also cover the standard library’s Downloads
module. It will be really brief because we will only be covering a single function named download
.
Suppose you want to download a file from the internet to your local storage. You can accomplish this with the download
function. The first and only required argument is the file’s url. You can also specify as a second argument the desired output path for the downloaded file (don’t forget the filesystem best practices!). If you don’t specify a second argument, Julia will, by default, create a temporary file with the tempfile
function.
Let’s load the Downloads
module:
using Downloads
For example, let’s download our JuliaDataScience
GitHub repository Project.toml
file. Note that download
function is not exported by Downloads
module, so we have to use the Module.function
syntax. By default, it returns a string that holds the file path for the downloaded file:
url = "https://raw.githubusercontent.com/JuliaDataScience/JuliaDataScience/main/Project.toml"
my_file = Downloads.download(url) # tempfile() being created
/tmp/jl_wroPa1QviH
With readlines
, we can look at the first 4 lines of our downloaded file:
readlines(my_file)[1:4]
4-element Vector{String}:
"name = \"JDS\""
"uuid = \"6c596d62-2771-44f8-8373-3ec4b616ee9d\""
"authors = [\"Jose Storopoli\", \"Rik Huijzer\", \"Lazaro Alonso\"]"
""
NOTE: For more complex HTTP interactions such as interacting with web APIs, see the
HTTP.jl
package.
One last thing from Julia’s standard library for us to cover is the Pkg
module. As described in Section 2.2, Julia offers a built-in package manager, with dependencies and version control tightly controlled, manageable, and replicable.
Unlike traditional package managers, which install and manage a single global set of packages, Julia’s package manager is designed around “environments”: independent sets of packages that can be local to an individual project or shared between projects. Each project maintains its own independent set of package versions.
Project.toml
and Manifest.toml
Inside every project environment there is a simple setup involving .toml
files in a folder. The folder, in this context, can be perceived as a “project” folder. The project environment is derived on two .toml
files:
Project.toml
: higher-level description of the project environment with the top-level package list.Manifest.toml
: lower-level description of the project environment with the full dependencies list and their versions. This file is machine-generated, which means that users are not encouraged to edit it.In order to create a new project environment, you can enter the Pkg
REPL mode by typing ]
(right-bracket) in the Julia REPL:
julia>]
Then it becomes the Pkg
REPL mode:
(@v1.8) pkg>
Here we can see that the REPL prompts changes from julia>
to pkg>
. There’s also additional information inside the parentheses regarding which project environment is currently active, (@v1.8)
. The v1.8
project environment is the default environment for your currently Julia installation (which in our case is Julia version 1.8.X
).
NOTE: You can see a list of available commands in the
Pkg
REPL mode with thehelp
command.
Julia has separate default environments for each minor release, the X
s in the 1.X
Julia version. Anything that we perform in this default environment will impact any fresh Julia session on that version. Hence, we need to create a new environment by using the activate
command:
(@v1.8) pkg> activate .
Activating project at `~/user/folder`
(folder) pkg>
This activates a project environment in the directory that your Julia REPL is running. In my case this is located at ~/user/folder
. Now we can start adding packages to our project environment with the add
command in the Pkg
REPL mode:
(folder) pkg> add DataFrames
Updating registry at `~/.julia/registries/General.toml`
Resolving package versions...
Updating `~/user/folder/Project.toml`
[a93c6f00] + DataFrames v1.4.3
Updating `~/user/folder/Manifest.toml`
[34da2185] + Compat v4.4.0
[a8cc5b0e] + Crayons v4.1.1
[9a962f9c] + DataAPI v1.13.0
[a93c6f00] + DataFrames v1.4.3
[864edb3b] + DataStructures v0.18.13
[e2d170a0] + DataValueInterfaces v1.0.0
[59287772] + Formatting v0.4.2
[41ab1584] + InvertedIndices v1.1.0
[82899510] + IteratorInterfaceExtensions v1.0.0
[b964fa9f] + LaTeXStrings v1.3.0
[e1d29d7a] + Missings v1.0.2
[bac558e1] + OrderedCollections v1.4.1
[2dfb63ee] + PooledArrays v1.4.2
[08abe8d2] + PrettyTables v2.2.1
[189a3867] + Reexport v1.2.2
[66db9d55] + SnoopPrecompile v1.0.1
[a2af1166] + SortingAlgorithms v1.1.0
[892a3eda] + StringManipulation v0.3.0
[3783bdb8] + TableTraits v1.0.1
[bd369af6] + Tables v1.10.0
[56f22d72] + Artifacts
[2a0f44e3] + Base64
[ade2ca70] + Dates
[9fa8497b] + Future
[b77e0a4c] + InteractiveUtils
[8f399da3] + Libdl
[37e2e46d] + LinearAlgebra
[56ddb016] + Logging
[d6f4376e] + Markdown
[de0858da] + Printf
[3fa0cd96] + REPL
[9a3f8284] + Random
[ea8e919c] + SHA v0.7.0
[9e88b42a] + Serialization
[6462fe0b] + Sockets
[2f01184e] + SparseArrays
[10745b16] + Statistics
[8dfed614] + Test
[cf7118a7] + UUIDs
[4ec0a83e] + Unicode
[e66e0078] + CompilerSupportLibraries_jll v0.5.2+0
[4536629a] + OpenBLAS_jll v0.3.20+0
[8e850b90] + libblastrampoline_jll v5.1.1+0
From the add
output, we can see that Julia automatically creates both the Project.toml
and Manifest.toml
files. In the Project.toml
, it adds a new package to the project environment package list. Here are the contents of the Project.toml
:
[deps]
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
This is a .toml
file where:
[deps]
: a TOML table (also know as hash tables or dictionaries)DataFrames
: a key in the TOML deps
table; this is the name of the package"a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
: the value for the DataFrames
key; this is the universally unique identifier (UUID) of the package.Let’s also take a peek into the Manifest.toml
. Here we will truncate the output since it is a big machine-generated file:
# This file is machine-generated - editing it directly is not advised
julia_version = "1.8.3"
manifest_format = "2.0"
project_hash = "376d427149ea94494cc22001edd58d53c9b2bee1"
[[deps.Artifacts]]
uuid = "56f22d72-fd6d-98f1-02f0-08ddc0907c33"
...
[[deps.DataFrames]]
deps = ["Compat", "DataAPI", "Future", "InvertedIndices", "IteratorInterfaceExtensions", "LinearAlgebra", "Markdown", "Missings", "PooledArrays", "PrettyTables", "Printf", "REPL", "Random", "Reexport", "SnoopPrecompile", "SortingAlgorithms", "Statistics", "TableTraits", "Tables", "Unicode"]
git-tree-sha1 = "0f44494fe4271cc966ac4fea524111bef63ba86c"
uuid = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
version = "1.4.3"
...
[[deps.libblastrampoline_jll]]
deps = ["Artifacts", "Libdl", "OpenBLAS_jll"]
uuid = "8e850b90-86db-534c-a0d3-1478176c7d93"
version = "5.1.1+0"
The three dots above (...
) represents truncated output. First, the Manifest.toml
presents us a comment saying that it is machine-generated and discouragin editing it directly. Then, there are entries for the Julia version (julia_version
), Manifest.toml
format version (manifest_format
), and project environment hash (project_hash
). Finally, it proceeds with a TOML array of tables which are the double brackets entries ([[...]]
). These entries stands for the dependencies of all packages necessary to create the environment described in the Project.toml
. Therefore all of the DataFrames.jl
‘s dependencies and its dependencies’ dependencies (and so on…) are listed here with their name, UUID, and version.
NOTE: Julia’s standard library module do not have a
version
key in theManifest.toml
because they are already specified by the Julia version (julia_version
). This is the case for theArtifacts
entry in the truncatedManifest.toml
output above, since it is a module in Julia’s standard library.
We can keep adding as many packages as we like with the add
command. To remove a package you can use the rm
command in the Pkg
REPL mode:
(folder) pkg> rm DataFrames
Updating `~/user/folder/Project.toml`
[a93c6f00] - DataFrames v1.4.3
Updating `~/user/folder/Manifest.toml`
[34da2185] - Compat v4.4.0
[a8cc5b0e] - Crayons v4.1.1
[9a962f9c] - DataAPI v1.13.0
[a93c6f00] - DataFrames v1.4.3
[864edb3b] - DataStructures v0.18.13
[e2d170a0] - DataValueInterfaces v1.0.0
[59287772] - Formatting v0.4.2
[41ab1584] - InvertedIndices v1.1.0
[82899510] - IteratorInterfaceExtensions v1.0.0
[b964fa9f] - LaTeXStrings v1.3.0
[e1d29d7a] - Missings v1.0.2
[bac558e1] - OrderedCollections v1.4.1
[2dfb63ee] - PooledArrays v1.4.2
[08abe8d2] - PrettyTables v2.2.1
[189a3867] - Reexport v1.2.2
[66db9d55] - SnoopPrecompile v1.0.1
[a2af1166] - SortingAlgorithms v1.1.0
[892a3eda] - StringManipulation v0.3.0
[3783bdb8] - TableTraits v1.0.1
[bd369af6] - Tables v1.10.0
[56f22d72] - Artifacts
[2a0f44e3] - Base64
[ade2ca70] - Dates
[9fa8497b] - Future
[b77e0a4c] - InteractiveUtils
[8f399da3] - Libdl
[37e2e46d] - LinearAlgebra
[56ddb016] - Logging
[d6f4376e] - Markdown
[de0858da] - Printf
[3fa0cd96] - REPL
[9a3f8284] - Random
[ea8e919c] - SHA v0.7.0
[9e88b42a] - Serialization
[6462fe0b] - Sockets
[2f01184e] - SparseArrays
[10745b16] - Statistics
[8dfed614] - Test
[cf7118a7] - UUIDs
[4ec0a83e] - Unicode
[e66e0078] - CompilerSupportLibraries_jll v0.5.2+0
[4536629a] - OpenBLAS_jll v0.3.20+0
[8e850b90] - libblastrampoline_jll v5.1.1+0
NOTE: Julia’s
Pkg
REPL mode supports autocompletion with<TAB>
. You can, for example, in the above command start typingrm DataF<TAB>
and it will autocomplete torm DataFrames
.
We can see that rm DataFrames
undoes add DataFrame
by removing entries in both Project.toml
and Manifest.toml
.
Once you have a project environment with both the Project.toml
and Manifest.toml
files, you can share it with any user to have a perfectly reproducible project environment.
Now let’s cover the other end of the process. Suppose you received a Project.toml
and a Manifest.toml
from someone.
How would you proceed to instantiate a Julia project environment?
It is a simple process:
Project.toml
and a Manifest.toml
files into a folder.]activate
.]instantiate
command.That’s it! Once the project environment finished downloading and instantiating the dependencies listed in the Project.toml
and Manifest.toml
files, you’ll have an exact copy of the project environment sent to you.
NOTE: You can also add
[compat]
bounds in theProject.toml
to specify which package versions your project environment is compatible with. This is an advanced-user functionality which we will not cover. Take a look at thePkg.jl
standard library module documentation on compatibility. For people new to Julia, we recommend sharing bothProject.toml
andManifest.toml
for a fully reproducible environment.