{
"cells": [
{
"cell_type": "markdown",
"id": "b53a7b12-538d-4459-b82a-a35c8c417849",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"id": "ae497b71-bc43-471e-8970-88a1878e7cf9",
"metadata": {},
"source": [
"# Fundamentals of Accelerated Data Science # "
]
},
{
"cell_type": "markdown",
"id": "a149b6d1-1880-4a5d-9d71-f963d3097aa4",
"metadata": {},
"source": [
"## 06 - Data Visualization ##\n",
"\n",
"**Table of Contents**\n",
"
\n",
"This notebook demonstrates the basics of data visualization for large datasets. This notebook covers the below sections: \n",
"1. [Data Visualization](#Data-Visualization)\n",
"2. [Bar Chart](#Bar-Chart)\n",
" * [Histogram](#Histogram)\n",
" * [Exercise #1 - Bar Chart](#Exercise-#1---Bar-Chart)\n",
"3. [Scatter Plot](#Scatter-Plot)\n",
"4. [Line Chart](#Line-Chart)\n",
"5. [Datashader](#Datashader)\n",
" * [Datashader Accelerated by GPU](#Datashader-Accelerated-by-GPU)\n",
"6. [Interactive Visualization](#Interactive-Visualization)\n",
" * [cuxfilter and Dashboard](#cuxfilter-and-Dashboard)\n",
"6. [Other Libraries](#Other-Libraries)"
]
},
{
"cell_type": "markdown",
"id": "39f0f08f-92a2-4bfc-b8bc-5904aa70b5fc",
"metadata": {},
"source": [
"## Data Visualization ##\n",
"Data visualization is an important part of data science for several reasons: \n",
"* **Data exploration**: enables data scientists to explore data and quickly identify patterns, trends, and outliers that may not be apparent when looking at raw data in tabular format\n",
"* **Interpretation**: transforms large and complex datasets into more digestible visual formats, making it easier to comprehend vast amounts of information\n",
"* **Communication**: helps data scientists communicate complex insights to stakeholders in an easy-to-understand visual format, making data more accessible to non-technical audiences\n",
"\n",
"Below is the simple dashboard we will create in this notebook: \n",
"\n",
"

| \n", " | age | \n", "sex | \n", "county | \n", "lat | \n", "long | \n", "name | \n", "
|---|---|---|---|---|---|---|
| 0 | \n", "0 | \n", "m | \n", "DARLINGTON | \n", "54.533638 | \n", "-1.524400 | \n", "FRANCIS | \n", "
| 1 | \n", "0 | \n", "m | \n", "DARLINGTON | \n", "54.426254 | \n", "-1.465314 | \n", "EDWARD | \n", "
| 2 | \n", "0 | \n", "m | \n", "DARLINGTON | \n", "54.555199 | \n", "-1.496417 | \n", "TEDDY | \n", "
| 3 | \n", "0 | \n", "m | \n", "DARLINGTON | \n", "54.547909 | \n", "-1.572342 | \n", "ANGUS | \n", "
| 4 | \n", "0 | \n", "m | \n", "DARLINGTON | \n", "54.477638 | \n", "-1.605995 | \n", "CHARLIE | \n", "
| \n", " | age | \n", "sex | \n", "county | \n", "lat | \n", "long | \n", "name | \n", "age_bucket | \n", "
|---|---|---|---|---|---|---|---|
| 43400796 | \n", "40 | \n", "f | \n", "NOTTINGHAMSHIRE | \n", "53.054501 | \n", "-0.943481 | \n", "ZOYA | \n", "3 | \n", "
| 32505888 | \n", "10 | \n", "f | \n", "NOTTINGHAMSHIRE | \n", "52.995071 | \n", "-0.938463 | \n", "LINA | \n", "0 | \n", "
| 33512940 | \n", "13 | \n", "f | \n", "LEICESTERSHIRE | \n", "52.967442 | \n", "-1.849579 | \n", "TEYANA | \n", "1 | \n", "
| 675372 | \n", "1 | \n", "m | \n", "SOMERSET | \n", "51.138855 | \n", "-2.919084 | \n", "HARRISON | \n", "0 | \n", "
| 21024733 | \n", "56 | \n", "m | \n", "BURY | \n", "53.628117 | \n", "-2.325484 | \n", "JACK | \n", "5 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 13618682 | \n", "36 | \n", "m | \n", "WALTHAM FOREST | \n", "51.599720 | \n", "-0.043070 | \n", "ETHAN | \n", "3 | \n", "
| 11029315 | \n", "30 | \n", "m | \n", "SALFORD | \n", "53.466831 | \n", "-2.422284 | \n", "WILLIAM | \n", "2 | \n", "
| 56802229 | \n", "80 | \n", "f | \n", "NOTTINGHAM | \n", "52.972649 | \n", "-1.233015 | \n", "ERIN | \n", "7 | \n", "
| 29592752 | \n", "2 | \n", "f | \n", "SALFORD | \n", "53.518764 | \n", "-2.454344 | \n", "SIRAT | \n", "0 | \n", "
| 33585530 | \n", "13 | \n", "f | \n", "HERTFORDSHIRE | \n", "51.604103 | \n", "-0.386183 | \n", "MATILDA | \n", "1 | \n", "
1000 rows × 7 columns
\n", "

| \n", " | age | \n", "sex | \n", "county | \n", "lat | \n", "long | \n", "name | \n", "
|---|---|---|---|---|---|---|
| 0 | \n", "0 | \n", "m | \n", "DARLINGTON | \n", "54.533646 | \n", "-1.524401 | \n", "FRANCIS | \n", "
| 1 | \n", "0 | \n", "m | \n", "DARLINGTON | \n", "54.426254 | \n", "-1.465314 | \n", "EDWARD | \n", "
| 2 | \n", "0 | \n", "m | \n", "DARLINGTON | \n", "54.555199 | \n", "-1.496417 | \n", "TEDDY | \n", "
| 3 | \n", "0 | \n", "m | \n", "DARLINGTON | \n", "54.547905 | \n", "-1.572341 | \n", "ANGUS | \n", "
| 4 | \n", "0 | \n", "m | \n", "DARLINGTON | \n", "54.477638 | \n", "-1.605994 | \n", "CHARLIE | \n", "
<xarray.DataArray (lat: 600, long: 600)> Size: 1MB\n",
"array([[0, 0, 0, ..., 0, 0, 0],\n",
" [0, 0, 0, ..., 0, 0, 0],\n",
" [0, 0, 0, ..., 0, 0, 0],\n",
" ...,\n",
" [0, 0, 0, ..., 0, 0, 0],\n",
" [0, 0, 0, ..., 0, 0, 0],\n",
" [0, 0, 0, ..., 0, 0, 0]], dtype=uint32)\n",
"Coordinates:\n",
" * long (long) float64 5kB -6.361 -6.346 -6.331 ... 2.662 2.677 2.693\n",
" * lat (lat) float64 5kB 49.52 49.54 49.55 49.56 ... 56.23 56.24 56.26\n",
"Attributes:\n",
" x_range: (-6.368374347686768, 2.7000913619995117)\n",
" y_range: (49.519046783447266, 56.261409759521484)| \n", " | age | \n", "sex | \n", "county | \n", "lat | \n", "long | \n", "name | \n", "
|---|---|---|---|---|---|---|
| 0 | \n", "0 | \n", "m | \n", "DARLINGTON | \n", "54.533638 | \n", "-1.524400 | \n", "FRANCIS | \n", "
| 1 | \n", "0 | \n", "m | \n", "DARLINGTON | \n", "54.426254 | \n", "-1.465314 | \n", "EDWARD | \n", "
| 2 | \n", "0 | \n", "m | \n", "DARLINGTON | \n", "54.555199 | \n", "-1.496417 | \n", "TEDDY | \n", "
| 3 | \n", "0 | \n", "m | \n", "DARLINGTON | \n", "54.547909 | \n", "-1.572342 | \n", "ANGUS | \n", "
| 4 | \n", "0 | \n", "m | \n", "DARLINGTON | \n", "54.477638 | \n", "-1.605995 | \n", "CHARLIE | \n", "
<xarray.DataArray (lat: 600, long: 600)> Size: 1MB\n",
"array([[0, 0, 0, ..., 0, 0, 0],\n",
" [0, 0, 0, ..., 0, 0, 0],\n",
" [0, 0, 0, ..., 0, 0, 0],\n",
" ...,\n",
" [0, 0, 0, ..., 0, 0, 0],\n",
" [0, 0, 0, ..., 0, 0, 0],\n",
" [0, 0, 0, ..., 0, 0, 0]], dtype=uint32)\n",
"Coordinates:\n",
" * long (long) float64 5kB -6.361 -6.346 -6.331 ... 2.662 2.677 2.693\n",
" * lat (lat) float64 5kB 49.52 49.54 49.55 49.56 ... 56.23 56.24 56.26\n",
"Attributes:\n",
" x_range: (-6.368374, 2.7000911)\n",
" y_range: (49.51904, 56.26141)