Next Generation Data Science Education

pyOpenSci Fall Festival 2024

James Balamuta

November 1, 2024

> Who are you?_

Classic hello greeting sticker containing the text "Hello, my name is James"

Who am I?

Photo of Dr. James Balamuta flying a drone next to the Alma mater

Dr. James Balamuta (he/him)

Learning the Unknown

Learning Laboratory

“It shouldn’t be a static thing; it should be one where people learn what’s happening. And the only way to learn what’s happening is to change what’s happening.

Frank Oppenheimer in “Exploratorium” by Jon Boorstin (1974) at ~13:22

Main entrance to the Exploratorium at Pier 15

Exploratorium at Pier 15 in San Francisco
  1. Direct manipulation of concepts
  2. Immediate visual feedback
  3. Personal discovery
  4. Enhanced retention

Active Learning

  • What is Active Learning?
    • A pedagogical approach that engages students in the learning process through activities and/or discussion.
  • Why Active Learning?
    • Increases student engagement
    • Improves retention
    • Enhances understanding
  • How to Implement Active Learning?
    • Explorable Explanations

Explorable Explanations

“People currently think of text as information to be consumed. I want text to be used as an environment to think in.”

– Bret Victor in Explorable Explanations, 2011

  • 🔄 Reactive Documents: Play with authors’ assumptions and see consequences
  • 🎮 Interactive Examples: Make abstract concepts concrete through direct manipulation
  • 🔍 Contextual Information: Verify claims and explore related ideas in real-time

Attempts, I’ve had a few …

Tweet from James Balamuta on the challenges of using learnr with Shiny Server

Me in ~2018 thinking learnr + Shiny Server + 100 students = 😱

Why? Unstable at scale and required a dedicated + licensed server.

Explorable Environments

Three key pieces of technology:

  1. Pyodide: Python in the Browser without a server
  2. Observable: Interactive JavaScript for Data Exploration
  3. Quarto Live: Official Quarto Extension for Interactivity in Notebooks

A bonus piece of technology is Quarto Drop: In-slide IDE, press the tilda ` key to open.

🐍 Python in the Browser

01:30

Say hello to Pyodide through Quarto Live.

```{pyodide}
#| exercise: ex_basic
# Try different values in the list comprehensions
squares = [x**2 for x in range(____)]
squares
```
1
Defines a Quarto Live cell attribute specifying this as an exercise named ex_basic
2
Provides a comment guiding the user to experiment with different values
3
Creates a list comprehension that squares numbers from 0 to a value the user needs to fill in
4
Displays the resulting squares list
```{pyodide}
#| exercise: ex_basic
#| check: true
feedback = None
if len(result) > 5:
 feedback = {"correct": True,
 "message": "Great! You created a longer sequence!"}
else:
 feedback = {"correct": False,
 "message": "Try using a larger number in range()"}
feedback
```
1
Defines a Quarto Live cell attribute specifying this as part of the ex_basic exercise
2
Specifies that this cell checks or “grades” the exercise result
3
Initializes a feedback variable as being empty
4
Checks if the length of the result is greater than 5, if true set feedback to indicate correct answer
5
Sets feedback to indicate incorrect answer and provide hint
6
Displays the feedback dictionary

Explorable Mathematics

Exploring the classic equation of a line: \(y = ax + b\)

\(a =\) and \(b =\) ,

```{md}
$a =$ `{{ojs}} aParam` and $b =$ `{{ojs}} bParam`,
```
1
Displays the current values of parameters a and b using Observable JS variables in a math context
```{ojs}
//| echo: false
import {Tangle} from "@mbostock/tangle"
// Setup Tangle reactive inputs
viewof a = Inputs.input(1);
viewof b = Inputs.input(0);
aParam = Inputs.bind(Tangle({min: -5, max: 5, minWidth: "1em", step: 0.1}), viewof a);
bParam = Inputs.bind(Tangle({min: -10, max: 10, minWidth: "1em", step: 0.2}), viewof b);
```
1
Hides the code cell from output
2
Imports the Tangle library for interactive inputs
3
Comment indicating setup of reactive inputs
4
Creates an input view for parameter ‘a’ with initial value 1
5
Creates an input view for parameter ‘b’ with initial value 0
6
Binds parameter ‘a’ to a Tangle input with range [-5,5] and step size 0.1
7
Binds parameter ‘b’ to a Tangle input with range [-10,10] and step size 0.2
```{pyodide}
#| echo: false
#| autorun: true
#| input:
#| - a
#| - b
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 100)
y = a * x + b
plt.figure(figsize=(8, 4))
plt.plot(x, y)
plt.grid(True)
plt.title(f'Linear Function: y = {a}x + {b}')
plt.show()
```
1
Hides the code cell from output
2
Automatically runs the cell when inputs change
3
Specifies input parameters
4
First input parameter ‘a’
5
Second input parameter ‘b’
6
Imports NumPy for numerical operations
7
Imports Matplotlib for plotting
8
Creates x-values array from 0 to 10 with 100 points
9
Calculates y-values using linear function ax + b
10
Creates a new figure with specified size
11
Plots x versus y values
12
Adds grid to the plot
13
Sets the plot title showing the current function
14
Displays the plot

Reactive Programming

Woah? What’s this?

Explorables

Explorable Graphs: Histogram Bins

```{ojs}
//| echo: false
viewof n_bins = Inputs.range([5, 1000], {
step: 1,
value: 20,
label: "Number of bins:"
})
```
1
Hides the code cell from output
2
Creates an interactive range slider for number of bins
3
Sets the increment step size to 1
4
Sets the initial value to 20 bins
5
Adds a label to describe the input control
```{pyodide}
#| autorun: true
#| echo: false
#| input:
#| - n_bins
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Generate random data
rng = np.random.RandomState(2021)
data = rng.normal(0, 1, 1000)
plt.figure(figsize=(8, 4))
plt.hist(data, bins=n_bins)
plt.title(f'Normal Distribution with {n_bins} bins')
plt.show()
```
1
Automatically runs the cell when the input changes
2
Prevents the code cell from being shown
3
Specifies input parameters section
4
Declares n_bins as an input parameter
5
Imports NumPy for numerical operations
6
Imports Matplotlib for plotting
7
Imports Seaborn for statistical visualization
8
Comments the data generation step
9
Generates 1000 random numbers from a normal distribution
10
Creates a new figure with specified size
11
Creates a histogram with user-specified number of bins
12
Sets the plot title showing current number of bins
13
Displays the plot

Explorable Biostatistics: Contigency

Drag the numbers to adjust the cell values and see how they affect the odds ratio:

Exposed Not Exposed Total
Cases
Controls
Total

Calculations:

```{ojs}
//| echo: false
// Initialize the 2x2 table cells with Tangle inputs
viewof a11 = Inputs.input(20);
viewof a12 = Inputs.input(15);
viewof a21 = Inputs.input(10);
viewof a22 = Inputs.input(25);
// Create Tangle bindings for each cell
cell11 = Inputs.bind(Tangle({min: 0, max: 100, minWidth: "3em", step: 1}), viewof a11);
cell12 = Inputs.bind(Tangle({min: 0, max: 100, minWidth: "3em", step: 1}), viewof a12);
cell21 = Inputs.bind(Tangle({min: 0, max: 100, minWidth: "3em", step: 1}), viewof a21);
cell22 = Inputs.bind(Tangle({min: 0, max: 100, minWidth: "3em", step: 1}), viewof a22);
// Calculate row and column totals
row1_total = a11 + a12
row2_total = a21 + a22
col1_total = a11 + a21
col2_total = a12 + a22
table_total = row1_total + row2_total
```
1
Hides the code cell from output
2
Comment indicating initialization of table cell inputs
3
Creates input view for cell (1,1) with initial value 20
4
Creates input view for cell (1,2) with initial value 15
5
Creates input view for cell (2,1) with initial value 10
6
Creates input view for cell (2,2) with initial value 25
7
Comment indicating creation of Tangle bindings
8
Binds cell (1,1) to Tangle input with range [0,100]
9
Binds cell (1,2) to Tangle input with range [0,100]
10
Binds cell (2,1) to Tangle input with range [0,100]
11
Binds cell (2,2) to Tangle input with range [0,100]
12
Comment indicating calculation of totals
13
Calculates first row total
14
Calculates second row total
15
Calculates first column total
16
Calculates second column total
17
Calculates overall table total
```{pyodide}
#| autorun: true
#| echo: false
#| input:
#| - a11
#| - a12
#| - a21
#| - a22
import numpy as np
from scipy import stats
def analyze_contingency(a11, a12, a21, a22):
 # Create contingency table
 table = np.array([[a11, a12], [a21, a22]])
 # Calculate odds ratio
 odds_ratio = (a11 * a22) / (a12 * a21)
 # Calculate 95% CI for odds ratio
 log_or = np.log(odds_ratio)
 se_log_or = np.sqrt(1/a11 + 1/a12 + 1/a21 + 1/a22)
 ci_lower = np.exp(log_or - 1.96 * se_log_or)
 ci_upper = np.exp(log_or + 1.96 * se_log_or)
 # Perform chi-square test
 chi2, p_value = stats.chi2_contingency(table)[0:2]
 return {
 'odds_ratio': odds_ratio,
 'ci_lower': ci_lower,
 'ci_upper': ci_upper,
 'chi2': chi2,
 'p_value': p_value
 }
results = analyze_contingency(a11, a12, a21, a22)
print(f"Odds Ratio: {results['odds_ratio']:.2f}")
print(f"95% CI: ({results['ci_lower']:.2f}, {results['ci_upper']:.2f})")
print(f"Chi-square statistic: {results['chi2']:.2f}")
print(f"P-value: {results['p_value']:.4f}")
```
1
Automatically runs the cell when inputs change
2
Hides the code cell from output
3
Specifies input parameters section
4
First input parameter a11
5
Second input parameter a12
6
Third input parameter a21
7
Fourth input parameter a22
8
Imports NumPy for numerical operations
9
Imports scipy.stats for statistical tests
10
Defines function to analyze contingency table
11
Comments contingency table creation
12
Creates 2x2 contingency table as NumPy array
13
Comments odds ratio calculation
14
Calculates odds ratio
15
Comments confidence interval calculation
16
Calculates log odds ratio
17
Calculates standard error of log odds ratio
18
Calculates lower confidence interval bound
19
Calculates upper confidence interval bound
20
Comments chi-square test
21
Performs chi-square test and extracts statistics
22
Begins return dictionary
23
Includes odds ratio in results
24
Includes lower CI bound in results
25
Includes upper CI bound in results
26
Includes chi-square statistic in results
27
Includes p-value in results
28
Calls analysis function with current values
29
Prints formatted odds ratio
30
Prints formatted confidence interval
31
Prints formatted chi-square statistic
32
Prints formatted p-value

Explorable Physics: Pendulum motion

```{ojs}
//| echo: false
viewof length = Inputs.range([1, 10], {
step: 0.1,
value: 5,
label: "Pendulum Length (m)"
})
viewof gravity = Inputs.range([1, 15], {
step: 0.1,
value: 9.8,
label: "Gravity (m/s²)"
})
```
1
Creates a range slider for pendulum length between 1 and 10 meters
2
Sets length increment step size to 0.1 meters
3
Sets initial length value to 5 meters
4
Adds label specifying length units
5
Creates a range slider for gravity between 1 and 15 m/s²
6
Sets gravity increment step size to 0.1 m/s²
7
Sets initial gravity value to 9.8 m/s² (Earth’s gravity)
8
Adds label specifying gravity units
```{pyodide}
#| echo: false
#| autorun: true
#| input:
#| - length
#| - gravity
import numpy as np
import matplotlib.pyplot as plt
# Calculate period
T = 2 * np.pi * np.sqrt(length/gravity)
t = np.linspace(0, T*2, 1000)
theta = 0.5 * np.sin(np.sqrt(gravity/length) * t)
plt.figure(figsize=(10, 4))
plt.plot(t, theta)
plt.title(f'Pendulum Motion (Period: {T:.2f} s)')
plt.xlabel('Time (s)')
plt.ylabel('Angle (rad)')
plt.grid(True)
plt.show()
```
1
Hides the code cell from output
2
Automatically runs the cell when inputs change
3
Specifies input parameters section
4
Declares length as first input parameter
5
Declares gravity as second input parameter
6
Imports NumPy for mathematical operations
7
Imports Matplotlib for plotting
8
Comments period calculation
9
Calculates pendulum period using formula T = 2π√(L/g)
10
Creates time array for two periods of motion
11
Calculates angle using small-angle approximation
12
Creates new figure with specified size
13
Plots time vs angle
14
Sets title showing current period
15
Labels x-axis with units
16
Labels y-axis with units
17
Adds grid to plot
18
Displays the plot

Explorable ML: k-means

```{ojs}
//| echo: false
viewof n_clusters = Inputs.range([2, 8], {
step: 1,
value: 3,
label: "Number of clusters:"
})
```
1
Creates an interactive range slider for the number of clusters between 2 and 8
2
Sets the increment step size to 1
3
Sets the initial value to 3 clusters
4
Adds a descriptive label for the input control
```{pyodide}
#| autorun: true
#| echo: false
#| input:
#| - n_clusters
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
# Generate sample data
X, * = make*blobs(n_samples=300, centers=4, random_state=42)
# Perform k-means clustering
kmeans = KMeans(n_clusters=n_clusters, random_state=42)
labels = kmeans.fit_predict(X)
# Plot results
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0],
 kmeans.cluster_centers_[:, 1],
 marker='x', s=200, linewidths=3,
 color='r', label='Centroids')
plt.title(f'K-means Clustering with {n_clusters} clusters')
plt.legend()
plt.show()
```
1
Automatically runs the cell when input changes
2
Hides the code cell from output
3
Specifies input parameters section
4
Declares n_clusters as an input parameter
5
Imports make_blobs for generating synthetic data
6
Imports KMeans clustering algorithm
7
Imports Matplotlib for plotting
8
Comments data generation step
9
Creates synthetic dataset with 300 samples and 4 centers
10
Comments clustering step
11
Initializes KMeans with user-specified number of clusters
12
Fits model and predicts cluster labels
13
Comments plotting section
14
Creates new figure with specified size
15
Plots data points colored by cluster assignment
16
Begins centroid plotting - x coordinates
17
Continues centroid plotting - y coordinates
18
Sets centroid marker style and size
19
Sets centroid color and label
20
Sets plot title showing current number of clusters
21
Adds legend to plot
22
Displays the plot

Explorable Data Sets

```{ojs}
//| echo: false
viewof selected_species = Inputs.select(
 ["Adelie", "Gentoo", "Chinstrap", "All"],
 { value: "All", label: "Highlight Species:" }
)
```
1
Hides the code cell from output
2
Creates a dropdown select input for penguin species
3
Lists available species options including “All”
4
Sets default value to “All” and adds descriptive label
```{pyodide}
#| fig-height: 5
#| echo: false
#| autorun: true
#| input:
#| - selected_species
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load penguins data
penguins = pd.read_csv('https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv')
penguins = penguins.dropna()
# Create plot
plt.figure(figsize=(10, 6))
# Set alpha (transparency) based on selection
penguins['alpha'] = 0.2 # default low transparency
if selected_species == "All":
 penguins['alpha'] = 0.7 # all visible
else:
 penguins.loc[penguins['species'] == selected_species, 'alpha'] = 0.9 # highlight selected
# Create scatter plot
for species in penguins['species'].unique():
 mask = penguins['species'] == species
 plt.scatter(
 penguins.loc[mask, 'bill_length_mm'],
 penguins.loc[mask, 'bill_depth_mm'],
 label=species,
 alpha=penguins.loc[mask, 'alpha'],
 s=100
 )
plt.title('Palmer Penguins: Bill Dimensions by Species')
plt.xlabel('Bill Length (mm)')
plt.ylabel('Bill Depth (mm)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
```
1
Sets figure height to 5 units
2
Hides code from output
3
Automatically runs when input changes
4
Specifies input parameters section
5
Declares selected_species as input
6
Imports pandas for data handling
7
Imports seaborn for statistical visualization
8
Imports matplotlib for plotting
9
Comments data loading section
10
Loads penguins dataset
11
Removes rows with missing values
12
Comments plot creation section
13
Creates new figure with specified size
14
Comments transparency section
15
Sets default low transparency for all points
16
Checks if all species are selected
17
Sets medium transparency for all points if “All” selected
18
Starts else block for specific species selection
19
Sets high transparency for selected species only
20
Comments scatter plot section
21
Iterates through unique species
22
Creates boolean mask for current species
23
Begins scatter plot for current species
24
Sets x-coordinates (bill length)
25
Sets y-coordinates (bill depth)
26
Labels points by species
27
Sets transparency based on selection
28
Sets point size
29
Sets plot title
30
Labels x-axis with units
31
Labels y-axis with units
32
Adds legend
33
Adds light grid
34
Displays the plot

Explorable Data Clusters

```{ojs}
//| echo: false
viewof feature1 = Inputs.select(
 ['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g'],
 { value: 'bill_length_mm', label: 'X-axis feature:' }
)
viewof feature2 = Inputs.select(
 ['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g'],
 { value: 'bill_depth_mm', label: 'Y-axis feature:' }
)
viewof n_clusters_penguins = Inputs.range([2, 5], {
step: 1,
value: 3,
label: "Number of clusters:"
})
```
1
Hides the code cell from output
2
Creates first dropdown for X-axis feature selection
3
Lists available penguin measurements for X-axis
4
Sets default X-axis to bill length and adds label
5
Creates second dropdown for Y-axis feature selection
6
Lists available penguin measurements for Y-axis
7
Sets default Y-axis to bill depth and adds label
8
Creates range slider for number of clusters between 2 and 5
9
Sets increment step size to 1
10
Sets initial number of clusters to 3
11
Adds label for cluster selection
```{pyodide}
#| autorun: true
#| echo: false
#| input:
#| - feature1
#| - feature2
#| - n_clusters_penguins
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
# Load and prepare data
penguins = pd.read_csv('https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv')
penguins = penguins.dropna()
# Select and scale features
X = penguins[[feature1, feature2]]
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Perform clustering
kmeans = KMeans(n_clusters=n_clusters_penguins, random_state=42)
clusters = kmeans.fit_predict(X_scaled)
# Create plot
plt.figure(figsize=(10, 6))
scatter = plt.scatter(penguins[feature1], penguins[feature2],
 c=clusters, cmap='viridis',
 alpha=0.6, s=100)
# Add cluster centers
centers_orig = scaler.inverse_transform(kmeans.cluster_centers_)
plt.scatter(centers_orig[:, 0], centers_orig[:, 1],
 c='red', marker='x', s=200, linewidths=3,
 label='Cluster Centers')
plt.title(f'Palmer Penguins Clusters\n{feature1} vs {feature2}')
plt.xlabel(feature1.replace('_', ' ').title())
plt.ylabel(feature2.replace('_', ' ').title())
plt.legend(*scatter.legend_elements(), title="Clusters")
plt.grid(True, alpha=0.3)
plt.show()
# Print cluster sizes
cluster_sizes = pd.Series(clusters).value_counts().sort_index()
print("\nCluster sizes:")
for i, size in enumerate(cluster_sizes):
 print(f"Cluster {i}: {size} penguins")
```
1
Automatically runs when inputs change
2
Hides code from output
3
Specifies input parameters section
4
Declares first feature as input
5
Declares second feature as input
6
Declares number of clusters as input
7
Imports pandas for data handling
8
Imports matplotlib for plotting
9
Imports KMeans clustering algorithm
10
Imports StandardScaler for feature scaling
11
Comments data loading section
12
Loads penguins dataset
13
Removes rows with missing values
14
Comments feature preparation section
15
Selects user-specified features
16
Initializes scaler object
17
Scales selected features
18
Comments clustering section
19
Initializes KMeans with user-specified clusters
20
Performs clustering and gets labels
21
Comments plot creation section
22
Creates new figure with specified size
23
Creates scatter plot of selected features
24
Colors points by cluster and sets colormap
25
Sets transparency and point size
26
Comments center plotting section
27
Transforms cluster centers back to original scale
28
Plots cluster centers
29
Sets center marker style and size
30
Adds label for centers
31
Sets plot title with feature names
32
Sets x-axis label
33
Sets y-axis label
34
Adds cluster legend
35
Adds light grid
36
Displays plot
37
Comments cluster size section
38
Calculates size of each cluster
39
Prints header for cluster sizes
40
Iterates through clusters
41
Prints size of each cluster

Explorable Structured Programming

```{ojs}
viewof operation = Inputs.select(
 ['square', 'cube', 'double', 'half'],
 {value: 'square', label: "Operation: "}
)
viewof range_end = Inputs.range([1, 20], {
step: 1,
value: 5,
label: "Range end: "
})
```
1
Creates a dropdown select input for mathematical operations
2
Defines the list of available operations
3
Sets initial value to ‘square’ and adds a label
4
Creates a range slider for the end value between 1 and 20
5
Sets the increment step size to 1
6
Sets the initial value to 5
7
Adds a descriptive label for the range input
```{pyodide}
#| autorun: true
#| edit: false
#| echo: false
#|
#| input:
#| - operation
#| - range_end
operations = {
 'square': lambda x: x**2,
 'cube': lambda x: x**3,
 'double': lambda x: x*2,
 'half': lambda x: x/2
}
result = [operations[operation](x) for x in range(range_end)]
print(f"Python code: [operations['{operation}'](x) for x in range({range_end})]")
print(f"Result: {result}")
```
1
Automatically runs the cell when inputs change
2
Prevents editing of the code cell
3
Hides the code cell from output
4
Empty line in YAML header
5
Specifies input parameters section
6
Declares operation as first input parameter
7
Declares range_end as second input parameter
8
Creates dictionary of mathematical operations
9
Defines square operation as lambda function
10
Defines cube operation as lambda function
11
Defines double operation as lambda function
12
Defines half operation as lambda function
13
Applies selected operation to range using list comprehension
14
Prints the Python code being executed
15
Prints the resulting list after applying the operation

Explorable Unstructured Programming

02:00
```{pyodide}
# Let's classify the weather
temperature = 101
if temperature > 90:
 print('It is hot outside')
elif temperature > 70:
 print('It is warm outside')
elif weather > 50:
 print('It is cool outside')
else:
 print('Wear a parka')
```
1
Comments describing the purpose of the code
2
Initializes temperature variable with value 101
3
First condition: checks if temperature is above 90
4
Prints hot weather message if first condition is true
5
Second condition: checks if temperature is above 70 (if first condition was false)
6
Prints warm weather message if second condition is true
7
Third condition: checks if weather is above 50 (if first and second conditions were false)
8
Prints cool weather message if third condition is true
9
Else block for when all conditions are false (temperature <= 50)
10
Prints cold weather message if all conditions are false

Exercising the Mind

Filter the penguins dataset to show only Gentoo penguins with body mass greater than 5000g:

Use boolean indexing with & for combining conditions:

df[(condition1) & (condition2)]

Use boolean indexing with & for combining conditions:

penguins[(penguins['species'] == 'Gentoo') & (penguins['body_mass_g'] > 5000)]
```{pyodide}
#| setup: true
#| exercise: ex_penguins
#| echo: false
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
# Load and prepare data
penguins = pd.read_csv('https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv')
penguins = penguins.dropna()
```
1
Marks this as a setup cell to run before the exercise
2
Associates this cell with the ‘ex_penguins’ exercise
3
Hides the code cell from output
4
Imports pandas for data manipulation
5
Imports numpy for numerical operations
6
Imports StandardScaler from scikit-learn
7
Comments data loading step
8
Loads penguins dataset from URL
9
Removes rows with missing values
```{pyodide}
#| exercise: ex_penguins
filtered_df = penguins[______]
filtered_df
```
1
Associates this cell with the ‘ex_penguins’ exercise
2
Creates placeholder for filtering conditions
3
Displays the filtered dataframe
```{pyodide}
#| exercise: ex_penguins
#| check: true
correct_answer = penguins[(penguins['species'] == 'Gentoo') & (penguins['body_mass_g'] > 5000)]
if len(result) == len(correct_answer) and all(result.index == correct_answer.index):
 feedback = {
 "correct": True,
 "message": "Perfect! You correctly filtered for Gentoo penguins over 5000g."
 }
else:
 feedback = {
 "correct": False,
 "message": "Not quite. Make sure you're using both conditions: species=='Gentoo' and body_mass_g>5000"
 }
feedback
```
1
Associates this cell with the ‘ex_penguins’ exercise
2
Marks this as a check cell for verification
3
Creates correct answer using both filtering conditions
4
Checks if result matches correct answer in length and indices
5
Creates feedback dictionary for correct answer
6
Sets correct status to True
7
Provides success message
8
Starts else block for incorrect answers
9
Creates feedback dictionary for incorrect answer
10
Sets correct status to False
11
Provides hint message for incorrect answer
12
Returns feedback dictionary

Explorable Design Patterns

  1. Progressive Disclosure
    • Start simple
    • Add complexity gradually
    • Layer concepts
  2. Direct Manipulation
    • Interactive variables
    • Real-time updates
    • Visual feedback
  3. Multiple Representations
    • Code
    • Visualizations
    • Numerical output

Best Practices

  1. Clear Relationships
    • Show how variables affect outcomes
    • Highlight connections
    • Demonstrate causality
  2. Bounded Exploration
    • Set meaningful limits
    • Prevent invalid states
    • Guide discovery
  3. Multiple Entry Points
    • Different learning styles
    • Various complexity levels
    • Multiple paths to understanding

Demos

Implementation Tips

Installation

  1. Install Quarto: https://quarto.org

  2. Install Quarto Live & Drop Extensions

    # Install Quarto Extensions
    quarto add r-wasm/quarto-live
    quarto add r-wasm/quarto-drop

Quarto extensions are installed only in the current project scope. For each new project, you will need to install the extensions again.

Document Setup

The pyodide section is used to specify Python packages to load in the WebAssembly environment. This ensures that the required packages are available for Python code execution before the code cells are unlocked for students to explore.

---
format: 
    live-revealjs:
        scrollable: true
        smaller: true
        pyodide:
            packages: ['numpy', 'matplotlib']
---

Code Cells

Tangle Integration

```{ojs}
import {Tangle} from "@mbostock/tangle"
viewof var = Inputs.range([min, max], {
    step: 0.1,
    value: initial,
    label: "Variable:"
})
varTangle = Inputs.bind(Tangle({min: min, max: max, step: 0.1}), viewof var)
```

Python Integration

```{pyodide}
#| input:
#|   - var
# Your Python code using var
```

And that in-slide IDE …

  • The quarto-drop extension uses the drop key to specify options for the in-slide IDE for students to interact with the code cells.
  • While quarto-live uses the pyodide section to specify WebAssembly environment.

Document Header Setup

---
format: 
    live-revealjs:
        scrollable: true
        smaller: true
        drop: 
            engine: pyodide
            packages: ['matplotlib', 'numpy', 'pandas', 'seaborn']
        pyodide:
            packages: ['matplotlib', 'numpy', 'pandas', 'seaborn']
revealjs-plugins:
- drop
---

Concluding

Some Limitations

  • Technical Constraints
    • Not all Python/R packages work in WebAssembly
    • Browser memory limits affect large computations
    • Initial load times can be significant
    • File system access is restricted
  • Development
    • Debugging is more challenging
    • Error messages may be less helpful
    • Package versions must be WebAssembly-compatible
  • Interactive Elements
    • Limited widget support compared to Jupyter

Resources

Thank You!

Questions?

Keep in touch: