Skip to contents

Run this setup first. It loads the packages used in the article and sets figure options.

library(windcut)
library(dplyr)
library(ggplot2)
library(tidyr)
library(cowplot)

Why window-pane?

Plant disease symptoms are often influenced by weather near a biological event. The event may be planting, flowering, inoculation, disease assessment, or another reference date. The hard part is that the most informative period is usually unknown. A window-pane workflow solves this by generating many candidate relative-time windows, summarizing weather inside each one, and comparing the resulting features.

Main idea. Choose a reference date, decide whether candidate windows should be before, after, or around that date, and summarize weather inside each candidate window.

The reference date does not have to be the disease assessment date. It can also be planting, flowering, inoculation, harvest, or any timestamp that makes sense for the biology of the system.

Step 1: Load bundled site-level weather and disease data

This tutorial starts from a bundled demo dataset with 10 sites. Each site has one daily weather time series and one disease assessment. That structure mirrors the modeling unit used by windcut: one row of disease data is paired with the weather history from the same site.

data(window_pane_demo_data)

weather <- window_pane_demo_data$weather
assessments <- window_pane_demo_data$assessments

nrow(weather)
#> [1] 1800
knitr::kable(assessments)
site_id assessment_id assessment_time response_type disease_intensity planting_time
S01 S01 2024-05-18 percent 75.2 2024-02-14
S02 S02 2024-05-07 percent 59.2 2024-02-03
S03 S03 2024-05-20 percent 53.9 2024-02-19
S04 S04 2024-04-12 percent 71.7 2024-01-16
S05 S05 2024-04-29 percent 80.9 2024-02-02
S06 S06 2024-04-15 percent 87.2 2024-01-14
S07 S07 2024-04-25 percent 84.0 2024-01-31
S08 S08 2024-04-23 percent 83.2 2024-01-22
S09 S09 2024-04-13 percent 59.4 2024-01-12
S10 S10 2024-05-05 percent 72.6 2024-02-10

The weather object is already daily. If your own data are hourly, use aggregate_weather_daily() before this step and choose the daily statistics that match each weather variable. Daily column names keep the statistic in the name, such as daily_mean_temp and daily_sum_rain.

The site_id column is important because it lets window_pane() match each assessment to the correct weather series. The assessment_time and planting_time columns are both valid reference dates, but they answer different questions.

Step 2: Visualize the possible reference dates

Before defining windows, it helps to see what the reference dates mean on the weather timeline. The plot below uses one site and draws two vertical reference lines: planting and assessment. Later, the same relative-time grid will be placed relative to either one of these columns by changing reference_col.

example_site <- assessments$site_id[1]
example_assessment <- assessments %>% filter(site_id == example_site)
example_weather <- weather %>% filter(site_id == example_site)

plot_start <- example_assessment$planting_time - 7 * 86400
plot_end <- example_assessment$assessment_time + 1 * 86400
plot_weather <- example_weather %>%
  filter(time >= plot_start, time <= plot_end)

reference_dates <- data.frame(
  reference_col = c("planting_time", "assessment_time"),
  reference_time = c(
    example_assessment$planting_time,
    example_assessment$assessment_time
  )
)

ggplot(plot_weather, aes(time, daily_mean_temp)) +
  geom_line(color = "#3f7d58", linewidth = 0.7) +
  geom_vline(
    data = reference_dates,
    aes(xintercept = reference_time, color = reference_col),
    linetype = "dashed",
    linewidth = 0.8
  ) +
  scale_color_manual(values = c(planting_time = "#c47f2c", assessment_time = "#20262e")) +
  labs(
    title = "Two possible reference dates for one site",
    x = "Calendar time",
    y = "Daily mean temperature (deg C)",
    color = "Reference column"
  ) +
  cowplot::theme_half_open()

ggplot2 chart showing reference dates, candidate windows, or generated weather features.

In make_windows(), offsets are measured relative to the selected reference date. Negative offsets are before the reference date, 0 is the reference date, and positive offsets are after the reference date. Window labels describe interval boundaries. For example, window_m05_z00 means a 5-day interval that ends at the selected reference date, while window_p01_p06 means a 5-day interval that begins 1 day after the reference date.

Step 3: Create a fixed-width window pane

A fixed-width window pane uses one duration and slides it through the relative-time range. Here every candidate window is 7 days long and occurs before or on the selected reference date. To match the 49-day span used later for planting-centered windows, this grid runs from 49 days before assessment to the assessment date itself. The first window is 49 to 43 days before the reference date, the next is 48 to 42 days before the reference date, then 47 to 41 days before the reference date, and so on. That default behavior is controlled by slide_by = 1.

fixed_windows <- make_windows(
  min_offset = -49,
  max_offset = 0,
  width = 7
)

knitr::kable(head(fixed_windows, 10))
relative_start relative_end width label
-49 -42 7 window_m49_m42
-48 -41 7 window_m48_m41
-47 -40 7 window_m47_m40
-46 -39 7 window_m46_m39
-45 -38 7 window_m45_m38
-44 -37 7 window_m44_m37
-43 -36 7 window_m43_m36
-42 -35 7 window_m42_m35
-41 -34 7 window_m41_m34
-40 -33 7 window_m40_m33
nrow(fixed_windows)
#> [1] 43

When a dense one-day slide creates more candidate windows than needed, increase slide_by. In the next grid, the same 7-day window moves two relative-time units at a time: 49 to 43 days before the reference date, then 47 to 41 days before the reference date, then 45 to 39 days before the reference date.

coarser_windows <- make_windows(
  min_offset = -49,
  max_offset = 0,
  width = 7,
  slide_by = 2
)

knitr::kable(head(coarser_windows, 10))
relative_start relative_end width label
-49 -42 7 window_m49_m42
-47 -40 7 window_m47_m40
-45 -38 7 window_m45_m38
-43 -36 7 window_m43_m36
-41 -34 7 window_m41_m34
-39 -32 7 window_m39_m32
-37 -30 7 window_m37_m30
-35 -28 7 window_m35_m28
-33 -26 7 window_m33_m26
-31 -24 7 window_m31_m24
nrow(coarser_windows)
#> [1] 22

The comparison below shows the practical effect of slide_by. The window width is still 7 days in both grids, but a larger slide produces fewer, more widely spaced candidate windows.

slide_comparison <- data.frame(
  slide_by = c("slide_by = 1", "slide_by = 2"),
  candidate_windows = c(nrow(fixed_windows), nrow(coarser_windows))
)

knitr::kable(slide_comparison)
slide_by candidate_windows
slide_by = 1 43
slide_by = 2 22

The first plot shows the dense fixed-width grid. The dashed line at relative-time 0 is the reference date. Segments to the left of that line are before the reference event.

plot_window_pane(
  fixed_windows,
  max_windows = Inf,
  color_by = "none",
  title = "Dense fixed-width window-pane grid",
  subtitle = "slide_by = 1 moves the 7-day window one relative-time unit at a time",
  xlab = "Time relative to reference date (days)"
)

ggplot2 chart showing reference dates, candidate windows, or generated weather features.

The next plot shows the same 7-day duration with a larger sliding step. The grid is smaller because the starting relative-time position jumps by two units instead of one.

plot_window_pane(
  coarser_windows,
  max_windows = Inf,
  color_by = "none",
  title = "Coarser fixed-width window-pane grid",
  subtitle = "slide_by = 2 moves the 7-day window two relative-time units at a time",
  xlab = "Time relative to reference date (days)"
)

ggplot2 chart showing reference dates, candidate windows, or generated weather features.

Use fixed-width windows when the exposure duration is already biologically defendable. For example, a 7-day period may be meaningful if infection requires several consecutive days of favorable moisture and temperature.

Step 4: Create a variable-width window pane

A variable-width window pane scans both timing and duration. Instead of using a single duration, give width several values. Here the workflow tries 3-, 5-, and 7-day windows within the same 49-day assessment-centered range. The window starts move two relative-time units at a time because slide_by = 2.

variable_windows <- make_windows(
  min_offset = -49,
  max_offset = 0,
  width = c(3, 5, 7),
  slide_by = 2
)

knitr::kable(head(variable_windows, 10))
relative_start relative_end width label
-49 -46 3 window_m49_m46
-49 -44 5 window_m49_m44
-49 -42 7 window_m49_m42
-47 -44 3 window_m47_m44
-47 -42 5 window_m47_m42
-47 -40 7 window_m47_m40
-45 -42 3 window_m45_m42
-45 -40 5 window_m45_m40
-45 -38 7 window_m45_m38
-43 -40 3 window_m43_m40
nrow(variable_windows)
#> [1] 69

The variable-width plot uses color to show duration. Shorter and longer windows can start at the same relative-time position, but they cover different portions of the weather history before the same reference date. The gap between adjacent starting positions shows the effect of slide_by.

plot_window_pane(
  variable_windows,
  max_windows = Inf,
  color_by = "width",
  title = "Variable-width window-pane grid",
  subtitle = "The dashed line at relative-time 0 marks the reference date",
  xlab = "Time relative to reference date (days)"
)

ggplot2 chart showing reference dates, candidate windows, or generated weather features.

The choice between fixed and variable windows should come from the disease biology. Fixed windows are simpler and easier to interpret. Variable windows are broader and may be better for discovery, especially early in a study.

Step 5: Generate features using assessment dates

window_pane() applies a window grid to every site. For each assessment row, it finds the matching weather series, places each candidate window relative to the chosen reference_col, and returns one wide feature table. In this first call, the windows are relative to assessment_time.

The statistics argument controls how weather is summarized inside each candidate window. A named list is useful when different variables need different summaries. In this example, the weather columns keep the names already present in the daily table, such as daily_mean_temp, daily_mean_rh, and daily_sum_rain. The same names are used in statistics, so the output feature names make the full calculation explicit.

daily_weather_cols <- c(
  "daily_mean_temp",
  "daily_mean_rh",
  "daily_sum_rain",
  "daily_sum_leaf_wetness"
)

summary_statistics <- list(
  daily_mean_temp = c("mean", "min", "max"),
  daily_mean_rh = list(mean = "mean", median = "median", days_at_or_above_90 = count_at_or_above(90)),
  daily_sum_rain = c("sum", "max"),
  daily_sum_leaf_wetness = "sum"
)

assessment_features <- window_pane(
  weather = weather,
  assessments = assessments,
  windows = fixed_windows,
  reference_col = "assessment_time",
  id_col = "site_id",
  response_col = "disease_intensity",
  weather_cols = daily_weather_cols,
  statistics = summary_statistics
)

assessment_feature_overview <- data.frame(
  rows = nrow(assessment_features),
  columns = ncol(assessment_features)
)
assessment_feature_names <- data.frame(
  feature = names(assessment_features %>% select(1:12))
)

knitr::kable(assessment_feature_overview)
rows columns
10 433
knitr::kable(assessment_feature_names)
feature
site_id
assessment_time
disease_intensity
n_obs_window_m49_m42
daily_mean_temp_mean_window_m49_m42
daily_mean_temp_min_window_m49_m42
daily_mean_temp_max_window_m49_m42
daily_mean_rh_mean_window_m49_m42
daily_mean_rh_median_window_m49_m42
daily_mean_rh_days_at_or_above_90_window_m49_m42
daily_sum_rain_sum_window_m49_m42
daily_sum_rain_max_window_m49_m42

The first columns identify the site, the reference date, and the response. The remaining columns are weather summaries. For example, daily_mean_temp_mean_window_m07_z00 is the mean of daily_mean_temp over the 7-day interval ending at the assessment date.

The table below shows the first rows and first feature columns of assessment_features. The full object has more weather-summary columns than shown here, but this view is enough to see the shape of the output: one row per site and one column per candidate weather predictor.

assessment_features %>%
  select(1:12) %>%
  slice_head(n = 6) %>%
  knitr::kable()
site_id assessment_time disease_intensity n_obs_window_m49_m42 daily_mean_temp_mean_window_m49_m42 daily_mean_temp_min_window_m49_m42 daily_mean_temp_max_window_m49_m42 daily_mean_rh_mean_window_m49_m42 daily_mean_rh_median_window_m49_m42 daily_mean_rh_days_at_or_above_90_window_m49_m42 daily_sum_rain_sum_window_m49_m42 daily_sum_rain_max_window_m49_m42
S01 2024-05-18 75.2 7 21.00065 20.29792 22.37833 80.53167 80.66500 0 19.59 10.33
S02 2024-05-07 59.2 7 22.23077 20.87000 23.49083 79.87208 79.93833 0 16.86 6.96
S03 2024-05-20 53.9 7 21.89202 20.77125 23.45917 79.92673 80.18958 0 10.62 8.22
S04 2024-04-12 71.7 7 22.92827 22.51333 23.30167 79.90863 80.13875 0 8.90 4.77
S05 2024-04-29 80.9 7 22.26113 20.56792 23.83250 79.65089 79.84375 0 5.99 1.63
S06 2024-04-15 87.2 7 22.29268 21.09167 23.70542 79.99827 80.13750 0 13.36 7.83

Step 6: Generate features using planting dates

Planting is positioned much earlier in the weather series than assessment, so it deserves its own window grid. Around assessment, the usual question is often “what happened before disease was measured?” Around planting, the question may include weather before planting and weather after planting, because both can be biologically meaningful.

The next grid uses 7-day windows and moves one week at a time. Negative windows summarize weather before planting, the window that starts at 0 includes the planting date, and positive windows summarize early-season weather after planting.

planting_windows <- make_windows(
  min_offset = -14,
  max_offset = 35,
  width = 7,
  slide_by = 7,
  reference_col = "planting_time"
)

knitr::kable(planting_windows)
relative_start relative_end width label
-14 -7 7 window_m14_m07
-7 0 7 window_m07_z00
0 7 7 window_z00_p07
7 14 7 window_p07_p14
14 21 7 window_p14_p21
21 28 7 window_p21_p28
28 35 7 window_p28_p35

The plot below makes the reference-date logic visible. The dashed line is planting. Segments to the left are pre-planting windows; segments to the right are post-planting windows.

plot_window_pane(
  planting_windows,
  max_windows = Inf,
  color_by = "timing",
  title = "Planting-centered windows can cross both sides of the reference date",
  subtitle = "Negative offsets are before planting; positive offsets are after planting",
  xlab = "Time relative to planting date (days)"
)

ggplot2 chart showing reference dates, candidate windows, or generated weather features.

Now window_pane() uses planting_windows with reference_col = "planting_time". The weather table and summary statistics stay the same; only the biological clock and the candidate windows change.

planting_features <- window_pane(
  weather = weather,
  assessments = assessments,
  windows = planting_windows,
  reference_col = "planting_time",
  id_col = "site_id",
  response_col = "disease_intensity",
  weather_cols = daily_weather_cols,
  statistics = summary_statistics
)

knitr::kable(
  planting_features %>%
    select(site_id, planting_time, disease_intensity) %>%
    slice_head(n = 5)
)
site_id planting_time disease_intensity
S01 2024-02-14 75.2
S02 2024-02-03 59.2
S03 2024-02-19 53.9
S04 2024-01-16 71.7
S05 2024-02-02 80.9

This is useful when the relevant biological clock starts at planting rather than assessment. For example, early-season weather may affect plant establishment, host susceptibility, or the amount of inoculum available later.

Step 7: Compare feature-table size across window strategies

Different window strategies create different numbers of candidate predictors. For a fair comparison, the four grids below use the same 49-day assessment-centered range introduced in Step 3: -49 to 0. This also matches the total length of the planting-centered grid above, where 35 - (-14) = 49. The only differences are the window durations and the sliding step. This lets the user see how a short sliding step creates a dense feature table, while a longer sliding step creates a smaller, more widely spaced one.

comparison_sparse_windows <- make_windows(
  min_offset = -49,
  max_offset = 0,
  width = 7,
  slide_by = 7
)

comparison_dense_features <- window_pane(
  weather = weather,
  assessments = assessments,
  windows = fixed_windows,
  reference_col = "assessment_time",
  id_col = "site_id",
  response_col = "disease_intensity",
  weather_cols = daily_weather_cols,
  statistics = summary_statistics
)

comparison_coarser_features <- window_pane(
  weather = weather,
  assessments = assessments,
  windows = coarser_windows,
  reference_col = "assessment_time",
  id_col = "site_id",
  response_col = "disease_intensity",
  weather_cols = daily_weather_cols,
  statistics = summary_statistics
)

comparison_variable_features <- window_pane(
  weather = weather,
  assessments = assessments,
  windows = variable_windows,
  reference_col = "assessment_time",
  id_col = "site_id",
  response_col = "disease_intensity",
  weather_cols = daily_weather_cols,
  statistics = summary_statistics
)

comparison_sparse_features <- window_pane(
  weather = weather,
  assessments = assessments,
  windows = comparison_sparse_windows,
  reference_col = "assessment_time",
  id_col = "site_id",
  response_col = "disease_intensity",
  weather_cols = daily_weather_cols,
  statistics = summary_statistics
)

feature_table_sizes <- data.frame(
  window_strategy = c(
    "7-day windows, slide_by = 1",
    "7-day windows, slide_by = 2",
    "3-, 5-, and 7-day windows, slide_by = 2",
    "7-day windows, slide_by = 7"
  ),
  relative_range = rep("-49 to 0 days", 4),
  total_length_days = rep(49, 4),
  candidate_windows = c(
    nrow(fixed_windows),
    nrow(coarser_windows),
    nrow(variable_windows),
    nrow(comparison_sparse_windows)
  ),
  generated_features = c(
    sum(grepl("_window_", names(comparison_dense_features))),
    sum(grepl("_window_", names(comparison_coarser_features))),
    sum(grepl("_window_", names(comparison_variable_features))),
    sum(grepl("_window_", names(comparison_sparse_features)))
  )
)

knitr::kable(feature_table_sizes)
window_strategy relative_range total_length_days candidate_windows generated_features
7-day windows, slide_by = 1 -49 to 0 days 49 43 430
7-day windows, slide_by = 2 -49 to 0 days 49 22 220
3-, 5-, and 7-day windows, slide_by = 2 -49 to 0 days 49 69 690
7-day windows, slide_by = 7 -49 to 0 days 49 7 70

The plot below emphasizes the practical trade-off. The label above each bar is the number of generated feature columns. Dense sliding and variable-width windows search a larger feature space. A longer sliding step keeps the same total relative-time range but creates fewer candidate predictors, which can be useful when the goal is a smaller, more interpretable modeling table.

feature_table_sizes <- feature_table_sizes %>%
  mutate(window_strategy = factor(window_strategy, levels = window_strategy))

ggplot(feature_table_sizes, aes(generated_features, window_strategy, fill = window_strategy)) +
  geom_col(show.legend = FALSE, width = 0.65) +
  geom_text(aes(label = generated_features), hjust = -0.25, size = 3.8) +
  scale_fill_manual(values = c(
    "7-day windows, slide_by = 1" = "#3f7d58",
    "7-day windows, slide_by = 2" = "#6ea87d",
    "3-, 5-, and 7-day windows, slide_by = 2" = "#c47f2c",
    "7-day windows, slide_by = 7" = "#7f5539"
  )) +
  scale_x_continuous(expand = expansion(mult = c(0, 0.12))) +
  labs(
    title = "Feature-table size by window strategy",
    subtitle = "Same 49-day range; different duration and slide_by",
    x = "Generated feature columns",
    y = NULL
  ) +
  cowplot::theme_half_open()

ggplot2 chart showing reference dates, candidate windows, or generated weather features.

Step 8: Inspect one metric family

After feature generation, inspect groups of related variables before modeling. Here we focus on temperature means from the fixed-width, assessment-referenced table. The object temp_cols contains only feature names that start with daily_mean_temp_mean_.

features <- assessment_features
temp_cols <- features %>%
  select(starts_with("daily_mean_temp_mean_")) %>%
  names()
knitr::kable(data.frame(feature = head(temp_cols)))
feature
daily_mean_temp_mean_window_m49_m42
daily_mean_temp_mean_window_m48_m41
daily_mean_temp_mean_window_m47_m40
daily_mean_temp_mean_window_m46_m39
daily_mean_temp_mean_window_m45_m38
daily_mean_temp_mean_window_m44_m37

The plot below converts a small set of temperature features from wide to long format using tidyr::pivot_longer(). Each line is a site, and each point is a candidate temperature window. This helps reveal whether sites differ consistently across windows or only in specific timing periods.

plot_ids <- features %>%
  slice_head(n = min(4, nrow(features))) %>%
  pull(site_id)
plot_cols <- temp_cols[1:min(8, length(temp_cols))]
temp_long <- features %>%
  filter(site_id %in% plot_ids) %>%
  select(site_id, all_of(plot_cols)) %>%
  pivot_longer(
    cols = -site_id,
    names_to = "feature",
    values_to = "value"
  ) %>%
  mutate(feature_index = match(feature, plot_cols))

ggplot(temp_long, aes(feature_index, value, color = site_id, group = site_id)) +
  geom_line(linewidth = 0.8) +
  geom_point(size = 1.8) +
  scale_color_manual(values = c("#2b6c4f", "#8aa05a", "#c47f2c", "#7f5539")) +
  scale_x_continuous(
    breaks = seq_along(plot_cols),
    labels = sub("daily_mean_temp_mean_", "", plot_cols)
  ) +
  labs(
    title = "Temperature summaries across candidate windows",
    x = "Relative-time window",
    y = "Feature value",
    color = "Site"
  ) +
  cowplot::theme_half_open() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplot2 chart showing reference dates, candidate windows, or generated weather features.

The metric family often matters as much as the window timing:

  • temperature may relate to infection efficiency
  • leaf wetness may relate to germination and penetration
  • rainfall may relate to dispersal or splash
  • relative humidity may act as a proxy for microclimatic favorability

Step 9: What to do next

Window-pane analysis can generate many predictors. A careful workflow usually screens candidates, removes redundant variables, and then fits models using a small set of biologically defendable features. Use screen_window_features() to rank features by association with disease intensity, and use screen_feature_correlations() to identify highly correlated predictors before modeling.