Window-Pane Workflow • windcut

Run this setup first. It loads the packages used in the article and sets figure options.

library(windcut)
library(dplyr)
library(ggplot2)
library(tidyr)
library(cowplot)

Why window-pane?

Plant disease symptoms are often influenced by weather near a biological event. The event may be planting, flowering, inoculation, disease assessment, or another reference date. The hard part is that the most informative period is usually unknown. A window-pane workflow solves this by generating many candidate relative-time windows, summarizing weather inside each one, and comparing the resulting features.

Main idea. Choose a reference date, decide whether candidate windows should be before, after, or around that date, and summarize weather inside each candidate window.

The reference date does not have to be the disease assessment date. It can also be planting, flowering, inoculation, harvest, or any timestamp that makes sense for the biology of the system.

Step 1: Load bundled site-level weather and disease data

This tutorial starts from a bundled demo dataset with 10 sites. Each site has one daily weather time series and one disease assessment. That structure mirrors the modeling unit used by windcut: one row of disease data is paired with the weather history from the same site.

data(window_pane_demo_data)

weather <- window_pane_demo_data$weather
assessments <- window_pane_demo_data$assessments

nrow(weather)
#> [1] 1800
knitr::kable(assessments)

site_id	assessment_id	assessment_time	response_type	disease_intensity	planting_time
S01	S01	2024-05-18	percent	75.2	2024-02-14
S02	S02	2024-05-07	percent	59.2	2024-02-03
S03	S03	2024-05-20	percent	53.9	2024-02-19
S04	S04	2024-04-12	percent	71.7	2024-01-16
S05	S05	2024-04-29	percent	80.9	2024-02-02
S06	S06	2024-04-15	percent	87.2	2024-01-14
S07	S07	2024-04-25	percent	84.0	2024-01-31
S08	S08	2024-04-23	percent	83.2	2024-01-22
S09	S09	2024-04-13	percent	59.4	2024-01-12
S10	S10	2024-05-05	percent	72.6	2024-02-10

The weather object is already daily. If your own data are hourly, use aggregate_weather_daily() before this step and choose the daily statistics that match each weather variable. Daily column names keep the statistic in the name, such as daily_mean_temp and daily_sum_rain.

The site_id column is important because it lets window_pane() match each assessment to the correct weather series. The assessment_time and planting_time columns are both valid reference dates, but they answer different questions.

Step 2: Visualize the possible reference dates

Before defining windows, it helps to see what the reference dates mean on the weather timeline. The plot below uses one site and draws two vertical reference lines: planting and assessment. Later, the same relative-time grid will be placed relative to either one of these columns by changing reference_col.

example_site <- assessments$site_id[1]
example_assessment <- assessments %>% filter(site_id == example_site)
example_weather <- weather %>% filter(site_id == example_site)

plot_start <- example_assessment$planting_time - 7 * 86400
plot_end <- example_assessment$assessment_time + 1 * 86400
plot_weather <- example_weather %>%
  filter(time >= plot_start, time <= plot_end)

reference_dates <- data.frame(
  reference_col = c("planting_time", "assessment_time"),
  reference_time = c(
    example_assessment$planting_time,
    example_assessment$assessment_time
  )
)

ggplot(plot_weather, aes(time, daily_mean_temp)) +
  geom_line(color = "#3f7d58", linewidth = 0.7) +
  geom_vline(
    data = reference_dates,
    aes(xintercept = reference_time, color = reference_col),
    linetype = "dashed",
    linewidth = 0.8
  ) +
  scale_color_manual(values = c(planting_time = "#c47f2c", assessment_time = "#20262e")) +
  labs(
    title = "Two possible reference dates for one site",
    x = "Calendar time",
    y = "Daily mean temperature (deg C)",
    color = "Reference column"
  ) +
  cowplot::theme_half_open()

ggplot2 chart showing reference dates, candidate windows, or generated weather features.

In make_windows(), offsets are measured relative to the selected reference date. Negative offsets are before the reference date, 0 is the reference date, and positive offsets are after the reference date. Window labels describe interval boundaries. For example, window_m05_z00 means a 5-day interval that ends at the selected reference date, while window_p01_p06 means a 5-day interval that begins 1 day after the reference date.

Step 3: Create a fixed-width window pane

A fixed-width window pane uses one duration and slides it through the relative-time range. Here every candidate window is 7 days long and occurs before or on the selected reference date. To match the 49-day span used later for planting-centered windows, this grid runs from 49 days before assessment to the assessment date itself. The first window is 49 to 43 days before the reference date, the next is 48 to 42 days before the reference date, then 47 to 41 days before the reference date, and so on. That default behavior is controlled by slide_by = 1.

fixed_windows <- make_windows(
  min_offset = -49,
  max_offset = 0,
  width = 7
)

knitr::kable(head(fixed_windows, 10))

relative_start	relative_end	width	label
-49	-42	7	window_m49_m42
-48	-41	7	window_m48_m41
-47	-40	7	window_m47_m40
-46	-39	7	window_m46_m39
-45	-38	7	window_m45_m38
-44	-37	7	window_m44_m37
-43	-36	7	window_m43_m36
-42	-35	7	window_m42_m35
-41	-34	7	window_m41_m34
-40	-33	7	window_m40_m33

nrow(fixed_windows)
#> [1] 43

When a dense one-day slide creates more candidate windows than needed, increase slide_by. In the next grid, the same 7-day window moves two relative-time units at a time: 49 to 43 days before the reference date, then 47 to 41 days before the reference date, then 45 to 39 days before the reference date.

coarser_windows <- make_windows(
  min_offset = -49,
  max_offset = 0,
  width = 7,
  slide_by = 2
)

knitr::kable(head(coarser_windows, 10))

relative_start	relative_end	width	label
-49	-42	7	window_m49_m42
-47	-40	7	window_m47_m40
-45	-38	7	window_m45_m38
-43	-36	7	window_m43_m36
-41	-34	7	window_m41_m34
-39	-32	7	window_m39_m32
-37	-30	7	window_m37_m30
-35	-28	7	window_m35_m28
-33	-26	7	window_m33_m26
-31	-24	7	window_m31_m24

nrow(coarser_windows)
#> [1] 22

The comparison below shows the practical effect of slide_by. The window width is still 7 days in both grids, but a larger slide produces fewer, more widely spaced candidate windows.

slide_comparison <- data.frame(
  slide_by = c("slide_by = 1", "slide_by = 2"),
  candidate_windows = c(nrow(fixed_windows), nrow(coarser_windows))
)

knitr::kable(slide_comparison)

slide_by	candidate_windows
slide_by = 1	43
slide_by = 2	22

The first plot shows the dense fixed-width grid. The dashed line at relative-time 0 is the reference date. Segments to the left of that line are before the reference event.

plot_window_pane(
  fixed_windows,
  max_windows = Inf,
  color_by = "none",
  title = "Dense fixed-width window-pane grid",
  subtitle = "slide_by = 1 moves the 7-day window one relative-time unit at a time",
  xlab = "Time relative to reference date (days)"
)

ggplot2 chart showing reference dates, candidate windows, or generated weather features.

The next plot shows the same 7-day duration with a larger sliding step. The grid is smaller because the starting relative-time position jumps by two units instead of one.

plot_window_pane(
  coarser_windows,
  max_windows = Inf,
  color_by = "none",
  title = "Coarser fixed-width window-pane grid",
  subtitle = "slide_by = 2 moves the 7-day window two relative-time units at a time",
  xlab = "Time relative to reference date (days)"
)

ggplot2 chart showing reference dates, candidate windows, or generated weather features.

Use fixed-width windows when the exposure duration is already biologically defendable. For example, a 7-day period may be meaningful if infection requires several consecutive days of favorable moisture and temperature.

Step 4: Create a variable-width window pane

A variable-width window pane scans both timing and duration. Instead of using a single duration, give width several values. Here the workflow tries 3-, 5-, and 7-day windows within the same 49-day assessment-centered range. The window starts move two relative-time units at a time because slide_by = 2.

variable_windows <- make_windows(
  min_offset = -49,
  max_offset = 0,
  width = c(3, 5, 7),
  slide_by = 2
)

knitr::kable(head(variable_windows, 10))

relative_start	relative_end	width	label
-49	-46	3	window_m49_m46
-49	-44	5	window_m49_m44
-49	-42	7	window_m49_m42
-47	-44	3	window_m47_m44
-47	-42	5	window_m47_m42
-47	-40	7	window_m47_m40
-45	-42	3	window_m45_m42
-45	-40	5	window_m45_m40
-45	-38	7	window_m45_m38
-43	-40	3	window_m43_m40

nrow(variable_windows)
#> [1] 69

The variable-width plot uses color to show duration. Shorter and longer windows can start at the same relative-time position, but they cover different portions of the weather history before the same reference date. The gap between adjacent starting positions shows the effect of slide_by.

plot_window_pane(
  variable_windows,
  max_windows = Inf,
  color_by = "width",
  title = "Variable-width window-pane grid",
  subtitle = "The dashed line at relative-time 0 marks the reference date",
  xlab = "Time relative to reference date (days)"
)

ggplot2 chart showing reference dates, candidate windows, or generated weather features.

The choice between fixed and variable windows should come from the disease biology. Fixed windows are simpler and easier to interpret. Variable windows are broader and may be better for discovery, especially early in a study.

Step 5: Generate features using assessment dates

window_pane() applies a window grid to every site. For each assessment row, it finds the matching weather series, places each candidate window relative to the chosen reference_col, and returns one wide feature table. In this first call, the windows are relative to assessment_time.

The statistics argument controls how weather is summarized inside each candidate window. A named list is useful when different variables need different summaries. In this example, the weather columns keep the names already present in the daily table, such as daily_mean_temp, daily_mean_rh, and daily_sum_rain. The same names are used in statistics, so the output feature names make the full calculation explicit.

daily_weather_cols <- c(
  "daily_mean_temp",
  "daily_mean_rh",
  "daily_sum_rain",
  "daily_sum_leaf_wetness"
)

summary_statistics <- list(
  daily_mean_temp = c("mean", "min", "max"),
  daily_mean_rh = list(mean = "mean", median = "median", days_at_or_above_90 = count_at_or_above(90)),
  daily_sum_rain = c("sum", "max"),
  daily_sum_leaf_wetness = "sum"
)

assessment_features <- window_pane(
  weather = weather,
  assessments = assessments,
  windows = fixed_windows,
  reference_col = "assessment_time",
  id_col = "site_id",
  response_col = "disease_intensity",
  weather_cols = daily_weather_cols,
  statistics = summary_statistics
)

assessment_feature_overview <- data.frame(
  rows = nrow(assessment_features),
  columns = ncol(assessment_features)
)
assessment_feature_names <- data.frame(
  feature = names(assessment_features %>% select(1:12))
)

knitr::kable(assessment_feature_overview)

rows	columns
10	433

knitr::kable(assessment_feature_names)

feature
site_id
assessment_time
disease_intensity
n_obs_window_m49_m42
daily_mean_temp_mean_window_m49_m42
daily_mean_temp_min_window_m49_m42
daily_mean_temp_max_window_m49_m42
daily_mean_rh_mean_window_m49_m42
daily_mean_rh_median_window_m49_m42
daily_mean_rh_days_at_or_above_90_window_m49_m42
daily_sum_rain_sum_window_m49_m42
daily_sum_rain_max_window_m49_m42

The first columns identify the site, the reference date, and the response. The remaining columns are weather summaries. For example, daily_mean_temp_mean_window_m07_z00 is the mean of daily_mean_temp over the 7-day interval ending at the assessment date.

The table below shows the first rows and first feature columns of assessment_features. The full object has more weather-summary columns than shown here, but this view is enough to see the shape of the output: one row per site and one column per candidate weather predictor.

assessment_features %>%
  select(1:12) %>%
  slice_head(n = 6) %>%
  knitr::kable()

site_id	assessment_time	disease_intensity	n_obs_window_m49_m42	daily_mean_temp_mean_window_m49_m42	daily_mean_temp_min_window_m49_m42	daily_mean_temp_max_window_m49_m42	daily_mean_rh_mean_window_m49_m42	daily_mean_rh_median_window_m49_m42	daily_sum_rain_sum_window_m49_m42	daily_sum_rain_max_window_m49_m42
S01	2024-05-18	75.2	7	21.00065	20.29792	22.37833	80.53167	80.66500	19.59	10.33
S02	2024-05-07	59.2	7	22.23077	20.87000	23.49083	79.87208	79.93833	16.86	6.96
S03	2024-05-20	53.9	7	21.89202	20.77125	23.45917	79.92673	80.18958	10.62	8.22
S04	2024-04-12	71.7	7	22.92827	22.51333	23.30167	79.90863	80.13875	8.90	4.77
S05	2024-04-29	80.9	7	22.26113	20.56792	23.83250	79.65089	79.84375	5.99	1.63
S06	2024-04-15	87.2	7	22.29268	21.09167	23.70542	79.99827	80.13750	13.36	7.83

Step 6: Generate features using planting dates

Planting is positioned much earlier in the weather series than assessment, so it deserves its own window grid. Around assessment, the usual question is often “what happened before disease was measured?” Around planting, the question may include weather before planting and weather after planting, because both can be biologically meaningful.

The next grid uses 7-day windows and moves one week at a time. Negative windows summarize weather before planting, the window that starts at 0 includes the planting date, and positive windows summarize early-season weather after planting.

planting_windows <- make_windows(
  min_offset = -14,
  max_offset = 35,
  width = 7,
  slide_by = 7,
  reference_col = "planting_time"
)

knitr::kable(planting_windows)

relative_start	relative_end	width	label
-14	-7	7	window_m14_m07
-7	0	7	window_m07_z00
0	7	7	window_z00_p07
7	14	7	window_p07_p14
14	21	7	window_p14_p21
21	28	7	window_p21_p28
28	35	7	window_p28_p35

The plot below makes the reference-date logic visible. The dashed line is planting. Segments to the left are pre-planting windows; segments to the right are post-planting windows.

plot_window_pane(
  planting_windows,
  max_windows = Inf,
  color_by = "timing",
  title = "Planting-centered windows can cross both sides of the reference date",
  subtitle = "Negative offsets are before planting; positive offsets are after planting",
  xlab = "Time relative to planting date (days)"
)

ggplot2 chart showing reference dates, candidate windows, or generated weather features.

Now window_pane() uses planting_windows with reference_col = "planting_time". The weather table and summary statistics stay the same; only the biological clock and the candidate windows change.

planting_features <- window_pane(
  weather = weather,
  assessments = assessments,
  windows = planting_windows,
  reference_col = "planting_time",
  id_col = "site_id",
  response_col = "disease_intensity",
  weather_cols = daily_weather_cols,
  statistics = summary_statistics
)

knitr::kable(
  planting_features %>%
    select(site_id, planting_time, disease_intensity) %>%
    slice_head(n = 5)
)

site_id	planting_time	disease_intensity
S01	2024-02-14	75.2
S02	2024-02-03	59.2
S03	2024-02-19	53.9
S04	2024-01-16	71.7
S05	2024-02-02	80.9

This is useful when the relevant biological clock starts at planting rather than assessment. For example, early-season weather may affect plant establishment, host susceptibility, or the amount of inoculum available later.

Step 7: Compare feature-table size across window strategies

Different window strategies create different numbers of candidate predictors. For a fair comparison, the four grids below use the same 49-day assessment-centered range introduced in Step 3: -49 to 0. This also matches the total length of the planting-centered grid above, where 35 - (-14) = 49. The only differences are the window durations and the sliding step. This lets the user see how a short sliding step creates a dense feature table, while a longer sliding step creates a smaller, more widely spaced one.

comparison_sparse_windows <- make_windows(
  min_offset = -49,
  max_offset = 0,
  width = 7,
  slide_by = 7
)

comparison_dense_features <- window_pane(
  weather = weather,
  assessments = assessments,
  windows = fixed_windows,
  reference_col = "assessment_time",
  id_col = "site_id",
  response_col = "disease_intensity",
  weather_cols = daily_weather_cols,
  statistics = summary_statistics
)

comparison_coarser_features <- window_pane(
  weather = weather,
  assessments = assessments,
  windows = coarser_windows,
  reference_col = "assessment_time",
  id_col = "site_id",
  response_col = "disease_intensity",
  weather_cols = daily_weather_cols,
  statistics = summary_statistics
)

comparison_variable_features <- window_pane(
  weather = weather,
  assessments = assessments,
  windows = variable_windows,
  reference_col = "assessment_time",
  id_col = "site_id",
  response_col = "disease_intensity",
  weather_cols = daily_weather_cols,
  statistics = summary_statistics
)

comparison_sparse_features <- window_pane(
  weather = weather,
  assessments = assessments,
  windows = comparison_sparse_windows,
  reference_col = "assessment_time",
  id_col = "site_id",
  response_col = "disease_intensity",
  weather_cols = daily_weather_cols,
  statistics = summary_statistics
)

feature_table_sizes <- data.frame(
  window_strategy = c(
    "7-day windows, slide_by = 1",
    "7-day windows, slide_by = 2",
    "3-, 5-, and 7-day windows, slide_by = 2",
    "7-day windows, slide_by = 7"
  ),
  relative_range = rep("-49 to 0 days", 4),
  total_length_days = rep(49, 4),
  candidate_windows = c(
    nrow(fixed_windows),
    nrow(coarser_windows),
    nrow(variable_windows),
    nrow(comparison_sparse_windows)
  ),
  generated_features = c(
    sum(grepl("_window_", names(comparison_dense_features))),
    sum(grepl("_window_", names(comparison_coarser_features))),
    sum(grepl("_window_", names(comparison_variable_features))),
    sum(grepl("_window_", names(comparison_sparse_features)))
  )
)

knitr::kable(feature_table_sizes)

window_strategy	relative_range	total_length_days	candidate_windows	generated_features
7-day windows, slide_by = 1	-49 to 0 days	49	43	430
7-day windows, slide_by = 2	-49 to 0 days	49	22	220
3-, 5-, and 7-day windows, slide_by = 2	-49 to 0 days	49	69	690
7-day windows, slide_by = 7	-49 to 0 days	49	7	70

The plot below emphasizes the practical trade-off. The label above each bar is the number of generated feature columns. Dense sliding and variable-width windows search a larger feature space. A longer sliding step keeps the same total relative-time range but creates fewer candidate predictors, which can be useful when the goal is a smaller, more interpretable modeling table.

feature_table_sizes <- feature_table_sizes %>%
  mutate(window_strategy = factor(window_strategy, levels = window_strategy))

ggplot(feature_table_sizes, aes(generated_features, window_strategy, fill = window_strategy)) +
  geom_col(show.legend = FALSE, width = 0.65) +
  geom_text(aes(label = generated_features), hjust = -0.25, size = 3.8) +
  scale_fill_manual(values = c(
    "7-day windows, slide_by = 1" = "#3f7d58",
    "7-day windows, slide_by = 2" = "#6ea87d",
    "3-, 5-, and 7-day windows, slide_by = 2" = "#c47f2c",
    "7-day windows, slide_by = 7" = "#7f5539"
  )) +
  scale_x_continuous(expand = expansion(mult = c(0, 0.12))) +
  labs(
    title = "Feature-table size by window strategy",
    subtitle = "Same 49-day range; different duration and slide_by",
    x = "Generated feature columns",
    y = NULL
  ) +
  cowplot::theme_half_open()

ggplot2 chart showing reference dates, candidate windows, or generated weather features.

Step 8: Inspect one metric family

After feature generation, inspect groups of related variables before modeling. Here we focus on temperature means from the fixed-width, assessment-referenced table. The object temp_cols contains only feature names that start with daily_mean_temp_mean_.

features <- assessment_features
temp_cols <- features %>%
  select(starts_with("daily_mean_temp_mean_")) %>%
  names()
knitr::kable(data.frame(feature = head(temp_cols)))

feature
daily_mean_temp_mean_window_m49_m42
daily_mean_temp_mean_window_m48_m41
daily_mean_temp_mean_window_m47_m40
daily_mean_temp_mean_window_m46_m39
daily_mean_temp_mean_window_m45_m38
daily_mean_temp_mean_window_m44_m37

The plot below converts a small set of temperature features from wide to long format using tidyr::pivot_longer(). Each line is a site, and each point is a candidate temperature window. This helps reveal whether sites differ consistently across windows or only in specific timing periods.

plot_ids <- features %>%
  slice_head(n = min(4, nrow(features))) %>%
  pull(site_id)
plot_cols <- temp_cols[1:min(8, length(temp_cols))]
temp_long <- features %>%
  filter(site_id %in% plot_ids) %>%
  select(site_id, all_of(plot_cols)) %>%
  pivot_longer(
    cols = -site_id,
    names_to = "feature",
    values_to = "value"
  ) %>%
  mutate(feature_index = match(feature, plot_cols))

ggplot(temp_long, aes(feature_index, value, color = site_id, group = site_id)) +
  geom_line(linewidth = 0.8) +
  geom_point(size = 1.8) +
  scale_color_manual(values = c("#2b6c4f", "#8aa05a", "#c47f2c", "#7f5539")) +
  scale_x_continuous(
    breaks = seq_along(plot_cols),
    labels = sub("daily_mean_temp_mean_", "", plot_cols)
  ) +
  labs(
    title = "Temperature summaries across candidate windows",
    x = "Relative-time window",
    y = "Feature value",
    color = "Site"
  ) +
  cowplot::theme_half_open() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplot2 chart showing reference dates, candidate windows, or generated weather features.

The metric family often matters as much as the window timing:

temperature may relate to infection efficiency
leaf wetness may relate to germination and penetration
rainfall may relate to dispersal or splash
relative humidity may act as a proxy for microclimatic favorability

Step 9: What to do next

Window-pane analysis can generate many predictors. A careful workflow usually screens candidates, removes redundant variables, and then fits models using a small set of biologically defendable features. Use screen_window_features() to rank features by association with disease intensity, and use screen_feature_correlations() to identify highly correlated predictors before modeling.