AgStats, Experimental Designs, SASsyFridays

AIs and Statistical Coding – SAS & RCBD

July 15, 2025

Alrighty you’ve already read about my feelings on the approach that ChatGPT took when asked it to create SAS code to analyze a Randomized Complete Block Design (RBCD). I was not happy to say the least. Now for my conundrum – of course I tried the sample dataset and code – compared it to the updated and more appropriate SAS coding – and guess what?? Yup the results were the same!

BUT

This still does NOT mean it is correct! Here’s why I disagree with the proposed SAS coding from ChatGPT:

Statistical Models

For better or worse, when I teach statistics and experimental designs it’s about controlling that experimental or residual error. The best way to think about this is to create or write the statistical model for your design. We can then take that statistical model and “translate” it to SAS, R, or whatever statistical program you are working with.

This is the standard statistical model for an RBCD.

Y_ij = mu + Block_i + Treatment_j + e_ij

Where:

Y_ij is the outcome measure (Yield as example)
mu is the overall mean
Block_i is the RANDOM effect of the block
Treatment_j is the FIXED effect of the treatment
e_ij is the experimental (or residual) error

There is CLEARLY a RANDOM effect in this statistical model. However, in the SAS code created by ChatGPT:

* Step 2: Run the RCBD analysis using PROC GLM;
proc glm data=rcbd;
    class Block Trt;
    model Yield = Block Trt;
    means Trt / lsd;
    lsmeans Trt / pdiff adjust=tukey;
    title ‘RCBD Analysis – Treatment Effects Adjusted for Block’;
run;

There is NO RANDOM statement!!! Therefore the random effect of the block is NOT being accounted for.

FIXED Block Effect

Yes the Block effect in the SAS code providedby ChatGPT is indeed being treated as a FIXED effect. Is this wrong? We can debate this! There are situations where you, the researcher, may be interested in the effect (and differences between the Block levels) – and in this case the Block effect would indeed be considered as a FIXED effect.

As a supporter and true believer in setting out experiments based on research questions – I believe that if you create an RCBD – you know there may be some factor that is adding variation to your outcome measures, but you are not interested in discussing the differences between the levels of that effect – so it is RANDOM and should be treated as such. Oish – Michelle – spill it and talk english!

Let’s look at an example:

Field trial
- You are testing 3 varieties of soybeans in a field that is next to a road. We know that in the winter, the county roads are salted, leading to salt leaching into the field and resulting in a salt gradient. We know this may affect the growth and yield of the crops – so we want to design the trial to account for this – but are not interested in testing the effect of the salt gradient. By creating an RCBD design – we will create blocks in our field that run parallel to our road. In each block we will randomly assign 1 plot to each variety of soybeans for a total of 3 plots in each block.
- The block or salt gradient is the RANDOM effect and should be added as such in our coding

RANDOM Block Effect

So folks out there are saying – Michelle – add the RANDOM statement to the above code generated by ChatGPT. Yes, we can definitely do this:

proc glm data=rcbd;
    class Block Trt;
    model Yield = Block Trt;
random Block;
    lsmeans Trt / pdiff adjust=tukey;
    title ‘RCBD Analysis – Treatment Effects Adjusted for Block’;
run;

Here is the output from that one statement:

The GLM Procedure

Source	Type III Expected Mean Square
Block	Var(Error) + 4 Var(Block)
Trt	Var(Error) + Q(Trt)

Most people will skip over this – heck I will suggest that most of our students will not even understand this. The assumption here is SAS will take care of everything – no worries!!!

As a trained animal geneticist and someone who calculated variance components for her MSc thesis – these are very fun! These list out how to calculate the EMS for both the fixed effect of Trt and random effect of Block. Again, most of us are going to skip this and ignore that there is any random variation accounted for by the Block effect!

Proper SAS code for the example

data rcbd;
input Block $ Trt $ Yield;
datalines;
1 A 45.2
1 B 48.7
1 C 50.1
1 D 46.3
2 A 44.9
2 B 49.5
2 C 51.0
2 D 47.2
3 A 46.0
3 B 50.0
3 C 52.3
3 D 48.1
;
run;

Proc glimmix data=rcbd;
class Block Trt;
model Yield = Trt;
random Block;
lsmeans Trt / pdiff adjust=tukey lines;
title “Updated and appropriate GLMM model – MEdwards”;
title2 ‘RCBD Analysis’;
Run;

This is based on the Statistical model listed above. The MODEL statement will only include the FIXED effects and you will add a separate statement for the RANDOM effects. The MODEL and RANDOM statement should reflect the statistical model for your trial and experiment.

Why are the results the same?

I started this post by admitting that as much as I dislike the SAS coding from ChatGPT, when I ran it and the more appropriate code – the end results were the same.

We have a very clean and small dataset – so yes the results are the same. When we start moving into larger datasets with messier data – you will see different results with these 2 approaches.

Too much information?

When I ran the Proc GLM coding presented – I had pages and pages of output and results. Really cool looking plots and tables

BUT do we need it all?

When you are uncertain or not confident with your analyses – please do NOT give your supervisor ALL the information you have in these outputs. Understand what it is saying! I will always say that SAS thinks it’s smarter than you and will try to anticipate ALL your needs and give you oodles and oodles of output pages. Most of it – irrelevant to your research question! I’ve found with the tools such as ChatGPT – you are now getting even more nonsensical and irrelevant output.

Remember – always go back to your research question! Don’t get side-tracked by all those rabbit holes and extra output – stick to your research questions.

SO…. What do we do?

Make statistics more accessible to students and researchers
- The more I play with this code and work with today’s students – this point is becoming more and more important and KEY!
Be aware of how AI is being used to create SAS and R code so we can teach our students and researchers how to build on what they are provided
As an instructor, keep playing with AI to be more aware of the results

Upcoming Blog post

This post concentrated on the SAS code, but ChatGPT created some interesting R code as well. I’ll present it and my thoughts next time!