Saturday, January 13, 2007

Professional Development Rubrics

There seems to be some conversation about the right term for these-- rubrics, scoring guides, continuums, etc., but I'm sure we are all picturing the same table of headings describing a scale from not-good to great.

In the business world, and somewhat in education, they are also called Behaviorally Anchored Rating Scales (BARS). I'm adding one word to that, making it Data-based Behaviorally Anchored Rating Scales (D-BARS). If you ever see that somewhere else, you can say you know where it started.

If you're adopting, amending, writing your own D-BARS, there are some errors to avoid lest the outcome be less than helpful to the observer and observee.

A very common error has to do with creating a continuum of behavior indicators. Since across the top of these documents is a scale that progresses from one extreme to the other, conceptually with no gaps or overlaps between the divisions, the D-BARS (the physical observable behavior that exemplifies each division) should also be a continuum. As I look at these documents from across the country and world, one of the most common errors is that the actual behavior being used as an indicator changes from one division/cell to another. It shouldn't. What should be described is one behavior across the continuum, poor to great. An example.....

The target standard/behavior is "Teachers involve and guide all students in assessing their own learning." The category headings are Unsatisfactory, Emerging, Basic, Proficient, and Distinguished.

The behavior indicator for the Unsatisfactory level is "Students do not assess their own learning." That's a clear statement, but there's more to the unsatisfactory level than no assessment at all. Students might be assessing themselves once a year, unguided, inaccurately, using the wrong criteria, etc. The descriptor for Unsatisfactory should describe the range of indicators, all of which are unsatisfactory.

The next level, Emerging, has this as a behavior indicator: "Teacher checks student work and communicates progress through the report card." This indicator is unrelated to student assessment of their own learning, and doesn't provide the guidance for whoever would use the D-BARS to clearly be able to determine the difference between Unsatisfactory and Emerging. This statement might fit well in a target standard related to 'communicating progress to students', and in that standard might well fit in the 'Emerging' category.

Perhaps (and this is brainstorming - collaborative discussion needed)...

Emerging would be "Students are asked to state/guess what their grade on an assignment will be." or "Students are asked to grade each other's papers without the use of a scoring guide." [Students are assessing their work related to grades, and with little guidance]

The Basic category could be something akin to "Students are assessed by the teacher according to a scoring guide and asked to describe why they agree/disagree with the grade." [Students are asked to apply the scoring guide in their reflection, but do not actually self-assess]

A descriptor for the Proficient level might be "Using a teacher provided scoring guide, students are asked to assess their work before they hand it in to the teacher." [Students assess their work according to a scoring guide]

And finally, the Distingished level could read "Using collaboratively developed (teacher and students) scoring guides, students are engaged in self and peer assessment of progress toward meeting the standards." [Students are engaged and guided in the process of creating the criteria, and then applying that criteria to themselves and others.]

I would hope that there would be discussion about my choices and wording, as this is only to illustrate the need for a continuum in the described behavior indicators.

The next thinking might be about what are the keystone observable behaviors that should be tracked to gather data on "Involving and guiding students in assessing their own learning." Is it the amount of time students are engaged in assessing learning? The number of references to standards made by the teacher? The number and or level of questions asked by students related to assessment and standards? Every observer who makes a determination of level is doing so on the basis of something they see. We need to come to consensus concerning what's valid and reliable.

Tuesday, January 9, 2007

Thoughts after a presentation on Observation Reliability

This is an email sent to a person who requested a copy of the powerpoint presentation I did on Observation Reliability where I presented my new idea about the sequence from research to standards to indicators to data collection to teacher support and evaluation. If you'd like to see the Powerpoint send me an email.
------------
When I first wrote eCOVE I was focused on giving helpful feedback to student teachers. From years of working with student teachers and new teachers I knew that they needed help thinking through the problems that came up in their classrooms. Providing them with 'my' answers and ideas was of much less benefit that getting them to think through things and devise their own solutions.

I also knew, again from personal experiences working with them, that giving them data (pencil and paper before eCOVE) help them honestly reflect on their own actions and outcomes, and it also greatly diminished the fear factor that came with the 'evaluator' role of a supervisor.

When I first started working with administrators and eCOVE I was totally focused on changing their role from judge to support and staff development. I preached hard that working collaboratively would have great effects and would/could create a staff of self-directed professionals. I still strongly believe that, and have enough feedback to feel confirmed.

However, a recent conversation with an ex-student, now an administrator, has added to my perspective. He likes eCOVE and would love to use it except that his district has a 20 page (gulp!) evaluation system that he needs to complete while observing - so he doesn't have the time to work with teachers. We agree that it's a waste of time, and corrupts the opportunity for collaborative professionalism.

As I thought about his situation and the hours of development time that went into the creation and adoption of that 'evaluation guide', I realized that my approach to observation as staff development had ignored the reality of the required and necessary role of administrator as evaluator. The guide that he's stuck with seems to me to be the main flaw in the process, and what I believe is wrong with it (and the thousands in use across the country) is that they ask the observer to make a series of poorly defined judgments based on a vaguely defined set of 'standards'. It's an impossible task and is functionally a terrible and ineffectual burden on both administrators and teachers.

When I thought about how a standards based system might be improved, I developed the basis for the idea in the powerpoint - Standards should be based on research; the implementation of the standard should be in some way observable, if not directly then by keystone indicators; the criteria for an acceptable level of performance should be concrete and collaboratively determined. I say collaboratively since I believe that administrators, teachers, parents, and the general public all have value to add to the process of educating our youth. Setting those criterial levels in terms of observable behavior data should, again, be based on research, and confirmed by localized action research efforts. That's not as difficult as it sounds when the systematic process already includes data collection.

For the last couple of years, whenever I presented eCOVE I made a big point of saying that I was against set data targets for all teachers, that the context played such a big part in it all that only the teacher could interpret the data. I think now that I was wrong about that, partially at least. A simple example might be wait time - the time between a question and calling on a student for an answer. There's lots of research that shows a wait time of 3 seconds has consistent positive benefits. While I'm sure it's not the exact time of 3 seconds that is critical, the researched recommendation is a useful concrete measure. If a teacher waits less than one second (the research on new teachers), the children are robbed of the opportunity to think, and that's not OK. An important facet of the process I'm proposing has to do with how the data is presented and used. My experience has been that the first approach to a teacher should be "Is this what you thought was happening?" This question, honestly asked, will empower the teacher and engage him or her in the process of reflection, interpretation, and problem solving. During the ensuing professional level discussion, the criteria for the acceptable level of student engagement is a 3 second wait period should be included, and that's the measure to be used in the final evaluation. For, in the end, a judgment does have to be made, but it should not be based on the observer's opinion or value system, but on set measurable criteria -- criteria set and confirmed by sound research.

A more complex example - class learning time. The standard illustrated in the powerpoint stated that 'students should be engaged in learning', a commonly included standard in most systems. There is extensive research that indicates that the more time a student is engaged in learning activities, the greater will be the learning. While the research does not propose a specific percent of learning time as a recommended criteria, I believe we as a profession can at least identify the ranges for unsatisfactory, satisfactory, and exceptional. I think we'd all agree that if a class period had only 25 % of the time organized for teaching and/or student engagement in learning activities, it would be absolutely unsatisfactory. Or is that number 35%? 45%? 60%? What educator would be comfortable with a class where 40% of the time lacked any opportunity for students to learn. I don't know what the right number is, but I am confident that it is possible to come to a consensus over a minimum level. Class Learning time is a good example of a keystone data set - something that underlies the basic concept in the standard 'engaged students'. I know there are others.

But then my personal experience as a teacher comes into focus, and the objection "How can you evaluate me on something I don't have full control over?" pops up. I remember my lesson plans not working out when the principal took 10 minutes with a PA announcement and there were 4 interruptions from people with important messages or requests for information or students. How could it be fair to be concerned about my 50% learning time when there were all these outside influences?

That would be a valid concern where the evaluation system is based on the observer's perception and judgment, but less so when based on data collection. It is an easy task to set up the data collection to identify the non-learning time by sub categories - time under the teacher's control and time when an outside event took the control away from the teacher. The time under the teacher's control should meet the criteria for acceptable performance; the total time should be examined for needed systematic changes to provide the teacher with the full allotment of teaching/learning time. Basing the inspection of school functioning on observable behavior data will reveal many possible solutions for problems currently included in the observer's impression of teaching effectiveness.

It's reasonable to be suspicious of data collected and used as an external weapon, and for that reason I believe it to be critical that the identification of the keystone research and indicators, and the setting of the target level be a collaborative process. Add to that the realization that good research continues to give us new knowledge about teaching and learning, and with that the process should be in a constant state of discussion and revision. That's my vision of how a profession works - critical self-examination and improvement.

So now my thinking has come to a point where I believe (tentatively, at least) that we have sufficient research to develop standards, or to better focus the standards we do have; that we can identify keystone indicators for those standards; that we can use our collective wisdom to determine concrete levels for acceptability in those keystone indicators; that we can train observers to accurately observe and gather data; and that that data can be used to both further the teacher's self-directed professional growth and to ensure that the levels of effective performance as indicated by sound research are met.

I'm hoping that my colleagues in the education field (and beyond) will join in this discussion and thinking. What is your reaction? Can you give me "Yes, but what if.....?" instances? Do we really have the credible research to provide us with keystone indicators? How could a system like this be abused? How could we guard against the abuse?

Observations, standards, teachers, administrators

I'm sure I'll stumble around for a while, but I hope to both consolidate my thoughts and lure others into a discussion of data-based observations, teacher support and evaluation, standards, and how all this can be improved.

My disclaimer - I wrote a piece of software for collecting timer and counter data while observering classrooms. it's called eCOVE Classroom Observation Toolkit and you can download a trial version at www.ecove.net.

However, that's my failed retirement activity. Here at least I'm more interested in exploring how supporting teacher professional development can be enhanced through objective feedback. Most current administrator observations are a process of impressions and judgments, sometimes based on a district set of standards, sometimes not. This isn't working to the benefit of teachers who want to further their skills, but is generally just a task for both administrator and teacher to get past. That's a shame and I think there is a better way.

The central idea is if an observation gathers data on teacher and/or student observable behaviors and that data is presented to the teacher for reflection and interpretation, those intelligent, educated and professional individuals are fully capable of self-directed professional development. The data has to be meaningful and should reflect what a teacher wants to know about their own classroom and students.

So please join in, lengthy or brief. Agree or disagree. Post questions or thoughts or solutions.