After the Capture: The Care and Treatment of Data

Erik van Widenfelt

By Martha Henry

“Data! Data! Data!” he cried impatiently. “I can’t make bricks without clay!”
~ Sherlock Holmes

Science depends on data. A large clinical trial like the Botswana Combination Prevention Project (BCPP) depends on lots of data. When the multi-year trial in 30 Botswana villages concludes, researchers hope their data will provide a better understanding of how to prevent HIV infections.

To assure that the data is as reliable as possible, every bit of BCPP information is captured electronically. Though your doctor’s office may still be transitioning from pencil to keyboard, the BCPP field team uses laptops exclusively. “Going to paper is not an option,” said Erik van Widenfelt, Director of Data Operations and Information Technology at the Botswana Harvard AIDS Institute Partnership (BHP), who designed a system to capture, store, and access the enormous amount of data being generated.

Electronic & Local

In the planning stages, BCPP leaders looked into purchasing an off-the-shelf data management system, but couldn’t find an affordable system that met their needs. Erik took on the task. He recruited six local computer-science graduates to help with the software design. Using open-source code, the team developed a flexible data management system for complex clinical trials.

Research Assistants (RAs) used to gather information on paper forms. That information would be keyed into a computer by a data entry person, usually weeks later at an entirely different location. The process was vulnerable to error. In a pilot study, Erik developed a data management system that allowed RAs to enter information directly into a laptop, eliminating both the paper forms and the errors that accompany them.

The BCPP protocol is complex with lots of rules to follow. Most study interviews take place inside people’s homes, not in the controlled environment of a clinic. Babies may be crying, dogs barking, music playing. Chaos or not, RAs are tasked with following a strict protocol. Erik’s team designed an easy-to-use tool that guides RAs through the interview process.

“Our RAs have to be good at dealing with people—talking to them about the importance of knowing their HIV status or participating in the study,” said Erik. “If we find someone who’s good at counseling and dealing with people, we’ll make sure that person—as long as they’ve got enough fingers to hit the keyboard—can operate the system.”

The software itself drives the process. As an RA enters data into a laptop, the system directs next steps. If a participant answers a question one way, required information fields and follow-up questions appear. Answers must be entered before continuing to the next step. The system guides RAs through the process, allowing them to concentrate on the participant and keeping the conversation as comfortable as possible.

Info for Geeks
The BCPP DMS is a suite of Python modules that extend the Django framework for clinical trials research. All code is open-source, updated daily, and available on GitHub. Researchers designing complex clinical trials are welcome to use and customize the code for their studies.

Fast & Accurate

After completing household visits for the day, RAs head back to two trailers parked outside a local clinic. The trailers serve as the field team’s base of operation in a village. When the RA is within a 15-meter radius of the IT trailer, the laptop automatically connects to the secure Wi-Fi network broadcasting from the trailer. Data synchronizes off the laptop and uploads onto a server.

At the mobile lab, the RA hands over the small tubes of blood from study participants. The Lab Assistant, viewing a screen with the data just downloaded from the RA’s laptop, reconciles the tubes with the data and prints a barcode label for each tube.

The lab is equipped with a centrifuge to separate whole blood into buffy coat and plasma. Erik’s system indicates which tubes need processing and generates a packing list for loading tubes into a cool box. Several times a week, a driver picks up the cool box and delivers it to BHP headquarters in Gaborone.

In the middle of the night, when RAs are hopefully sound asleep, data on the server in the IT trailer is transmitted to a server at BHP headquarters. The next morning, Drs. Unoda Chakalisa and Etienne Kadima, the BCPP Study Coordinators in Gaborone, review the data to see what’s going on in the field and make any necessary adjustments.

Blood & Data Flow
Click on image to enlarge

When the driver delivers the cool box to BHP, samples are again reconciled. HIV+ samples are tested for viral load. Most samples are stored in BHP freezers. Some are sent to the Essex Lab in Boston for genetic analysis.

The time difference between Gaborone and Boston is six hours. When she arrives at her Harvard office, Nealia Khan, the BCPP Data Manager, accesses new data on the BHP server in Gaborone. Data is encrypted at this and every other step of the process. Nealia cleans the data and creates reports for conference calls.

“That means that when we have weekly calls, we have up-to-date information,” said Erik. “On a Wednesday, we have Tuesday night information. That’s how good it is.”

The BCPP data management system demonstrates that high-quality research data can be collected quickly across a large geographic area. In the past, it took months or years for a study of this magnitude to have data ready for publication. The BCPP does it in days. “As we shift from paper-based to electronic collection, that interval just keeps shrinking,” said Erik.

“It’s been extremely important to modernize data collection, efficiency, and quality,” said Max Essex, Senior Principal Investigator of the BCPP and Chair of the Harvard AIDS Initiative and the Botswana Harvard Partnership.

Title photo by Dominic Chavez