Data Science and Applied Technologies (RC4) Facilities

The Data Science and Applied Technology Core has 800 sqft of dedicated space to conduct research.  This space is outfitted with 8 workstations with large monitors (>20 inches).  It also has an 8X4 ft conference table that serves as a collaboration space with an office is adjacent to the space where clinical assessments are performed. The space is designed for programmers and trainees to analyze data.  The space is situated immediately in office/work areas of clinic research staff that include study coordinators, student assistants, and collaborating investigators.

DSAT Computing and software. Computers have software critical for inter-investigator communication and writing collaborations that include programs Microsoft Office, WinEdt for LaTeX editing, Adobe Acrobat Professional, Adobe Creatively and Design suite (e.g. photoshop), Skype, and FaceTime (iPad).  It also has a large suite of software to conduct a variety of analyses and visualization. These include STATA, SAS, SPSS, StatTransfer, MatLAB, Labview 16.0, R v3.4, JAVA, and Enthought Canopy (Python). Data are Visualized using GraphPad Prism 5.0, Tableau, Visual Studio, Adobe Photoshop, Adobe Illustrator, and custom visualization packages in R.

The core has full access to The Department of Computer and Information Sciences and Engineering (CISE) computer cluster space consisting of a head node with dual Opterons, 16GB of memory, and 3.5TB of storage with 20 worker nodes with dual Opterons and 32GB of memory running Linux (Ubuntu Server 10.04). These will be used for prototype software development. All graduate students have access to a workstation that can be used to access this cluster. All faculty offices are equipped with a Windows or Linux workstation with standard software installations. Wireless access is available throughout the CSE Building and all of campus.

DSAT Wearable sensors. The Core houses several types of wearable sensors for temporal measurement movement patterns, community mobility via global positioning systems, and bio-sensors (e.g. heart rate, galvanic skin response, skin temperature). The Core has 40 Actigraph GT1M, GT3X, and LINK accelerometer models (The Actigraph Inc. Pensacola, FL). The monitors are small (3.8 x 3.7 x 1.8 cm), lightweight (27 g), and include a uniaxial and triaxial accelerometer. The accelerometers measure accelerations in the range of 0.05-2 G with a band-limited frequency of 0.25-2.5 Hz. The monitors are initialized and data is downloaded with the ActiLife software (Version 3.3.0). The Core also uses multi-sensor technology through a portable armband (HealthWear Bodymedia, Pittsburgh, PA). The Sensewear armband uses a dual-axis accelerometer, a heat flux sensor, a galvanic skin response sensor, a skin temperature sensor, and a near-body ambient temperature sensor to capture data. Data from multi-sensor technologies are comparable to energy expenditure measured with doubly-labeled water.  The core possesses 4 Empatica E4 wristband wrist-worn wearable multi-sensor.  The E4 measures blood volume pulse through a photoplethysmography Sensor – from which heart rate, heart rate variability (HRV), and other cardiovascular features are derived. An electrodermal Activity Sensor

The UF Health Integrated Data Repository (IDR) and I2B2
. The IDR was created to serve as a common source of information to be used by clinicians, executives, researchers, and educators. The IDR enables new research discoveries, as well as patient care quality and safety improvements through a continuous cycle of information, flows between our clinical enterprise and research community. In its simplest form, a data repository is a collection of disparate data organized in a manner that lends itself to understanding relationships between data elements to answer questions.measures sympathetic nervous arousal and derives features related to stress, engagement, and excitement. It also has a tri-axial accelerometer, event mark button, and infrared thermopile for peripheral skin temperature.  Lastly, the core owns 15 Samsung Gear S smartwatches that possess customized software to program “apps”.  Programming is done in TIZEN and Android operating systems. The applications loaded on the Gear S device run in a WebKit-based browser environment. Tizen provides API libraries to interface with its sensors as well as other system-level functionality and notifications.

The UF Health IDR currently consists of a Clinical Data Warehouse (CDW) that aggregates data from the various clinical and administrative information systems, including the Epic EMR. The CDW contains demographics, inpatient and outpatient clinical encounter data, diagnoses, procedures, lab results, medications, select nursing assessments, co-morbidity measures, and select perioperative anesthesia information system data. The CDW data contains “Fully Identified Data” and is fundamental to institutional business processes and secured per UF&Shands policies. The UF Health IDR Team is a multi-disciplinary group comprised of members from Shands Decision and Support Services (DSS), AHC IT, AHC faculty, IRB, and the Clinical and Translational Science Institute (CTSI). Access to IDR data is provided through the NIH-funded i2b2 tool, which provides researchers access to a HIPAA-compliant and IRB-approved “Limited Data Set.” Faculty researchers can query the i2b2 Limited Data Set to identify cohort counts as they prepare grant proposals, plan clinical trials, and write IRB protocols.

RC4 Data repository.   A data repository of de-identified data has been created for investigators to address age-related questions.  A brief description of data available in the repository is listed in Table RC4.

Department of Biomedical Engineering resources are directly available to RC4.  Dr. Rashidi will contribute resources available in her laboratory – intelligent HEAlth Lab (i-HEAL). The i-HEAL lab is located at the New Engineering Building (NEB). It includes desk space for up to 10 students. It contains a Dell Precision T5610 server with 64GB of memory and Dual Intel® Xeon processor and five networked workstations, each being equipped with four microprocessors. The software licensed to  Dr. Rashidi’s lab for advanced data analysis and programming includes MTLAB, Visual Studio, Enthought Canopy (Python), WinEdt for LaTeX editing, and Microsoft suite for text and graphics processing. 

UF Research Computing.  Additionally, the researchers in this core will also have access to the resources at the HPC center at the University of Florida. These resources will be used for our scaling experiments. The HPC Center runs several clusters with about 23,000 cores in multi-core servers. Further details on UF Research Computing can be found in the section below on High-Performance Computing Center (HPC).

Computing resources in the Department of Computer and Information Science and Engineering (CISE).  RC4 has access to and utilizes the resources in CISE under the supervision of Dr. Ranka.  The CISE data science cluster has twelve subsystems (8 AMD and 4 Intel-based systems) connected with Infiniband (40Gbit/s) and Gigabit internet. Each AMD subsystem has 64 2.3 GHz cores, 512 GB, and 24 4 Terabyte hard drives. Each Intel-based subsystem will have 16 2.1 GHz cores, 128 GB main memory, and 10 2TB hard drives. The total capacity of the cluster will be 576 cores, 4.6TB main memory, and 848 TB disk space with 30GB/s cross-network bandwidth. The data science cluster also has a number of GPGPUs/Video Cards: 2 NVidia Tesla K20, 5GB RAM each 2 NVidia Tesla K40, 12GB RAM each; 1 ATI Saphire, 16GB RAM; 3 Intel Phi 3120a, 6GB RAM each; 1 Intel Phi 5120; 2 NVidia GTX 790, 3GB RAM each. These are connected to a subset of Intel subsystems. This system will be used for conducting the big data research described in this proposal.

CISE also provides a computer cluster consisting of a head node with dual Opterons, 16GB of memory, and 3.5TB of storage with 20 worker nodes with dual Opterons and 32GB of memory running Linux (Ubuntu Server 10.04). These will be used for prototype software development. All graduate students have access to a workstation that can be used to access this cluster. All faculty offices are equipped with a Windows or Linux workstation with standard software installations. Wireless access is available throughout the CSE Building and all of campus.

Florida cyberinfrastructure. Eleven universities in the state of Florida joined forces in the Sunshine State Education & Research Computing Alliance (SSERCA) to build a robust cyberinfrastructure to share expertise and resources. The current members are Florida Atlantic University (FAU), Florida International University (FIU), Florida State University (FSU), University of Central Florida (UCF), University of Florida (UF), University of Miami (UM), and University of South Florida (USF). The affiliate institutions are Florida Agricultural and Mechanical University (FAMU), University of North Florida (UNF), and University of West Florida (UWF).  The Florida Lambda Rail (FLR) provides the underlying fiber-optic network and network connectivity between these institutions and many others. The FLR backbone will complete the upgrade to 100 Gbps by June 2015. The University of Florida is connected to this backbone at the full speed of 100 Gbps and has been connected at that rate to Internet2 backbone since Jan 2013.

Artificial Intelligence Center. In June 2020, the University of Florida announced a new partnership with NVIDIA to create an AI Center to create an AI-centric data center that houses the world’s fastest AI supercomputer in higher education.  This partnership includes commitments of $25 million from a UF alumnus donor, $25 million in hardware, software, training, and services from NVIDIA, and a $20 million investment from UF.  This resource will give faculty and students within and beyond UF the tools to apply AI across a multitude of areas to improve lives, bolster industry, and create economic growth across the state.

OneFlorida Clinical Data Research Network.  RC4 will actively utilize the resources provided by the OneFLorida Data Trust.  Following an investment of $100 million, in 2011 UF Health opened a new electronic medical record system and a clinical data warehouse that was the foundation for the development of an integrated data repository.  Over the past 4 years, the IDR system expanded to the OneFlorida Network— a statewide Clinical Data Research Network (CDRN) that will join the PCORnet to optimize opportunities for conducting comparative effectiveness research (CER).  In 2012, One Florida cared for 7,506,370 unique patients, or 39% of all Floridians, through a network of 22 hospitals, 416 practices, and 3,250 physician providers. The centerpiece of the One Florida CDRN is the OneFlorida Data Trust, a secure, de-identified data repository in which UF Health, Orlando Health, Florida Medicaid/CHIP, and the Florida Department of Health currently participate. To date, the OneFlorida Data Trust houses data on 5M patients, including demographic information, diagnoses, procedures, lab results, personalized medicine genotyping data, health care visit details, nurse assessments, bio-specimen availability, and vital statistics records.