Data is used frequently in the healthcare field and comes from a variety of sources. For example, imaging data, demographic data, Electronic Health Records and wearable devices are all sources of data used by researchers and medical practitioners to gain insights about patients. Looking at the overall data of a patient gives doctors a good idea of who the patient is and helps them with their treatment plans and decisions.
Structured Data is like data that is neatly arranged and separated into different categories. It is similar to data organized in a spreadsheet with rows and columns to separate the categories. For example, vital signs, medical history and medication records are all part of structured data. Structured data is very readable and searchable, and is very useful in patient care and in research.
Unstructured data is data that doesn't fit neatly into a spreadsheet or into categories. For example, patient stories, radiological reports, pathology reports and clinical notes.
Big data represents larger datasets. For example, a continuous stream of data that is collected from a wearable device. Due to its magnitude, big data requires thorough data management, strong infrastructure and strong analysis. Big data is very powerful and is what leads to innovations such as personalized medicine.
Electronic Health Records contain data on the patients such as their medical history, diagnoses, prescriptions and treatment plans.
Medical imaging data comes from CT scans, MRIs, X-rays, radiology images and more.
Patient demographic data is about the patient and their lifestyle. For example, the patient's race, gender, age or socioeconomic status, are all things that doctors look at when making treatment plans and decisions.
Wearable technology provides continuous data about a patient. For example, wearable devices that output a patient's blood sugar levels or sleep habits.
Genomic Data is data about the patients genes, and is collected to find diseases and mutations. Personalized medicine relies genomic data.
Data on a population is data about the health and results of a population. This is most useful in public health, epidemiology and disease surveillance.
The first type of data storage is file storage. This is the most common and stores data in files, and then furthers sorts the files into folders.
The second type is block storage which organizes the data into blocks and each block has a unique identifier. You can access each block reliably and efficiently. This is similar to the different pages in a book. Each page contains different words, and you can choose which page you want in order to access those words.
The third type is object storage, which is used to store unstructured data. This could be videos, audios, sensors, web pages and more.
There are different ways that these storage methods can be implemented, and each comes with different drawbacks and benefits.
One way is through on premise data storage. This represents data that is stored locally at the hospital. Hospitals appreciate this because it allows for more control over their data. They are able to create their own databases with on premise data, and they don't need to rely on internet connection to access the data, which makes on-premise data storage less risky. However, on premise storage takes up a lot of space and requires maintenance. Physical space is needed for the servers, and the maintenance of the datasets, and the cooling for the servers are expensive.
Another type is cloud storage. Cloud storage is data that is not stored locally. It is less costly than on premise, and it is also more flexible, scalable and efficient. 89% of healthcare organizations use the cloud to store their data. It can be used for back office applications, backup and disaster recovery, revenue management and patient engagement. However, by using the cloud, hospitals have to rely on another company's product and services. So healthcare professionals have less control over their data storage. They also have concerns about security and if the cloud vendors they use comply with HIPAA laws.
Most organizations use a hybrid storage solution. This is where they use multiple different data storage solutions, and decide what types of data they want stored on premise, and what data they want on the cloud. For example, if some data is very private, hospitals might prefer to store that on premise.
My project for this week was to read in patient data from a clinical trial and encrypt the data.
Kaggle data set: https://www.kaggle.com/datasets/thedevastator/cancer-patients-and-air-pollution-a-new-link
Encryption Code