Question

In: Economics

How do we collect data from over 100 different physical sites into a central location for warehousing?

CASE STUDY:

A large Transformational Medical Technologies and Services Organization.

Business Problem :-

The client offered a hosted Electronic Medical Record (EMR) solution and wanted to build a portal for healthcare and pharmaceutical companies to allow access to the data for medical care quality measures and research needs.

Key Issues :

• How do we collect data from over 100 different physical sites into a central location for warehousing?

• How do we combine EMR data from numerous organizations with different data collection standards?

• What is the best method to properly cleanse the data such that personal identifying information is removed?

Expert Solution

* Electronic medical records (EMR)

EMR stands for Electronic medical records, which are the digital equivalent of paper records, or charts at a clinician’s office. EMRs typically contain general information such as treatment and medical history about a patient as it is collected by the individual medical practice.

* How to collect data from different physical sites to a central location for warehousing?

Health care involves a diverse set of public and private data collection systems, including health surveys, administrative enrollment and billing records, and medical records, used by various entities, including hospitals, CHCs, physicians, and health plans. Data on race, ethnicity, and language are collected, to some extent, by all these entities, suggesting the potential of each to contribute information on patients or enrollees.

The data of medical is very crucial in nature thus it can't be handled light handedly. It needs utmost care. Thus the data collection is to be done with utmost care.

Developing a online portal will play a key role in gathering data from multiple location at a very high rate and with high accuracy. Now we need to set up a dedicated server for the portal which will play it's role as warehouse and all data collected will be stored there.

We will provide all our healthcare institutes and pharmaceutical companies a login id from where they can upload data will will automatically be transferred and stored the data in the central warehose location server which could be accessed from anywhere.

* How do we combine EMR data from numerious organisation with different data collection standards?

The unstructured data of an EMR are present in clinical notes, surgical records, discharge records, radiology reports, and pathology reports. Clinical notes are documents written, in free text, by doctors, nurses, and staff providing care to a patient, and offer increase detail beyond what may be inferred from a patient’s diagnosis codes. The information contained in clinical notes may concern a patient’s medical history (diseases, interventions, among others), family history of diseases, environmental exposures, and lifestyle data. Therefore, applying an automatic way of interpretation of these clinical notes and records is of the utmost importance.

We propose to model the data preprocessing phase as an extract-transform and load (ETL) process. ETL derives from data warehousing and covers the process of how data are loaded from the source system to the data warehouse. A typical data preprocessing phase is thus composed of the following three phases: (i) Extract available data (extract), (ii) transform and clean data (transform), and (iii) store the output data in an output repository (load). These phases are preceded by the data source selection, typically a set of files or a database. More detailed insight of each of the three phases is provided:

The extract process is responsible for the data extraction from the source system or database and makes it accessible for further processing or management process. In health care, this process needs to deal with data privacy, and most of the extract process has an anonymization process associated. At this point, the researcher decides which data make sense to use.
The transform is a process based on a set of rules to transform the data from the source to the target. In the current research, a new approach using semantics is proposed for this phase. This can be a complex process taking into account different dimensionalities cases, and it needs to assure that all variables are in the same units so that they can later be joined and a clean process can be conducted. The transformation step also requires joining data from several sources, generating aggregations, surrogate keys, sorting, deriving newly calculated values, and applying advanced validation rules.
The load process merges all data into a target database (output data).

* Best method to properly cleanse the data

Screening involves systematically looking for suspect features in assessment questionnaires, databases, or analysis datasets. The diagnosis (identifying the nature of the defective data) and treatment (deleting, editing or leaving the data as it is) phases of data cleaning requires an in depth understanding of all types and sources of errors possible during data collection and entry processes. Documenting changes entails leaving an audit trail of errors detected, alterations, additions and error checking and will allow a return to the original value if required.

Openrefine (ex-Google Refine) and LODRefine are powerful tools for working with messy data, cleaning it, or transforming it from one format into another. Videos and tutorials are available to learn about the different functionalities offered by this software. The facets function is particularly useful as it can very efficiently and quickly gives a feel for the range of variation contained within the dataset.

If the data are cleaned by more than one person, then the final step is to merge all the spreadsheets together so that there is only one database. The comments or change logs that are made as the cleaning progresses should be compiled into one document. Problem data should be discussed in the documentation file. Update cleaning procedures, change log and data documentation file as the cleaning progress. Provide feedbacks to enumerators, team leaders or data entry operators if the data collection and entry process is still ongoing. If the same mistakes are made by one team or enumerators, make sure to inform the culprit. Data cleaning is a continuous process. Some problems cannot be identified until the analysis has begun. Errors are discovered as analysts manipulate the data, and several cleaning stages are generally required as inconsistencies are discovered. In rapid assessments, it is very common that errors are detected even during the peer review process

Rahul Sunny answered 2 years ago

how do we calculate the percentage of calories? can a amdr from protein exceed over 100%?

A central question in biology is, how do we have > 200 different cell types, complete...

A central question in biology is, how do we have > 200 different cell types, complete with unique cell morphologies and functions, when all cells have the same DNA (the concept of genomic equivalence). Epigenetic regulation of gene transcription is one way to explain this. Define what is meant by epigenetic regulation of gene transcription, including different general epigenetic mechanisms, and describe three different contexts of a developmental process that is associated with epigenetic changes that we have discussed in...

Economics: Can make considerable fortunes for some? How do we collect data, disseminate research – can...

Economics: Can make considerable fortunes for some? How do we collect data, disseminate research – can we bring research results into areas outside of the academic world?

How is DNA fragmented with restriction enzymes? What sites are recognized? How do we determine how...

How is DNA fragmented with restriction enzymes? What sites are recognized? How do we determine how often recognition sites are found? Please provide an explanation.

Suppose you collect data on people’s height from a sample of 100 people. The average height...

Suppose you collect data on people’s height from a sample of 100 people. The average height in the sample is 66, and the standard deviation of the sample meanis 3 inches. Calculate the 95% confidence interval for the average height in the population At the 12% significance level, test the hypothesis that the average height in the population is 69 inches. Use the four steps we discussed in class. Calculate the p-value for the hypothesis that the average height in...

I am planning to collect data from 100 people on which cereal and flavour they like...

I am planning to collect data from 100 people on which cereal and flavour they like the most and whether they would spend their income on a regular basis to buy it. What is the Independant Variable and dependant variable? Remember that your independent variables should not be part of your dependent variable by construction The project should involve applying the regression analysis.

In this study patients were recruited from 3 different clinical sites. Use the following data to...

In this study patients were recruited from 3 different clinical sites. Use the following data to test if there is a difference in the proportions of hypertensive patients across clinical sites. (old fashioned step by step process, Excel is okay but no electronic computing like ANOVA, please) Site 1 Site 2 Site 3 Hypertensive 10 14 12 Not Hypertensive 68 56 40

To find out if wealthier people are happier we collect data from 50 people about their...

To find out if wealthier people are happier we collect data from 50 people about their income and their overall happiness on a scale from 1 to 10. The correlation coefficient comes out to be -0.25. Given that r=0.025 which means this is a weak negative correlation. In terms of strength, we can conclude that the correlation between income and happiness is moderate. In terms of direction, there is a negative correlation between happiness and income. If we increase the number...

How do you feel about physical fitness in physical education programs? Should it be something we...

How do you feel about physical fitness in physical education programs? Should it be something we teach separately, or intertwined with the rest of the curriculum on sports, games and movement activities? You can write about how you approach your own fitness…do you use sports as part of your training, or do you train completely separately? If appropriate, how would you use these personal experiences with fitness training with your students/athletes when you’re a coach/PE teacher in the future?

How do we assess the performance of a central bank? Given this, how would assess the...

How do we assess the performance of a central bank? Given this, how would assess the performance of the RBA over the last decade?