In: Accounting
1. Discuss the high-level conceptual architecture for major elements and relationships in Big Data Solutions.
2. What are the similarities and differences between high-level
conceptual architecture and architecture for data warehousing and
data mining?
3. What information technologies can the computational demand of
Big Data Analytics fulfil?
ANSWER(1)
In the new era of Big Data and Data Sciences, it is vitally important for an enterprise to have a centralized data architecture aligned with business processes, which scales with business growth and evolves with technological advancements. A successful data architecture provides clarity about every aspect of the data, which enables data scientists to work with trustable data efficiently and to solve complex business problems. It also prepares an organization to quickly take advantage of new business opportunities by leveraging emerging technologies and improves operational efficiency by managing complex data and information delivery throughout the enterprise.
When compared with information architecture, system architecture, and software architecture, data architecture is relatively new. The role of Data Architects has also been nebulous and has fallen on the shoulders of senior business analysts, ETL developers, and data scientists. Nonetheless, I will use Data Architect to refer to those data management professionals who design data architecture for an organization.
When talking about architecture, we often think about the analogy with building architecture. A conventional building architect plans, designs, and reviews the construction of a building. The design process involves working with the clients to fully gather the requirements, understanding the legal and environmental constraints of the location, and working with engineers, surveyors and other specialists to ensure the design is realistic and within the budget. The complexity of the job is indeed very similar to the role of a data architect. However, there are a few fundamental differences between the two architect roles:
ANSWER(2)
Data Warehouse Architecture
The basic concept of a Data Warehouse is to facilitate a single version of truth for a company for decision making and forecasting. A Data warehouse is an information system that contains historical and commutative data from single or multiple sources. Data Warehouse concept, simplifies reporting and analysis process of the organization.
Data Mining Architecture
Data mining is a very important process where potentially useful and previously unknown information is extracted from large volumes of data. There are a number of components involved in the data mining process. These components constitute the architecture of a data mining system.
The major components of any data mining system are data source, data warehouse server, data mining engine, pattern evaluation module, graphical user interface and knowledge base.
Difference between Data warehousing and Data Mining
A data warehouse is database system which is designed for analytical analysis instead of transactional work. | Data mining is the process of analyzing data patterns. |
Data is stored periodically. | Data is analyzed regularly. |
Data warehousing is the process of extracting and storing data to allow easier reporting. | Data mining is the use of pattern recognition logic to identify patterns |
Data warehousing is solely carried out by engineers. | Data mining is carried by business users with the help of engineers. |
Data warehousing is the process of pooling all relevant data together. |
Data mining is considered as a process of extracting data from large data sets |
ANSWER(3):-
1. Conceptual Level Data Architecture Design based on Business Process and Operations
In modern IT, business processes are supported and driven by data entities, data flows, and business rules applied to the data. A data architect, therefore, needs to have in-depth business knowledge, including Financial, Marketing, Products, and industry-specific expertise of the business processes, such as Health, Insurance, Manufacturers, and Retailers. He or she can then properly build a data blueprint at the enterprise level by designing the data entities and taxonomies that represent each business domain, as well as the data flow underneath the business process. In particular, the following areas need to be considered and planned at this conceptual stage:
This conceptual level of design consists of the underlying data entities that support each business function. The blueprint is crucial for the successful design and implementation of Enterprise and System architectures and their future expansions or upgrades. In many organizations, this conceptual design is usually embedded in the business analysis driven by the individual project without guidance from the perspective of enterprise end-to-end solutions and standards.
2. Logical Level Data Architecture Design
This level of design is sometimes called data modeling by considering which type of database or data format to use. It connects the business requirements to the underlying technology platforms and systems. However, most organizations have data modeling designed only within a particular database or system, given the siloed role of the data modeler. A successful data architecture should be developed with an integrated approach, by considering the standards applicable to each database or system, and the data flows between these data systems. In particular, the following 5 areas need to be designed in a synergistic way:
The naming conventions and data integrity
The naming conventions for data entities and elements should be applied consistently to each database. Also, the integrity between the data source and its references should be enforced if the same data have to reside in multiple databases. Ultimately, these data elements should belong to a data entity in the conceptual design in the data architecture, which can then be updated or modified synergistically and accurately based on business requirements.
Data archival/retention policies
The data archival and retention policies are often not considered or established until every late-stage on Production, which caused wasted resources, inconsistent data states across different databases, and poor performance of data queries and updates. To enforce the data integrity, data architects should define the data archival and retention policy in the data architecture based on Operational standards.
Privacy and security information
Privacy and security become an essential aspect of the logical database design. While the conceptual design has defined which data component is sensitive information, the logical design should have the confidential information protected in a database with limited access, restricted data replication, particular data type, and secured data flows to protect the information.
Data Replications
Data Replication is a critical aspect to consider for three objectives: 1) High availability; 2) Performance to avoid data transferring over the network; 3) De-coupling to minimize the downstream impact. Excessive data replications, however, can lead to confusion, poor data quality, and poor performance. Any data replication should be examined by data architect and applied with principles and disciplines.
Data Flows and Pipelines
How data flows between different database systems and applications should be clearly defined at this level. Again, this flow is consistent with the flow illustrated in the business process and data architect conceptual level. Besides, the frequencies of the data ingestion, data transformations in the pipelines, and data access patterns against the output data should be considered in an integrated view in the logical design. For example, if an upstream data source comes in real-time, while a downstream system is mainly used for data access of aggregated information with heavy indexes (e.g., expensive for frequent updates and inserts), a data pipeline needs to be designed in between to optimize the performance.
3. Data Governance is the Key to the Continous Success of Data Architecture.
As data architecture reflects and supports the business processes and flow, it is subject to change whenever the business process is changed. As the underlying database system is changed, the data architecture also needs to be adjusted. The data architecture, therefore, is not static but needs to be continuously managed, enhanced, and audited. Data governance, therefore, should be adopted to ensure that enterprise data architecture is designed and implemented correctly as each new project is being kicked off.
Conclusion
Within a successful data architecture, a conceptual design based on the business process is the most crucial ingredient, followed by a logical design that emphasizes consistency, integrity, and efficiency across all the databases and data pipelines. Once the data architecture is established, the organization can see what data resides where and ensure that the data is secured, stored efficiently, and processed accurately. Also, when one database or a component is changed, the data architecture can allow the organization to assess the impact quickly and guides all relevant teams on the designs and implementations. Lastly, the data architecture is a live document of the enterprise systems, which is guaranteed to be up-to-date and gives a clear end-to-end picture. In summary, a holistic data architecture that reflects the end-to-end business process and operations is essential for a company to advance quickly and efficiently while undergoing significant changes such as acquisitions, digital transformation, or migration to the next-gen platform.