All About Distributed Databases
What
Is Distributed Database?
A
database system known as a distributed database is one that uses several
computers or nodes that are linked together by a network. In a distributed
database, each node can store a portion of the data, and the total amount of
data stored on all nodes makes up the entire database.
The
system makes sure that the data is consistent and accessible to users despite
network outages or other system problems. Data is stored and processed in a
distributed way in a distributed database. A distributed database's main
objectives are to give applications that need access to vast amounts of data
high availability, scalability, and performance.
Need
of Distributed Database In Dbms
·
An
organised collection of information is called a database. In a database, the
data can be conveniently accessed, managed, modified, updated, controlled, and
organised.
·
The
two main categories of databases are distributed and centralised databases. Why
do we even need a Distributed Database in Dbms is the question at hand. For the
time being, let's imagine that we only have centralised databases.
·
All
of the data will be entered into a single database. Increasing its size to the
point that it will take a long time to query even a single entry.
·
Since
we only have one database, once a fault happens, we are no longer able to
fulfil user requests.
·
Even
if we wanted to, there is no way to scale, and availability is also reduced,
which lowers throughput.
·
Throughput,
latency, scalability, availability, fault tolerance, and many other
difficulties that may occur while using a single machine and a single database
are all resolved by distributed databases. We require distributed databases
because of this. Let's go over them in more depth.
Distributed
Database Types
There are two types of distributed
databases:
- Homogenous
- Heterogenous
1. Homogeneous
A
homogeneous distributed database is a form of distributed database where each
site uses the same operating system and database management system (DBMS)
software. In this kind of distributed database, data is partitioned and
distributed across various sites in a way that ensures data consistency and
availability. Each site has a copy of the same database schema.
Homogeneous
distributed databases are commonly used in large-scale enterprise applications
where data needs to be shared across multiple locations, such as in global
supply chain management, financial services, and telecommunications.
2. Heterogenous
In a
heterogeneous distributed database, many database management systems (DBMS) or
data models are utilised by the system's nodes. This indicates that the
material is accessed via several languages or protocols and is stored in various
formats.
For
example, one node in the system may use a relational DBMS, while another node
uses a NoSQL DBMS. Each node may also use different data models, such as
hierarchical, network, or object-oriented data models. Because the data must be
translated between various formats and languages, managing a heterogeneous
distributed database can be more difficult than managing a homogeneous
distributed database. Data security, performance, and consistency problems may
result from this. Therefore, to manage a heterogeneous distributed database
effectively, specialised knowledge and tools are frequently needed.
Distributed
Database Storage
Distributed
database storage is managed in two ways:
- Replication
- Fragmentation
1. Replication
The
system maintains copies of the data at many locations, as the name would imply.
A database is fully redundant if every single record is accessible from
numerous locations.
Data
replication has the benefit of making more data accessible across more sites.
Since the data is spread across several locations, queries can be handled
concurrently.
Data
replication does, however, have significant drawbacks. If any site fails to
update and synchronise data with other sites on a consistent basis, the
database will become inconsistent. Data availability benefits greatly from
replication.
Constant
updating makes concurrency control more difficult and adds server overhead.
The
relations are fragmented, which means they are divided into smaller portions,
when it comes to distributed database storage. Each fragment is kept in a
different location depending on where it is needed.
Making
ensuring that the fragments can be afterwards reassembled into the original
relation without losing data is a need for fragmentation.
Data
consistency is avoided by fragmentation since there are no duplicates of the
data.
Two
different types of fragmentation exist:
Vertical fragmentation - Each
group (tuple) of rows in the relation schema is assigned to a single fragment.
Vertical fragmentation: This
technique divides the relational schema into smaller schemas, each of which has
a common candidate key that ensures a lossless join.
Distributed
Database Features
Some general features of distributed databases
are:
- Location independency - Data is physically stored at
multiple sites and managed by an independent DDBMS.
- Distributed query processing - Distributed databases answer
queries in a distributed environment that manages data at multiple sites.
High-level queries are transformed into a query execution plan for simpler
management.
- Distributed
transaction management -
Provides
a consistent distributed database through commit protocols, distributed
concurrency control techniques, and distributed recovery methods in case
of many transactions and failures.
- Seamless integration - Databases in a collection
usually represent a single logical database, and they are interconnected.
- Network linking - All databases in a
collection are linked by a network and communicate with each other.
- Transaction processing - Distributed databases
incorporate transaction processing, which is a program including a
collection of one or more database operations. Transaction processing is
an atomic process that is either entirely executed or not at all.
- Vertical fragmentation - The relation schema is
fragmented into smaller schemas, and each fragment contains a common
candidate key to guarantee a lossless join.
• Distributed Database Advantages and Disadvantages
Development in modules- A distributed database that has been developed in a
modular fashion can be expanded to new locations or units by adding new servers
and data to the current configuration and seamlessly connecting them to the
distributed system. Distributed databases continue to operate normally after
this kind of expansion.
Reliability- Comparatively speaking,
distributed databases are more reliable than centralised ones. In the event of
a centralised database failure, the entire system shuts down. When a failure
occurs in a distributed database, the system continues to run, although with
reduced performance, until the problem is fixed.
Disadvantages
Large overhead- When
database replication is employed, several activities over numerous sites
necessitate numerous calculations and ongoing synchronisation, adding a
significant amount of processing overhead.
Data reliability-Data
integrity, which is jeopardised by changing data at various sites, is a
potential issue when employing database replication.
Data distribution errors- Effective
data dissemination is a key factor in user request response. That means if data
is not evenly distributed across various sites, responsiveness may suffer.
Comments
Post a Comment