AWS ML Speciality Certification

Table of Contents

Data Engineering with AWS Machine Learning

**Data type Characteristics help to determine which AWS repository to use.

Structured Data:
- Relational
- Have predefined schema
- Relantionships
- Supports for complex queries
Amazon Relational Database Service: Amazon RDS: Amazon Aurora, Postgresql, mysql, mariodb, oracle and SQL database engines. AWS includes Amazon redshift datawarehouse in this datatype
Semi Structured Data:
- Partially structured such as JSON/XML
- Key value AWS databases support these type like mariodb, dynamo db
Unstructured data:
- No schema at all
- Heterogenous object storage.
- AWS S3 supports this type.

Batch and Stream processing Characteristics:

Batch Processing: Scope is limited to querying or processing over all or most of the datasets Data size is in the form large batches Data performance latencies are over 1 minutes over to hours Analyzes are complex like OLTP, string processing.

Last updated on Dec 6, 2022