This project focuses on building a real-time big data streaming architecture capable of capturing and processing live data from YouTube. The implemented solution establishes a data pipeline where live comments from YouTube streams are collected via a Flask server exposing an HTTP API. These real-time messages are sent to an AWS Kinesis data stream, which routes the data simultaneously to two destinations: an Amazon S3 Datalake for storage and Apache Flink for real-time processing. Through Flink, the system analyzes incoming data, producing statistics and insights that are visualized using a graphical dashboard. The project successfully achieved the real-time data capture, transmission to Kinesis, storage in S3, and stream processing with Flink. However, the persistent export of the final processed results remains pending. The development environment is based on Python, leveraging libraries such as pychat, boto3, Flask, and pyflink, along with .env files for managing parameters like IP addresses, ports, and credentials. The Flask server exposes an endpoint at IP 52.2.10.125, which receives the YouTube live stream ID to start capturing real-time comments.