InfoQ – January 2025

Here are the articles I wrote for InfoQ in January 2025, covering everything from Apache Hudi 1.0 to the latest controversies in the database world, from AWS Glue 5.0 to data transfer terminals.

AWS Glue 5.0 Introduces Spark 3.5.2 and Enhanced ETL Performance

At the latest re:Invent conference in Las Vegas, Amazon announced the general availability of AWS Glue 5.0, designed to accelerate ETL jobs powered by Apache Spark. The latest release of the serverless data integration service introduces upgraded runtimes, including Spark 3.5.2, Python 3.11, and Java 17, along with enhancements in performance and security.

Databases in 2024: Growth, Change and Controversy

Andrew Pavlo’s annual retrospective on the database world has recently been released, covering trends and innovations from the past year. The opinionated report, “Databases in 2024: A Year in Review,” highlights that while we may indeed be in the “golden era of databases,” last year brought significant license changes, the rapid growth of DuckDB, and some surprising new releases.

Apache Hudi 1.0 Now Generally Available

The Apache Software Foundation has recently announced the general availability of Apache Hudi 1.0, the transactional data lake platform with support for near real-time analytics. Initially introduced in 2017, Apache Hudi provides an open table format optimized for efficient writes in incremental data pipelines and fast query performance.

Google Expands Gemini Code Assist with Support for Atlassian, GitHub, and GitLab

Google recently announced support for third-party tools in Gemini Code Assist, including Atlassian Rovo, GitHub, GitLab, Google Docs, Sentry, and Snyk. The private preview enables developers to test the integration of widely-used software tools with the personal AI assistant directly within the IDE.

AWS Announces Physical Data Transfer Terminal for High-Speed Uploads

AWS has recently introduced AWS Data Transfer Terminal, a new option for high-speed data uploads. Currently available only in the US, Data Transfer Terminals provide a physical location where customers can bring their storage devices for fast data transfer to and from the AWS cloud.

AWS Introduces S3 Tables Bucket: Is S3 Becoming a Data Lakehouse?

AWS has recently announced S3 Tables Bucket, managed Apache Iceberg tables optimized for analytics workloads. According to the cloud provider, the new option delivers up to 3x faster query performance and up to 10x higher transaction rates for Apache Iceberg tables compared to standard S3 storage.