• Log processing—With minimal modification, you can process large amounts of text log data per day from several sources using existing MapReduce.

  • Reporting—Aggregate data into reports and store the data in BigQuery. Then you can push the aggregate data to applications that power dashboards and conduct analysis.

  • On-demand Spark clusters—Quickly launch ad-hoc clusters to analyze data is stored in blob storage using Spark (Spark SQL, PySpark, Spark shell).

  • Machine learning—Use the Spark Machine Learning Libraries (MLlib), which are preinstalled on the cluster, to customize and run classification algorithms.

hadoop, spark, spark sql, hive, pig 可以创建多个cluster,解决冲突问题。无法升级现有cluster可以销毁重建。The most cost-effective and flexible way to migrate your Hadoop system to GCP is to shift away from thinking in terms of large, multi-purpose, persistent clusters and instead think about small, short-lived clusters that are designed to run specific jobs. tailor cluster configurations for individual jobs

sqoop hdfs to relational db oozie = workflow dag stack driver resource type and metrics