Moderniser.repo
  • 日本語
  • English✔️
  • 日本語
  • English✔️
[AWS]
What is a Design Guideline?
AWS Organizations AWS Control Tower
AWS IAM / AWS IAM Identity Center
AWS CloudTrail AWS Config Amazon GuardDuty
aws:executeScript action for SSM Automation
How to create custom checks with Prowler How to scrape the reference of Security Hub control and convert it to data CIS AWS Foundations Benchmark v3.0.0 How to use the latest Boto3 with SSM Automation (or Lambda) How to write a CommaDelimitedList in List (array) format in samconfig.yml How to specify a local file using the Script property of AWS::SSM::Document How to get information on multiple AWS resources in Terraforms Data Source
CloudFormation template / SAM template coding rules
Amazon S3
Amazon Athena (Glue Database)
[Visual Studio Code]
How to use the latest AWS icon in Draw.io of Visual Studio Code
Recommended plug-in summary of Visual Studio Code for cloud engineers
[iPhone]
[Others]
How to switch accounts for GitHub CLI commands using commands
Privacy Policy
Profile
...

Kanji

・ Cloud engineer / freelance
・ Born in 1993
・ Born in Ehime Prefecture / Lives in Shibuya-ku, Tokyo
・ AWS history 5 years

Profile details

Contact
Twitter(@kanji_aws_fl) Instagram(kanji_aws_freelance) Mail(kanji@cont-aid.com)


【IaC】Sample Code Collection for Amazon Athena (Glue Database)


Created date: 2025/04/17, Update date: 2025/04/17


Amazon Athena is a service that builds a data query service within the AWS cloud.
Athena allows you to query data stored in S3 using SQL queries.
When executing queries in Athena, it is necessary to create a Glue Database.
This article provides a sample code collection for Amazon Athena and Glue Database CloudFormation templates.

Table of Contents


  1. Revision History
  2. Coding Rules
  3. Athena Workgroup
  4. Glue Database


Revision History

Date Update Details
2025/03/08 ・Fixed the Prefix specified in LifecycleConfiguration for AthenaBucket01
・Updated the Type of useridentity and resources columns in CloudTrailTable01

Coding Rules

  • The CloudFormation templates described in this article are written based on 【IaC】CloudFormation Template / SAM Template Coding Rules .
  • CloudFormation Guard is used to evaluate rulesets that can assess “Center for Internet Security (CIS) Amazon Web Services Foundation v1.4 Level 1.”
    • Since it cannot be evaluated using the VSCode plugin, the following command is used.
cfn-guard validate \
--rules ~/cis-aws-benchmark-level-1.guard \
--data ./template.yml

Athena Workgroup

  • The following CloudFormation template configures a basic Athena workgroup.
  • The destination for server access logs is set to the bucket defined for storing audit logs in 【IaC】Amazon S3 Sample Code Collection .
    • As described in Amazon S3 Server Access Logging Now Supports Date-Based Partitioning , server access logs can now be partitioned by date. Therefore, the path is configured to be split by the event time of S3 events.
  • The log retention period is set to 45 days, the same as the query history retention period.
    • Viewing Recent Queries in the Athena Console - Amazon Athena
  • Although there is no reference information for S3 bucket policies in the AWS User Guide, allowing the s3:PutObject and s3:GetBucketLocation actions enables access from Athena.
AWSTemplateFormatVersion: 2010-09-09
Resources:
  AthenaBucket01:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: !Sub sample-athena-bucket-${AWS::AccountId}-${AWS::Region}
      BucketEncryption:
        ServerSideEncryptionConfiguration:
          - ServerSideEncryptionByDefault:
              SSEAlgorithm: AES256
      LifecycleConfiguration:
        Rules:
          - Status: Enabled
            ExpirationInDays: 45
            Prefix: athena-query-resultsd/
          - Status: Enabled
            AbortIncompleteMultipartUpload:
              DaysAfterInitiation: 7
            NoncurrentVersionExpirationInDays: 7
      PublicAccessBlockConfiguration:
        BlockPublicAcls: true
        BlockPublicPolicy: true
        IgnorePublicAcls: true
        RestrictPublicBuckets: true
      LoggingConfiguration:
        DestinationBucketName: !Sub sample-server-access-log-bucket-${AWS::AccountId}-${AWS::Region}
        TargetObjectKeyFormat:
          PartitionedPrefix:
            PartitionDateSource: EventTime
      VersioningConfiguration:
        Status: Enabled

  AthenaBucketPolicy01:
    Type: AWS::S3::BucketPolicy
    Properties:
      Bucket: !Ref AthenaBucket01
      PolicyDocument:
        Version: 2012-10-17
        Statement:
          - Sid: AllowAthenaGetBucketLocation
            Effect: Allow
            Principal:
              Service:
                - !Sub athena.${AWS::Region}.amazonaws.com
            Action:
              - s3:GetBucketLocation
              - s3:PutObject
            Resource:
              - !Sub arn:${AWS::Partition}:s3:::${AthenaBucket01}
              - !Sub arn:${AWS::Partition}:s3:::${AthenaBucket01}/*
          # 参考: https://docs.aws.amazon.com/ja_jp/securityhub/latest/userguide/s3-controls.html#s3-5
          - Sid: AllowSSLRequestsOnly
            Effect: Deny
            Principal: '*'
            Action: s3:*
            Resource:
              - !Sub arn:${AWS::Partition}:s3:::${AthenaBucket01}
              - !Sub arn:${AWS::Partition}:s3:::${AthenaBucket01}/*
            Condition:
              Bool:
                aws:SecureTransport: false

  AthenaWorkgroup01:
    Type: AWS::Athena::WorkGroup
    Properties:
      Name: sample-athena-workgroup
      State: ENABLED
      WorkGroupConfiguration:
        EngineVersion:
          SelectedEngineVersion: AUTO
        ResultConfiguration:
          OutputLocation: !Sub s3://${AthenaBucket01}/athena-query-results/
          EncryptionConfiguration:
            EncryptionOption: SSE_S3

Glue Database

  • The following CloudFormation template configures a Glue database for analyzing logs accumulated in the audit log bucket and server access log bucket described in 【IaC】Amazon S3 Sample Code Collection .
    • Supported AWS services include AWS Config, AWS CloudTrail, Amazon GuardDuty, Amazon VPC Flow Logs, and S3 Server Access Logs.
  • A single database is created with five tables:
    • CloudTrail Table
    • Config Table
    • GuardDuty Table
    • VPC Flow Logs Table
    • Server Access Log Table
  • For all tables, three partition keys are set using partition projection: account , region , and timestamp .
    • account is a 12-digit integer, region is a string type for AWS region names, and timestamp is a date type with a range from seven years ago to the present.
    • Additionally, region is an enumerated type listing AWS region names.
  • For the Server Access Log Table, an additional partition key source_bucket_name is added.
    • This is set as the partition key for the S3 bucket name that records server access logs.
AWSTemplateFormatVersion: 2010-09-09
Resources:
  GlueDatabase01:
    Type: AWS::Glue::Database
    Properties:
      DatabaseInput:
        Name: sample_glue_database
      CatalogId: !Ref AWS::AccountId

  # reference: https://docs.aws.amazon.com/ja_jp/athena/latest/ug/create-cloudtrail-table-partition-projection.html
  CloudTrailTable01:
    Type: AWS::Glue::Table
    Properties:
      DatabaseName: !Ref GlueDatabase01
      CatalogId: !Ref AWS::AccountId
      TableInput:
        TableType: EXTERNAL_TABLE
        Name: cloud_trail
        PartitionKeys:
          - Name: account
            Type: string
          - Name: region
            Type: string
          - Name: timestamp
            Type: string
        Parameters:
          projection.enabled: true
          projection.account.type: integer
          projection.account.digit: 12
          projection.account.range: 000000000000,999999999999
          projection.timestamp.type: date
          projection.timestamp.range: NOW-7YEARS,NOW
          projection.timestamp.format: yyyy/MM/dd
          projection.timestamp.interval: 1
          projection.timestamp.interval.unit: DAYS
          projection.region.type: enum
          projection.region.values: us-east-1,us-east-2,us-west-1,us-west-2,af-south-1,ap-east-1,ap-south-1,ap-northeast-2,ap-southeast-1,ap-southeast-2,ap-northeast-1,ca-central-1,eu-central-1,eu-west-1,eu-west-2,eu-south-1,eu-west-3,eu-north-1,me-south-1,sa-east-1
          EXTERNAL: TRUE
          storage.location.template: !Sub s3://sample-audit-bucket-${AWS::AccountId}-${AWS::Region}/AWSLogs/${!account}/CloudTrail/${!region}/${!timestamp}
        StorageDescriptor:
          Location: !Sub s3://sample-audit-bucket-${AWS::AccountId}-${AWS::Region}/AWSLogs/
          InputFormat: com.amazon.emr.cloudtrail.CloudTrailInputFormat
          OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
          Compressed: false
          SerdeInfo:
            SerializationLibrary: com.amazon.emr.hive.serde.CloudTrailSerde
          StoredAsSubDirectories: false
          Columns:
            - Name: eventversion
              Type: string
            - Name: useridentity
              Type: struct<type:string,principalId:string,arn:string,accountId:string,invokedBy:string,accessKeyId:string,userName:string,sessionContext:struct<attributes:struct<mfaAuthenticated:string,creationDate:string>,sessionIssuer:struct<type:string,principalId:string,arn:string,accountId:string,userName:string>>>
            - Name: eventtime
              Type: string
            - Name: eventsource
              Type: string
            - Name: eventname
              Type: string
            - Name: awsregion
              Type: string
            - Name: sourceipaddress
              Type: string
            - Name: useragent
              Type: string
            - Name: errorcode
              Type: string
            - Name: errormessage
              Type: string
            - Name: requestparameters
              Type: string
            - Name: responseelements
              Type: string
            - Name: additionaleventdata
              Type: string
            - Name: requestid
              Type: string
            - Name: eventid
              Type: string
            - Name: resources
              Type: array<struct<arn:string,accountId:string,type:string>>
            - Name: eventtype
              Type: string
            - Name: apiversion
              Type: string
            - Name: readonly
              Type: string
            - Name: recipientaccountid
              Type: string
            - Name: serviceeventdetails
              Type: string
            - Name: sharedeventid
              Type: string
            - Name: vpcendpointid
              Type: string

  # 参考: https://aws.amazon.com/jp/blogs/mt/how-to-query-your-aws-resource-configuration-states-using-aws-config-and-amazon-athena/
  ConfigTable01:
    Type: AWS::Glue::Table
    Properties:
      CatalogId: !Ref AWS::AccountId
      DatabaseName: !Ref GlueDatabase01
      TableInput:
        Name: config
        PartitionKeys:
          - Name: account
            Type: string
          - Name: region
            Type: string
          - Name: timestamp
            Type: string
        Parameters:
          projection.enabled: true
          projection.account.type: integer
          projection.account.digit: 12
          projection.account.range: 000000000000,999999999999
          projection.timestamp.type: date
          projection.timestamp.range: NOW-7YEARS,NOW
          projection.timestamp.format: yyyy/MM/dd
          projection.timestamp.interval: 1
          projection.timestamp.interval.unit: DAYS
          projection.region.type: enum
          projection.region.values: us-east-1,us-east-2,us-west-1,us-west-2,af-south-1,ap-east-1,ap-south-1,ap-northeast-2,ap-southeast-1,ap-southeast-2,ap-northeast-1,ca-central-1,eu-central-1,eu-west-1,eu-west-2,eu-south-1,eu-west-3,eu-north-1,me-south-1,sa-east-1
          storage.location.template: !Sub s3://sample-audit-bucket-${AWS::AccountId}-${AWS::Region}/AWSLogs/${!account}/Config/${!region}/${!timestamp}
        TableType: EXTERNAL_TABLE
        StorageDescriptor:
          Location: !Sub s3://sample-audit-bucket-${AWS::AccountId}-${AWS::Region}/AWSLogs/
          InputFormat: com.amazon.emr.cloudtrail.CloudTrailInputFormat
          OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
          SerdeInfo:
            SerializationLibrary: org.apache.hive.hcatalog.data.JsonSerDe
          Columns:
            - Name: fileversion
              Type: string
            - Name: configSnapshotId
              Type: string
            - Name: configurationitems
              Type: array<struct<configurationItemVersion:string,configurationItemCaptureTime:string,configurationStateId:bigint,awsAccountId:string,configurationItemStatus:string,resourceType:string,resourceId:string,resourceName:string,ARN:string,awsRegion:string,availabilityZone:string,configurationStateMd5Hash:string,resourceCreationTime:string>>

  # reference: https://docs.aws.amazon.com/ja_jp/athena/latest/ug/querying-guardduty.html
  GuardDutyTable01:
    Type: AWS::Glue::Table
    Properties:
      DatabaseName: !Ref GlueDatabase01
      CatalogId: !Ref AWS::AccountId
      TableInput:
        TableType: EXTERNAL_TABLE
        Name: guard_duty
        PartitionKeys:
          - Name: account
            Type: string
          - Name: region
            Type: string
          - Name: timestamp
            Type: string
        Parameters:
          projection.enabled: true
          projection.account.type: integer
          projection.account.digit: 12
          projection.account.range: 000000000000,999999999999
          projection.timestamp.type: date
          projection.timestamp.range: NOW-1YEARS,NOW
          projection.timestamp.format: yyyy/MM/dd
          projection.timestamp.interval: 1
          projection.timestamp.interval.unit: DAYS
          projection.region.type: enum
          projection.region.values: us-east-1,us-east-2,us-west-1,us-west-2,af-south-1,ap-east-1,ap-south-1,ap-northeast-2,ap-southeast-1,ap-southeast-2,ap-northeast-1,ca-central-1,eu-central-1,eu-west-1,eu-west-2,eu-south-1,eu-west-3,eu-north-1,me-south-1,sa-east-1
          EXTERNAL: TRUE
          storage.location.template: !Sub s3://sample-audit-bucket-${AWS::AccountId}-${AWS::Region}/AWSLogs/${!account}/GuardDuty/${!region}/${!timestamp}
        StorageDescriptor:
          Location: !Sub s3://sample-audit-bucket-${AWS::AccountId}-${AWS::Region}/AWSLogs/
          InputFormat: org.apache.hadoop.mapred.TextInputFormat
          OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
          Compressed: false
          SerdeInfo:
            SerializationLibrary: org.openx.data.jsonserde.JsonSerDe
          StoredAsSubDirectories: false
          Columns:
            - Name: schemaversion
              Type: string
            - Name: accountid
              Type: string
            - Name: region
              Type: string
            - Name: partition
              Type: string
            - Name: id
              Type: string
            - Name: arn
              Type: string
            - Name: type
              Type: string
            - Name: resource
              Type: struct<resourcetype:string,accesskeydetails:struct<accesskeyid:string,principalid:string,usertype:string,username:string>>
            - Name: service
              Type: struct<servicename:string,detectorid:string,action:struct<actiontype:string,awsapicallaction:struct<api:string,servicename:string,callertype:string,remoteipdetails:struct<ipaddressv4:string,organization:struct<asn:string,asnorg:string,isp:string,org:string>,country:struct<countryname:string>,city:struct<cityname:string>,geolocation:struct<lat:float,lon:float>>,affectedresources:string>>,resourcerole:string,additionalinfo:struct<recentapicalls:array<struct<api:string,count:int>>>,evidence:string,eventfirstseen:string,eventlastseen:string,archived:boolean,count:int>
            - Name: severity
              Type: int
            - Name: createdat
              Type: string
            - Name: updatedat
              Type: string
            - Name: title
              Type: string
            - Name: description
              Type: string

  # reference: https://docs.aws.amazon.com/ja_jp/athena/latest/ug/vpc-flow-logs-create-table-statement.html
  VPCFlowLogsTable01:
    Type: AWS::Glue::Table
    Properties:
      DatabaseName: !Ref GlueDatabase01
      CatalogId: !Ref AWS::AccountId
      TableInput:
        TableType: EXTERNAL_TABLE
        Name: vpc_flow_logs
        PartitionKeys:
          - Name: account
            Type: string
          - Name: region
            Type: string
          - Name: timestamp
            Type: string
        Parameters:
          projection.enabled: true
          projection.account.type: integer
          projection.account.digit: 12
          projection.account.range: 000000000000,999999999999
          projection.timestamp.type: date
          projection.timestamp.range: NOW-7YEARS,NOW
          projection.timestamp.format: yyyy/MM/dd
          projection.timestamp.interval: 1
          projection.timestamp.interval.unit: DAYS
          projection.region.type: enum
          projection.region.values: us-east-1,us-east-2,us-west-1,us-west-2,af-south-1,ap-east-1,ap-south-1,ap-northeast-2,ap-southeast-1,ap-southeast-2,ap-northeast-1,ca-central-1,eu-central-1,eu-west-1,eu-west-2,eu-south-1,eu-west-3,eu-north-1,me-south-1,sa-east-1
          EXTERNAL: TRUE
          storage.location.template: !Sub s3://sample-audit-bucket-${AWS::AccountId}-${AWS::Region}/AWSLogs/${!account}/vpcflowlogs/${!region}/${!timestamp}
        StorageDescriptor:
          Location: !Sub s3://sample-audit-bucket-${AWS::AccountId}-${AWS::Region}/AWSLogs/
          InputFormat: org.apache.hadoop.mapred.TextInputFormat
          OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
          SerdeInfo:
            SerializationLibrary: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
            Parameters:
              field.delim: ' '
              skip.header.line.count: 1
          Parameters:
            classification: textfile
            skip.header.line.count: 1
          Columns:
            - Name: version
              Type: int
            - Name: account_id
              Type: string
            - Name: interface_id
              Type: string
            - Name: srcaddr
              Type: string
            - Name: dstaddr
              Type: string
            - Name: srcport
              Type: int
            - Name: dstport
              Type: int
            - Name: protocol
              Type: bigint
            - Name: packets
              Type: bigint
            - Name: bytes
              Type: bigint
            - Name: start
              Type: bigint
            - Name: end
              Type: bigint
            - Name: action
              Type: string
            - Name: vpc_id
              Type: string
            - Name: subnet_id
              Type: string
            - Name: instance_id
              Type: string
            - Name: tcp_flags
              Type: int
            - Name: type
              Type: string
            - Name: pkt_srcaddr
              Type: string
            - Name: pkt_dstaddr
              Type: string
            - Name: az_id
              Type: string
            - Name: sublocation_type
              Type: string
            - Name: sublocation_id
              Type: string
            - Name: pkt_src_aws_service
              Type: string
            - Name: pkt_dst_aws_service
              Type: string
            - Name: flow_direction
              Type: string
            - Name: traffic_path
              Type: int

  # reference: https://repost.aws/knowledge-center/analyze-logs-athena
  ServerAccessLogTable01:
    Type: AWS::Glue::Table
    Properties:
      DatabaseName: !Ref GlueDatabase01
      CatalogId: !Ref AWS::AccountId
      TableInput:
        Name: server_access_log
        TableType: EXTERNAL_TABLE
        PartitionKeys:
          - Name: account
            Type: string
          - Name: region
            Type: string
          - Name: timestamp
            Type: string
          - Name: source_bucket_name
            Type: string
        Parameters:
          projection.enabled: true
          projection.account.type: integer
          projection.account.digit: 12
          projection.account.range: 000000000000,999999999999
          projection.timestamp.format: yyyy/MM/dd
          projection.timestamp.interval: 1
          projection.timestamp.interval.unit: DAYS
          projection.timestamp.range: NOW-7YEARS,NOW
          projection.timestamp.type: date
          projection.region.type: enum
          projection.region.values: us-east-1,us-east-2,us-west-1,us-west-2,af-south-1,ap-east-1,ap-south-1,ap-northeast-2,ap-southeast-1,ap-southeast-2,ap-northeast-1,ca-central-1,eu-central-1,eu-west-1,eu-west-2,eu-south-1,eu-west-3,eu-north-1,me-south-1,sa-east-1
          projection.source_bucket_name.type: injected
          storage.location.template: !Sub s3://sample-server-access-log-bucket-${AWS::AccountId}-${AWS::Region}/${!account}/${!region}/${!source_bucket_name}/${!timestamp}
        StorageDescriptor:
          Location: !Sub s3://sample-server-access-log-bucket-${AWS::AccountId}-${AWS::Region}/
          InputFormat: org.apache.hadoop.mapred.TextInputFormat
          OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
          SerdeInfo:
            SerializationLibrary: org.apache.hadoop.hive.serde2.RegexSerDe
            Parameters:
              input.regex: '([^ ]*) ([^ ]*) \[(.*?)\] ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\"|-) (-|[0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\"|-) ([^ ]*)(?: ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*))?.*$'
          Columns:
            - Name: bucketowner
              Type: string
            - Name: bucket_name
              Type: string
            - Name: requestdatetime
              Type: string
            - Name: remoteip
              Type: string
            - Name: requester
              Type: string
            - Name: requestid
              Type: string
            - Name: operation
              Type: string
            - Name: key
              Type: string
            - Name: request_uri
              Type: string
            - Name: httpstatus
              Type: string
            - Name: errorcode
              Type: string
            - Name: bytessent
              Type: bigint
            - Name: objectsize
              Type: bigint
            - Name: totaltime
              Type: string
            - Name: turnaroundtime
              Type: string
            - Name: referrer
              Type: string
            - Name: useragent
              Type: string
            - Name: versionid
              Type: string
            - Name: hostid
              Type: string
            - Name: sigv
              Type: string
            - Name: ciphersuite
              Type: string
            - Name: authtype
              Type: string
            - Name: endpoint
              Type: string
            - Name: tlsversion
              Type: string
            - Name: accesspointarn
              Type: string
            - Name: aclrequired
              Type: string



©2025 ContAID