Skip to main content
This section describes the AWS service information required for your dedicated instance to reach AWS services in your VPC. The sections below cover AWS managed services, AWS Managed Services with native PrivateLink, and Customer Managed Services (AWS) or AWS Services with NLB.

AWS managed services

This section covers AWS-managed services that Unstructured can access using AWS-native private networking features, without requiring you to create a customer-managed endpoint service or Network Load Balancer.
The Order column indicates the general sequence for the information exchange. Items with the same order value can usually be provided at the same stage.

AWS S3 (gateway endpoint)

OrderInformation RequiredDescriptionExampleOwner
1S3 Bucket NameBuckets Unstructured needs to accessmy-documentsCustomer
1S3 Bucket RegionRegion where bucket is locatedus-east-1Customer
2Unstructured IAM Role ARNIAM Role ARN that will access S3arn:aws:iam::987654321098:role/unstructured-s3-accessUnstructured
This section also covers Delta Tables in Amazon S3 — the S3 Gateway Endpoint configuration is the same. Example S3 Bucket Policy You must create a bucket policy that grants Unstructured’s IAM Role access to the required S3 buckets. For read-only access:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowUnstructuredAccess",
      "Effect": "Allow",
      "Principal": {
        "AWS": "<UNSTRUCTURED_IAM_ROLE_ARN>"
      },
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::<BUCKET_NAME>",
        "arn:aws:s3:::<BUCKET_NAME>/*"
      ]
    }
  ]
}
Use this Action clause for write access (e.g., if S3 is a destination):
{
  "Action": [
    "s3:GetObject",
    "s3:PutObject",
    "s3:DeleteObject",
    "s3:ListBucket"
  ]
}
Replace:
  • <UNSTRUCTURED_IAM_ROLE_ARN> — Unstructured’s IAM Role ARN (provided during setup).
  • <BUCKET_NAME> — Your S3 bucket name.

AWS Bedrock

Amazon Bedrock is accessed via AWS-provided VPC endpoints. Unstructured configures VPC endpoints in our VPC to ensure all traffic to Bedrock stays off the public internet. Access to customer-specific Bedrock resources is controlled via IAM policies.
OrderInformation RequiredDescriptionExampleOwner
1Bedrock RegionAWS region where Bedrock resources are locatedus-east-1Customer
1Model IDs / ARNsFoundation models or custom models to accessanthropic.claude-sonnet-4-5, arn:aws:bedrock:us-east-1:123456789012:custom-model/my-modelCustomer
2Unstructured AWS Account IDAccount ID to allow in IAM/resource policies987654321098Unstructured
2Unstructured IAM Role ARNIAM Role ARN that will access Bedrockarn:aws:iam::987654321098:role/unstructured-bedrockUnstructured
Unstructured configures the Bedrock VPC endpoint on the Unstructured platform. You must create IAM policies that grant access to Unstructured’s IAM Role. Example IAM Policy
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowBedrockModelInvocation",
      "Effect": "Allow",
      "Principal": {
        "AWS": "<UNSTRUCTURED_IAM_ROLE_ARN>"
      },
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": [
        "arn:aws:bedrock:<REGION>::foundation-model/anthropic.claude-sonnet-4-5-*",
        "arn:aws:bedrock:<REGION>::foundation-model/anthropic.claude-opus-4-5-*",
        "arn:aws:bedrock:<REGION>:<CUSTOMER_ACC_NO>:custom-model/*"
      ]
    }
  ]
}
Replace:
  • <UNSTRUCTURED_IAM_ROLE_ARN> — Unstructured’s AWS IAM Role ARN (provided during setup).
  • <CUSTOMER_ACC_NO> — Your AWS Account ID.
  • <REGION> — Your Bedrock region.

Amazon Managed Streaming for Apache Kafka (MSK)

Amazon MSK supports native multi-VPC private connectivity via PrivateLink. This enables Unstructured to connect to the customer’s MSK cluster (as a Kafka source) entirely within the AWS private network. You must have an MSK cluster with Multi-VPC Connectivity enabled. MSK Multi-VPC Connectivity requires MSK cluster type provisioned (not serverless). The cluster must use TLS or SASL/TLS authentication.
OrderInformation RequiredDescriptionExampleOwner
1MSK Cluster ARNARN of the MSK clusterarn:aws:kafka:us-east-1:123456789012:cluster/my-cluster/abc-123Customer
1MSK Cluster RegionAWS region where cluster is deployedus-east-1Customer
1Kafka PortPort the brokers listen on9094 (TLS) or 9096 (SASL/TLS)Customer
1Topic Name(s)Kafka topics Unstructured needs to readdocuments-raw, documents-processedCustomer
2Unstructured AWS Account IDAccount ID to add as allowed principal987654321098Unstructured
3VPC Endpoint Service NameService name created when Multi-VPC Connectivity is enabledcom.amazonaws.vpce.us-east-1.vpce-svc-0abc123Customer
3Bootstrap Broker EndpointsPrivate broker DNS names for the clusterb-1.mycluster.abc123.kafka.us-east-1.amazonaws.com:9094Customer
Enabling MSK Multi-VPC Connectivity Use the AWS Console:
  1. Navigate to Amazon MSK > select your cluster.
  2. Choose Actions > Edit cluster connectivity.
  3. Enable Multi-VPC connectivity.
  4. Confirm — MSK will create a VPC Endpoint Service automatically.
Use the AWS CLI:
aws kafka update-connectivity \
  --cluster-arn "<MSK_CLUSTER_ARN>" \
  --connectivity-info '{
    "VpcConnectivity": {
      "ClientAuthentication": {
        "Tls": { "Enabled": true }
      }
    }
  }'
After enabling, retrieve the VPC Endpoint Service name:
aws kafka describe-cluster \
  --cluster-arn "<MSK_CLUSTER_ARN>" \
  --query 'ClusterInfo.BrokerNodeGroupInfo.ConnectivityInfo'
Adding Unstructured as an Allowed Principal Once Multi-VPC Connectivity is enabled, use the AWS CLI to add Unstructured’s AWS Account ID as an allowed principal on the endpoint service:
aws ec2 modify-vpc-endpoint-service-permissions \
  --service-id <MSK_ENDPOINT_SERVICE_ID> \
  --add-allowed-principals "arn:aws:iam::<UNSTRUCTURED_AWS_ACCOUNT_ID>:root"
Replace:
  • <MSK_ENDPOINT_SERVICE_ID> — The endpoint service ID created by MSK Multi-VPC Connectivity.
  • <UNSTRUCTURED_AWS_ACCOUNT_ID> — Unstructured’s AWS Account ID (provided during setup).

Amazon OpenSearch Service

Amazon OpenSearch Service supports native Interface VPC Endpoints. Unstructured creates a VPC endpoint in our VPC targeting the customer’s OpenSearch domain.
OrderInformation RequiredDescriptionExampleOwner
1OpenSearch Domain ARNARN of the OpenSearch domainarn:aws:es:us-east-1:123456789012:domain/my-domainCustomer
1OpenSearch Domain RegionAWS region where domain is deployedus-east-1Customer
1Service PortPort the service listens on443Customer
2Unstructured AWS Account IDAccount ID to add as allowed principal987654321098Unstructured
2Unstructured IAM Role ARNIAM Role that will access OpenSearcharn:aws:iam::987654321098:role/unstructured-opensearchUnstructured
3VPC Endpoint DNSThe endpoint DNS name for connectionvpc-my-domain-xyz.us-east-1.es.amazonaws.comCustomer
Example Domain Access Policy
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "<UNSTRUCTURED_IAM_ROLE_ARN>"
      },
      "Action": [
        "es:ESHttpGet",
        "es:ESHttpHead",
        "es:ESHttpPost",
        "es:ESHttpPut",
        "es:ESHttpDelete"
      ],
      "Resource": "arn:aws:es:<REGION>:<CUSTOMER_ACC_NO>:domain/<DOMAIN_NAME>/*"
    }
  ]
}
Replace:
  • <UNSTRUCTURED_IAM_ROLE_ARN> — Unstructured’s AWS Role ARN (provided during setup).
  • <CUSTOMER_ACC_NO> — Your AWS Account ID.
  • <REGION> — Your OpenSearch region.
  • <DOMAIN_NAME> — Your OpenSearch domain name.

AWS OpenSearch Serverless

OpenSearch Serverless uses a fundamentally different access model compared to OpenSearch Service. It does not use resource-based access policies. Instead, access is controlled through data access policies and network access policies tied to VPC endpoints.
OrderInformation RequiredDescriptionExampleOwner
1Collection NameName of the OpenSearch Serverless collectionmy-vector-storeCustomer
1Collection ARNFull ARN of the collectionarn:aws:aoss:us-east-1:123456789012:collection/abc123Customer
1Collection EndpointHTTPS endpoint of the collectionabc123.us-east-1.aoss.amazonaws.comCustomer
1Collection RegionAWS region where collection is deployedus-east-1Customer
2Unstructured AWS Account IDAccount ID to add to network access policy987654321098Unstructured
2Unstructured IAM Role ARNIAM Role ARN to grant data accessarn:aws:iam::987654321098:role/unstructured-aossUnstructured
3VPC Endpoint IDVPC endpoint ID created by Unstructured for aoss.amazonaws.comvpce-0abc123def456789Unstructured
Step 1: Create a Network Access Policy The network access policy must allow Unstructured’s VPC endpoint to access the collection. Create or update the network policy for your collection:
[
  {
    "Rules": [
      {
        "Resource": ["collection/my-vector-store"],
        "ResourceType": "collection"
      }
    ],
    "AllowFromPublic": false,
    "SourceVPCEs": ["<UNSTRUCTURED_VPC_ENDPOINT_ID>"]
  }
]
Using AWS CLI:
aws opensearchserverless update-security-policy \
  --name "my-network-policy" \
  --type network \
  --policy '[{"Rules":[{"Resource":["collection/my-vector-store"],"ResourceType":"collection"}],"AllowFromPublic":false,"SourceVPCEs":["<UNSTRUCTURED_VPC_ENDPOINT_ID>"]}]'
Step 2: Create a Data Access Policy The data access policy grants Unstructured’s IAM Role permissions to read/write the collection’s indexes. For a vector store destination (read/write):
[
  {
    "Rules": [
      {
        "Resource": ["index/my-vector-store/*"],
        "Permission": [
          "aoss:CreateIndex",
          "aoss:DeleteIndex",
          "aoss:UpdateIndex",
          "aoss:DescribeIndex",
          "aoss:ReadDocument",
          "aoss:WriteDocument"
        ],
        "ResourceType": "index"
      },
      {
        "Resource": ["collection/my-vector-store"],
        "Permission": ["aoss:DescribeCollectionItems"],
        "ResourceType": "collection"
      }
    ],
    "Principal": ["<UNSTRUCTURED_IAM_ROLE_ARN>"]
  }
]
Using AWS CLI:
aws opensearchserverless create-access-policy \
  --name "unstructured-access" \
  --type data \
  --policy '[{"Rules":[{"Resource":["index/my-vector-store/*"],"Permission":["aoss:CreateIndex","aoss:DeleteIndex","aoss:UpdateIndex","aoss:DescribeIndex","aoss:ReadDocument","aoss:WriteDocument"],"ResourceType":"index"},{"Resource":["collection/my-vector-store"],"Permission":["aoss:DescribeCollectionItems"],"ResourceType":"collection"}],"Principal":["<UNSTRUCTURED_IAM_ROLE_ARN>"]}]'
Replace:
  • <UNSTRUCTURED_VPC_ENDPOINT_ID> — VPC Endpoint ID provided by Unstructured (from Step 2 of the information exchange).
  • <UNSTRUCTURED_IAM_ROLE_ARN> — Unstructured’s IAM Role ARN (provided during setup).
  • my-vector-store — Your OpenSearch Serverless collection name.

AWS Databricks

Databricks on AWS supports native PrivateLink connectivity. The customer must have a Databricks Enterprise plan with a customer-managed VPC and PrivateLink enabled on their workspace.
OrderInformation RequiredDescriptionExampleOwner
1Databricks Workspace URLThe workspace URLmyworkspace.cloud.databricks.comCustomer
1Databricks Workspace RegionAWS region where workspace is deployedus-east-1Customer
1Private Access LevelWhether access is at ACCOUNT or ENDPOINT levelACCOUNT, ENDPOINTCustomer
2Unstructured VPC Endpoint IDVPC Endpoint ID to add to allowed list (if ENDPOINT level)vpce-0abc123def456789Unstructured
3Workspace VPC Endpoint DNSThe private endpoint DNS for the workspacemyworkspace.privatelink.cloud.databricks.comCustomer
Example: Databricks Private Access Settings (ENDPOINT level) If using ENDPOINT level access, add Unstructured’s VPC Endpoint ID to the allowed list via the Databricks Account Console or API:
{
  "private_access_settings_name": "unstructured-access",
  "region": "<REGION>",
  "public_access_enabled": false,
  "private_access_level": "ENDPOINT",
  "allowed_vpc_endpoint_ids": [
    "<UNSTRUCTURED_VPCE_ID>"
  ]
}
Replace:
  • <UNSTRUCTURED_VPCE_ID> — VPC Endpoint ID provided by Unstructured.
  • <REGION> — Your Databricks region.
For ACCOUNT level access, no explicit endpoint allowlisting is required — any VPC endpoint registered in the Databricks account can connect.
Some AWS managed services support native PrivateLink endpoints. You must first create a VPC endpoint for the service. Unstructured then connects to it. This pattern applies to services like Amazon ElastiCache (Redis) and AWS Elasticsearch Service (legacy).
The Order column indicates the general sequence for the information exchange. Items with the same order value can usually be provided at the same stage.
OrderInformation RequiredDescriptionExampleOwner
1Service TypeThe AWS Service being accessedElastiCache, ElasticsearchCustomer
1Service RegionRegion where the service is hostedus-east-1Customer
1Service PortPort the service listens on6379 (Redis), 443 (Elasticsearch)Customer
1Resource ARNARN of the resourcearn:aws:elasticache:us-east-1:123456789012:cluster/my-cacheCustomer
2Unstructured AWS Account IDAccount ID to add as allowed principal987654321098Unstructured
2Unstructured IAM Role ARNIAM Role that will access the servicearn:aws:iam::987654321098:role/unstructured-accessUnstructured
3VPC Endpoint IDThe service-managed VPC endpoint IDvpce-0abc123def456789Customer
3VPC Endpoint DNSThe endpoint DNS name for connectionvpce-0abc123.us-east-1.es.amazonaws.comCustomer

Customer Managed Services (AWS) or AWS Services with NLB

This information applies to:
  • Applications that your organization is self-hosting in your AWS VPC (e.g., Elasticsearch, MongoDB, Couchbase).
  • AWS services that do not have native PrivateLink support and require an NLB front-end, such as Amazon RDS, Aurora, Redshift, DocumentDB.
The Order column indicates the general sequence for the information exchange. Items with the same order value can usually be provided at the same stage.
OrderInformation RequiredDescriptionExampleOwner
1Service TypeThe service being accessedPostgreSQL (RDS), MongoDB, ElasticsearchCustomer
1Service RegionRegion where the service is hostedus-east-1Customer
1Service PortPort the service listens on5432 (PostgreSQL), 27017 (MongoDB), 9200 (Elasticsearch)Customer
2Unstructured AWS Account IDAccount ID to add as allowed principal987654321098Unstructured
2Unstructured IAM Role ARNIAM Role that will access the servicearn:aws:iam::987654321098:role/unstructured-accessUnstructured
3VPC Endpoint Service NameService name for the endpoint service fronting the NLBcom.amazonaws.vpce.us-east-1.vpce-svc-0abc123Customer
3Service EndpointThe endpoint URL for connectionCustom DNS or endpoint service DNSCustomer
You must create both of the following:
  • Network Load Balancer targeting their managed service
  • VPC Endpoint Service pointing to the NLB.
Example: Allow Unstructured as a Principal on the Endpoint Service Using AWS Console:
  1. Navigate to VPC > Endpoint Services.
  2. Select your endpoint service.
  3. Go to the “Allow principals” tab and click “Allow principals”.
  4. Add the Unstructured ARN captured during the setup process.
Using AWS CLI:
aws ec2 modify-vpc-endpoint-service-permissions \
  --service-id vpce-svc-0abc123def456789 \
  --add-allowed-principals "<UNSTRUCTURED_IAM_ROLE_ARN>"
Replace:
  • <UNSTRUCTURED_IAM_ROLE_ARN> — Unstructured’s AWS Role ARN (provided during setup).