EC2 - Scraping behind VPN (but allow VPC access)
I had a website scraping project that scraped tens of thousands of websites every day. I was running this on an EC2 instance, but didn't want the EC2 instance's IP to be blocked by any of these websites.
One solution for scraping with (near) infinite IP's that I've come across is utilizing AWS API Gateway such as with requests-ip-rotator. But, with hundreds of thousands of requests a day, I was afraid the costs could add up so I used a VPN. The VPN I was using would rotate through it's own IP's pretty regularly (sorry, but I'm going to be fuzzy on the exact details of what I was using). It wasn't an insane number of IP's like API Gateway would have been, but enough that I could rotate as needed.
Launching the VPN (openvpn) led to me losing access to everything in the VPC! I could no longer access my RDS instance - the VPN was routing all my traffic - including what should have stayed within the VPC!
I still needed to access my RDS instance inside of the VPC, so I routed that traffic specifically through the appropriate gateway.
route add -host my-rds-instance.us-east-2.rds.amazonaws.com gw 172.31.16.1
I had service that launched the VPN, so I configured this to run immediately after the VPN started.