I’d never heard of Airflow before. Something with workflows… How does it compare to Node Red?
It’s mostly used as a ETL system, from my experience, and would be more akin to systems like Luigi.
That’s right, Airflow solves the same problem that Luigi solves.
Thanks, looked a little enterprisy :P Now I’m off to look what Luigi is ;)
My team uses EFS to solve this same problem.
We used to utilize S3, but the eventual persistence got annoying. As well as the fact that people would just upload to it from their local machines, etc.
Our staging and production S3 buckets are not accessible from local machine, so all changes have to go through git commit and CI/CD.
What was the replication delay you experienced from S3? So far I have only experienced worst case delay of couple seconds, which is not causing any issue for us.
Based upon how our large organization worked we had to refresh our AWS Creds every hour, which resulted in us needing to run a sidecar for this refresh. Now that we just mount to EFS no cred management is needed. (This was the true root of our problem)
Interesting, wouldn’t use of IAM role automatically handle the cred rotation for you?
Our role sessions only last for an hour
This is a really interesting way to set up your Airflow DAGs. We broke ours out into a couple of Airflow instances due to the size of one. They also had a logical separation based on what they were processing though.
Are you pulling directly into the current DAGs or are you pulling into a separate dir that you cut over to? IIRC, you have to signal Airflow to reload the DAGs.
We pull directly into the current DAG directory. Airflow now automatically detects DAG change and reloads them on the fly, so we didn’t have to do anything special.