Multiple files can be loaded into database with Python and SQL
It’s been said that doing the same job repeatedly, will be boring and there won’t be any excitement to work on it. Hence, loading files into database doesn’t not create any passion to work continuously.
Core BI and data engineering teams need to load files from operational databases into data warehouses. This can be done by developing pipelines that process and transform data.
In order to coordinate with the other teams they need to have multiple file types, data spec versions, and data providers. Hence, which requires a different pipeline just to load into raw database.
To manage these teams they need to maintain 10 , 20 or even 50 smaller pipelines that are all subject to the change of the data providers. New data provider requires an entirely new pipeline to be developed. Managing infrastructure around each pipeline in order to log and track data pipelines statuses. This can quickly become very difficult to manage and tedious to build.
Step : 2
Other than this solution we can also attempt to manage multiple pipelines to develop a central system that manages the various insert statements required to insert data into each different raw table.
In this system it essentially looks up system that matches flat files to their insert queries. This reduces the need to constantly have separate pipelines that would also need an insert statement.
Hence the system needs only one main system to load all the various tables. With this,
1. you can also put version numbers on each of the files that come in.
2. you can set up by tracking each version in the metadata table that will then allow you to tie a raw file to an insert file version
We can observe that,
1. Loading multiple file types by individual pipelines is an exhausting work.
2. It may be through SSIS or another system, the constant development quickly weighs heavily on data engineers.
3. It will stuck maintaining and developing operational pipelines