![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
A few months back, while sparring with the SAS zealots at the comp.soft-sys.sas , one of the SAS users said that he uses SAS ( the SAS datastep in particular ) for ETL all the time, instead of using SQL directly, or a GUI-based ETL tool. Does this make sense? |
#3
| |||
| |||
|
|
On Fri, 27 Apr 2007 16:43:47 -0700, irishhacker wrote: A few months back, while sparring with the SAS zealots at the comp.soft-sys.sas , one of the SAS users said that he uses SAS ( the SAS datastep in particular ) for ETL all the time, instead of using SQL directly, or a GUI-based ETL tool. Does this make sense? At a SAS course I attended, the instructor pointed out that nowadays, most use if SAS is _not_ really statistics-work, as one might expect. Rather, SAS is used for ETL-stuff, and reporting. Personally, I find that SAS may be a good ETL tool, albeit with some limitations: - SAS is expensive, and with no free test-only download option. So with SAS, you need to somehow know SAS already (perhaps through school) before you can know how it is to work with. - SAS is the ugliest and most inconsistent programming language I've ever seen. This may easily lead to "write-only" code, i.e. code which is very hard to understand for others than the one who wrote the code. About GUI/non-GUI: I found that GUI-based tools are often a pain with regards to traceability: They often produce "code" which is full of GUI-related nonsense which means that it's very hard to deduce what changed between different versions of the application. E.g., I've been recently been involved in an ETL-like task where Microsoft's SQL Server Integration Services (SSIS) tool was chosen. I'm never going to touch SSIS again for a number of reasons -- one being that SSIS's XML contained blobs of strange binary-like code; and even the smalles change in the SSIS package (like a changed file path) resulted in numerous strange and very hard-to- read code differences. In other words, some of the -- seemingly very nice -- GUI-based ETL tools may turn out to be a version control nightmare because _important_ changes are hidden in heaps of gibberish "code"- changes. In my opinion is that you may get very solid solutions, simply using SQL. Staging/temporary tables, views with INSTEAD-triggers, functions, and procedures will get you very far. The last pieces of glue needed (e.g. some functionality which downloads import-data via FTP) may easily be performed by scripts, such as perl/python/shell, or SAS (if you have relevant licensing); the script language should be chosen based on what's popular in your organization. |
![]() |
| Thread Tools | |
| Display Modes | |
| |