ETL – Data Wharehouse

Above is the common elements that make up a Data Warehouse. The data warehouse consists of several different elements, the source can come from legacy systems that are usually an operational system used by the corporation or external data sources, the data staging area where it is processed and moved to the presentation server where the data is organized and stored for future queries and reports. The last step is the end user data access point, currently one of the most popular forms to access data is through a web page.

The first step is extraction and involves obtaining the data from the source systems.

Next it is transforming the data using a series of steps to make the data usable for the data warehouse. The steps consists of cleaning the data to improve data quality and consistency, purging of any unnecessary data that is not required by the data warehouse, combining of similar data from different source systems and creating surrogate keys.

After the data is extracted, transformed and cleansed, it is ready to be loaded and indexed into the warehouse for fast querying.

Other steps include running a series of reports to ensure that the data stays consistent and that the quality stays intact. The data is released to users for generating reports and dashboards and is secured against unauthorized user access.

Main steps in the ETL cycle are:

  • Initiating the Cycle
  • Building of reference data
  • Extraction
  • Validation
  • Transforming
  • Staging (if staging tables are necessary)
  • Audit reports to verify that all business rules are kept
  • Publishing to warehouse tables
  • Archiving
  • Clean up.

rsnapshot: Auto Version Controlled Backup for Unix/Linux/Mac/BSD…

Red Had Enterprise Linux (RHEL) 6. I missed the ease of configuration and all the free tools that people smarter than me have created.

[fusion_builder_container hundred_percent=”yes” overflow=”visible”][fusion_builder_row][fusion_builder_column type=”1_1″ background_position=”left top” background_color=”” border_size=”” border_color=”” border_style=”solid” spacing=”yes” background_image=”” background_repeat=”no-repeat” padding=”” margin_top=”0px” margin_bottom=”0px” class=”” id=”” animation_type=”” animation_speed=”0.3″ animation_direction=”left” hide_on_mobile=”no” center_content=”no” min_height=”none”]

Systems that can take advantage of Rsnapshot

Systems that can take advantage of Rsnapshot

I would like to do a fast post on rsnapshot. I have seen ssh and rsnapshot scheduled in cron to automate backups of OSX to a Linux server. Since we didn’t want the wireless to slow down we only used the physical MAC address of the MAC. What makes rsnapshot so great is that it will wok on so many systems that are out there (Ubuntu, Debian GNU/Linux, Red Hat Linux, Fedora Linux, SuSE Linux, Gentoo Linux, Slackware Linux, FreeBSD, OpenBSD, NetBSD, Solaris, Mac OS X, and even IRIX) .

For now I’m using it for personal automated backups to my external hard drive. There are plenty of other advanced options and examples on the Internet. I just wanted to get out a fast an easy example.

  1. First – find and install rsnapshot. for Red Hat this was
    $ sudo yum install rsnapshot
    (rsynch is a dependancy that should already be installed).
  2. After install if you do not have this file /etc/rsnapshot.conf. Use the command:
    $ sudo cp /etc/rsnapshot.conf.default /etc/rsnapshot.conf
  3. Edit rsnapshot.conf – The defaults I changed from the default configuration file are below. These options allow me to back up everything in /etc/ and /home/. Backups kept will be twice a day, 7 days a week, 4 weeks, 12 months and 5 years (change this as you see fit).  Most important is that switch to make sure that the mount point will not be created and wrote to locally if the disk is not attached.
    1. WHERE TO PLACE BACKUPS
      # All snapshots will be stored under this root directory.
      #
      snapshot_root   /media/myexternal/rsnapshot/
    2. DO NOT CREATE IF DISK IS NOT CREATED
      # If no_create_root is enabled, rsnapshot will not automatically create the
      # snapshot_root directory. This is particularly useful if you are backing
      # up to removable media, such as a FireWire or USB drive.
      #
      no_create_root 1
    3. INTERVALS (make sure this is tabbed – do NOT use spaces)
      #########################################
      #           BACKUP INTERVALS            #
      # Must be unique and in ascending order #
      # i.e. hourly, daily, weekly, etc.      #
      #########################################
      interval        hourly  12
      interval        daily   7
      interval        weekly  4
      interval        monthly 12
      interval        yearly  5
       
  4. Time to configure cron. Most people will tell you to create your jobs using $ crontab e
    I prefer to use the root crontab using $ sudo vim /etc/crontab shown below:

    0 */12 * * * root /usr/bin/rsnapshot hourly # Every 12 hours
    30 23 * * * toot /usr/bin/rsnapshot daily   # Daily at 11:30PM
    20 2 * * 0 root /usr/bin/rsnapshot weekly   # Sunday at 2:20AM
    10 5 1 * * root /usr/bin/rsnapshot monthly  # First day of the month at 5:10AM
    01 8 1 1 * root /usr/bin/rsnapshot yearly   # January 1st at 8:01AM
     
  5. Test It – Following these steps you should have the basic setup needed to run rsnapshot on your personal computer to an external hard drive or usb. Just one last thing to do. Make sure that your hard drive is plugged in and  run:
    $ sudo rsnapshot -V hourly
    rsnapshot should give you plenty of verbose information as it creates your first hourly backup inside the location you specified. If there is a issue with the lock file, remove the lock file and try again.

Still stuck?

There are many other helpful documents out there  start with the rsnapshot how to:
http://www.rsnapshot.org/howto/1.2/rsnapshot-HOWTO.en.html#installation

If you want to learn how to do remote backup and use OSX? try this article:
http://blog.philippmetzler.com/?p=138

As I said in the beginning of this article, this was a fastpost and not meant to cove everything about rsnapshot. It took longer to write this article than it did to set up rsnapshot.
Good Luck – Adam M. Erickson

 

 [/fusion_builder_column][/fusion_builder_row][/fusion_builder_container]

Linux: Compiz Grid Plugin

After working on a few projects in windows, I finally moved my primary workstation back to Linux (RHEL 6.6).

Out of the box everything worked fine, only one thing was missing, the ability to place a window were I needed it. In Windows 7  you can drag a full screen window to the left or right side of the screen and Win-7 will auto-re-size to that side but only half of the screen. Moving the window to the top of the screen makes the window full screen again.

Sadly, this option is not an available default in Linux. You can drag a window to a different desktop, you can double click the header of a window to have it maximize to full screen or back to default size. I wanted some sort of ability that Win-7 had.

SOLUTION – Compiz

Compiz Grid gives you 9 options to place your window using your numeric key pad and unlike Win-7, this will work on multiple monitor setups for each monitor. That means you can have 18 small windows quickly placed in dual monitor mode! Windows only allows the far left side of the left monitor and the far right side of the right monitor.

You cannot move the window with your mouse because that would kill the ability to drag a window to another workspace (don’t worry, Linux has plenty or work-a-rounds but I’m happy with this solution). If you already have compiz installed to do the “cool” cube thing that Windows users ask to see. Then you are already ahead of the game and depending on the version of Gnome you have, this should work flawlessly. I am only going to go into details for how to do this for RHEL 6.6 with Gnome 2.28.2 but with a few pointers from your favorite LUG (Linux User Group), you should be able to do this from the command line interface (CLI) with no problem using zypper, apt-get, yum, yast or even using a GUI software installer.

There is 2 additional packages for  that need to be installed in order for this to work.
You may have to enable 3rd part packages in order to get these.

$sudo yum install :
  1. compiz-fusion.x86_64 0.8.2-8.el6
  2. compiz-fusion-extras.x86_64 0.8.2-6.el6

These are the 64 bit versions “x86” with out the “_64” is for 32 bit systems.

Once packages are installed:Grid for CCSM

  1.  open the Compiz Configuration Settings Manager (CCSM)
  2. Scroll to the bottom
  3. Enable Grid as seen in this picture

Now give any window focus and hit these combination of keys.

The numbers must be from the numeric keypad!

  •  Ctrl+Alt+7 (window goes to top left corner)
  •  Ctrl+Alt+8 (window goes to fills top or 1/3rd center)
  •  Ctrl+Alt+9 (window goes to files top right corner)
  •  Ctrl+Alt+4 (window goes to fills left side)
  • Ctrl+Alt+5 (maximize window)
  • Ctrl+Alt+6 (window goes fills right)
  •  Ctrl+Alt+1 (window goes to bottom left corner)
  • Ctrl+Alt+2 (window goes to fills bottom or 1/3rd center)
  •  Ctrl+Alt+3 (window goes to bottom right corner)

Press the same keys more than once in a row on the same window to get different sizing options. Basically the same directions in a game that you might be playing using the numeric keypad and it does not matter if number locks is on or not.

Thank you and good luck – Adam M. Erickson

WordPress: Upload Blocked File Extensions Without a Plugin

This is not for a novice, you should have a little experience in PHP and know what the .htaccess file does. If you are new and want to edit these files make sure to create a backup file in case you bring down your WordPress site.

It took me an hour to figure this out, I do not like to overload WordPress with plugins for things I can do myself.

To many plugins can slow your system, open security holes and add external links to places you do not want your visitors to visit. Hopefully you found my post before wasting an hour of your time to come to the same solution.

I recently wanted to share additional files beside the basic extensions that WordPress allows you to use, i.e.  doc, jpg, mpg… I tried editing my .htacess file a few time with no results. Then I found a post that referenced the “wp-includes/funtions.php”. From the functions.php I made 2 changes in the same file.

  1. ,  'vsd' ),WordPress Funtions.php file

 

  1.            'vsd' => 'application/msvisio',functions.php

That is all I had to do to upload visio files with the “vsd” extension.

If it does not work for you, try adding access through the root .htaccess file as well.

# additional mimetypes
    AddType application/visio .vsd
    AddType application/x-visio .vsd
    AddType application/vnd.visio .vsd
    AddType application/visio.drawing .vsd
    AddType application/vsd .vsd
    AddType application/x-vsd .vsd
    AddType image/x-vsd .vsd

Good luck,

Adam M. Erickson

Elements Used in a Data Warehouse

Many Business Intelligence solutions are based on the use of a data warehouse. Here is a view of the components of a data warehouse both logically and physically.  Data Warehouse Elements

The data warehouse consists of several different elements:

  • The source can come from legacy systems that are usually an operational system used by the corporation or external data sources,
  • The data staging area is where data is processed (normalized and some history is stored) and moved to the presentation server.
  • The presentation server takes the data, organizes it and stores is for future quires and reports
  • The last step is the end user data access point, currently one of the most popular forms to access data is through a web page and mobile applications.

You can download the visio drawing I created here –>data-warehouse-elements.vsd

Linux: Grep Script to Search Multiple Directories

Audience that may find this useful will mostly be new Linux administrators.ascii art for Grep

This is a Command Line Interface (CLI) Shell script that once set up using the directories you want to search.  All you will have to do is enter the word or phrase that you are trying to search and it will search across the directories (if you use *) and files for the key word/phrase.

In the past, I have used it to search dozens of logs in production at the same time for errors.

#!/bin/bash### First add the directories you want to search
locationAlpha=”/LOGS/Websphere-01/General.log”
locationBravo=”/LOGS/Websphere-02/General.log”
locationCharlie=”/LOGS/Websphere-03/General.log”
locationDelta=”/LOGS/Websphere-04/General.log”
locationEcho=”/LOGS/Websphere-05/General.log”
locationFoxtrot=”/LOGS/Websphere-06General.log”
locationGulf=”/LOGS/Websphere-07/General.log”### Next line will prompt for the word or phrase you want to search
read -p “Enter the word you want to find: ” word
 
## returns word you entered
echo “Searching for ‘$word'”
 
##Starts searching and returning hits
grep -iH “$word” $locationAlpha $locationBravo $locationCharlie $locationDelta $locationEcho $locationFoxtrot $locationGulf

–>                                               View file here without comments–> countProd.txt

Initially called countProd.sh because I was using it to count the number of certain errors. If you would like to only count the number of error. Change “grep -iH” to “grep -ic” (c for count, H to return filename for each match, and i to ignore case) . 

That’s it, short and simple.

Expect – For more remote work, automation, & CI

If you ever have a situation were you want to run scripts automatically this will help.
You can even time the events through cron or CI (continual integration) software like Hudson/Jenkins.
Let me introduce you to a new command I was not aware of until this year.

It is called Expect.expect

Expect can be installed like any other package in a Linux environment.

  • $ sudo yum install expect
  • $ sudo apt-get install expect

Here is an example of what created inside Jenkins to run after a build as an shell script.

Instead of #!/usr/bin/bash

Use:hudson

#!/usr/bin/expect
set prompt "$ "
spawn ssh -t user@remote.server
set timeout 30
expect {
timeout {
puts "Connection timed out"
exit 1
}
"yes/no" {
send "yes\r"
exp_continue
}
"assword:" {
send -- "job00ibm\r"
exp_continue
}
"$prompt" {
send "cd ~/test/\r"
}
}

Now, as long as you have Internet, the correct username, remote server, and password. You should of just created a script the will SSH into a remote server and accept the RSA fingerprint. Or you just created a script that tells you the connection timed out.
Don’t alter this part of the script much because you want to be able to re-use and if there is not a yes/no at the prompt it will continue to expect assword:.
You can get change the last command to anything you want. I added that part because you remotes into a server for no reason?

Add this to the script and it will send the command to find anything with a certain name in it to be removed from the current directory.
expect {
"$prompt" {
send "find . -name '*filename*' -type f -print0 |xargs -0 rm -f\r"
}
}

Add this to the script to find files of a certain name and copy to a new location. You should be able to use modified times in the find switch if that helps.
expect {
"$prompt" {
send "find . -name 'coke*' -type f | xargs -n1 -i cp {} /data02/home/ibmcorp/12180-us-mcrtest1/upload/\r"
}
}

Add this to the script to run a script in the current directory.
expect {
"$prompt" {
send ". name.sh\r"
}

Does your script prompt for anything? Expect can enter the reply for you.
}
expect {
"the expected promt" {
send -- "the reply\r"
}
}

Some things need to e escaped like these ampersands. It is not way to find out what all needs to be escaped but you can figure that through trial and error, intuition, or because of how smart you are.
expect {
":" {
send "me\@a-erickson.com\r"
}
}

That’s my real address if you want to reach me.

All done running your script? Don’t forget to exit your connection.
expect {
"$ " {
send "exit\r"
}
}remote

  Have fun work remotely 🙂

Force Jenkins to connect to remote sites.

Problem:

  1. You created an awesome script on Linux that builds your application that will automatically upload to another server via sftp or ssh.
  2. You want to go a step further and have Jenkins run the same script.
  3. Every time Jenkins runs the script it fails.
  4. You Curse Jenkins/Hudson and start pricing Atlassian tools instead.

One possible reasons could be because you have already accepted the RSA fingerprint.

The authenticity of host ‘216.64.211.4 (216.64.211.4)’ can’t be established.CLI Root
RSA key fingerprint is 2c:78:0d:08:b7:06:ac:1a:be:5f:9a:4e:37:06:8f:53.
Are you sure you want to continue connecting (yes/no)?

You cannot get through to a remote sight for upload using Jenkins because even with Jenkins using your credentials, Jenkins still needs to accept the RSA Fingerprint to proceed.

Answer:

You need to have sudo rights
There is a user switch in sudo that will allow you to ssh as jenkins.

$ sudo  -u jenkins username@server

Manually accept the RSA fingerprint key.
Now test the script in Jenkins again. It should work unless there are other rights and directory paths that you need to work out.

JenkinsRemember:

Jenkins is a user with his own user directory. My Jenkin’s user directory was /var/lib/jenkins/
If you want to have Jenkins run a script, make sure it is in Jenkins directory.

Don’t name any of your Jenkin Items with spaces.
The Items you create are made into a directory inside Jenkin’s ~/workspaces/item-name

I am not a Linux expert, I am not certified in Linux. I have been using Linux systems for over 10 years now and love coming up with solutions using Open Source Software.

Cyber Crime: Risk Assessment

Cyber Crime: A Clear and Present Danger

This is the excel file used for this post –>   Risk-Registry.xlsx

Information systems give great ways to communicate and learn, but also allow nefarious others access to exploit the power of the Internet for terrorist and/or criminal purposes. Criminal warfare has moved to the virtual world were more damage can be done in less time with a better potential of getting away with it. The term used for criminals that use the Internet as there method for committing crimes is know as cyber criminals. Moor’s Law describes how the number of transistors that can be placed on inexpensive integrated circuitry double every two years. Since a new generation of faster computers can be processed every two years, it also allows for criminals to afford faster, more complex computers to leverage against your organization

Analyzing and avoiding risk should be a part of any organization who, especially if transactions are made on the Net. Include\ing: email, web browsing, online stores, etc.

The rest of this article is an example of what may be used to start your own risk register.

Qualitative Risk Assessment

Threats that have been identified through the risk it holds to the organizations have been qualified in the attached risk register spreadsheet. Download the actual Excel File –>   Risk-Registry.xlsx

Risk RegisterWhile all risk poses a level of threat there are certain ones that can be identified as low medium or high risk based on the severity of risk against cost and time loss to our origination. Damage to reputation from faulty security is actually a side effect of not being able to protect our customer’s assets and personal information. Side by side comparison of the risk register with the risk matrix will give a better forecast of risk as it impacts our organization.

Quantitative Risk Matrix

Qualitative

On average credit scam counts for $260 dollars per customer per year and we can mitigate that price through the use of heuristic programs that can detect unusual purchases as they happen instead of reading reports after the fact.

While the risk of finances from cyber attacks can be measured after the fact the most damaging effects of cyber attacks is the lack of trust from our customers. Irreversible damage could be imposed upon are organization as customer retreat to companies that they believe can manage their finances better. The contingency budget set forth below would save us from going completely under in the future. The quantity of the budget needed is minimally estimated to be around 1% of what we risk loosing. We must be ready to budget at least 30% of our organizations earned income after taxes and deductions.

Contingency Budget

Contingency BudgetColors reflect the same qualitative selection as the risk Matrix and should be considered top priority when considering budgeting amounts. The contingency cost has been lowered based on the probability of the risks occurrence and is 65% of the total budget needed to address all risks.

 

Recommendations

When money is used it is gone, data is more valuable than money. Our data and systems are worth more than can be quantified in any report. Data can be reused over and over again. In order to protect our data we need to spend our budget targeting mitigation of the highest ranked risks. Targeting unknown risks through monitoring assets as they are accessed and having a fast corrective action time to save the organization from unknown intrusions. All internal and external metrical information that can help target cyber activities should be used to the up-most efficiently to be as effective as possible. Continually vigilance in our security reaction time, as we move towards future technologies to communicate and process information on the Internet, is imperative to our survival.

Additional Resources

Martin H. Bosworth. (2008). Losses From Cybercrime Nearly $240 Million in 2007. Consumer Affairs. Retrieved from: http://www.consumeraffairs.com/news04/2008/04/cybercrime.html

Tom Mochal. (2006, May). Creating a risk contingency budget using expected monetary value (EMV). TechRebulic. Retrieved July 24, 2011, from http://www.techrepublic.com/article/create-a-risk-contingency-budget-using-expected-monetary-value-emv/6069576

United States Department of Justice. Computer Crime & Intellectual Property Section. Cyberethics. Retrieved from: http://www.cybercrime.gov/cyberethics.htm

Qualitative risk assessment of RFID

Below is a short qualitative risk assessment of RFID (Radio Frequency Identification) I have pieced together.  In order to highlight strengths, weaknesses and benefits.

May it help you come to a conclusion on how would you deploy this technology while minimizing its risks?

Category Risk Impact Probability Affects
Customer QoS for customers declines because decrease in staff Medium Low Customers, Company rep.
Customer Customers may be dissatisfied with change Medium Medium Company rep, customers
Fraud Scale to RFID Fraud Low High inventory
Fraud Hackers making there own RFID tags High Low Revenue, company rep, emp trust
Inventory Management Mis-tagged or no RFID tag High Medium Inventory, revenue
Employees Adaptability Very High High Employee trust
Employees Lose or trust Medium Medium Employee loyalty
Employees strike Low Low Company rep, employee loyalty, revenue
Employees Productivity to meet new expectations Medium Low Inventory, company rep, customer loyalty
Pricing Lack or pricing from no supervision of ID creator Low Medium Revenue, company rep.
Pricing Inconsistencies Medium Medium revenue
Pricing Scanners not scanning all items Low Low Inventory, revenue
Inventory Management Mis-counts or stock and poor inventory control Medium Low Inventory control
Technical Issues RFID not functioning Very High Medium Data, company rep, employee trust, customer trust.
Technical Issues RFID limitations Medium Low data
Technical Issues Lack of knowledge for RFID or when system goes down High Medium Customer loyalty, company rep.
Technical Issues Designing Standards and processes Medium Medium Employee trust
Inventory Management RFID supplies not delivered on time Low High Inventory, customers, inventory
Inventory Management Lack of knowledge for RFID inventory processing Medium Low Employee, company rep.

 

Srength:

  • Become a leader through technology
  • Improve process flow
  • Customer satisfaction by adding value to customer services

Weaknesses:

  • Lack of technical support knowledge
  • RFID tagging on devices and produce
  • Major inventory fluctuations
  • Internal fraud
  • Damage to reputation of RFID is not effective

Benefits

  • Better real time data and additional data to make assumptions by
  • Possible opening to new market segments

Cost would be on schedule and worth the cost if all negative risks had a contingency plan. Costs can be fixed in the system and validated during pilot program to ensure cost consistency.

 

Reference:

Amber Russell, Curt Ireton, Damon Mulligan, Jan Bondoc, Tyler Rudolph. (November 2009). Risk Management Plan. RFID Implementation for Fresh Foods. Retrieved from http:// www.curtireton.com/Assets/Fresh_Foods_Risk.pdf