Monday, June 02, 2014

R Cookbook

The R Cookbook by Paul Teetor (O'Reilly Media) aims and succeeds in introducing programmers to the technical aspects of the R language. R is a programming language designed for Statistical work, therefore it might not share too much ground with other typical languages like Python or C. The R Cookbook gives a tour to programmers to this wonderful language, from the very screen output beginnings, through ploting, to the creation of statistical models and Time Series.

The author does a great job exposing the differences and similarities of R to other languages. For example, the author, explains how in R the single brackets applied to a vector (myvector[1]) mean "give me a vector of one item from myvector", he then explains that the correct way to get a single item is myvector[1]. Comparisons to other languages like SQL are dispersed through the chapters.

Mr. Teetor provides a throughout explanation of other aspects of R that might not be common in other computer languages. For example, in Chapter 5, he explains in detail and with clear examples what the Recycling Rule is. I doubt I would have been able to grasp such concepts so easily without an explanation like his.

This book however, is not a book about Statistics. The author goes ankle deep in Statistics Theory, just deep enough to have a general understanding of the examples. That is fine because he even states at the beginning of the book that this is not a book on Statistics. There are other books that deal with the theory, some of them are referenced in the book as a pointer to further reading. I recommend reading an introductory Statistical analysis book such as Data Science for Business before reading the cookbook.

This book fits very nicely as a step between an introductory book to Statistics and a book for Regression Modeling. I recommend this book for those programmers that have some basic notion of Statistical modeling and want to learn how to implement those concepts in R. This is exactly the place I was when I started reading the R Cookbook.


Thursday, May 29, 2014

List to ICollection AutoMapper and WebAPI problem

I was getting an exception from Automapper when it tried to map a List to an ICollection. The problem was that WebApi was building the MyClass from the xml sent by the client and instead of using a List for the ICollection property, it used a plain array. When Automapper tried to map the List to the Array, it failed because even though Arrays are ICollections, they do not support some of the ICollection operations, such as Add. 

To solve the problem, I used Json in the client side when posting. After that, webapi chose to use List instead of an Array when the data came as Json.

Cryptic message when serializing Computed Observable to JSON

If you have an error in one of your computed observable in Knockout.js, the toJson function will set the value of that property to a cryptic script snipped when serializing it to JSON. For example:
var SortField = function (expression) {
        this.FieldDef = ko.observable(null);
        this.SortOrder = ko.observable("");

        this.ExpressionType = "SortField";

        this.Label = ko.computed(function () {
            return this.FieldDef() == null? "" : this.FieldDef().Label + ' ' + this.SortOrder();
        }, this);

    }
var FieldDefinition = function (table_name, field_name, table_alias) {
        this.TableName = table_name;
        this.TableAlias = ko.observable(table_alias);
        this.FieldName = field_name;
        this.Label = ko.computed(function () {
            return this.TableAlias() + '⇨' + field_name;
        }, this);
In this example SortField’s Label function is not calling the Label function properly from the FieldDef property. If you serialize to JSON, you will get  the following:
function dependentObservable() {\n        if (arguments.length > 0) {\n
 if (typeof writeFunction === \"function\") {\n
     // Writing a value\n
     writeFunction.apply(evaluatorFunctionTarget, arguments);\n
 } else {\n
     throw new Error(\"Cannot write a value to a ko.computed unless you specify a 'write' option. If you wish to read the current value, don't pass any parameters.\");\n
 }\n
 return this; // Permits chained assignments\n        } else {\n
 // Reading the value\n
 if (!_hasBeenEvaluated)\n
     evaluateImmediate();\n
 ko.dependencyDetection.registerDependency(dependentObservable);\n
 return _latestValue;\n        }\n    } ASC"}],"ExpressionType":"Hierarchy","Label":"function dependentObservable() {\n        if (arguments.length > 0) {\n
 if (typeof writeFunction === \"function\") {\n
     // Writing a value\n
     writeFunction.apply(evaluatorFunctionTarget, arguments);\n
 } else {\n
     throw new Error(\"Cannot write a value to a ko.computed unless you specify a 'write' option. If you wish to read the current value, don't pass any parameters.\");\n
 }\n
 return this; // Permits chained assignments\n        } else {\n
 // Reading the value\n
 if (!_hasBeenEvaluated)\n
     evaluateImmediate();\n
 ko.dependencyDetection.registerDependency(dependentObservable);\n
 return _latestValue;\n        }\n    }
For the value of the Label property of the SortField.

The fix is to call the Label function as this.FieldDef().Label() inside SortField's Label.




Monday, January 06, 2014

Compile Postgres in Windows with GSSAPI

Configuration Phase
  1. Install the 64 bit versions of OpenSSL and MIT Kerberos
  2. Get the professional version of Visual Studio 2010 or 2012.
  3. Open the 64Bit of the Visual Studio Command Prompt (not the 32 bit version)
  4. cd to src\tools\msvc in the folder that contains Postgres’ source code
  5. Change the config_default.pl:
add these lines:

includes => 'C:\\Program Files\\MIT\\Kerberos\\include:C:\\OpenSSL-Win64\\include',
libraries => 'C:\\Program Files\\MIT\\Kerberos\\lib\\amd64:C:\\OpenSSL-Win64\\lib',
after 
“iconv   => undef,”
change the krb5 and ssl lines to:
krb5    => 'C:\\Program Files (x86)\\MIT\\Kerberos',    # --with-krb5=
openssl => 'C:\\OpenSSL-Win64',    # --with-ssl=

Build Phase
  1. in the vs command prompt, type Build
  2. When done, if no errors, then type Install C:\PathWhereYouWanToInstallPostgres
  3. Go to the folder where OpenSSL is installed
  4. Search in the folder and subfolders for libeay32 and ssleay.dll and copy those files
  5. Go to the folder where it Postgres was installed (C:\PathWhereYouWanToInstallPostgres) 
  6. Go to the BIN folder and paste libeay32 and ssleay.dll
  7. Start the regular command prompt in that folder and call pg_ctl -D PATH_TO_YOUR_DATA_FOLDER

Please refer to the Postgresql documentation on how to configure GSSAPI in an installed Postgresql service here.


Tuesday, November 19, 2013

How to use Windows SSO with OpenXava

One of the nice things about the .NET web environment is the dead easy way to implement Single Sign On in your web apps through Active Directory authentication. In the Java world there are multiple alternatives to use Windows’ Single Sign On with Java based web apps. One of those alternatives is Waffle. Waffle allows your Java web app to authenticate against Active Directory groups (and users). The only caveat is that your web server needs to be running in Windows, which kind of makes sense.


In this article, you will learn the steps required to have your OpenXava web application use Waffle to authenticate your Windows users.


The first step is to download Waffle from their site and then copy the JAR files outlined in https://github.com/dblock/waffle/blob/master/Docs/tomcat/TomcatSingleSignOnValve.md to the OpenXava’s tomcat server.


In your OpenXava project, create servlets.xml in the Web-inf, containing the following:


<!-- the role name (the domain gorup) must be entered EXACTLY as it appears in AD. It is case sensitive -->
<security-role>
<role-name>YOURDOMAIN\YourADGroup</role-name>
</security-role>

<security-constraint>
<web-resource-collection>
<web-resource-name>
Demo Application
</web-resource-name>
<url-pattern>/*</url-pattern>
<http-method>GET</http-method>
<http-method>POST</http-method>
</web-resource-collection>
<auth-constraint>
<role-name>YOURDOMAIN\YourADGroup</role-name>
</auth-constraint>
</security-constraint>


Add a new file called filters.xml to Web-inf:


<filter>
<filter-name>SecurityFilter</filter-name>
<filter-class>waffle.servlet.NegotiateSecurityFilter</filter-class>
<init-param>
<param-name>allowGuestLogin</param-name>
<param-value>false</param-value>
</init-param>
<init-param>
<param-name>waffle.servlet.spi.NegotiateSecurityFilterProvider/protocols</param-name>
<param-value>
Negotiate
NTLM
</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>SecurityFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>


After creating the filters.xml file, create the a context.xml file in the META-INF folder (not Web-inf):

<?xml version="1.0" encoding="UTF-8"?>
<Context>
<Valve className="waffle.apache.NegotiateAuthenticator" />
<Realm className="waffle.apache.WindowsRealm" />
</Context>



That’s it! that will limit access to members of YOURDOMAIN\YourADGroup and the users that visit the site will have their Windows credential pass-thorugh their browser. Please consult OpenXava and Waffle on how to get the current username in order to handle custom logic based on the current user’s Active Directory groups.



Wednesday, November 13, 2013

Data Science for Business by Foster Provost & Tom Fawcett O’Reilly Media

Data Science for Business is a book that makes a phenomenal job teaching the fundamental concepts of Data Science (a.k.a. Data Analysis and Data Mining). Foster Provost and Tom Fawcett explain in plain English, clear examples and beginner-level math the processes surrounding Data Science and the basics of its algorithms.


The authors go over the various steps of the CRISP method using situations found in the real world such as Customer Churn and Online Advertising. The most common data analysis models are reviewed and explained in detail such as Clustering, Decision Trees and Support Vector Machines. Extensive explanation is given to the difference between supervised and unsupervised methods. Even if you use software tools that create those models, this book will help you understand how to use/test them correctly and how to avoid over-fitting.


Multiple examples are given in each chapter and most of the math is visually aided with graphs. The authors explain step by step any equation presented in the book. A notable example is how the authors show how the different parts of the Bayes’ Rule equation come together in chapter 9. There are also special Math-intensive sections that business managers might skip, but software developers and future data scientist need to examine closely.

I would recommend this book to any DBA or Developer looking for an useful introduction to Data Science. For a practical application of the concepts in the book, I recommend Data Analysis Using SQL and Excel by Gordon Linoff after reading Data Science for Business. As a SQL Server DBA, I will apply the concepts I learned with the book to SQL Server Analysis Services.

Saturday, November 02, 2013

How to make MS SQL integrated security work in Spoon

How to make Micorsoft SQL Server's integrated security (SSPI) work in Spoon:


  1. Download the MS SQL JDBC sql drivers
  2. Copy enu\auth\x64\sqljdbc_auth.dll to {spoon installtion folder}\libswt\win64
  3. Copy enu\auth\x64\sqljdbc_auth.dll  to {spoon installtion folder}\libswt\win32
  4. Copy C:\sqljdbc_4.0\enu\sqljdbc4.jar to {spoon installtion folder}\libext\JDBC
  5. Open Spoon
  6. When creating the data source, make sure to check "Use integrated authentication"

Please note that you are copying the 64 bit version of sqljdbc_auth.dll to both \libswt\win64 and \libswt\win32 if you have a 64 bit processor.

Tuesday, July 02, 2013

Alert if file missing using Powershell

The following Powershell script can be used to send an email alert when a file is missing from a folder or it is the same file from a previous check:

$path_mask = "yourfile_*.txt"
$previous_file_store = "lastfileread.txt"
$script_name = "File Check"



###### Functions ##########
Function EMailLog($subject, $message)
{
   $emailTo = "juanito@yourserver.com"
   $emailFrom = "alert@yourserver.com"
   $smtpserver="smtp.yourserver.com"   
   $smtp=new-object Net.Mail.SmtpClient($smtpServer)
   $smtp.Send($emailFrom, $emailTo, $subject, $message)
}



Try
{
   #get files that match the mask
   $curr_file = dir $path_mask |  select name

   if ($curr_file.count -gt 0)
   {
       #file found
       #check if the file is different from the previous file read
       $previous_file = Get-Content $previous_file_store
       $curr_file_name = $curr_file.Item(0).Name

       if ($previous_file.trim() -eq  $curr_file_name.trim())
       {
           $msg = "Found same file as previous check: $previous_file and $curr_file_name"
           $msg
           EMailLog "$script_name Error" $msg
       }
       else
       {
           #different file, record its name for comparing in the next run
           $curr_file.Item(0).Name | out-file -filepath $previous_file_store
       }

   }
   else
   {
       $msg = "$path_mask Not found"
       $msg
       EMailLog "$script_name Error" $msg
   }
}
Catch [system.exception]
{
   "Caught a system exception "
   $error
   $error.Exception | EMailLog  "$script_name System error" $_
   
}