Reenergize Your Approach to Enterprise BI

Business Intelligence competency centers have not enjoyed even close to the same hype and acceptance as BI in general, meaning enterprises are missing out on some advanced data and business strategy opportunities, according to Gartner Research expert John Haggerty Justin Kern Information Management – 1 Aug 2012

Business Intelligence should have not only a tactical impact, but a strategic one too. This is easier said than done because achieving a strategic impact in large organizations requires the leaders of the BI initiative to be dedicated and drive for its success. To foster this leadership, the Business Intelligence Competency Center (BICC) was born, providing a more holistic approach to business intelligence. BICC encompasses more than just technology to include the organization’s overall information strategy.

One of the key elements the BICC must have to ensure success is a business process leader who, along with the data management leader and the business intelligence leader, forms the core of the BICC management group. The business process leader is responsible for defining process challenges within the BI implementation. BI is a process, not just a software product; processes can be changed, measured and documented. This makes them repeatable and adaptable to changing business requirements. The key to a successful process is people. Organizations that marry their human capital, culture, knowledge processes and infrastructure by creating a fluid and functional BICC are best prepared and poised to meet the continuously changing demands of their customers while maximizing corporate potential.

The BICC is formed around a number of axes:

  • Business needs
  • Organization and processes
  • Tools and applications
  • Data integration and management

The success to any BICC implementation is not to treat it as a project, but as a process. Once in place, the BICC should stay and be expanded and refined throughout its life. As business requirements change, there is a constant need for more advanced or different types of intelligence. The “intelligent company” that is accustomed to fact-based decision making will better usef information in more and more of its business processes.

In a 2003 report for Gartner, Kevin Strange and Bill Hostman (BI Competency Center is Core to BI Success AV-20-5294 Dated 22 July 2003) postulated that BI success within any company depends on the formation, organization and staffing of the BICC. Yet many organizations have struggled with their BICC implementations as identified by Haggerty in Kern’s Report

Whether this struggle is due to a lack of skills, understanding or commitment, the BICC still ranks as the No. 1 priority for CIOs. But the arrangement of processes and management of the output have not taken hold despite this stated interest. The problem most organizations face is that best practices and implementation of a BICC take time.

With that said, those organizations that put in the time will be rewarded, as Chris Carney from HP recently stated in an interview by Linda L Briggs for TDWI – 7 Feb 2012 because the BICC offers so many benefits that it’s hard to state just a few. Nevertheless, the time required is a concern to the business management, especially at the CFO level and above.

CFOs want to ensure that all business processes are being utilized in productive and cost-effective manners. In this fast-paced economy with margins cut to the bone, companies cannot always wait three months or more for the BICC to get up and running to provide BI on operational processes. This may result in business users looking for the ability to gain answers or Business Intelligence Right Now (BIRN), which leads to the proliferation within the business community of discovery tools and Excel spreadmarts.

CIOs, meanwhile, must maintain their commitment and support for the BICC and its obvious benefits around delivery of quality information and governance of the data supplies. Additionally, they should find a means to provide the business teams with access to the data in a way that maintains this security but allows flexibility to choose how business users integrate their data as well as visualize and discover trends and patterns. This is the recipe for a fully reenergized approach to enterprise BI.

Matching the Speed of Business Change

The inability to match the increasingly rapid speed of business change has being affecting the implementation of Business Intelligence for a number of years. Further complicating the problem is the fact that many enterprise information technology systems often inhibit business flexibility, sometimes with dire consequences. A prime example: There is speculation that the Sept. 11, 2001 attacks might have been prevented if the FBI had had more flexible IT systems (National Commission on Terrorist Attacks Upon the United States; Final report, 2004.)

The reality is that companies, change, merge, split or reorganize. New products and services are introduced as old ones are retired or modified to meet evolving demand, competition or regulations. One thing is for certain: agile businesses are more likely to thrive than ones that do not adjust quickly. Companies can lack agility for many reasons; one of the most common is the inflexibility of the current BI system, resulting in the inability to gain insight within a timeframe that allows decisions to impact the company’s profitability.

The Digital Revolution is transforming the marketplace and creating an unquenched thirst for business analytics. Business customers are buying more carefully, while demanding greater budget and business value from their choices. Today, customers benefit from the ability to see—and say—more about the companies that serve them. They demand better services with more choices and value from the products they purchase. In addition, they expect the company they are using to act in a socially and environmentally responsible manner.

At the same time, the new world of Facebook and Twitter is providing unprecedented opportunities for companies to engage with customers. If a company is to realize the potential of this deluge of new information and communication, they must first intercept and interpret vast quantities of data to find the meaningful parts. The volume and variety of data, much of it unstructured, are increasing with ferocious velocity leading to the term “Big Data.” CMOs, therefore, have to do more than ever before. They must manage more data, understand and engage with more demanding customers, and ensure their employees consistently exemplify their companies’ values. To attain this, they need tools and technologies often understood better by their children than by them.

Within most companies today, CFOs are increasingly responsible for IT as CIOs often report to the CFO, who in turn, must deliver the proper information and analytics to line-of-business leaders in order to help them run their companies most effectively. CFOs, on the other hand, need to be able to put critical line-of-business data into proper perspective, so they can better understand whether organic business expansion or growing through acquisitions is the preferred growth strategy.

For that reason, the ability to take advantage of analytics to correlate performance and growth is essential. Recent research shows organizations that lead in analytics outperform those that are just beginning by up to three times. Furthermore, it has been shown that the top performers using an analytical approach–instead of intuition–outperform the beginners by up to five and a half times! (Analytics: The New Path to Value, a joint MIT Sloan Management Review and IBM Institute of Business Value Study – Massachusetts Institute of Technology 2010).

Unfortunately, some IT departments are using a “full-steam ahead” mentality, and excluding the business customer, which does not bode well for long-term survival. Some companies are only discovering now that, like the proverbial iceberg, just 10 percent of their future plans are starting to be incorporated in discussions of analytics along with mobile and cloud adoption, while the vast majority remains below the surface.

During the next decade, the requirements and costs under the surface will filter to the top. These will include:

  • Advice – Business Intelligence thought leaders will evolve into mavens to guide solutions with their influence and experience replacing IT best practices – enjoy the wisdom of crowds!
  • Data – Access will be easy and complete; if you want sensor data, you will have access in minutes, not months.
  • Analytics – There will be an app for that, which is easy to learn and use. It will serve the Gen Y business users who are used to just doing it.
  • Visualization – 3D reality gaming techniques will make their way into all areas of computing and will extend the human senses in ways not thought of today.
  • Processing Power – Massive computer farms will provide processing power on tap for those who need it; it will be easily accessed and budgeted through a credit card.

If IT is going to survive this quantum paradigm shift in the way business is performed, they must start adapting now. An excellent first step: Begin by improving customer communications with their most important customer: the business. Then stay on the fast track by improving flexibility and the ability of systems to provide data, analytics and reporting in an agile and user-driven environment.

Cassandra 12 Summit – Quick Follow Up

Cassandra 12 was an awesome experience, interesting conversations and very knowledgeable people. My understanding of the Cassandra movement and its impact on everyday business has multiplied exponentially.  I was asked to present a five minute lightening talk on Toad for Cloud and its ability to talk to Cassandra utilising standard SQL.  First time I had taken part in this type of quick fire presentation but I enjoyed it and I hope you will too.  I will write more on the summit in the near future but I was very encouraged by the numbers – over a 1000 production instances of Cassandra and growing.  Along with the exceptional job the Cassandra MVP’s are doing in the field of improving and growing the software base I am really excited by the way that Cassandra can influence certain areas with Business Intelligence.  More to follow:

Cassandra Summit 2012

Heading over to Santa Clara tonight to take part in the Cassandra 2012 summit. Looking forward to some interesting and educational sessions along with presenting a lightning talk on Toad for Cloud Databases. NoSQL Cloud and Big Data are the flavour at the moment and it is interesting to see the difference between true big data (interaction data) and big data sets (large transactional data sets) and the different kinds of analysis they need. For example, true big data analysis may imply possible machine learning implications – like google ad word costing for example. Whereas transactional data analysis (even with large data sets) usually involves rolling up and aggregating to such an extent that real humans can analyze and understand the data. So when people talk about ‘big data analytics’ what kind of data they have should influence what kind of analysis they want to do. I will report back on my findings later this week but I am excited to broaden my knowledge in this area.

Post Pacific Northwest BI Summit Discussions

Firstly a big thank you to Scott Humphrey for his organisation of this annual one off event which was by far the best ‘Summit’ that I have attended.  The ability to be able to talk to extremely knowledgeable and sharp analysts along with my fellow vendor participants and the members of the press in such a relaxed atmosphere where conversations last 30 minutes or more instead of the grabbed two to three at the normal event has being invaluable to me personally.

The sessions during the weekend were presented exceptionally well by all those taking part and I gained knowledge and understanding whilst also being able to discuss what if scenarios with my peers.

My takeaways from the summit include:

  • The future of BI could be more about the application and less about the platform.
  • Analytics will continue to expand and become more mainstream – vendors will need to provide easier access to complex analytical suites.  This will lead to people who do not see themselves as analysts becoming involved in this type of work
  • There will be a lot of disruption around new technology fitting in alongside legacy systems.
  • They way forward for NoSQL/Big Data seems to be finding a way to be part of a hybrid ecosystem within the enterprise – but only if we can find a way to provide both BI and analytic platforms that server off this new architecture.
  • Social Data and Machine data will go beyond the hype and provide insights and benefit to organisations in the near future.
  • The Big Data management layer is ready to be expanded into the enterprise but needs support from applications.

It was obvious from the many discussions that all present believe that the landscape within organisations is changing very rapidly and that in the future the progression to a hybrid ecosystem within companies will lead to the requirements for more flexible and user friendly BI systems.  The inclusion of Social Media in to many analytical applications is a sign that the trend of “Big Data” is very much alive and driving a lot of this change within companies.

The ability of BI Vendors to be able to provide a data catalogue when connecting to various data sources which allows the business user to complete their own data integration by interpreting the data sets being provided to them is key to the ability of these self service BI to support fully an agile and mobile BI infrastructure.  Mobility of BI will need to become platform agnostic if it is to succeed in delivering on the promise of fully interactive and self service BI to those who require it on the move – the way forward may lie in development of HTML 5 applications which will work on any platform.

It was also interesting to note that recent studies have shown that many users of BI tools prefer and trust more in those applications which provide a full touch interface – especially at the information consumer level – who like the ability with modern devices of being able to investigate and collaborate on the data in their hands by gestures instead of mouse clicks.  With Gen Y becoming more and more prevalent not only in the work place but as consumers of information we should expect to see and be able to react to this change of interface requirement’s from a static display of information on a dashboard to the user wanting to investigate the information presented to them by simple gestures.  Allowing the consumers of the information to ask the ‘What If’ questions which drive the requirements of analytics will lead to these systems being improved and simplified for general use.

 

Pacific North West BI Summit

It is just two days until I get to meet up with some of BI’s finest in beautiful southern Oregon for the 11th annual Pacific Northwest BI Summit.  Along with myself there are a number of experts including Claudia Imhoff, Colin White, Jill Dyche, William McKnight and Shawn Rogers  we will all partake in roundtable discussions and activities around BI during the weekend fellow attendees include:

Mark McNally, Predixion

Tarun Loomba, Armanta

Fred Funke, Gnip

Donald Farmer, QlikView

Kim Dossey, Teradata

Robert Eve, Composite

Dan Soceanu, SAS

Michael Whitehead, WhereScape

Harriet Fryman, IBM

Yves de Montcheuil, Talend

John Santaferraro, ParAccel

Mark Theissen, Cirro

Glen Rabie, Yellowfin

Nicole Laskowski, TechTarget

Ted Cuzzillo, Datadoodle

Stephen Swoyer, TDWI

The topics to be covered look great.  The weather is shaping up nicely.  See you soon at the #BISUM.

What is “Big Data”

Why all the hype surrounding “Big Data”

To understand you really need to be able to define what the term “Big Data” actually means.  To me the definition is clearly identified by the three V’s.

Volume – Variety – Velocity

According to survey’s published late in 2011 over 1.5 trillion gigabytes of data was created and replicated in that year alone (IDC 5th Annual Survey).  This shows a 100% increase from two years previous and this increase in data production is not expected to slow but to rise exponentially every two years.  This data is not all useful data however all of it can and is being collected.  If the data can be collected then should we not be providing tools to connect, analyze and visualize the results to improve decision making.

Volume

With Twitter generating volumes of greater than 6 terabytes of data per day you can see that the sheer volume of data being stored today is exploding.  With some enterprises generate terabytes of data every hour of every day of the year this leads to the current conundrum facing today’s businesses across all industries. As the amount of data available to the enterprise is on the rise, the percent of data it can process, understand, and analyze is on the decline, thereby creating an area of information that is clouded from view – not because we cannot store or retrieve the data but because we cannot process and analyze it quickly enough.

Variety

With data in the enterprise becoming complex and including not only traditional relational data, but also unstructured, semi-structured, and raw data from web pages, web log files, search indexes, social media forums and even sensor data from active and passive systems. Much of this information does not lead itself to being stored in traditional systems and enterprises can therefore struggle to store and perform the required analytics to gain understanding from the data because of this.  An organization’s success will rely on its ability to draw insights from the various kinds of data available to it, which includes both traditional and non-traditional.

Velocity

Along with the other two V’s the velocity of the data being generated has increased over the last thirty years.  To programme a computer used to be an inherently slow process of writing the code, punching cards which were then read by a card punch reader and entered onto the main frame.  Sometimes a hundred lines of code would take a day or more to get loaded into memory and be run.  Today’s generation can create a mobile app in minutes with object orientation and pattern modeling allowing those with no programming skills at all to produce slick data gathering models.  Just as the sheer volume and variety of data we collect and store has changed, so, too, has the velocity at which it is generated and needs to be handled. Today’s enterprises are dealing with petabytes of data instead of terabytes, a constant flow of data at a pace that has made it impossible for traditional systems to handle. To provide effective analytics you need to be able to deal with both the volume and variety of data while it is still in motion, not just when it has come to rest.

Quite simply, the hype around “Big Data” exists today because the world is changing. Through applications and devices not thought of twenty years ago when the Data Warehouse was born we are able to sense and record more things once we have recorded it we naturally want to save it. Through advances in technology, people and devices are collaborating on a level not before seen – I liken it to very first telephone exchange coming on line – we moved away from the process of writing letters to talking on the telephone.  We have now moved away from the process of analyzing just stored static data not because have to but because everything is now so increasingly interconnected and we need to if we are to understand the challenges that lie before us.

Gartner BI Summit LA 2012

Well a really good two days sorry to leave. Important points and take aways:

Big Data is everywhere but less that 3% of companies have a production version of any form. Logical Data Warehouse (LDW) details expanded on and the future looks good in area but a lot of work required to define requirements and the ability to be able to conduct MDM against external sources before introduction to the EDW. Social networking and the ability to add this to the marketing BI Stack is starting to be enabled but requires enablement from the IT team and support from both CIO and CEO within the organisation.

Sorry to leave tomorrow but have gained so much in two days.

Gartner BI Summit 2012

Excellent start to the day at Gartner in LA Keynote by Bill Hostmann delivery talking about BI being a team sport. This is nothing new to me as I have being preaching it for years – nice to hear the message is getting through to others. Now sat listening to how BI uptake is still below 30%. Data discovery market to be greater than $1 Billion by 2013. Major vendors now following this path to catch up.

Business Intelligence – The Underpinings

Business Intelligence – Data Storage 

POS Applications, HR Applications, Customer Survey results these are just some of the myriad sources of data that we are, as database administrators or developers responsible for and that a business intelligence system can consume.  Within Business Intelligence systems these sources of data are usually encompassed in to two main types of storage systems to provide historical, current, and predictive views of business operations.

The Data Warehouse 

Is a repository of an organization’s electronically stored data, data warehouses are designed to facilitate reporting and analysis.  The need for a data warehouse is driving by an organization’s need for reliable, consolidated, unique and integrated reporting and analysis of its data, at different levels of aggregation.  The practical reality of most organizations is that their data infrastructure is made up by a collection of heterogeneous systems. For example, an organization might have one system that handles service quality, one that handles employees, and others that handle sales data or production data. In practice, these systems are often poorly or not at all integrated and simple questions like: “How many customers are complaining about branch A and have a credit account and which employees are being targeted” can be very hard to answer, even though the information is available “somewhere” in the different data systems.

It is partly the purpose of Data warehousing to bridge such problems but also to make data appear consistent, integrated and consolidated despite the problems in the underlying source systems the data warehouse achieves this by employing techniques, creating a new data repositories (i.e. the data warehouse) whose data model(s) support the needed reporting and analysis.

Data Mart 

A data mart should be viewed as a subset of an organizational data store, usually oriented to a specific purpose or major data subject that may be distributed to support business needs.  Data marts are analytical data stores designed to focus on specific business functions for a specific community within an organization. Data marts are often derived from subsets of data in a data warehouse, though in the bottom-up data warehouse design methodology the data warehouse is created from the union of organizational data marts.

Reasons for creating a data mart

  • Access to frequently needed data
  • Creates collective view by a group of users
  • End-user response time
  • Ease of creation
  • Lower cost than implementing a full Data warehouse
  • Users are more clearly defined than in a full Data warehouse

The Real World

In practice, the terms data mart and data warehouse each tend to imply the presence of the other in some form. However, most writers using the term seem to agree that the design of a data mart tends to start from an analysis of user needs and that a data warehouse tends to start from an analysis of what data already exists and how it can be collected in such a way that the data can later be used. A data warehouse is a central aggregation of data (which can be distributed physically); a data mart is a data repository that may or may not derive from a data warehouse and that emphasizes ease of access and usability for a particular designed purpose. In general, a data warehouse tends to be a strategic but somewhat unfinished concept; a data mart tends to be tactical and aimed at meeting an immediate need.

Design schemas

 The Star

The star schema (sometimes referenced as star join schema) is the simplest style of data warehouse schema. The star schema consists of a few fact tables (possibly only one, justifying the name) referencing any number of dimension tables. The star schema is considered an important special case of the snowflake schema.

Model

 Dimension tables have a simple primary key, while fact tables have a set of foreign keys which make up a compound primary key consisting of a combination of relevant dimension keys. It is common for dimension tables to consolidate redundant data in the most granular column, and is rendered in second normal form. Fact tables are usually in third normal form because all data depends on either one dimension or all of them, not on combinations of a few dimensions.  The star schema is a way to implement multi-dimensional database (MDDB) functionality using a mainstream relational database: given the typical commitment to relational databases of most organizations, a specialized multidimensional DBMS is likely to be both expensive and inconvenient.  The facts that the data warehouse helps analyze are classified along different dimensions: the fact tables hold the main data, while the usually smaller dimension tables describe each value of a dimension and can be joined to fact tables as needed.  Another reason for using a star schema is its simplicity from the users’ point of view: queries are never complex because the only joins and conditions involve a fact table and a single level of dimension tables, without the indirect dependencies to other tables that are possible in a better normalized snowflake schema.

Business Intelligence, the primary consumer of Star Schema data, is also best expressed in Business English, not programmer’s dialect. The aggregate navigators (OLAP) tools that are common in the industry do not need to subsequently rename the elements, because they are already in proper business English.  Most SQL database engines allow schemata descriptors, and also permit decoration suffixes on surrogate keys columns. Using square brackets, which are physically easier to type on the keyboard (no shift key needed) are not intrusive and make the code easier to read.

For example, the following query extracts how many SUVs have been sold, for each brand and country, in 1997.

SELECT  Brand, Country, SUM ([Units Sold]) FROM    Fact.Sales (NOLOCK) JOIN Dim.Date (NOLOCK) ON Date_FK = Date_PK JOIN Dim.Store (NOLOCK) ON Store_FK = Store_PK JOIN Dim.Product (NOLOCK) ON Product_FK = Product_PK WHERE  [Year] = 1997 AND [Product Category] = ‘SUV’ GROUP BY  Brand, Country

The Snowflake
 
A snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake in shape. Closely related to the star schema, the snowflake schema is represented by centralized fact tables which are connected to multiple dimensions. In the snowflake schema, however, dimensions are normalized into multiple related tables whereas the star schema’s dimensions are denormalized with each dimension being represented by a single table. When the dimensions of a snowflake schema are elaborate, having multiple levels of relationships, and where child tables have multiple parent tables (“forks in the road”), a complex snowflake shape starts to emerge. The “snowflaking” effect only affects the dimension tables and not the fact tables.

Data normalization and storage

Normalization splits up data to avoid redundancy (duplication) by moving commonly repeating groups of data into a new table. Normalization therefore tends to increase the number of tables that need to be joined in order to perform a given query, but reduces the space required to hold the data and the number of places where it needs to be updated if the data changes.  From a space storage point of view, the dimensional tables are typically small compared to the fact tables. This often removes the storage space benefit of snowflaking the dimension tables, as compared with a star schema.  Some database developers compromise by creating an underlying snowflake schema with views built on top of it that perform many of the necessary joins to simulate a star schema. This provides the storage benefits achieved through the normalization of dimensions with the ease of querying that the star schema provides. The tradeoff is that requiring the server to perform the underlying joins automatically can result in a performance hit when querying as well as extra joins to tables that may not be necessary to fulfill certain queries.

Benefits of “snowflaking”

  • Some OLAP multidimensional database  modeling tools that use dimensional data marts as a data source are optimized for snowflake schemas.
  • If a dimension is very sparse (i.e. most of the possible values for the dimension have no data) and/or a dimension has a very long list of attributes which may be used in a query, the dimension table may occupy a significant proportion of the database and snowflaking may be appropriate.
  • A multidimensional view is sometimes added to an existing transactional database to aid reporting. In this case, the tables which describe the dimensions will already exist and will typically be normalized. A snowflake schema will therefore be easier to implement.
  • A snowflake schema can sometimes reflect the way in which users think about data. Users may prefer to generate queries using a star schema in some cases, although this may or may not be reflected in the underlying organization of the database.
  • Some users may wish to submit queries to the database which, using conventional multidimensional reporting tools, cannot be expressed within a simple star schema. This is particularly common in data mining of customer databases, where a common requirement is to locate common factors between customers who bought products meeting complex criteria. Some snowflaking would typically be required to permit simple query tools to form such a query, especially if  provision for these forms of query weren’t anticipated when the data warehouse was first designed.

Which schema to use

Your decision whether to employ a star schema or a snowflake schema should consider the relative strengths of the database platform in question and the query tool to be employed.  The Star schema should be favored with query tools that largely expose users to the underlying table structures and in environments where most queries are simpler in nature.  The Snowflake schema is often better with more sophisticated query tools that isolate users from the raw table structures and for environments having numerous queries with complex criteria.