Benchmarking Big Data Business Forecasting Data

The Semantic Approach to Building Accurate Project Plans

An idea.  Alone and unshared, while of great potential, worthless in the wild.  In the consumer space, ideas are tweeted, trending, memes, most popular and can even go viral.  The 140 characters of Twitter and the brief updates on Facebook, singular or compound ideas, are being used to target marketing.  In the consumer space, ideas have value.  It costs over $54 dollars (Google Adwords) to tap into the buy insurance idea.  So what about in the business enterprise?

For over 60 years with the advent of modern project management, ideas happen once, and then the work begins.  No trending, most popular, nothing like a business segment meme, or even the winning idea going viral.  We keep developing a project plan like it is a one off exercise with no recognition that the past shapes a new project’s prospects for success.  The project – one off things we do very poorly (less than 50% full success rates).  Sounding more and more like the definition of crazy.

Last year, I completed an in-depth study of ideas in the workplace.  I consciously decided not to use the idea word, as it sounds so, well, fluffy.  I use lexeme (a basic unit of meaning), a word borrowed from linguistics.  I also use lexemetry to describe the process, measuring lexemes in a context.  For me, that context is projects.

Here’s what I did.  First, I had already developed an engine that does semantic matching.  While it can look at any string, I focused on task titles for this research.  For each task, the matching engine looks for other task titles that semantically match the new task title.  New matches are added to a matchset cache.  Now, we have a subset of all tasks that match semantically.

500px-Semantic_Net The premise is the tasks that match, those that happen more than once in a project portfolio, form the cogs of what makes the day to day project assembly line operate.  These matchsets are the iterating steps of a formal process. They are multiple copies of project templates.  The matchsets are also things that everyone adds to projects based on the culture of the enterprise.  They are the crowd sourced knowledge of the way to get projects done.  Also, iterating tasks allow us to build models based on the matchsets – cost, roles, durations, efforts, movements, relationships, all the metadata of project work.  In fact, I’ve argued that the bulk of our enterprise big data is meta of work/performance.  Thus, these matchsets are actually the Higgs Bosons of enterprise big data – everything else is created from work.  As such, the matchsets can also provide insights into those tasks that don’t match, the value givers.  With models we have benchmarks.  With models we have ranges.  With models, we can even evaluate worst, bad, abnormal, normal, good and best.

While we could look at any other meta of the matchsets, I wanted to do something with time.  Yes, flexing our analytical muscles.  I wanted to show that these matchsets are not just static ideas/lexemes glued to the projects where they occurred.  They move, they materialize in different parts of a schedule.

However, time comparisons are not a simple endeavor.  Projects have different durations, and June 15th is comparable to what?  We need to normalize time across projects.  We chose percentage as our canonical form for time.  Thus, the beginning of the project becomes the 0 percentile of time, while the 100 percentile becomes the end of the project.  Now, all time is directly comparable, and all of our matchsets live in this normalized time.

We have some cogs of commerce iterating in time, so what can we tell from this?  We can find benchmarks, mine process, and even identify best practices.  But first, our data.  I randomly selected data from a few databases. I also ensured that I did not get all similar businesses or industry types.  That resulted in just shy of one million tasks in over 20,000 projects.  I found:

  1. At least 40% of tasks are iterative.  We have a benchmark.  I’d also like to say that the concept that all projects are one offs is a dead one.  Almost one out of two lexemes in your project is iterative.  Plan on it.  Use it to your advantage.  Understand your iterating ideas via measure.
  2. On average, lexemes can deviate in time by 11.52%.  We have a benchmark for a task buffer.  The critical chain folks are going nuts!  To me, this so fundamental.  We have a statistically significant finding of an empirical number that shows how difficult it is to schedule and perform work.  Any piece of work can move by around 10% of the project timeline!  Conversely, we can use this bedrock datum to better plan.  Agile, your stories now have a time scale.
  3. Look at this composite process I mined out:
Matchset Master Title Start Percentile
Develop Project Charter 4
Project Initiation Activities 5
Develop Preliminary Plan 6
Update Charter 10
Complete Charter 12
Provide Detailed Plan 13
Develop Communications Plan 23
Identify Supply Chain 27
Develop Requirements 30
Execution Phase Start 42
Development Complete 43
Deployment 69
Execution Complete 70
Deploy to Production 86
Training 86

At a very large company, I found the commonalities in the plethora of templates across many business units. Additionally, I found over 80 lexeme matchsets (in proper time sequence and proper time distance) that were being consistently added to these project plan templates – sub project assembly lines.  Lastly, their very traditional waterfall approach resulted in actual project work consistently not starting until almost half way into the project!

I believe that knowing the benchmark iteration number (40%), as well as the benchmark time deviation number (11%), project success will increase for any enterprise.  It is time the enterprise got its meme on, as lexemetry is poised to go viral.