Power BI Blog

Tips, ideas, and updates from the experts

  • Behind the Scenes: How Power BI Handled The World Cup

    In this post

    Importing the Data with Power Query

    Getting the right format with Power Query

    Power Pivot

    Power Q&A

    Crawl, then walk

    This year’s world cup was filled with unbelievable surprises; Miroslav Klose’s record breaking total goals, Germany’s thrashing of Brazil, Spain’s unfortunate crash-out, United States’ resilience through the Group of Death, and moments of Messi magic.  For the stat geek in all of us, Power BI Q&A made it easy to put the game in context and help you make better predictions about the outcome.  If you didn’t have a chance to ask Q&A about the world cup during the tournament, you can still give it a try here.

    We heard from many of you how amazing it was to see the breadth of questions Q&A could answer on the world cup.  If you’d like a glimpse into how we can teach Q&A a whole new sport in a matter of a day, read on!

    In this post we will discuss how we used the breadth of tools available in Excel and Power BI to bring this experience to you including:

    • Power Query to import and format Data
    • Power Pivot to build the massive model that supports the questions being asked
    • Power Q&A to actually answer the questions asked through visualizations

    Importing the Data with Power Query

    We partnered with Opta Sports, a leading provider of historical sports statistics to give Q&A all of the facts you’d expect a sports statistician to know.  The historical data was delivered as a series of XML files with the statistics for each game organized in folders by year. Each file contained information on the teams, coaches, referee, players, goals, cards, substitutes, and game statistics. This isn’t the easiest way to import data, but luckily Power Query has robust support for handling different data formats and folder structures.

    We used the From Folder option in Power Query to import all the data at once. Each row was a path to the individual XML file. Power Query also has the ability to read in the content of an XML file. Under content, clicking on “binary” lets you expand out all the data within a XML. After deleting and renaming a few columns, you end up with data for each game (see image below). As you can see, the value “Table” is in a lot of cells. That’s because Opta’s XML feed was highly nested, but it allowed for easy organization and navigation of the data. 

     

     

     

     

    We had to use a different approach for the 2014 data since it was updated in real time from Opta as the games progressed.  Using the From Web option in Power Query, it was easy to import the latest data from each game and refresh it whenever we wanted.   If you’d like to learn more about how to connect to almost any data source with Power Query, check out Getting Started with Power Query Part I.

      

     

     

    Getting the right format with Power Query

    Once we pulled in the data, we had to create a query for each table we wanted in our data model: games, goals, cards, referees, managers, stadiums, etc. The motivation behind creating a query for each table is that when we load each table to the data model, we can easily create the relational diagram required to make Q&A shine. Most queries involved expanding out nested tables, and filtering columns to get one query for one particular table. One functionality we would like to point out with Power Query is the advanced editor, which allows you full control over the transformation of you data. The entire advanced editor is not the scope of this post, but we wanted to highlight the scenario of pivoting tables. In the Power Query editor, under the “View” tab, there is a button to launch into the advanced editor (shown below):

     

     

    Chris Webb wrote a great blog post on pivoting tables.

    After each query was created for the particular tables we wanted, we loaded them to the data model for Power Pivot to consume.

    Power Pivot

    When you load to the data model from Power Query, it automatically adds your tables to Power Pivot. All we had to do in Power Pivot was match unique IDs to one another to create relationships so we know which players had goals, cards and which referees, teams, players, and coaches were in a particular game. Now that the relationships were created, we could take this to Power Q&A. 

     

    Power Q&A

    Now that we have our relational model created, we optimized our data by going through the steps outlined in the Demystifying Power BI Q&A Series (Parts 1 | 2).

    The model is now ready to be used on Power Q&A. However, we optimized the model further using the new cloud modeling environment to be able to ask a wider variety of questions. The cloud modeling environment also is an easy way to manage synonyms for your columns and tables. Since all the changes are saved on to the cloud, it allows everyone to reap the benefits.

    Documentation on how to get started with the cloud modeling environment can be found here. As an example on how to make a wider variety of questions to work, we created the following phrasing “aggressive team has large average cards per game”. The inclusion of this phrasing allowed us to ask more questions like:

    You can click the links above to see how one good phrasing allows Q&A to give really great answers!

    Crawl, then walk

    As you can see, there’s a lot more to teaching Q&A than just understanding natural language.  Finding the right data, choosing the right tables and relationships, and data modeling all contribute to a great experience in Q&A.  Luckily, the full suite of Power BI tools like Power Query and Power Pivot make it easy to achieve powerful results in a few minutes.

     

    #power query#Power BI#A#Q&amp#Power Pivot#2014 world cup#Brazil#football#soccer#World Cup#futbol

    Tue, 29 Jul 2014 16:00:00 GMT byPower BI Team0 Comment

    Read full post
  • Power Map July Update

    Last month in the June update we focused on quality improvements for Power Map, including enhanced interaction with maps on touch devices. The touch experience should feel smoother, more fluid, and more predictable. Using your feedback, we also improved stability. Thank you for sending in your error reports! 

    This month in the July update we’re adding a new feature called “Automatic Refresh.” This feature makes the Repeat Tour feature we added back in May even better. During tour playback, Automatic Refresh will pull in the latest updates to your data every 10 minutes. When used together, Repeat Tour and Automatic Refresh let you use Power Map to monitor events in near real time. We’re very excited about the new opportunities this opens up, and we hope you are too.

    To enable Automatic Refresh, make sure Repeat Tour is turned on:

    Then press the “…” button when you’re playing your tour to bring up the new tour options menu:

    From the menu click the Automatic Refresh option to toggle it on:

     

    If you find these features useful, please “Send a Smile” in the app and let us know how you’re using them. We hope you enjoy using Power Map. Big things are coming soon, so stay tuned for our monthly updates.

    #Power BI#Power Map#data refresh#Office 365

    Wed, 16 Jul 2014 16:00:00 GMT byPower BI Team6 Comments

    Read full post
  • New in Power BI: Cloud Modeling for Q and A

    In May, the Q&A team released an update to Q&A that enabled a wider variety of natural language questions that can be answered as well as personal customization of certain types of questions. In the new cloud modeling environment, it is now possible to define new terms and resolve any ambiguity for your questions. For example, cloud-modeling enabled these questions to bring back meaningful results:

    Which player scored the most unassisted goals per world cup

    Who are the diving medalists from China and what is their medal count

    Which player has received a yellow card and scored 2 goals in the same game

    How does Q&A know what an unassisted goal is? Or that athletes are from a particular country, not just that they compete in a host country? With cloud modeling, it is possible to define what an unassisted goal is and that athletes are from their home country. This update allows you to create new phrasings within your workbook’s language model.

    Before walking through an example, what is a language model? A language model is your workbook’s dictionary. It’s what Q&A uses to know how to understand each row and column and how they are related. Within a language model, phrasings are created to define what words are used to relate different columns in your model.

    Let’s walk through an example to show how creating phrasings through the cloud-modeling environment works.

     

    Getting Started with Cloud Modeling

    1)      If you sign into PowerBI, you will be presented with the main dashboard:

     

    2)      Make sure to add your workbook with the “Add” button, located next to Documents.

    3)      After you add your workbook, click on the (…) located on your workbook’s tile (circled in red in the above picture).

    a) Click “Add to Q&A”

    4)      After you add your workbook for Q&A, it is now time to launch into the language modeling experience. Click on the gear icon in the upper right (circled in green in the above picture) and click “Power BI site settings”.

    5)      Click on “Q&A” and it’ll bring you to a screen like this:

     

    6)      Click on the (…) and then click “Optimize for Q&A”

    7)      Now you are in the Language Modeling tool and are ready to customize your workbook to your liking. It will look like the screenshot below.

     

    On the right hand side of the screen, you will see a pane with you Workbook name on it. The pane lets you create your language model in several ways:

      1. The overview pane gives you a synopsis of what your Language Model contains;
      2. Synonyms are the synonyms you added in Power Pivot (you can add more here if you forgot to add them in Power Pivot);
      3. Phrasings are where you define the entity relationships. This is where the bulk of fine tuning will take place as well as the remaining contents of this blog post; and
      4. Usage shows the utterances that have been typed in the Q&A text box. This is useful to see questions that weren’t recognized by Q&A (to know what phrasings to add); questions that didn’t give the most intuitive visualization; and most popular utterances

     

    Before making a phrasing, it is important to start from the question you are trying to ask in Q&A. In this example, the question “which player has the most unassisted goals” doesn’t work because Q&A doesn’t know what an unassisted goal is.

    To create the phrasing that will enable that question, we click on the “phrasings” tab.

     

    After clicking on “add phrasing”, you can start typing in what you want to define. Typing in the word “unassisted” prompts a suggestion box to help craft your phrasing. In the World Cup model, I have a column named AssistPlayerName. If the row value is “N/A”, it is an unassisted goal. Therefore, the proper phrasing needed would be the second one in the dropdown box. The full phrasing is shown below.

     

    You can immediately test your phrasing to see if it works.

     

    Now that we have walked through how to get started with cloud modeling, we encourage you to read the documentation on the various phrasing types, what types of questions they enable, and how to create them.

     

    Importance of Cloud Modeling

    Cloud modeling allows you to make Q&A work for the kind of language you use in your company and industry.   It is the next step in model optimization, and it follows the optimizations done in Power Pivot (see Demystifying Power BI Q&A Part 1 | 2).

    Before even creating any phrasings on your own, Q&A already generates some phrasings for you so that basic queries will work. For example, if you have sales data with a Customers table and State and Customer Name were two columns, we would automatically generate the phrasing “Customer name is the name of Customer” so that you could ask, “Show Pennsylvania customers”.

    However, if you have another table Orders with columns named Product, Price, Date Bought, Customer ID (relating back to the Customer table), and you want to ask, “what products did Pennsylvania customers buy?”), Q&A wouldn’t know what to do with that question. By adding the phrasing “customers buy products”, Q&A has the context it needs to relate both the columns together.

    Another example would be what if you are the owner of a small bakery and you want to see what items to have on your seasonal display showcase. You have a database of ingredients and their respective prices. If you just want to ask, “show expensive ingredients”, you can create a phrasing to define what expensive meant for you! For small bakeries, it could be that ingredients more than $26/lb are expensive. For bigger bakeries, ingredients more than $40/lb are considered expensive. Depending on the needs of your company, you can tailor Q&A to give you the most pertinent results.

    With cloud modeling, it is possible to answer a wide range of questions with Q&A. Q&A is still a ways from being as intelligent as Batman’s batcomputer, but because Power BI is a cloud service, we can rapidly and continually improve the product based on your needs and wants from a self-service business intelligence product. Lastly, we hope this blog post has shown how Q&A can be tailored to fit your needs and we encourage that you try making your own phrasings as explained here.

     
    #Power BI#QA#q and a#cloud modeling

    Tue, 15 Jul 2014 16:00:00 GMT byPower BI Team5 Comments

    Read full post
  • Getting Started with Excel and Power BI Series

    Over the next couple of weeks, we'll introduce you to the basic usage of all the latest and greatest features in Excel and Power BI. How do you know if this is the right series for you? Here are a few questions to help you decide:

    • Are you new to Power BI?
    • Did you sign up for the trial but don't know where to start?
    • Have you been using Excel for a long time, but you don't know where to start to use the new BI and visualization features?

    If you answered "yes" to any of them, then this is the right place for you to get started!

    Whether you're an Excel pro or you just downloaded the latest version of Excel, we decided to create this Power BI and Excel Getting Started series to get you up and running with our new set of features and services in no time.

    Our goal is to help you learn about and understand the features in Excel and Power BI following the natural path of getting some data in your hands (It doesn't matter where it comes from), shaping it, creating a model, visualizing your data to finally share it and consume it anytime, anywhere taking advantage of Power BI in the cloud.

    What should you expect to see in this series?

    The natural path of data analysis is a perfect match to the process of analyzing data. This series will cover the following features/processes:

    Excel

    Discover and Combine Data

    Model and Analyze

    • Getting Started with Power Pivot

    Visualize

    • Getting Started with Power View
    • Getting Started with Power Map

    Power BI for Office 365

    Share and Collaborate

    • Getting Started with Power BI Sites
    • Getting Started with Power View Forecasting Capabilities
    • Getting Started with Sharing Queries and Data Stewardship
    • Getting Started with Connecting On-Prem data to Power BI

    Ask Questions

    • Getting Started with Q&A

    Access your data anywhere

    • Getting Started with the Power BI App
    • Getting Started with Power BI HTML5

     

    You can always come back to this post to find the latest getting started post or to access past posts of this series.

    Enjoy and please let us know what we can do to make this series better for you!

    #power query#Power BI#Power Pivot#Power View#Excel#Power BI app#Getting Started#power bi sites

    Tue, 08 Jul 2014 18:35:00 GMT byPower BI Team3 Comments

    Read full post
  • World Cup: comparing contrast 20 years worth of data with natural language

    June 26th marked the end of the Round Robin stage, and we saw the World’s best teams crash out early while seeing other teams fight their way to the next round. There is no easy path to winning the World Cup, and we’re in for a treat for the next 16 games.

    Try PowerBI Q&A on World Cup 1990-2010 Data

    But, how did the 2014 Group Stage games compare with previous World Cups and could we potentially use these as an indicator for the games to come? PowerBI Q&A investigates and highlights key statistics surrounding World Cup’s most popular facts: goals, cards, and game performance. 

    2014 Performances

     

    Key insights: From the top 5 most winningest teams, Italy and Spain suffered an unfortunate fate. However, they have had similar performances before—Italy in 2002 and Spain in 1998. Their premature exit from the tournament is balanced by a bulk of the middle-tier teams doing really well (ex. Belgium, Colombia, Mexico, Switzerland, Costa Rica, etc.).

     

    Key insight: Taking a look at the club teams that Spanish players play for, we see that all of them play for the best teams in the world. All of these club teams play an average of 45-50 games a year so when the World Cup starts, they are not mentally or physically prepared. Also, this graph points out that there isn’t necessarily a correlation between playing for the same club team (or same league) for the entire year and World Cup performance. The Guardian did a great post on this as well using a complete data set from FIFA to show national team performance based on club teams.

     

    Key Insight: Scoring a lot of goals helps leading to victory, but at the end of the day, you have to stay consistent and win. Taking a look at Belgium and Croatia, they both scored 4 goals, but Croatia scored all those 4 goals in 1 game. Although Belgium didn’t score many goals, they won every game. One question I have looking at this graph is, can Colombia continue their perfect form?

    Goals, Cards, Attendance Statistics

    Key Insight: This year, compared to other World Cups, has been one of goals, not cards.

     

    Key Insight: Adding to the insight found from the above picture, this clearly has been a World Cup of goals, not cards.

     

    Key Insight: Longevity has helped Miroslav Klose tie the record for most goals scored (he has played in 4 world cups as opposed to Ronaldo’s 3). Also, Messi may be the best player in the world, but he isn’t yet the best in Argentina. Gabriel Batistuta has the most goals for Argentina, and Messi needs to continue his form this year. Thomas Muller, last year’s top goal scorer, is proving to be a consistently lethal threat.

      

    Key Insight: Although the United States held the World Cup in 1994, it still is today the highest attended World Cup. Due to existing infrastructure, USA has potential to host another World Cup. Maybe FIFA can make 2026 in the US.

    Can I Play With This Data

    When the World Cup is finished, we will make the 2014 data available to play with on Q&A. Currently, we have a demo experience live of 1990-2010 data that you can experience here:

    Try PowerBI Q&A on World Cup 1990-2010 Data

    *Note: Data is provided by Opta Sports and this documents 1990-2014 statistics.

    #Power BI#2014 world cup#Brazil#football#soccer#World Cup#QA

    Tue, 01 Jul 2014 16:00:00 GMT byPower BI Team0 Comment

    Read full post
  • 15 World Cup history facts with Power Map

    This blog post was authored by Igor Peev, Senior Program Manager who enjoys working on challenging projects that deliver new, innovative ways to analyze data and tell good stories with it. He works on interactive tours to present and play with data in 3D environments.

     

     

    I am not a big soccer fan. I am not very fond of watching TV, but every four years I can’t help it and get into the World Cup. There is something different about this particular competition, something totally awesome about watching teams from every country try their best to win the Champion Title. Perhaps the universal appeal has to do with the fact that in all parts of the world kids play soccer.  I too played when I was really young. I grew up in a small town in Bulgaria and we played straight on the asphalt street, pausing for the occasional car to pass by.  All we needed was a ball and four cardboard boxes to make goals. I bet that millions of others who watch the World Cup can tell you the same story.

    For this tour, I search for some basic data with Microsoft Power Query. I found a table from Wikipedia – Teams Reaching the Top Four – Results – FIFA World Cup which I imported and had to normalize. This is what the table looked like initially:

     

    This is how I normalized the above table, granted, there is more than one way to normalize this data: 

     

    I had to do a few other things:

     

    • Convert the Year column contents to date format using the Date function in Excel, like this:   =DATE([@Year],1,1)
    • Convert countries like Czechoslovakia and Yugoslavia which no longer exist, to countries which do, such as the Czech Republic and Serbia and I sincerely hope I am not going to offend anyone with that choice.
    • Build a story board on some sheets of paper about how I was going to make the tour.
    • As an intro I made a blank scene with the globe, no data, no text boxes, just the globe:

     

    Then I copied that scene and put a text box in it. That way the video will start with a blank globe and then the title appears a couple of seconds later. 

     

    My following scenes followed each logical part of script which told a story from the data above. The story followed the original table which I imported using Power Query and the Wikipedia article on FIFA World Cup. I wanted to show all countries which have been and are in the World Cup. I made a scene for that, and used Category for shading the regions using Region chart and a layer based on the Country column.

    After doing this, I made a scene about the first winners in 1930. That scene was shot from the bottom of South America looking north.

    Then I made a scene that had all of the countries which have won the World Cup Championship Title. That has 8 countries, some in South America and some in Europe so I positioned the camera over Argentina and set a ‘Fly over’ effect in the scene so that the motion is towards Europe. That way, I can have a smooth transition to the next scene which showed all countries which have won the top three spots in the last 10 World Cup tournaments. I knew from the data in the original table that a lot of these countries were in Europe and I positioned the scene such that you can see Europe best but you can see South America too.

     

     

    Later, I added another effect and set the Year in the Time well to play in accumulation.

    Adding more time to each scene – at least 10 seconds and more time to each transition in Scene Options definitely makes for a better, smooth moving tour.

    I made all other scenes about the European winners and added shaded regions for each country using the Region Chart type. I put scenes in the same geo locations together so that the camera does not have to travel back and forth too much. As a rule, I position and move the camera in each scene so that it always moves towards the beginning point of the next scene as much as possible. In that manner, the engine which applies the effects and arcs in transition works on both things at the same time and does better interpolation between adjacent scenes.

    I did choose a theme with less colors so that the visuals stand out and used monochrome colors for the shading of countries. The final scenes showed all countries who have ever scored a spot in the top 4, over time. I used the ‘First place’, ‘Second’ place columns in the Height field, set to Count Not Blank. For the very last scene I removed the Year from the Time well and added a text box saying ‘Good luck to all teams’ and a few other things. 

    Enjoy!

    #power query#Power Map#World Cup#wikipedia

    Mon, 30 Jun 2014 16:00:00 GMT byPower BI Team0 Comment

    Read full post
  • 6 new updates in Power Query - June 2014

    In this post

    Download Power Query Update

    What's New in Power Query

    Support for “Whole Number” column type

    Additional transformations in Query Editor ribbon

    Aggregate Column dialog

    Additional Date/Time filters

    Improved Navigator experience: Refresh Tree & Function Invocation

    Cache Management options

    What’s New in Power Query

    We are very happy to announce the availability of a new Power Query update. You can download the update from this page.

    There are quite a few new or improved features in this release, as well as several quality fixes. You can watch the following video or read the rest of this blog post for more details.

    Support for “Whole Number” column type

    Data Type menus in the Home & Transform tabs in the ribbon and in the Change Type context menu on preview columns now let users set a column type to Whole Number. Previously supported type (Number) has now been renamed to Decimal Number in the UX. This provides better alignment with Data Model numeric types, which was a recurrent piece of feedback from our customers.

     

    Additional Transformations in Query Editor ribbon

     A few new transformations have been added to the Query Editor ribbon under the Transform & Insert tabs. The motivation for this is to increase discoverability of existing capabilities that customers were having trouble to find without UX support.

    -          Home tab:

    • Keep Bottom Rows: Remove all rows from a table except the last N rows.
    • Keep Errors: Keep rows with errors in the selected column(s). Note that there will be another improvement in the next SU geared towards providing a better experience to identify rows with errors in the output of a query.

    -          Transform tab:

    • Count Rows: Return the number of rows of a table.
    • Unpivot Other Columns: Unpivot all columns in a table except the currently selected ones.

     Aggregate Column dialog

     Similarly to how we added a dialog for “Expand Column” in the previous release, this month we have added a dialog for “Aggregate Column”. The goal of this change is to improve discoverability of this operation by exposing it via the Editor ribbon, as well as enable editing Aggregate steps later (via the Edit Settings button next to the step).

    Additional filters for Previous/Next “N” days/weeks/months/quarters/years

     Previously, Power Query only supported filtering by Last or Next Day/Week/etc. instead of letting users customize “N”. In this release, additional filters have been added to the Filter dropdown menu. Users can now specify the value of “N” as well as the unit of time that they want to use for their filter.

     

      

    Improved Navigator experience: Refresh Navigator items & Function Invocation experience

    The Navigator task pane now exposes “Invoke” for functions. This lets users directly invoke a function instead of having to load the function into the workbook and invoke it from there. This is available for functions in Azure Marketplace, Databases and OData feeds.

     

    Also the Navigator Task pane now has a Refresh button to refresh the contents of the tree as needed by the user.

    Cache Management options

    The Power Query Options dialog now includes a new section about Cache Management. This section lets users check the amount of memory used by the PQ Cache, delete all cache items and configure the max size allowed for this cache.

     

    That’s all for this update! We hope that you enjoy these new Power Query features. Please don't hesitate to contact us via the Power Query Forum or send us a smile/frown with any questions or feedback that you may have.

    Follow this links to access more resources about Power Query and Power BI:


    #power query#Excel#Query Editor#column dialog#cache management#refresh#date time filters#column type

    Thu, 26 Jun 2014 16:00:00 GMT byPower BI Team7 Comments

    Read full post
  • The Amir Netz Selection

    Yesterday Microsoft News featured Amir Netz on its People series. Amir is a Technical Fellow, Chief Designer of SQL Server and Chief Architect of Microsoft BI offerings. He holds more than 40 patents, including PowerPivot and Power View. He has been with us since 1997 when the company he founded, Panorama Software, got acquired by Microsoft.

    Geoff Carter, the writer of the article, says: 'When you watch the video - Amir's Power BI announcement at WPC last year - you think, wow, the man is cool. Amir Netz is cool." We agree with Geoff 100%. 

    Some people may not be familiar with Amir's trajectory at Microsoft, and his pivotal role in the development of Power BI. For this reason, we wanted to follow up this great story with a selection of Amir's best interviews, demos and presentations, which include an amazing Power BI demo at Microsoft's Worldwide Partner Conference 2013.

    Enjoy!

    Amir Netz announces Power BI at WPC 2013

    Amir announces the preview of Power BI for Office 365 at WPC 2013, staying true to his style

     

     

    SQL Server 2014 Launch Event - April 2014

    Amir revisits his music rankings demo at the SQL 2014 launch event at San Francisco

     

      

    Interview: Accelerate your insights - making data more accessible

    Amir shares his view on how important it is to make data more accessible to everyone

     

    Interview: Accelerate your insights - Microsoft's vision for BI and analytics

     Amir explains Microsoft's vision for self service analytics and BI

     

    #Power BI#amir netz#WPC#microsoft news

    Tue, 24 Jun 2014 16:00:00 GMT byPower BI Team2 Comments

    Read full post
  • Visualizing the World Cup history with Excel and Power BI

    This blog post was authored by Marc Reguera, finance director, Microsoft

    Besides my family and friends, I have two additional passions in life: soccer and data visualization. Like most kids growing up in Europe, my dream was to become a soccer professional. I wanted this so badly that I actually followed my dream, and I played 3rd division for the Paris Saint Germain’s youth team in France. My father was smart in making sure I could only play soccer if I had good grades.  After an unsuccessful tryout with Bastia in Corsica in 1990, I switched to plan B. I started a career in finance.

    Plan B was not bad. I have been fortunate to be at Microsoft in Finance since 1994, and I discovered very early the power of data visualization. I studied the best (Edward Tufte and Stephen Few to name a couple….) and applied this thinking  to my job – I focused on telling stories about data with great visuals and strong narratives.

    In the last four  years, our business intelligence (BI) tools have really transformed the way hundreds of finance folks do their job. First Power Pivot came out, which allows “technical dummies” like me to play with millions of records without knowing anything about SQL. Power View was released two years ago, which gave us the ability to create touch enabled, interactive, and dynamic views in minutes. As of today, most of our reviews at Microsoft have evolved from static lectures to dynamic conversations because of Power View. Power Maps brought a 3D local perspective. Power BI enabled us to ask questions against our data in the most natural way using natural language. So, what does this have to do with the title of this blog post? Let me tell you.

    I got ahold of a few thousands rows of World Cup data, so I decided to combine my two passions. Using Power Query and Power Pivot I was able to add formulas, join mapping tables and even duplicate records without any external help. Then the fun really began - I started building visualizations with Power View and Power BI. To be honest, I didn't think I would learn anything because I am a bit of a soccer encyclopedia, but I learned a lot.

    Here are six amazing insights I learned after exploring World Cup data using Microsoft’s BI tools

    1. There were no 0-0 in the first five World Cup from 1930 to 1954.  The first 0-0 took place in 1958
    2. The World Cup tournament that had the most spectators took place in USA in 1994.  In fact, it was the only time when we had over 3.5M even though only 52 games were played (versus 64 today).  That is because the stadiums in the USA are all big and generated an average attendance per game of approximately 70k while other tournaments peaked at average attendance per game of 50k.  This is why we usually get 3M to attend the world cup (64 games times 50k)
    3. The average goal per game decrease is really driven by the increase by the percentage of games with 1 goal or 2 goals.  Games with 0 goals have actually been flat around 10% since 1958 while the percentage of games with 1 and 2 goals has increased to ½ of all the games played
    4. Italy is the most effective when comparing games statistics.   When looking at cumulative victories/ties/losses, Germany has a much better record yet. Italy has won the world cup 4 times vs. Germany 3 times.  In fact, Germany has the 2nd best record of all teams in the last 4 world cups (just behind Brazil) even though they haven’t been able to win the trophy (In the last 3 world cups, Germany lost in the final or semifinal)
    5. Part of the issue with Germany is that they have never been able to beat Italy the world cup.  They have played 5 games against Italy (lost 3 and tied 2)
    6. If Germany plays Italy, they should be careful in the first 10 minutes and the last 10 minutes as Italy is scores over 1/3 of their goal on average during these periods.  On the other hand, Brazil is the team that scores the least goals at the beginning and end of the games.

    I don’t want you to miss out on the fun of playing with these visualizations! I invite you to give try it out yourself. 

    Marc Reguera
    Finance Director
    Microsoft

     

    #power query#Power BI#Power Map#Power View#Excel#live demo#World Cup

    Thu, 19 Jun 2014 16:00:00 GMT byPower BI Team5 Comments

    Read full post
  • Try Power BI Q&A with World Cup data!

    In this Post

    The World Cup through the lens of technology

    The Dataset

    Next Week

    Instructions: Try Power BI Q&A yourself

    The World Cup through the lens of technology

    The 2014 World Cup starts today in Brazil, with the host country kicking off against Croatia in what should be the biggest sporting event in history. The World Cup is generally the largest watched and talked about event whenever it comes around every four year, but this one is special. This year, many people around the world have the ability to interact with the World Cup like never before. Twitter, Facebook, live streaming, cheaper access to TV and internet allow people to participate by sharing their thoughts and overall roller coaster journey of emotions as they see their favorite team(s) challenge for the most coveted trophy in the world.

    Not only has technology improved to allow people to share through social media and watch games through various channels, but it has also allowed for richer insights into the data itself. Business intelligence, or more plainly, data analysis has improved greatly in 4 years. With that, Microsoft proudly introduced PowerBI Q&A back February 2014, a natural language query experience which some of you may or may not be familiar with. Rather than trying to explain what it does through lengthy gibberish, let’s follow Q&A through its emotional roller coaster as it works to uncover meaningful insight about players, teams, coaches, referees, and everything in between.   

    What we hope to find:

    1. Players who have scored the most goals
    2. Which team has scored the most goals in the second half
    3. Which manager has won the most
    4. Which teams win when they have more fouls then their opponent? Does that team win frequently by fouling a lot
    5. What are the records for each team when a certain referee has officiated their game
    6. How well does Brazil do against Argentina
    7. Anything and everything statistical about the world cup

    Let’s have fun with this!

    The Dataset

    Our Dataset: Historical data from the 1990-2010 World Cup. Provided by Opta Sports Ltd.

    We have data from the 1990-2010 world cups ranging from player statistics, team statistics, managers, and referees, to which stadium each game was played in and its attendance. Let’s go through a Q&A experience and answer some questions about world cup statistics people will have.

    Make sure to go through the guided preview first to see what kind of questions and visuals we can show through Q&A.

    Players

    1. brazil players in 2010
    2. brazil defenders in 2010
    3. daniel alves da Silva total tackles by game in 2010

    Teams

    1. teams in the 2010 world cup
    2. which team has played in the most world cups
    3. which team has won the most world cups
    4. which team has played the most games

    Bookings

    1. Show time with the most bookings as line chart
    2. stadiums with the most bookings
    3. Which team has received the most cards for dissent

    Goals

    1. show games with greater than or equal to 4 goals and more than 5 bookings
    2. which player has the most goals
    3. which player has the most goals for Germany
    4. which player has the most goals in the firsthalf
    5. who scored the most goals in 2010

    Substitutions

    1. how many substitutions in 2010 by manager
    2. subs in 2010 final

    Awards

    1. 1998 awards
    2. France awards
    3. Who won golden boot in 2006
    4. Golden ball 

    Managers

    1. who coached the USA team in 2010
    2. Germany managers by year

    Referees

    1. number of bookings by referee in 2010
    2. referees that called finals and which game and year

    Next Week

    We will be adding more statistics for teams and players, as well as ways to talk about those statistics. (ex. An aggressive team is a team that fouls a lot. A reckless team is a team that fouls a lot and gets a lot of bookings.) If there are certain statistics that you really want to see, post a comment below. Another great feature of Q&A is that we know what questions you are asking that we aren’t answering correctly. So each week, check back to see if those questions that weren’t previously working are working now. This perfectly highlights the communication between a model creator and the end user. Sometimes Q&A won’t get every answer correct, but we can identify which ones aren’t working and ensure that they do.

    If there are excitements and frustrations you want to express, please leave a comment below. 

    Instructions: Try Power BI Q&A yourself

    You can go now to PowerBI.com and start playing with our World Cup model right away

    Let us show you how Q&A works: Type a question and get your answer as an interactive chart. That’s it. No more instructions necessary to start.

    As you start this experience, we have predefined a few questions in the model for you to try out.

    For example: which player scored the most unassisted goals per world cup?

     

    At any point in time you can stop or pause the predefined questions by going to the Q&A bar or clicking the pause button when you hover over the answer

     

    Also, while a question is being typed (by you or by our preloaded questions) you also get recommendations on different types of questions you can ask based on the data behind the model and what is being typed.

     

    You can modify the type of visualization being used by clicking on the right side menu. This menu also includes filters and the fields available in the model, highlighting the ones being used in the current visualization.

    For example, our predefined questions includes number of substitutions by team in 2010

     

    Select for example ‘map’ and you get fouls by country in a map instead of a scatterplot.

     

    Then click on Fields and select ‘fouls per game’. Now you have the average fouls per game for each team

     

     

    General Tips:

    1. Currently, we are adding nicknames/shorter names for players. For now, you will have to type in a player’s full name. Luckily, Q&A autocomplete very well so if you are looking for a certain player but don’t know the full name, start typing in the first name of that player and hit “tab” when autocomplete has found the player you want. (ex. Typing in Daniel will let you find Daniel Alves….a Brazilian defender)
    2. If one of you questions brings back an almost correct answer, it might not seem like the “best” experience. Q&A should magically work on any type of question. However, Natural language processing is still in its early stages as a technology, and we are working on improving it every month (beauty of being a cloud-based SaaS…monthly iterations). Use the filter/field panes to *tweak* your result. 
    3. We are currently working on adding synonyms for the ways you talk about certain things. For now, if you want to talk about these following terms, you have to use the words as is, and can’t use a variation (yet):
      1. Firsthalf
      2. Secondhalf
      3. ExtraFirstHalf
      4. ExtraSecondHalf
      5. ShootOut
      6. Semi-finals
      7. Quarter-finals
      8. Group  A/B/C/D/E/F/G/H (two spaces needed inbetween group and the letter)
      9. 3rd and 4th Place
      10. Round of 16

     

    Don’t forget that we have another great Q&A demo at PowerBI.com using our classic Olympics dataset. Feel free to play around with too using the same features that we described on this post.

    Finally, we’ll be doing an open demos of the World Cup Q&A today, 5 PM at the Microsoft Store in Bellevue Square WA. If you happen to live in the area, stop by to experience this Q&A experience in the store with our Power BI experts.

    Happy World Cup and Q&A-ing with Power BI

    #Q&A#Power BI#2014 world cup#Brazil#football#soccer#World Cup

    Thu, 12 Jun 2014 16:00:00 GMT byPower BI Team1 Comment

    Read full post
  • Explore the Top 250 Movies since 1921

    Moviegoers have a lot of tools at their disposal today compared to decades ago. With the amount of movies being released every week, now they can research about, find showtimes for and even rate their favorite flicks in sites like IMDb.com.

    Dan English created this interactive dashboard using IMDb data for the top 250 movies of all time based on more than 62 million reviews.

    Play around with this interactive visualization to unlock  your own insights about this ranking.

    Some interesting facts from this dashboard:

    • The Shawshank Redemption, a movie based on the famous Stephen King's novel, was not a critics or fans favorite when it was released 20 years ago, but after that it has become a cult flick and its number one spot in this list confirms that.
    • The R rating is the one that drives more votes into the top 250 movies, more than doubling rating number two, PG-13.
    • Starting in the 1980s, you can see how the number of movies released in each decade with presence in the ranking increases around 30%.

    You can also hear from Dan about how he used Excel to tap into this online data and Power BI to visualize it.

    Let us know what you think about this visualization and find more information about Power BI at www.PowerBI.com.

    #Power BI#BI#Power BI Demo Contest#Power View#Power BI Training#demo contest#live demo#IMDb#top 250#movies

    Tue, 10 Jun 2014 16:00:00 GMT byPower BI Team1 Comment

    Read full post
  • The visualization that changed the world of data

    In 1854, a cholera outbreak in London puzzled the scientific community as well as the local authorities. Dr. John Snow, an epidemiologist studying the outbreak was trying to find the root causes by looking at the data available.

    Dr. Snow's approach is very similar to the approach scientists use today to unlock insights from data, but in 1854, things were a bit different. Back then, researchers had very strong preconceptions about how diseases were transmitted, and the concept of germs was not yet understood. To try to understand what was happening, Dr. Snow mapped the cholera cases. Sure, today this might seem an obvious way to tackle this challenge, but back then, it was groundbreaking.

    Data visualization enabled Dr. Snow to recognize that the cases were clustered around water pumps, specifically the one in Broad Street. This insight opened an amazing new branch of questions and answers for this outbreak and epidemiology in general.

    As part of our demo contest, Jason Thomas created this visual tribute to John Snow taking advantage of maps, bar charts and slicers in Power View and Power BI.

    We invite you to explore Dr. Snow's findings by using this interactive visualization. Play with the different radius and pumps to see the impact they had in cholera cases. Also, check out the London map to see where pumps where located back in 1854 and their impact.

     

     

    You can also hear from Jason first hand on how he used Excel and Power BI to build this modern age tribute.

    Let us know what you think about this visualization and find more information about Power BI at www.PowerBI.com.

    #Power BI#BI#Power BI Demo Contest#Power View#Power BI Training#demo contest#live demo#epidemiology#cholera map

    Thu, 05 Jun 2014 16:00:00 GMT byPower BI Team2 Comments

    Read full post
  • New in Power BI: Data Source Management

    Introduction

    The data management capabilities in Power BI for Office 365 are delivered both in the Power Query add-in for Excel, and in the Manage Data Portal in Office 365. To help data stewards and information workers get the most from their Power BI experience, we’ve updated the Manage Data Portal to improve the visibility and manageability of arguably the most important metadata in Power BI: data source metadata.

    In the next paragraphs, we’ll show you how this new feature works and how you can begin leveraging it. You can also let Matt Roche, Senior Program Manager walk you through these updates.

    The Importance of Data Source Metadata in Self-Service BI

    Data source metadata, specifically the “Approver for access requests” property for Data Sources, is probably the most valuable metadata in the Power BI Data Catalog. Query metadata may get more visibility and attention because it is prominently displayed, but data source metadata does the heavy lifting when it’s needed the most. Data source metadata can answer the most vexing questions that users face – what is the data source, and how to get access to the data you need, when you don’t already have permissions.

    Here’s the experience when a user attempts to connect to a secure data source, and that data source has not been annotated in the Data Catalog:

    Data source metadata is displayed when a user attempts to connect to that source. Instead of displaying a lengthy address that may confuse non-technical information workers (servername.networkname.com;databasename or \\servername\sharedfoldername) the connection dialog in Power Query will instead display the steward-supplied business-friendly name. Even more importantly, when the data source has a request access URL specified, Power Query will also display a “Request access” link within the connection dialog itself, so that users can easily initiate the process to gain access. This property accepts either an email address or a URL, and should typically be set to one of the following values:

    • The email address of the person or team who manages access to the data
    • The URL for the documentation (on a SharePoint site or shared folder) that describes how to request access to the data
    • The URL for a self-service identity management tool (such as Forefront Identity Manager) where users can directly request access to the data.

    Here’s the experience when a user attempts to connect to a secure data source, and that data source has been annotated in the Data Catalog by a data steward:

    The difference between these two experiences is significant. In the first scenario, the user is blocked – unless he knows the person or team to contact to request access, he has no way to proceed, and cannot open the query that uses this data. In the second scenario, the user can clearly see what the data source is, and by clicking on the Request access link can easily follow the process established by the data source owner to get the permissions required to access the data.

    Making Data Source Management More Discoverable

    Given the importance of data source metadata, we’ve updated the Power BI Manage Data Portal to increase its visibility and to make it even easier for data stewards to discover data sources that lack metadata, and opportunities to make their data more usable.

    In the “My Queries” page in the portal, the list of shared queries now includes a “Data Source Status” column that displays the completeness of the metadata for the data sources that are used by each query. This column can be red, yellow, or green:

    • Red: Missing – no metadata has been provided for the data sources
    • Yellow: Incomplete – some metadata has been provided, but at least one attribute is missing
    • Green: Complete – all metadata has been provided for the data sources

    The “…” action fly-out for queries has also been updated to display the full list of data sources used by a query, the metadata status for each data sources, and a link to jump directly to the edit experience for a data source.

    By clicking on any of the data sources, the user can view and edit the data source metadata without needing to first navigate to the data sources list. Once the data source metadata has been completed, the magic can begin. When a user previews the query, the friendly data source name will be displayed. And any time a user opens a query, the connection dialog will include a “Request access” link. When the user clicks on that link, the action specified by the data steward will take place, opening email or a web browser. This is “magical” because it represents a best-of-both-worlds experience. The user gets help exactly where he needs it, and the data source admin retains control of the data and of the process for granting access to the data.

    Summary

    Complete data source metadata adds value to any query that uses that data source, and can help users successfully complete the “last mile” of opening the query and connecting to the underlying data. The new capabilities in the Manage Data Portal make it easier than ever for data stewards to identify opportunities to add missing data source metadata, and to make enterprise data discoverable and understandable by business users.

    Follow this links to access more resources about Power Query and Power BI:

     

    #power query#Power BI#Office 365

    Tue, 03 Jun 2014 16:00:00 GMT byPower BI Team0 Comment

    Read full post
  • What’s New in Power Query

    We are very happy to announce the availability of a new Power Query update. You can download the update from this page.

    Here is a summary of five exciting new features included in this release:

    1. Connect to Recent Data Sources (including pinned/favorite sources).
    2. Search results tabs for different Search scopes (Enterprise & Public).
    3. Query Editor improvements:
      • Improved Insert tab capabilities & new Transform tab, which expose several new transformations.
      • New “Expand Column” dialog.
      • New “Choose Columns” dialog.
    4. Data source lineage information in Workbook Queries task pane.
    5. Better experience for subscribing to data services from Windows Azure Marketplace.

    In the next paragraphs, we’ll show you how each of these new features work and how you can begin leveraging them. You can also let Miguel Llopis, Program Manager for our Power Query team walk you through these updates.

    Recent Data Sources

    One of the most commonly requested features in Power Query has been the ability to easily reconnect to a previously used data source. This is very handy as it saves users from typing in the same data source information every time. It also helps users remember data sources that they don’t connect to very often, but that they need to remember and use from time to time.

    After installing this update, you will find a new “Recent Sources” button under the “Get External Data” group in the Power Query tab. This button will be enabled after you create a successful connection to any data source. From this menu you will have the option to access the 15 most recently used sources. Furthermore, you can manage the sources list via the “More Recents” option and pin up to 10 entries to the dropdown menu. Pinned entries are sorted alphabetically and most recent sources are sorted by last accessed time.

    In addition to easy access to sources via the ribbon dropdown menu, there is a management dialog view where you can perform additional actions using your recent and pinned sources. You can also clear your Recent Sources list.


    Search results tabs for different Search scopes

    A major area of feedback from Power BI customers using Power Query Online Search was the need for better distinction between Public results and results from their own organization. As part of this update, we have changed the way search results are displayed inside the “Online Search” task pane.

    Search results are now organized in tabs according to the domain they belong to (public results vs. organizational results). This allows the user to retrieve the search results for a given term from each scope all at once, without having to select a different Search scope option and perform a new search every time. In addition to better separation between Search scopes, Power Query now also provides the count of results within each scope. As part of this change we have also removed the “All” tab/scope where public results were displayed together with enterprise results.

    Here is a screenshot of the updated “Online Search” task pane.

    Note: the “Organization” tab will only be visible when the user is logged in. Likewise, the “Public” tab won’t be displayed if the user is connected from a restricted market (where public search is not available).

    New Transformations available under Transform and Insert tabs

    The number of transform operations available in the Query Editor ribbon has significantly increased in this release. The Insert tab has been enhanced to include several new options to insert new columns based on Text, Number, Date & Time and other common operations referencing values from other columns in a query.

    Furthermore, a new Transform tab has been added to the ribbon. This new tab exposes common transformations that can be applied to an entire table or to specific columns within a table.

    Expand builder

    “Expand Column” has been available for quite some time in Power Query. However, it was not very discoverable to end users and it was not possible to modify an existing Expand step via the UI (it required manually modifying the generated formula).

    In order to improve discoverability of Expand operations, we have added an entry point in the Transform ribbon tab. This option brings up a new Expand dialog which lets users search and pick the columns that they would like to expand, similar to the existing Expand dropdown menu on column headers. This dialog can also be used for editing an existing Expand step via the “Edit Settings” option (or gear icon) in an existing step.

    Choose Columns dialog

    To facilitate working with wide tables, we’ve introduced the “Choose Columns” dialog. This dialog can be accessed from the Home tab in the Query Editor ribbon.

    “Choose Columns” allows users to search for columns by name and quickly subset a table to the relevant columns, instead of having to scroll in the preview and select several columns to keep or remove. The functionality of this transformation is equivalent to “Remove Other Columns”. In other words, Power Query keeps the columns that are selected by the user and discards any other columns that may appear in the table.

    As a bonus feature, we’ve also enabled the ability to edit existing “Remove Other Columns” steps using this same dialog. As mentioned previously with “Expand Column”, having an easy way to modify existing steps was a frequent customer ask.

    Data Source lineage information in Workbook Queries task pane

    Query peeks in the Workbook Queries pane now also include the list of data sources that a query depends on. This enables users to easily understand where the data of a given query or report is coming from, which helps increase their level of confidence in the data retrieved by their queries.

    Better experience for subscribing to data services from Windows Azure Marketplace

    In order to improve discoverability of how to subscribe to Azure Marketplace services and consume them in Power Query, we have added a link in the Power Query Navigator task pane to the Azure Marketplace Catalog page for services recommended for consumption inside Excel. Users can click this link, browse services and subscribe to new ones. Then they will be able to see them in the Navigator the next time that they connect to Azure Marketplace using Power Query.

    That’s all for this update! We hope that you enjoy these new Power Query features. Please don't hesitate to contact us via the Power Query Forum or send us a smile/frown with any questions or feedback that you may have.

    Follow this links to access more resources about Power Query and Power BI:

    #power query#Query Editor#Recent Data Sources#Windows Azure#Workbook Queries

    Thu, 29 May 2014 16:00:00 GMT byPower BI Team20 Comments

    Read full post
  • May updates for Power Map now available to all Office 365 customers and Preview extension available for Office 2013

    Today we are announcing that Power Map will become available to all Office 365 subscriptions that include the Office applications. This is part of our larger investment in making data visualizations more accessible to our customers. We are in the process of updating both Office.com and TechNet to reflect these changes.

    Last month, we announced via the Power BI blog that we will be removing the expiration on the Power Map Preview and making it available for all versions of Office 2013 and Excel 2013 standalone. The updated version of the preview is now available. Office 2013 customers will need to re-download the Preview on the Download Center in order to get the Preview without the original May 31st expiration date. Office 2013 customers will not receive future updates delivered to Office 365. Office 365 customers will automatically get the latest feature updates like Repeat Tour.

    Please note preview features are not supported and we do not encourage the preview version be used in production. A supported version of Power Map is available as part of the Office 365 subscription today. Power Map will also be added to Excel in the next version of Office for customers purchasing Office under a perpetual licensing agreement. 

    Power Map Updates

    As part of our continued effort to bring new features requested by customers to Power Map, we are adding the ‘Repeat Tour’ feature. Next month we’ll cover the ‘Refresh Data’ feature and explain how the two can be used together.

     

    Since launch we’ve heard from many customers who want to use Power Map as a presentation tool in scenarios such as demos and kiosks. In these scenarios, they need to always have their tour playing. Currently, these customers must export their Power Map tour to video and set the video to repeat. However, doing so eliminates one of the most powerful elements of Power Map, the ability to pause the tour and interact with the map directly. To address this need in the May update we’ve added a “Repeat” button to the Power Map tour playback. This will automatically cycle back to the beginning of a tour once the tour is complete. This feature allows interactive Power Map tours to repeat and thereby opens up Power Map to a whole new set of uses.

    Look for more Power Map feature updates to enable new scenarios that you request. Leave more suggestions in the comments and Map On!

    #Power BI#Power Map#Office 365

    Wed, 21 May 2014 16:00:00 GMT byPower BI Team6 Comments

    Read full post
  • Power BI Connectivity to SAP BusinessObjects BI Now Generally Available

    Today Microsoft and SAP announced the general availability of Power BI connectivity to SAP BusinessObjects BI. This joint business intelligence interoperability solution is delivered in Microsoft Excel, Power BI for Office 365, and SAP BusinessObjects BI. With this solution, users can seamlessly connect to SAP BusinessObjects BI Universes as another supported data source in Power Query for Excel, enabling them to access and analyze data across the enterprise and share their data and insights in the cloud and any device through Power BI.

    Customers that have invested in both companies’ BI stacks can now extend their trusted enterprise data into familiar business tools for self-service business intelligence. Access and analyze data with Power Query for Excel:

    • Seamless access to trusted enterprise data – connect to the latest, accurate, and trusted data from an SAP BusinessObjects BI Universe including SAP BW and SAP Business Suite.
    • Business friendly semantics in familiar business tools – easily work with data by navigating dimensions, measures, and business views from an SAP BusinessObjects BI Universe directly in Excel.
    • Combining data from within and outside your organization – discover and combine SAP data with public, corporate and big data sources such as Microsoft Azure HDInsight.
    • No hassle data refresh on-premises – refresh your data without a manual process on-premises.
    • Deep interactive analysis and visualization – explore and present your data from an SAP BusinessObjects BI Universe visually through charts, graphs, and a 3D map experience.

    Share and collaborate on newfound data and insights from an SAP BusinessObjects BI Universe by unlocking them across your organization with Power BI.

    Watch this video to learn how to access and analyze enterprise data from SAP BusinessObjects BI Universes in Excel and Power BI: 

    To get started today, download Power Query for Excel with the generally available SAP BusinessObjects BI connectivity and follow the SAP BusinessObjects Tutorial. You can also sign-up for a free trial of Power BI for Office 365. As part of the Power BI trial you’ll receive a free 30 day trial of Office 365 ProPlus giving you access to the latest version of Excel.

    #power query#Power BI#SAP#SAP BusinessObjects BI

    Mon, 19 May 2014 15:00:00 GMT byPower BI Team1 Comment

    Read full post
  • Introducing new forecasting capabilities in Power View for Office 365

    Today at the PASS Business Analytics conference we announced the availability of new forecasting capabilities for Power View in Office 365. In Power View, you can now apply predictive forecasting, and hindcasting, when visualizing and exploring your data. Forecasting in Power View utilizes built-in predictive forecasting models using exponential smoothing to automatically detect seasonality in the data to provide forecast results from a series of data. Explore forecast results by adjusting the desired confidence interval or by adjusting outlier data to see how they affect results. You can also hindcast to see how Power View would have predicted the present and recent past based on older data.

    The best data for forecasting is time series data or uniformly increasing whole numbers. The line chart has to have only one line. Multiple-line charts won’t work, even if all but one line is filtered out. See more things to keep in mind about forecasting.

    Evaluate the prediction quality

    You can evaluate how accurate a prediction is by combining these techniques:

    • Vary the confidence interval to understand expected variance in future forecast results
    • Try hindcast to see how the algorithm performs in practice on your historical data.

    The forecasting in Power View performs advanced statistical analysis of your data to generate forecasts that incorporate trends and seasonal factors. Learn more about these methods and how to customize your forecasts to get the best results.

     

    Try forecasting

    Try the new forecasting capabilities of Power View today on your own data or with the sample report available as part of the Power BI report samples. To view your own data, upload a workbook with a Power View time series line chart to Power BI for Office 365.

    Open the file in Power BI and switch to Power View in HTML 5 (choose Try the HTML5 version of Power View  in the lower-right corner). 

    Then click the forecast arrow or drag the forecast dot in the line chart, and you’ll see the forecasting options appear in the Analysis pane to the right of your report, as seen in the following image. 

     

     

    Try hindcasting

    Hindcasting shows how the forecast would've predicted the recent past, from the vantage point of the past.

    1. Now that you've got a forecast, you also see an arrow pointing left, at the beginning of the forecast. That’s the hindcast arrow.
    2. Click the hindcast arrow or drag the hindcast dot to the left to see how it would've predicted current values, if you had asked in the past.

    To undo forecasting, click the arrow to the left of the forecast dot

    Things to keep in mind about hindcasting

    • For hindcasting to work, the line chart needs a minimum of seasonality periods.
    • Filtering can impact hindcasting.
    • Hindcasting is really a new and separate forecast, with its own seasonality and values in the Analysis area.

    Adjust your forecast

    The following capabilities are available to help you adjust your forecast:

    Adjust for Seasonality

    Power View automatically detects what appears to be the seasonality of the data – in other words, a complete cycle of peaks and dips. For example, if you have high summer sales and low winter sales year over year, and you have daily data points, then the seasonality may be 365 - 365 data points in the one-year data cycle. You can override automatic seasonality detection.

    1. Start a forecast by dragging the forecast handle in a line chart.
    2. The Analysis Pane will automatically pop up on your right hand side. Can’t find it? Look for Analysis in the upper-right corner.
    3. In the Analysis area, manually set the seasonality to none, 2, 3, 4, 5, 6, 7, 12, 24, 52, or 365.

       Tip : Seasonality works best if the chart has at least four times more values than the data cycle. For example, if you set the seasonality to 52, you’d want to have at least 4 times 52, or 208, values in the time series axis of your chart.

     Adjust your confidence interval

    The confidence interval shows visually how reliable the forecast is. Standard deviation is the setting for the confidence interval of forecast results.

    1. Start a forecast by dragging the forecast handle in a line chart.
    2. The Analysis Pane will automatically pop up on your right hand side. Can’t find it? Look for Analysis in the upper-right corner.
    3. In the Analysis  pane, adjust Standard Deviation to show values with a standard deviation of 0, 1, 2, or 3..
     

    Line chart with one standard deviation.

    Adjust for outliers and explore what-if scenarios

    When you change data points in a line chart, the forecast changes based on the new values. You’re not changing the underlying data in the workbook or data source.

    Note :  What-if analysis currently works only in Power View  HTML5 with forecasting, and not in the Power BI app.

    1. Start a forecast by dragging the forecast handle in a line chart.
    2. Drag data point dots either up or down in the historical data to see how that changes the forecast.

    Drag the dot up or down.

    The line chart after dragging the dot down.

    To undo the change, double-click the data point dot.

    Things to keep in mind about forecasting

    Forecast is available for line charts only.

    • The x-axis value needs to have a date/time format or be a uniformly increasing whole number.
      • It can’t contain text or decimal numbers.
      • Sometimes x-axis values appear to be dates, but they’re actually formatted as text. They won’t work for forecasting.
    • The chart has to have only one line. Multiple-line charts don’t work, even if all but one line is filtered out.
    • The line chart has to have fewer than 1,000 values. Power View samples data in charts with more than 1,000 values, and forecasting won’t work on sampled data.
    • The four most recent values in the time series must increase uniformly, meaning there’s the same length of time between each value.
    • The interval between values has to be at least one day.

    Power View completes missing values before forecasting

    Sometimes data has gaps in it—maybe a value is missing in a row, or maybe a whole row is missing. Power View fills in missing values before forecasting. Power View can add new values up to 100% of the existing values, so up to half the values in the line chart could be invented. If the chart has more missing values than that, Power View can’t forecast.

    Filtering forecasts

    A line chart contains all the values in the fields on the x and y axes. To limit those values, filter the line chart or filter the whole sheet. More about filtering in Power View. But filtering historical data may reduce the forecast quality.

    Filtering may also result in a chart with more than 50% of the values completed by Power View. If so, Power View can no longer create a forecast for that line chart.

    Also keep in mind

    You can print the HTML5 version of Power View sheets, but the prediction and analysis doesn't print.

    Undo (Ctrl-Z) doesn't work. See Remove predictions for undo tips. 

     

    If you want to try the new forecasting capabilities right away, you can interact with the following report right here:

     

     

     

     
    #Power BI#Power View#Forecasting

    Thu, 08 May 2014 16:15:00 GMT byPower BI Team10 Comments

    Read full post
  • Describing the forecasting models in Power View

    Power View performs advanced statistical analysis of the data in your line charts to generate forecasts that incorporate trends and seasonal factors. If you want to learn more about these methods and how to customize your forecasts to get the best results, read on!

    We've also provided some references at the end of this article if you are interested in the background behind these methods.

    In this article

    Which algorithm does Power View use for forecasting? Can I view the formula?

    Does Power View use the same forecasting algorithm as in the Data Mining Add-ins for Excel?

    What is hindcasting?

    How are missing values handled?

    What is seasonality? How does the seasonality value affect my forecasts?

    How can I tell if the forecasts are any good?

    References

    Which algorithm does Power View use for forecasting? Can I view the formula?

    Forecasting in Power View is based on an established suite of methods for time series prediction called exponential smoothing. Over the years many methods have been developed for the analysis of time series, depending on whether the data is strongly seasonal or has no seasonality, how much noise there is in the data, and whether the data contains “surprises” or irregular peaks. The exponential smoothing method has a good track record in both academia and business, and has the advantage that it suppresses noise, or unwanted variation that can distort the model, while efficiently capturing trends.

    For Power View in Excel, we provided two versions of exponential smoothing, one for seasonal data (ETS AAA), and one for non-seasonal data (ETS AAN). Power View uses the appropriate model automatically when you start a forecast for your line chart, based on an analysis of the historical data.

    The formulas are not output but the general method is widely accepted in academia, and we’ve described the details here:

    Seasonal algorithm (ETS AAA)

    The seasonal algorithm (ETS AAA) models the time series using an equation that accounts for additive error, additive trend, and additive seasonality. This algorithm is also popularly known as the Holt-Winters algorithm, after the researchers who described the characteristics of the model. The Holt-Winters method is widely used, for example, in predicting and planning demand in businesses.

    For forecasting in Power View charts, we made several enhancements to the Holt Winters algorithm to make it more resistant to noise in the data. Specifically, we have made the following changes:

    Use of validation window for optimal parameter selection

    The classical Holt-Winters method finds the optimal smoothing parameters by minimizing the mean sum of squares of errors for predictions in the training window, looking only at predictions that are one-step ahead. However, the errors you get from looking just one step ahead might not be representative of the errors you get when you want a longer horizon forecast. Therefore, to improve long-range forecasting error, we introduced a validation window, which contains the last few points of the training window. Within this validation window, we do not adjust the state at each and every step, but instead, compute the sum of squares of prediction errors for the window as a whole. This has the effect of dampening variation and preserving trend across the validation window.

    State vector correction at the end of training window when data is noisy

    The original algorithm (ETS AAA) is a state-space-based forecasting method. Essentially, forecasts are weighted averages of past observations, with recent observations given more weight. A state vector is calculated throughout the training window and is used to compute the training fit. However, when the optimal smoothing parameters in the model are relatively high, the model can become sensitive to outliers. If the outliers appear in the latter part of the training window, this sensitivity is increased, because the most recent observations are weighted more heavily. In essence, an outlier in the wrong place can distort the model, pulling the training fit towards itself. As a result, forecasts can look very strange – for example, the forecast might move in a trend opposite to that in the input time series.

    To avoid such distortions, we automatically track variations in the training state. When we detect large variations, we adjust the trend in the time window to more closely match the overall trend of the time series and adjust the forecast values accordingly.

    Non-seasonal algorithm (ETS AAN)

    The non-seasonal algorithm (ETS AAN) uses a simpler equation to model the time series, which includes only a term for additive trend and additive error, and does not consider seasonality at all. We assume data values increase or decrease in some way that can be described by a formula, but that the increase or decrease is not cyclical.

    Does Power View use the same forecasting algorithm as in the Data Mining Add-ins for Excel?

    No, this is a different tool, and a different (but similar) algorithm.

    • Charting and forecasting in Power View does not require a connection to Analysis Services, and instead runs on data stored in Excel, with processing done “behind the scenes” on the Excel Data Model.
    • Forecasting on Power View charts does not use the ARTXP or ARIMA algorithms provided by SQL Server Analysis Services. Although those algorithms were developed by Microsoft Research in response to common theories and research on time series forecasting, and solve many of the same problems, they are implemented on a different code base and might yield different results.

    We recommend that you experiment with various methods of forecasting and compare results. Typically data scientists will try out many models and evaluate their accuracy before accepting a prediction as valid for business application.

    What is hindcasting?

    The forecasting chart in Power View lets you test the accuracy of forecasts by predicting past values. How does it work?

    1. The chart computes a data model based on the historical data preceding the currently selected point in time.
    2. The chart generates new predictions. Now you can see how your predictions would look like had you been predicting in some past point with only the information up to that point.

    You can compare the predicted values against the real values and visually determine whether the model is doing a fair job of predicting.

    It is important to understand that in Power View hindcasts are based only on the portion of the data preceding your selection, and not on the complete data.  This can affect the quality of the predictions in several ways:

    • Hindcasts will not necessarily mirror predictions based on the larger data set, simply because the data is different.
    • The farther you go back in time, the less information you have, and the less it will represent your current forecast.
    • If there have been any significant changes in the distribution of values throughout your historical data, you should take those into account when assessing the quality of the hindcasts.

    What about missing values?

    In some cases, your timeline might be missing some historical values. Does this pose a problem?

    Not usually – the forecasting chart can automatically fill in some values to provide a forecast. If the total number of missing values is less than 40% of the total number of data points, the algorithm will perform linear interpolation prior to performing the forecast.

    If more than 40% of your values are missing, try to fill in more data, or perhaps aggregate values into larger time units, to ensure that a more complete data series is available for analysis.

    What is seasonality? How does the seasonality value affect my forecasts?

    Seasonality (or periodicity) is defined as the number of time steps that it takes for a full data cycle, after which the cycle repeats throughout the time series. For example, if you have sales data that changes over the course of a year but tends to look the same year over year, then the time series has a seasonality (or periodicity) of one year. If your historical data is presented in units of months, and of course 12 months comprise a year, then to get the best results you would set a seasonality value of 12, meaning 12 units makes one complete data cycle.

    Seasonality estimation has a strong effect on time series forecasts, and is an important first step when predicting just about everything cyclical, from the weather to sales. Popular time series forecasting algorithms such as Exponential Time Smoothing (ETS) and Auto-Regressive Integrated Moving Average (ARIMA) require seasonality as an input and are often very sensitive to different input values.

    Therefore, we’ve added algorithms to help detect seasonality accurately, and provided you with the ability to specify a seasonality value when you know what kind of data cycles to expect. Note that seasonality detection works best when there are several seasons of data.

    Detecting seasonality in your data

    When analyzing chart data to create a forecast, we used the following methods to detect seasonality and adjust the model appropriately:

    1. Preprocess data. We remove any linear trend we find, and normalize the time series values by using the Z normalization method – in other words, rather than using the raw values, we transform the input values to get a mean centered on 0 and standard deviation of close to 1.
    2. Identify candidate periods from the power spectrum. We compute the power spectrum, which is a way of decomposing a signal into its frequencies. From the spectrum, we identify the top 3 peaks and define intervals around each peak by including the adjacent bins in the frequency domain. Further, we modify the interval boundaries to include seasonality values that are common in business applications (such as months or fiscal quarters), assuming the interval boundary is close to the business seasonality. We use these intervals to find the candidate periods from the auto-correlation function.
    3. Rank candidate periods. Given a set of possible periods based on the autocorrelation function, we find the best one by applying the following procedure:
      1. We compute the Pearson and Spearman auto-correlation of the preprocessed data.
      2. We find peaks in the Pearson and Spearman auto-correlation and match those to the intervals found from the power spectrum. We treat the highest value of the auto-correlation function (acf ) in the interval as the peak.
      3. We test for hills in this interval by fitting two line segments between interval boundaries. If we don’t detect a hill we check the next harmonic for a hill, and if we detect a hill we report the peak as a harmonic peak.
      4. We arrange the detected peaks in decreasing order of power, using only the top k peaks, where power_peakk / power_peakk+1 > 2.
      5. Among these top peaks, we pick the one with the highest correlation value.
    4. Choose which peak is best, from among the power spectrum and auto-correlation.
    • If the value from the autocorrelation function is 1, we use the candidate from the power spectrum as the final seasonality.
    • If the value from the autocorrelation function is less than 6, we calculate the powers of both candidates normalized by length, and use the value with the higher power.
    • In all other cases, we use the candidate derived from the autocorrelation function
    1. Look for specified or common seasonality values. Additionally, the algorithm includes a term that gives weight to a specified common seasonality value, if selected by the user. For example, if the methods described above identify the best seasonality as 11, 12, or 6, but you specify a yearly (12 month) seasonality that provides better confidence intervals, we apply the 12-month seasonality. In other words, the algorithm gives priority to user selections and to common seasonality values.

    Note: Automatic detection of seasonality works best when the cycles being evaluated are not too long. The problem with very long data is that the auto-correlation values tend to have less information value (are more variable depending on adjacent sequences). The Pearson correlation coefficient that we use also tends to work best on linear relationships and in a long data series, linearity can be easily broken. Various heuristics have been proposed by researchers for assessing residuals in long time series, but in general, better results sometimes come from constraining the data series to a smaller range, or identifying probable data cycles by specifying a seasonality value.

    Generally, you can get better results if you have at least 3-4 cycles or seasons of data to work with

    Specifying a seasonality value manually

    If you know your data and domain well, you might be able to provide a hint to the algorithm to help it calculate seasonality more accurately. You do this by looking at the units of time used in the data, and determining how many units make up a complete data cycle.

    For example, let’s say you have weekly rainfall data, and you expect that patterns of rainfall repeat on a yearly basis, more or less. In that case, you would set your seasonality value to 52, because 1 week = 1 unit of time, and there are 52 units of time in a data cycle.

    When you specify a seasonality value, the number is incorporated in the formula that calculates trends overall, making the forecast better able to compensate for outliers  such as extremely rainy days. However, if you have any doubt about what the data cycle is, you should leave it to the algorithm to test all possibilities and determine the most likely one.

    How can I tell if the forecasts are any good?

    Hindcasting and adjusting confidence intervals are two good ways evaluate the quality of the forecast.

    Hindcast is one way to verify whether the model is doing a good job If the observed value doesn’t exactly match the predicted value, it does not mean the forecast is all wrong – instead, consider both the amount of variation and the direction of the trend line. Predictions are a matter of probability and estimation, so if the predicted value is fairly close to but not exactly the same as the real value, it could be a better indicator of prediction quality than if the value exactly matched the real result. In general, when a model too closely mirrors the values and trends within the input dataset, it might be overfitted, meaning it likely won’t provide good predictions on new data.

    Another way to assess accuracy is to view the probability error in the forecast by adjusting the confidence intervals. The shaded area shows you the range of predicted values at different confidence levels. Depending on your domain, you might require that values meet a very high confidence interval, or that possible predictions fall within a standard deviation of 0.05. However, in other cases, variations of plus or minus 30% might represent plausible scenarios.

    You are the best judge of how reliable the input data is, and what the real range of possible predictions might be.

    References

    If you are interested in learning more about time series models and the methods that have been developed over the years for forecasting, we recommend the following resources.

     

    Box and Jenkins

    http://www.amazon.com/Time-Series-Analysis-Forecasting-Control/dp/0130607746

    This is the classic book on time series forecasting. Here you will find a detailed description of the many different types of problems posed by time series data, along with proposed solutions and formulas used by the researchers. Much later work in this field is based on Box-Jenkins and their work remains pertinent and valuable.

     

    Holt-Winters

    http://www.bauer.uh.edu/gardner/research.asp

    The Holt-Winters method is a specific implementation of exponential smoothing that is widely used in business and now has many variants. To get an idea of the arc of research, see Dr. Gardner’s published papers, Exponential smoothing: State of the Art (Part 1 and Part 2).

     

    Exponential smoothing (Wikipedia)

    http://en.wikipedia.org/wiki/Exponential_smoothing

    This Wikipedia article describes the history and basic mechanisms of smoothing as applied to time series.

     

    Overview of time series methods

    http://sqlmag.com/sql-server-analysis-services/understanding-time-series-forecasting-concepts

    Still confused about the terms and choices? This article by Microsoft Technical Support Specialist Tyler Chessman provides an overview of the different forecasting options, what they try to accomplish, and how they compare. He also walks through creation of a forecasting model in Excel.

     

     

     

     
    #Power BI#Power View#Forecasting

    Thu, 08 May 2014 16:15:00 GMT byPower BI Team0 Comment

    Read full post