Quantcast
Channel: Business Intelligence y Big Data: ¡Aprende Gratis sobre Analytics!
Viewing all 575 articles
Browse latest View live

STAgile (easy and fast web Dashboards from excel), open source based

$
0
0


STAgile is a quick and simple dashboard generator that gives the user the ability to create their own dashboards using Excel and CSV files including save, share, filter, export features...

What does STAgile offer?


  •     Simple design for intuitive operation
  •     You don't have to write a single line of code
  •     Generation of charts from Excel or CSV
  •     Navigate through hierarchies using drill down
  •     Synchronized Graphics
  •     Simple and user-friendly configuration system
  •     Export to CSV
  •     Table mode. View all your dashboard data
  •     Save and share your Dashboard
  •     Pentaho and web portals integration

You can see on this series of VideoTutorials, the main features of STAgile (best open source based web dashboarding tool from Excel, with no licenses and professional support included) and how it works

STAgile is part of LinceBI Open Analytics solution




 


0. From Excel to Dashboards for end users
1. STAgile Basic example import csv file, basic graphs, dashboard view, export to csv
2. STAgile Advanced example I. geo choropleth, numbers graph
3. STAgile Advanced example I. Heat map, drill and filters with advanced graphs
4. STAgile Advanced I. Line graphs, edit cvs and export data
5. STAgile Advanced II. Scatter plot, Box plot, Bubble graph
6. STAgile Advanced III. custom text, images and links
7. STAgile Advanced IV. custom iFrames



Know more:



STDashboard (Web Dashboard Editor open source based), Video Tutorials



You can see on this series of VideoTutorials, the main features of STDashboard (best open source based web dashboarding tool, with no licenses and professional support included) and how it works STDashboard is part of LinceBI Open Analytics solution 0. STDashboard (Dashboard for end users in minutes) 1. STDashboard (LinceBI Open Source BI/BigData Solution) 2. STDashboard (LinceBI Vertical Dashboarding Solution) 3. STDashboard...



STPivot (Web Analytics open source based) complete Videotutorials



You can see on this series of VideoTutorials, the main features of STPivot (best open source based web analysis tool, with no licenses and professional support included) and how it works Besides, you can embed, customize and modify in order to fit your needs STPivot is part of LinceBI Open Analytics solution 1. LinceBI OLAP interactive analysis 2. STPivot OLAP Analytics for Big Data  3. Powerful Forecasts in STPivot 4. STPivot...



Introducing STMonitoring for Pentaho



One of the things more useful when you are running a Pentaho production environment with a lot of users accessing the BI server, using reports, dashbords, olap analysis... is monitor the whole user performance.                             That´s why we´ve created STMonitoring (included free in all of the projects we help to develop and in some solutions, like LinceBI)....


STReport (Web Reporting Open Source based tool) Video Tutorials



You can see on this series of VideoTutorials, main features of STReport (best open source web reporting tool based, with no licenses and professional support included) and how it works STReport is part of LinceBI Open Analytics solution 1. STReport (creating simple report using rows, groups, filters) 2. STReport (Models, exploring categories and glossary) 3. STReport (Work area, hidden sections, limit results, info options...) 4. STReport...


List of Open Source Business Intelligence tools



Here you can find an updated list of main business intelligence open source tools. If you know any other, don´t hesitate to write us - Talend, including ETL, Data quality and MDM. Versions OS y Enterprise - Pentaho, including Kettle, Mondrian, JFreeReport and Weka. Versions OS y Enterprise - BIRT, for reporting - Seal Report, for reporting - LinceBI, including Kettle, Mondrian, STDashboard, STCard and STPivot - Jasper Reports, including...


STDashboard, a free license way to create Dashboards



The improvements in this version of STDashboard are focused on user interface for panel and dashboard and also some enhancement in performance and close some old bugs. It works with Pentaho and embeded in web applications You can see it in action in this Pentaho Demo Online and as a part of LinceBI suite STDashboard doesn´t requiere anual license, you can manage unlimited users and it´s open source based.  STDashboard includes professional...


Comparacion de Tableau y Pentaho

$
0
0

Muchas veces publicamos estudios y comparativas de diferentes tecnologías Business Intelligence o Big Data. Pero como suele ocurrir en muchos aspectos, lo mejor es verlos en funcionamiento sobre la práctica. 

Por ello, os mostramos ejemplos de Cuadros de Mando creados con Tableau y Pentaho con los datos de la Liga de Futbol en España para poder comparar

Pinchad en cada uno de los cuadros de mando para acceder a los mismos:

Tableau:





Pentaho (también puedes ver otra DemoPentaho Online)







Comparativa Herramientas Business Intelligence



Migracion y update de versiones de Pentaho

$
0
0

Pentaho CE lleva más de 10 años siendo implementado en muchas organizaciones. 

Afortunadamente, en la mayor parte de los casos, los usuarios le sacan un gran partido, pero conforme han ido saliendo nuevas versiones y se han ido produciendo mejoras por la comunidad, se suele hacer necesario un upgrade para mejorar:

- Rendimiento y cuellos de botella
- Mejorar el front-end y la experiencia de usuario
- Incluir nuevas funcionalidades y mejoras

Podéis echar un vistazo a las mejoras que introducen los especialistas en Pentaho de Stratebi, que incluyen:

- Mejoras en la consola (tags, search, comentarios)
- Herramientas OLAP y Reporting mejoradas
- Nuevas herramientas de generación de Dashboards y Scorecards
- Potentes Cuadros de Mando predefinidos
- Integración con entornos Big Data y Real Time

Ver las mejoras en acción:

Demo_Pentaho - Big Data



Aplicaciones de Big Data en Turismo

Cuadros de Mando y Business Intelligence para Ciudades Inteligentes

$
0
0

Cada vez son más las ciudades que están implementando soluciones de Ciudades Inteligentes, Smart Cities... en donde se abarcan una gran cantidad de aspectos, en cuando a tecnologías, dispositivos, analítica de datos, etc...

Lo principal en todos ellos es que son soluciones que deben integrar información e indicadores diversos de todo tipo de fuentes de datos: bases de datos relacionales tradicionales, redes sociales, aplicaciones móviles, sensores... en donde es fundamental que no haya islas o tecnologías cerradas, por lo que el Open Source es fundamental, pues se puede adaptar a todo tipo de soluciones

En base a nuestra experiencia en algunos de estos proyectos de ciudades inteligentes en los que hemos participado, queremos compartir unos cuantas tecnologías, recursos y demos que os pueden ser de ayuda:

1. List of Open Source solutions for Smart Cities - Internet of Things projects

2. List of Open Source Business Intelligence tool for Smart Cities 

3. 35 Open Source Tools para Internet of Things (IoT)



Demos:

Tecnologías Big Data

Demos Business Intelligence





Seguimiento del tráfico near real time en el Ayuntamiento de Madrid (Acceso)



Geoposicionamiento de rutas dinámicas (Acceso/Video)




Recomendación de Rutas (grafos) (Acceso/Video)



STCard Videotutorials (Open Source based Scorecard solution)

$
0
0


The improvements in this version of STCard, an open source based solution, are focused on user interface for panel and dashboard and also some enhancement in performance and close some old bugs:

- Import with ETL
- New KPIs always in red bug
- Tooltips and characters solved
- Export to PDF
- Modify colors of new scorecard
- Some other minus bugs...

It works with Pentaho and embeded in web applications

You can manage your organization with a powerful KPIs control with Balance Scorecard using STCard

You can see it in action in this Demo Online and as a part of LinceBI suite

STCard doesn´t requiere anual license, you can manage unlimited users and it´s open source based. 

Videotutorials:

- STCard 01 Global View
STCard 02 Create a new scorecard and security
STCard 03 Configuration
STCard 04 Planning and write back data
STCard 05 Scorecard Analysis and dashboard

STCard includes professional services (training, support and maintenance, docs and bug resolution - so, you have high enterprise level guaranteed -)

Interested? contact Stratebi or LinceBI



See a Video Demo:




About main functionalities:

STCard works on top of Pentaho, is the best tool for managing your KPIs (Key Performance Indicators), targets an keep track of your Balance Scorecard strategy









Fully integrated with Pentaho CE, you can leverage all the power of this Open Source BI Suite



STCard is an open source tool developed by StrateBI for the creation, management and analysis of Scorecards.
A Scorecard is a global management system within an organization that allows you to have a view of it based on a number of perspectives. All these as a whole define the vision and strategy of the organization.
To define a Scorecard you have to define a clear strategy:
  • Strategic Objectives for the units of the organization.
  • Indicators (KPI’s) that mark the fulfillment of the strategic objectives.
The main features of STCard are:
  • Flexibility: A Scorecard is always referred to an organization as a whole, but with STCard we can create a scorecard for a specific area of the organization. For example:Treasury Financial Area, Consolidation, Suppliers, etc. On the other hand, the concept of flexibility is applicable to the creation of a scorecard in terms of the number of strategic perspectives and objectives. As many as you like. The philosophy of Kaplan and Norton is not limited to 4 perspectives: customer, financial, internal business procedures and learning and growth. You can create as you need
  • Flexibility does not break with the original philosophy. A scorecard in STCAD consists of a weighted hierarchical structure of 3 levels:
    • Perspective: from what point of view we will see our system. For example, financial, quality, customers, IT, etc.
    • Strategic Objective: what is our goal. For example, increase profitability, customer loyalty, incentive and motivation HR, etc.
    • Indicator (KPI): the measure or metric. Indicators can be quantitative or qualitative (confirmation / domain values), and these always have a real value and a target value.
For the launch of the ScoreCard we can consider three scenarios:
  • This scenario has a rapid implementation, and only requires the definition of a load processes to obtain the information of the indicators of the organization and adapt it to STCard.
  • The organization lacks a system / repository of indicators.
    This variant requires more consulting work, because in the organization, first, a pure BI project must be carried out to obtain those indicators to be dealt with later in STCARD.
    For example: data sources; ETL processes; System / repository of indicators; Load processes in STCard.
  • Immediate start-up:
    It is the fastest alternative, only requires installation / configuration and training. Data management is done through Excel templates. No additional consulting work required.
    Users set values through Excel templates, where data is filled. These values are loaded into STCARD and after this, it is the users who interact with STCARD.

These are the main features of STCard:




More info:



STReport (Web Reporting Open Source based tool) Video Tutorials



You can see on this series of VideoTutorials, main features of STReport (best open source web reporting tool based, with no licenses and professional support included) and how it works STReport is part of LinceBI Open Analytics solution 1. STReport (creating simple report using rows, groups, filters) 2. STReport (Models, exploring categories and glossary) 3. STReport (Work area, hidden sections, limit results, info options...) 4. STReport...

STAgile Videotutorials (easy and fast web Dashboards from excel), open source based



STAgile is a quick and simple dashboard generator that gives the user the ability to create their own dashboards using Excel and CSV files including save, share, filter, export features... What does STAgile offer?     Simple design for intuitive operation     You don't have to write a single line of code     Generation of charts from Excel or CSV     Navigate through hierarchies using drill down  ...


STPivot (Web Analytics open source based) complete Videotutorials



You can see on this series of VideoTutorials, the main features of STPivot (best open source based web analysis tool, with no licenses and professional support included) and how it works Besides, you can embed, customize and modify in order to fit your needs STPivot is part of LinceBI Open Analytics solution 1. LinceBI OLAP interactive analysis 2. STPivot OLAP Analytics for Big Data  3. Powerful Forecasts in STPivot 4. STPivot...


STDashboard (Web Dashboard Editor open source based), Video Tutorials



You can see on this series of VideoTutorials, the main features of STDashboard (best open source based web dashboarding tool, with no licenses and professional support included) and how it works STDashboard is part of LinceBI Open Analytics solution 0. STDashboard (Dashboard for end users in minutes) 1. STDashboard (LinceBI Open Source BI/BigData Solution) 2. STDashboard (LinceBI Vertical Dashboarding Solution) 3. STDashboard...

7 Ejemplos y Aplicaciones practicas de Big Data

$
0
0

En las siguientes Aplicaciones, Cuadros de Mando y ejemplos podéis ver el funcionamiento práctico del Big Data en diferentes casos y usando diferentes tecnologías: Kafka, Spark, Apache Kylin, Neo4J....

Acceder a los ejemplos

Si quieres saber más de Big Data, te pueden interesar estos enlaces:

OLAP for Big Data. It´s possible? 
Como empezar a aprender Big Data en 2 horas
List of Open Source Business Intelligence tools
Analysis Big Data OLAP sobre Hadoop con Apache Kylin (spanish)
Caso de uso de Apache Kafka en tiempo real, Big Data
 (spanish)


Tutorial y Demo: trabajando con Grafana

$
0
0

Ya tenemos demo Grafana con datos públicos de ocupación del Ayuntamiento de Málaga recogidos mediante API. 


El propósito de este documento es recoger el proceso de creación de un cuadro de mandos que monitorice la situación de los parkings públicos de Málaga en tiempo real utilizando la herramienta Grafana.

Grafana es una herramienta de software libre que permite crear cuadros de mando y gráficas a partir de múltiples fuentes de datos. Suele ser utilizado para la visualización y monitorización de datos en tiempo real. 



En este ejemplo práctico el origen de datos será el portal de datos abiertos del Ayuntamiento de Málaga (https://datosabiertos.malaga.eu/), concretamente el conjunto de datos sobre la ocupación de los aparcamientos públicos municipales. Esta información se encuentra en formato CSV y se actualiza cada minuto.




Acceso Demo:

https://grafana.demo.stratebi.com
Usuario: demo
Pass: tKPnruDeN4YJWiTa



Tipos de roles en Analytics (Business Intelligence, Big Data)

$
0
0


Conforme va creciendo la industria de Analytics, se hace más dificil conocer las descripción de cada uno de los roles y puestos. Es más, generalmente se usan de forma equivocada, mezclando tareas, descripciones de cometidos, etc...

Esto lleva a confusión tanto a los propios especialistas, como a las personas que están formandose y estudiando para realizar estos trabajos. En una industria tan cambiante es frecuente la aparición y especialización de diferentes puestos de trabajos. Aquí, os detallamos cada uno de ellos:


Business Analyst:




Data Analyst:



Data and Analytics Manager:


Data Architect:



Data Engineer:



Data Scientist:



Database Administrator:



Statistician:





Te puede interesar tambien:

Como pasar una entrevista con Pentaho BI Open Source?
Skills en Data Analysts y sus diferencias
Empezar a aprender Big Data en 2 horas?

Visto en Kdnuggets

Gestion de Proyectos con Redmine Analytics

$
0
0

Redmine Analytics es la solución complementaria a la herramienta para la gestión de proyectos Redmine, gracias al uso del Business Intelligence basado en open source Pentaho de LinceBI, con todos los modelos preparados y listos para su uso. También integrado con PowerBI

Consulta a nuestros compañeros de Stratebi


Modelo de Análisis de Productividad:







Modelo de Análisis por Proyectos:




Para una organización es de vital importancia articular los proyectos de una manera correcta y ágil en beneficio de la misma. 



Asociado a la ejecución de los proyectos, es igualmente importante conocer si el equipo que participa en los proyectos es productivo, así como si la previsión en cuanto a costes, calendario y esfuerzo se mantiene. 

Alertas:

Alerta sobre la Productividad:

Objetivo: Controlar las horas imputadas por empleado respecto a lo previsto
Comunicación: Vía e-mail. A cada empleado reporte de horas. A los manager, resumen por empleado de la diferencia de horas.

Alerta sobre el Consumo de Horas Estimadas:

Objetivo: Sobre proyectos y servicios ofrecidos, se identifican aquellos para los que se ha superado sobre lo estimado el 50%, 75%, 85% y 100%.
Comunicación: Vía e-mail

Alerta sobre la parametrización del proyecto:

Objetivo: Controlar la configuración de cada proyecto en Redmine en cuanto a establecimiento de fases, tiempo estimado de referencia, perfiles y coste por perfil.

Comunicación: Vía e-mail






Con las variables oportunas la toma de decisiones cobra sentido y permite hacer correctivos en tiempo y forma, y para ello Redmine Analytics ofrece todo lo necesario. 
Además, de forma totalmente automatizada










Tutorial: Creacion de Dashboards con soluciones Open Source

$
0
0

Cada vez son más demandados los Cuadros de Mando y la buena noticia es que gran parte de ellos pueden hacerse con soluciones Open Source: Pentaho, CDE, dc.js...

Como novedad, también puedes crearlos con StDashboard: How to create your own Dashboards in Pentaho

Os incluimos las principales claves para construir potentes Cuadros de Mando, del Curso de creación de Dashboards Open Source:




Si os ha interesado, podéis también:

- Ver ejemplos en funcionamiento de Cuadros de Mando Open Source
- Ver Galería de Cuadros de Mando y Video Tutorial de Cuadros de Mando Open Source
- Ver temario y Cursos presenciales e 'in company' para crear cuadros de mando de forma práctica
- Ver Cuadros de Mando con tecnologías Big Data 'Real Time?

También podéis ver este Video Tutorial muy práctico:

Verdades y Mentiras acerca del Software Libre

$
0
0


BBVA ha elaborado un estudio (unas 120 páginas) muy interesante sobre Open Source: Historia, Tecnologías, Modelos de Negocio, etc...

Ni dudeis en descargároslo

Empieza interesante...



Glosario de Terminos de Business Intelligence

$
0
0

Para todos aquellos que se están introduciendo en el mundo del Business Intelligence, os incluimos un Glosario de los principales términos de Business Intelligence. 

Si queréis jugar con una Demo abierta, open source, para conocer y probar estos conceptos, es lo mejor para familiarizarse.

Glosario de Términos Business Intelligence:

  • Automated Analysis: Automatic analysis of data to find hidden insights in the data and show users the answers to questions they have not even thought of yet.
  • BI Analyst: As stated by modernanalyst.com, a data analyst is a professional who is in charge of analyzing and mining data to identify patterns and correlations, mapping and tracing data from system to system in order to solve a problem, using BI and data discovery tools to help business executives in their decision making, and perform statistical analysis of business data, among other things. (Can be called a data analyst too)
  • BI Governance: According to Boris Evelson, from Forrester Research, BI governance is a key part of data governance, but if focuses on a BI system and governs over who uses the data, when, and how.
  • Big Data: Enormous and complex data sets that traditional data processing tools cannot deal with.
  • Bottlenecks: Points of congestion or blockage that hinder the efficiency of the BI system.
  • Business Intelligence: According to Gartner, “Business Intelligence is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance.”
  • Centralized Business Intelligence: A BI model that enables users to work connected and share insights, while seeing the same and only version of the truth. IT governs over data permissions to ensure data security.
  • Collaborative BI: An approach to Business Intelligence where the BI tool empowers users to collaborate between colleagues, share insights, and drive collective knowledge to improve decision making.
  • Collective Knowledge: Knowledge that benefits the whole enterprise as it comes from the sharing of insights and data findings across groups and departments to enrich analysis.
  • Dark Data: According to Gartner, the definition for Dark Data is “information assets that organizations collect, process and store in the course of their regular business activity, but generally fail to use for other purposes”. 90% of companies’ data is dark data.
  • Dashboards: A data visualization tool that displays the current enterprise health, the status of metric and KPIs, and the current data analysis and insights.
  • Data Analyst: As stated by modernanalyst.com, a data analyst is a professional who is in charge of analyzing and mining data to identify patterns and correlations, mapping and tracing data from system to system in order to solve a problem, using BI and data discovery tools to help business executives in their decision making, and perform statistical analysis of business data, among other things.
  • Data Analytics: According to TechTarget, “data analytics is the process of examining data sets in order to draw conclusions about the information they contain, increasingly with the aid of specialized systems and software.”
  • Data Governance: According to Boris Evelson, from Forrester Research, data governance “deals with the entire spectrum (creation, transformation, ownership, etc.) of people, processes, policies, and technologies that manage and govern an enterprise’s use of its data assets (such as data governance stewardship applications, master data management, metadata management, and data quality).
  • Data Mashup: An integration multiple data sets in a unified analytical and visual representation.
  • Data Silos: According to Tech Target, a data silo is “data that is under the control of one department or person and is isolated from the rest of the organization.” Data silos are a bottleneck for effective business operations.
  • Data Sources: The source where the data to be analyzed comes from. It can be a file, a database, a dataset, etc. Modern BI solutions like Necto can mashup data from multiple data sources.
  • Data Visualization: The graphic visualization of data. Can include traditional forms like graphs and charts, and modern forms like infographics.
  • Data Warehouse: A relational database that integrates data from multiple sources within a company.
  • Embedded Analytics: The integration of reporting and data analytic capabilities in a BI solution. Users can access full data analysis capabilities without having to leave their BI platform.
  • Excel Hell: A situation where the enterprise is full of unnecessary copies of data, thousands of spreadsheets get shared, and no one knows with certainty which is the most updated and real version of the data.
  • Federated Business Intelligence: A BI model where users work in separate desktops, creating data silos and unnecessary copies of data, leading to multiple versions of the truth.
  • Geo-analytic capabilities: The ability that a BI or data discovery tool has to analyze data by geographical area and reflect such analysis on maps on the user’s dashboard.
  • Infographics: Visual representations of data that are easily understandable and drive engagement.
  • Insights: According to Forrester Research, insights are “actionable knowledge in the context of a process or decision.”
  • KPI: Key Performance Indicator. A quantifiable measure that a business uses to determine how well it meets the set operational and strategic goals. KPIs give managers insights of what is happening at any specific moment and allow them to see in what direction things are going.
  • Modern BI: An approach to BI using state of the art technology, providing a centralized and secure platform where business users can enjoy self-service capabilities and IT can govern over data security.
  • OLAP: Stands for Online Analytical Processing and it is a technology for data discovery invented by Panorama Software and then sold to Microsoft in 1996. It has many capabilities, such as complex analytics, predictive “what if” scenario planning, and limitless report viewing.
  • Scalability: The ability of a BI solution to be used by a larger number of users as time passes.
  • Self-Service BI: An approach that allows business users to access and work with data sources even though they do not have an analyst or computer science background. They can access, profile, prepare, integrate, curate, model, and enrich data for analysis and consumption by BI platforms. In order to have successful self-service BI, the BI tool must be centralized and governed by IT.
  • Smart Data: Smaller data sets from Big Data that are valuable to the enterprise and can be turned into actionable data.
  • Smart Data Discovery: The processing and analysis of Smart Data to discover insights that can be turned into actions to make data-driven decisions in an organization.
  • Social BI: An approach where social media capabilities, such as social networking, crowdsourcing, and thread-based discussions are embedded into Business Intelligence so that users can communicate and share insights.
  • Social Enterprise: An enterprise that has a new level of corporate connectivity, leveraging the social grid to share and collaborate on information and ideas. It drives a more efficient operation where problems are uncovered and fixed before they can affect the revenue streams.
  • SQL: Stands for Standardized Query Language. It is a language used in programming for managing relational databases and data manipulation.
  • State of the Art BI: The highest level of technology, the most up-to date features, and the best analysis capabilities in a Business Intelligence solution.
  • Suggestive Discovery Engine: An engine behind the program that recommends to the users the most relevant insights to focus on, based on personal preferences and behavior.
  • Systems of Insight: This is a term coined by Boris Evelson, VP of Forrester Research. It is a Business Intelligence system that combines data availability with business agility, where both IT and business users work together to achieve their goals.
  • Workboards: An interactive data visualization tool. It is like a dashboard that displays the current status of KPIs and other data analysis, with the possibility to work directly on it and do further analysis.

Visto en el blog de Panorama

Cloudera cambia de estrategia y se hace Open Source

$
0
0
Para los que pensaban que la compra de Hortonworks por parte de Cloudera iba a hacer peligrar el modelo open source, todo lo contrario. Cloudera será 100% Open Source, según acaban de afirmar (leer bien el enlace anterior)

Cloudera acaba de anunciar que se va a centrar en un modelo de servicios y soporte

Una gran noticia para todos los que trabajan con Stacks Big Data basados en Open Source, como LinceBI



Estas camisetas buscan Data Ninjas!!

$
0
0



Nuestros amigos de Stratebi, colaboradores de este Portal, están buscando especialistas que quieran progresar, aprender y disfrutar de nuevos retos en Business Intelligence, Machine Learning y Big Data. Son, además, los creadores de la plataforma Big Data open source based, LinceBI

Algunas de sus Demos Online:



Escribe, manda tu CV y hazte merecedor/a de llevar las camisetas, si dominas o quieres dominar lo siguiente!!






The DataOps CookBook. Free download

Por qué si tengo un dashboard no soy capaz de tomar decisiones?

$
0
0


Muy interesante esta reflexión de Tristan Elosegui, de hace ya un par de años, pero que mantiene toda su vigencia. Abajo os indicamos los puntos principales que detalla:

En TodoBI, hablamos mucho de Dashboards (ver posts), de los que os destacamos:

12 aplicaciones gratuitas para crear Dashboards
Tutorial de Creación de Cuadros de Mando Open Source
Ejemplos Dashboards
- Cuadro de Mando Integral (Scorecard)

Según Tristán, las empresas tienen gran cantidad de datos a su alcance, pero no son capaces de poner orden entre tanto caos y como consecuencia, no tienen una visión clara de la situación. 

El ruido es mayor que la ‘señal’

El volumen de datos y la velocidad con la que se generan, provocan más ruido que señal.
Esta situación lleva a las empresas a la toma de decisiones sin los datos necesarios o a la parálisis post-análisis en lugar de facilitar la acción (toma de decisiones).
Los datos llegan desde diferentes fuentes, en diferentes formatos, desde diferentes herramientas,… y todos acaban en informes, que intentan integrar en un dashboard que les ayude a tomar decisiones.

¿Por qué teniendo tantos datos las empresas no son capaces de tomar decisiones estratégicas?

Tener muchos datos no siempre significa tener mejor visión sobre la situación. Seguro que más de uno de los que estáis leyendo este post, os sentís identificados.
Las empresas toman decisiones en base a datos todos los días (y sin datos también), el problema es que estas decisiones son tácticas ya que se toman tipo ‘silo’ (por áreas).
Para poder tomar decisiones que optimicen la estrategia global de la empresa necesitamos:
  • Tener los datos necesarios, ni más ni menos, para tomarlas (la foto más completa posible del contexto) y
  • ser capaces de entender los datos,para transformarlos eninformación y a continuación en conocimiento.
No hay nada peor que haber recorrido el camino hasta tener un dashboard estratégico, y que la persona que tiene que tomar las decisiones no las tome. ¿por qué ocurre esto?

Falta de contexto

El motivo principal para no tomar decisiones, es que los datos representados en el dashboard no sean relevantes, no sean accionables.
Esto ocurre cuando no hemos definido correctamente el dashboard (los pasos correctos están definidos en el modelo de madurez de la analítica digital). Los errores más comunes suelen ser:
  • Objetivos y KPIs mal definidos: si el punto de partida esta mal definido, todo lo que venga detrás nos llevará a error. Y por supuesto, el contexto será del todo equivocado.
  • Datos irrelevantes o no accionables: bien por una mala definición de objetivos y de las KPIs que nos ayudan a controlarlos o simplemente porque hemos seleccionado mal los datos, llegamos a un dashboard lleno de números y gráficas, que no nos permite tomar decisiones.Bien porque no muestra los datos con el área de responsabilidad de la persona que toma las decisiones, o simplemente porque son datos no accionables. En cualquiera de los dos casos el resultado es el mismo.
  • Datos incompletos: es el otro extremo del caso anterior. Nos faltan los datos necesarios para tomar decisiones.

Visualización de datos

El segundo gran problema es que la persona que tiene que tomar las decisiones no entienda los datos.
Al igual que tenemos que mostrar a cada stakeholder los datos que son relevantes para su trabajo (caso anterior), tenemos que adaptar el lenguaje y la visualización, para que el decisor entienda lo que está viendo.
Así que, para que un dashboard estratégico funcione debes empezar por tener definir bien los objetivos y KPIs, trabajar la calidad del dato, que estos datos te estén contando lo que te interesa y que integren datos de las diferentes fuentes que manejas.

No te saltes ninguna fase del modelo de madurez de la analítica digital, porque sino te puedes encontrar con los problemas que hemos visto en este post.

Ver Articulo completo

Como usar Machine Learning para hacer Data Quality

$
0
0


Las empresas cada vez necesitan almacenar y procesar más datos sobre sus clientes, proveedores, personal o pedidos.
Sin embargo, a mayor Volumen de datos, mayor es la probabilidad de que existan datos incorrectos, como las direcciones o teléfonos que afecten de forma negativa a nuestro negocio. Por ejemplo, un pedido con una dirección errónea será devuelto, reduciendo el beneficio de la empresa y la confianza del cliente.
Teniendo en cuenta esta problemática, se propone una solución: Big Data QualityDescargar Paper

Es una Solución para el tratamiento de datos de personas con características Big Data (Volumen, Variedad, Velocidad), basada en la aplicación de procesos de limpieza, normalización y validación a un grupo de datos muy comunes en cualquier base de datos. 
Esta solución escalable y extensible, incluye módulos para la limpieza, normalización validación, de-duplicación y corrección de datos de personas y direcciones en los siguientes tipos de datos: Teléfonos, Emails, Documentos de Identidad, Nombres y Apellidos, Direcciones...

Esta solución de Data Quality para Big Data se ha implementado usando la tecnología Apache Spark, la cual permite la escalabilidad del procesamiento a cualquier volumen de datos sin reescribir ni una línea de código. De esta forma, Big Data Quality se puede ejecutar en una única máquina y, si el tamaño de los datos de entrada, diccionarios de nombres, callejeros o datos recuperados de las API lo requieren, puede ejecutarse en un clúster Spark como Databricks o Amazon EMR en la nube o en instalación Hadoop on-premise como Hortonworks o Cloudera.

Además, estos módulos permiten realizar la validación y corrección de datos en función de datos de referencias obtenidos mediante diccionarios y API's de nombres, callejeros (ej. Correos) o dominios frecuentes de emails. En los siguientes apartados, analizaremos las principales funcionalidades de cada uno de los módulos.

Descargar Paper

TECNOLOGÍAS USADAS Y ESCALABILIDAD

Con el objetivo de abordar dicho problema de la manera más eficiente posible, se ha usado la tecnología Big Data Apache Spark. Ofreciendo de esta manera todos los beneficios de las tecnologías Big Data:



• Escalabilidad: El programa puede ejecutarse en 1 o n máquinas de un clúster sin hacer ningún cambio en el programa
• Rapidez de procesamiento: Gracias a la escalabilidad y a la arquitectura de Spark basada en el uso distribuido de la memoria RAM.
• Tolerancia a fallos: Incluso en procesos de Data Quality, que implican cantidades ingentes de datos, los procesos siempre terminarán.
• Extensibilidad: para añadir nuevasfuncionalidades de calidad de datos.

Hemos optado por esta tecnología por la gran cantidad y variedad de fuentes de datos que es necesario procesar para lograr la calidad del dato, ya sean los datos de entrada como los diccionarios que se puedan utilizar para la corrección y validación de los datos.

Apache Spark es una tecnología Open Source con una comunidad muy activa. Es una herramienta donde los cálculos se ejecutan entre 10 y 100 veces más rápidos que otras plataformas. Además, en una misma herramienta nos permite combinar SQL in memory, streaming, Machine Learning y grafos.


Como se ha comentado, Spark se puede ejecutar en diversos entornos, ya sea en Spark standalone, Mesos, clúster en la nube en Databricks, EMR o HDInsight, distribuciones Hadoop on-premise, como Hortonworks o Cloudera.



Mas de 20 Tecnicas y Tipos de Analisis Machine Learning y Analytics

$
0
0

A continuación, os detallamos las principales técnicas y tipos de análisis que se realizan en Big Data, muchas veces agrupadas bajo nombres como algoritmos, machine learning, etc.... pero que no siempre se explican correctamente

Aquí os hemos creado algunos ejemplos online usando algunas de estas técnicas

Si quieres saber más, puedes consultar también otros posts relacionados:

Las 53 Claves para conocer Machine Learning
69 claves para conocer Big Data
Como empezar a aprender Big Data en 2 horas
Tipos de roles en Analytics (Business Intelligence, Big Data)
Libro Gratuito: Big Data, el poder de convertir datos en decisiones

Veamos pues, cuales son estas técnicas:

1. A/B testing: A technique in which a control group is compared with a variety of test groups in order to determine what treatments (i.e., changes) will improve a given objective variable, e.g., marketing response rate. This technique is also known as split testing or bucket testing. An example application is determining what copy text, layouts, images, or colors will improve conversion rates on an e-commerce Web site. Big data enables huge numbers of tests to be executed and analyzed, ensuring that groups are of sufficient size to detect meaningful (i.e., statistically significant) differences between the control and treatment groups (see statistics). When more than one variable is simultaneously manipulated in the treatment, the multivariate generalization of this technique, which applies statistical modeling, is often called “A/B/N” testing

2. Association rule learning: A set of techniques for discovering interesting relationships, i.e., “association rules,” among variables in large databases.These techniques consist of a variety of algorithms to generate and test possible rules. One application is market basket analysis, in which a retailer can determine which products are frequently bought together and use this information for marketing (a commonly cited example is the discovery that many supermarket shoppers who buy diapers also tend to buy beer). Used for data mining.

3. Classification: A set of techniques to identify the categories in which new data points belong, based on a training set containing data points that have already been categorized. One application is the prediction of segment-specific customer behavior (e.g., buying decisions, churn rate, consumption rate) where there is a clear hypothesis or objective outcome. These techniques are often described as supervised learning because of the existence of a training set; they stand in contrast to cluster analysis, a type of unsupervised learning. Used for data mining.



4. Cluster analysis: A statistical method for classifying objects that splits a diverse group into smaller groups of similar objects, whose characteristics of similarity are not known in advance. An example of cluster analysis is segmenting consumers into self-similar groups for targeted marketing. This is a type of unsupervised learning because training data are not used. This technique is in contrast to classification, a type of supervised learning. Used for data mining.

5. Crowdsourcing: A technique for collecting data submitted by a large group of people or ommunity (i.e., the “crowd”) through an open call, usually through networked media such as the Web.This is a type of mass collaboration and an instance of using Web.

6. Data fusion and data integration: A set of techniques that integrate and analyze data from multiple sources in order to develop insights in ways that are more efficient and potentially more accurate than if they were developed by analyzing a single source of data. Signal processing techniques can be used to implement some types of data fusion. One example of an application is sensor data from the Internet of Things being combined to develop an integrated perspective on the performance of a complex distributed system such as an oil refinery. Data from social media, analyzed by natural language processing, can be combined with real-time sales data, in order to determine what effect a marketing campaign is having on customer sentiment and purchasing behavior.

7. Data mining: A set of techniques to extract patterns from large datasets by combining methods from statistics and machine learning with database management. These techniques include association rule learning, cluster analysis, classification, and regression. Applications include mining customer data to determine segments most likely to respond to an offer, mining human resources data to identify characteristics of most successful employees, or market basket analysis to model the purchase behavior of customers

8. Ensemble learning: Using multiple predictive models (each developed using statistics and/or machine learning) to obtain better predictive performance than could be obtained from any of the constituent models. This is a type of supervised learning.

9. Genetic algorithms: A technique used for optimization that is inspired by the process of natural evolution or “survival of the fittest.” In this technique, potential solutions are encoded as “chromosomes” that can combine and mutate. These individual chromosomes are selected for survival within a modeled “environment” that determines the fitness or performance of each individual in the population. Often described as a type of “evolutionary algorithm,” these algorithms are well-suited for solving nonlinear problems. Examples of applications include improving job scheduling in manufacturing and optimizing the performance of an investment portfolio.

10. Machine learning: A subspecialty of computer science (within a field historically called “artificial intelligence”) concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data. A major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data. Natural language processing is an example of machine learning

11. Natural language processing (NLP): A set of techniques from a subspecialty of computer science (within a field historically called “artificial intelligence”) and linguistics that uses computer algorithms to analyze human (natural) language. Many NLP techniques are types of machine learning. One application of NLP is using sentiment analysis on social media to determine how prospective customers are reacting to a branding campaign.

12. Neural networks: Computational models, inspired by the structure and workings of biological neural networks (i.e., the cells and connections within a brain), that find patterns in data. Neural networks are well-suited for finding nonlinear patterns. They can be used for pattern recognition and optimization. Some neural network applications involve supervised learning and others involve unsupervised learning. Examples of applications include identifying high-value customers that are at risk of leaving a particular company and identifying fraudulent insurance claims.

13. Network analysis: A set of techniques used to characterize relationships among discrete nodes in a graph or a network. In social network analysis, connections between individuals in a community or organization are analyzed, e.g., how information travels, or who has the most influence over whom. Examples of applications include identifying key opinion leaders to target for marketing, and identifying bottlenecks in enterprise information flows.

14. Optimization: A portfolio of numerical techniques used to redesign complex systems and processes to improve their performance according to one or more objective measures (e.g., cost, speed, or reliability). Examples of applications include improving operational processes such as scheduling, routing, and floor layout, and making strategic decisions such as product range strategy, linked investment analysis, and R&D portfolio strategy. Genetic algorithms are an example of an optimization technique

15. Pattern recognition: A set of machine learning techniques that assign some sort of output value (or label) to a given input value (or instance) according to a specific algorithm. Classification techniques are an example.

16. Predictive modeling: A set of techniques in which a mathematical model is created or chosen to best predict the probability of an outcome. An example of an application in customer relationship management is the use of predictive models to estimate the likelihood that a customer will “churn” (i.e., change providers) or the likelihood that a customer can be cross-sold another product. Regression is one example of the many predictive modeling techniques.

17. Regression: A set of statistical techniques to determine how the value of the dependent variable changes when one or more independent variables is modified. Often used for forecasting or prediction. Examples of applications include forecasting sales volumes based on various market and economic variables or determining what measurable manufacturing parameters most influence customer satisfaction. Used for data mining.

18. Sentiment analysis: Application of natural language processing and other analytic techniques to identify and extract subjective information from source text material. Key aspects of these analyses include identifying the feature, aspect, or product about which a sentiment is being expressed, and determining the type, “polarity” (i.e., positive, negative, or neutral) and the degree and strength of the sentiment. Examples of applications include companies applying sentiment analysis to analyze social media (e.g., blogs, microblogs, and social networks) to determine how different customer segments and stakeholders are reacting to their products and actions.

19. Signal processing: A set of techniques from electrical engineering and applied mathematics originally developed to analyze discrete and continuous signals, i.e., representations of analog physical quantities (even if represented digitally) such as radio signals, sounds, and images. This category includes techniques from signal detection theory, which quantifies the ability to discern between signal and noise. Sample applications include modeling for time series analysis or implementing data fusion to determine a more precise reading by combining data from a set of less precise data sources (i.e., extracting the signal from the noise).

20. Spatial analysis: A set of techniques, some applied from statistics, which analyze the topological, geometric, or geographic properties encoded in a data set. Often the data for spatial analysis come from geographic information systems (GIS) that capture data including location information, e.g., addresses or latitude/longitude coordinates. Examples of applications include the incorporation of spatial data into spatial regressions (e.g., how is consumer willingness to purchase a product correlated with location?) or simulations (e.g., how would a manufacturing supply chain network perform with sites in different locations?).

21. Statistics: The science of the collection, organization, and interpretation of data, including the design of surveys and experiments. Statistical techniques are often used to make judgments about what relationships between variables could have occurred by chance (the “null hypothesis”), and what relationships between variables likely result from some kind of underlying causal relationship (i.e., that are “statistically significant”). Statistical techniques are also used to reduce the likelihood of Type I errors (“false positives”) and Type II errors (“false negatives”). An example of an application is A/B testing to determine what types of marketing material will most increase revenue.

22. Supervised learning: The set of machine learning techniques that infer a function or relationship from a set of training data. Examples include classification and support vector machines.30 This is different from unsupervised learning.

23. Simulation: Modeling the behavior of complex systems, often used for forecasting, predicting and scenario planning. Monte Carlo simulations, for example, are a class of algorithms that rely on repeated random sampling, i.e., running thousands of simulations, each based on different assumptions. The result is a histogram that gives a probability distribution of outcomes. One application is assessing the likelihood of meeting financial targets given uncertainties about the success of various initiatives

24. Time series analysis: Set of techniques from both statistics and signal processing for analyzing sequences of data points, representing values at successive times, to extract meaningful characteristics from the data. Examples of time series analysis include the hourly value of a stock market index or the number of patients diagnosed with a given condition every day. Time series forecasting is the use of a model to predict future values of a time series based on known past values of the same or other series. Some of these techniques, e.g., structural modeling, decompose a series into trend, seasonal, and residual components, which can be useful for identifying cyclical patterns in the data. Examples of applications include forecasting sales figures, or predicting the number of people who will be diagnosed with an infectious disease.

25. Unsupervised learning: A set of machine learning techniques that finds hidden structure in unlabeled data. Cluster analysis is an example of unsupervised learning (in contrast to supervised learning).


26. Visualization: Techniques used for creating images, diagrams, or animations to communicate, understand, and improve the results of big data analyses.

Visto en Big Data made simple

(Libro y VideoTutorial) Storytelling with Data: A Data Visualization Guide for Business Professionals

$
0
0

Un libro muy interesante para todos los interesados en Visualización y como 'contar' con datos, colores y formas, muy de moda a través del concepto de 'Storytelling'

Abajo, tenéis también este VideoTutorial explicado por la autora:

Viewing all 575 articles
Browse latest View live