Brief recap of the Open Data Science Conference - San Francisco, October 2019
On AI ROI: the questions you need to be asking Kerstin Frailey Metis
Success is unpredictable in AI – feasibility is oftenunknown before a project has begun. Projects are esoteric – require highlyspecialized training. Application is new – methods to track ROI haven’t beenadjusted for AI; managing AI is a challenge
Performance is volatile – and there’s an iterativelifecycle. Feedback loops and response to AI intervention speed up theexpiration of data and it’s dependent models
Targets are fuzzy – executives don’t have experience with AIprojects so they can’t set clear expectations. Data scientists inevitably missthe intangible goals
Data science teams strive to achieve good data science;which doesn’t always translate to achieving business goals
Data scientist are not informed of the strategy of thebusiness
AI ROI is urgent – companies continue to double down oninvestments even without seeing ROI on their investments. Investments will dryup
Planning – what type of ROI should we expect? Some projectsare explorations for gaining knowledge, some are recommendations or informingdecisions, and some are automation interventions; e.g. fraud prevention project– to interfere and not allow fraudulent transactions from going through.Identifying fraudulent transactions isn’t impactful – reducing the costs due tofraud is the desired impact. Sanity check – is this pure automation play? Coulddescriptive statistics do the job without AI? Does it just seem fun butpromises no clear impact? Is it more cost-effective than pre-exiting orvendor-based solutions? ensure that the project is strategically positioned andif the system can be leveraged elsewhere
Development - what kind of errors do we prefer, whenwe compare models and presenting the models to the business, have to show we’vethought about this. What kind of volatility can we handle? Volatility inperformance is expected; ask your stakeholders what they can accept. Benchmark-simulate, use ghost mode and control group to see how performance compares.Sanity check – is the solution user friendly? An the solution be scaled toaddress the entirety of the problem? Are we building feedback loops? Will thedata become stale quickly? Will we need to update this model often?
Deployment – is it performing close to expectations – arebusiness metrics moving as expected in response to the data science metrics? Isvariability close to expectations?
Governance - Sanity check – are we monitoring feedbackloops, are we iterating to the point of overoptimizing, are we duct-tapingupdates and iterations?
Peter Welinder OpenAI - Learning Robot Dexterity
In order to interact with the world; we have to make contactwith it. Open AI wants to combine learning and manipulation to allow robots tocarry out useful tasks. They decided to test a rubrics cube – they can do thisin a fraction of a second
Past – high robotics expertise
Future – all you need is learning
- Deep reinforcement learning – learning by trial and error; like teaching a dog. Drawback is that it takes a long time to get results
- Simulation to reality
- Cool results
Harry Glaser Sisense - sources of bias: strategies fortackling inherent bias in AI
AI judge developed by UCL computer scientists
AI could identify gang crimes and ignite an ethicalfirestorm
False positives are a societal issue here; a machine treatsa specific geography as “gang related” and results in severe punishments
Another example is using facial recognition during TSAsecurity check. This was an issue for Asian race; you have to build a team thatrepresents the wider world that you plan to apply your model on; the broaderand more diverse team you can get – the better the outcomes of the models.
AI unchallenged runs a strong risk of delivering immoraloutcomes. It is your job and responsibility as a data professional to use yourskills to be the moral compass of your organization and make it right.
Turning data teams into superheroes – Sisense – think abouthuman outcomes
Google employees quit over controversial pentagon work;targeting ads vs targeting drone strikes – got to the right outcome because theengineers that developed the system took ownership and knew the differencesbetween outcomes
How can you incorporate data ethically?
Holistic metrics – grade yourself on metrics you think matchwell with societal outcomes
Representative & diverse teams – key to building modelswith a positive outcome on society
Check your sources – get diverse sources and consider thebias in your data
Who do you report to?
Data professionals reporting directly to sales lean towardsthe biases of sales objectives. Centralized data teams are more likely toremove bias.
You need a Chief Data Office that thinks holistically aboutthe ethical use of data; CDO is conscious of the organization.
You are the conscience of AI – this is your responsibility
Building AI Products: delivery vs. discovery
Companies face challenges with getting data science to workfor them;
Information technology – integrate deploy and mange finishedproducts in production
Software engineering design and code new products using bestpractices
Data engineering – build data pipelines that collect,organize and validate data
Data science- discover the unknown patterns in data andalgorithms that add business value
Data science is different – cross functional engineering,product, marketing, finance – must work autonomous – separate from the traditionalengineering product lifecycle, self-organizing and self-managing. It’s alsoexperimental; - form a hypothesis, analyze data, make predictions, run backtests, a/b testing. It’s also self-sustaining – not a cost center, generates arevenue
Problem is companies hoard data; data stores are a costsink.
Rapid prototyping is key for data science; back of theenvelop calculations; simple experiments; don’t make plans – make tests. Repeatuntil it works.
Kirk Borne Adapting Machine Learning Algorithms to Novel Use Cases
It’s not about telling the business about the coolness ofyour algorithm – it’s about connecting it to their needs and using storytellingto do this
Innovations are inspired by data, informed by data, enabledby data, and create value from data
Confucius says “study your past to know your future” –machine learning
Travel sites raise prices for mac users because they assumethey make more money and would be willing top pay more
Most important thing in your data is metadata
Hilary mason – getting specific about algorithm bias
Facial recognition products fail to do a good job withdarker shades of skin; 99.7 white male and 65.3% darker female.
Sources of bias enter at different stages; machine learningcan amplify bias
People are more likely to assume algorithms are objective orerror free -even if they’re given the option of a human override
Algorithms are more likely to be implemented with no appealsprocess in place
Algorithms are often used at scale
The privileged are processed by people; the poor are processed by algorithms (Cathy O’Neil)