Please turn JavaScript on
Daily Dose of Data Science icon

Daily Dose of Data Science

Click on the "Follow" button below and you'll get the latest news from Daily Dose of Data Science via email, mobile or you can read them on your personal news page on this site.

You can unsubscribe anytime you want easily.

You can also choose the topics or keywords that you're interested in, so you receive only what you want.

Daily Dose of Data Science title: Daily Dose of Data Science

Is this your feed? Claim it!

Publisher:  Unclaimed!
Message frequency:  0.21 / day

Message History

Recap

In the previous chapter, we made the transition from tables to parameterized value functions.

The reasons were structural. Tables do not scale, and they do not generalize. Mountain car has a state space made of two real numbers, so it is impossible to even index into a table for it. And updating one cell of a Q-table tells us nothing about the cell next to it. The...


Read full story
Recap

In the previous chapter, we entered the model-free setting, where the agent learns purely from interaction.

We began by clarifying what "model-free" means. The environment still has dynamics, of course. The point is that the algorithm does not get to see $P$ or $R$ directly. It only gets to interact, observe rewards, and adapt.

We then covered Monte Carlo me...


Read full story
Recap

In the previous chapter, we explored the recursive structure that sits at the heart of reinforcement learning: the Bellman equations.

We started with the Bellman expectation equations for $v_\pi$ and $q_\pi$. We saw that the value of a state, under a fixed policy, equals the expected immediate reward plus the discounted value of the next state. The equations gave ...


Read full story

Hermes Agent crossed 90,000 GitHub stars in two months. Developers are quietly building personal AI agents that learn their workflow, remember their context, and run 24/7.

Hermes Agent takes a fundamentally different approach which makes it much more practically useful over OpenClaw. It ships with a learning loop that:

Remembers across sessionsWrites its own reusabl...

Read full story
Recap

In the previous chapter, we formalized the agent-environment interaction as a Markov decision process (MDP).

We began with the Markov property, which states that the future depends on the past only through the present state. This is the assumption that makes the entire framework tractable: once you know the current state, you can discard the history.

We then...


Read full story